Exporting the results of a tMap component inner join: Rejected data to a new table

Steps: https://help.talend.com/en-US/joining-two-data-sources-tmap-talend-studio/8.0/displaying-results-tmap-joining?id=10

Talend: Configuring filters in a tMap component

Source:

https://help.talend.com/en-US/joining-two-data-sources-tmap-talend-studio/7.3/tmap-filters-year

Installing Talend in MAC

Talend Open Studio for Data Integration.

If you have intel based Mac, then the OSX installer will work. 

Else, you may install a Virtual Machine such as Parallels, then have Windows in it (Parallels may come with Windows), then have Talend under Windows

Another option can be use Linux Versions. 

Download and install 

Download from: https://www.qlik.com/us/trial/talend-data-fabric

(Talend download from https://www.talend.com/products/talend-open-studio/ leads to the above URL) Instructions are also on this site.

You may see which one works for you. You will need Java to install Linux Version on Mac. Make sure, you have the correct version of Java in your system.

Python: MatPlotLib: Plot Data

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


columns = ["J_ID", "P_ID", "D_ID", "Year", "JOB_TITLE", "Department_Name", "Total Salary", "Commission"]
df = pd.read_csv("datamart.csv", usecols=columns)


# Create a 3D plot
fig = plt.figure(figsize=(20, 15))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot
ax.scatter(df['Year'], df['J_ID'], df['D_ID'], c=df['Total Salary'], cmap='viridis', marker='X')

# Labels
ax.set_xlabel('Year')
ax.set_ylabel('Job Title (Categorized)')
ax.set_zlabel('Departments')

# Title
ax.set_title('3D Plot of Total Salary by Year and Job Title and Department')


for i in range(df.shape[0]):
    if i%7 == 0:
        ax.text(df['Year'][i], df['J_ID'][1], df['D_ID'][i], '(' + str(df['Year'][i]) + ',' + str(df["JOB_TITLE"][i]) + ',' + str(df["Department_Name"][i]) + ',' + str(df['Total Salary'][i]) + ")")

# Show plot
plt.show()


Sample Data:

Dynamic SQL

“Dynamic SQL is the SQL statement that is constructed and executed at runtime based on input parameters passed”

https://www.sqlshack.com/dynamic-sql-in-sql-server/

Example: ChatGPT

DECLARE @sql AS NVARCHAR(MAX)

SET @sql = ‘SELECT * FROM Employees WHERE Department = ”’ + @department + ””

EXEC sp_executesql @sql

Another Example: Oracle

table_name := ‘HR.Employees’;

dyn_sql := ‘Select * from ‘ || table_name || ‘ FETCH FIRST 10 ROWS ONLY ‘;

EXECUTE IMMEDIATE dyn_sql;

DECLARE
    table_name VARCHAR2(100):= '';
    dyn_sql VARCHAR2(100):= '';
BEGIN     
    table_name := 'HR.Employees';
    dyn_sql := 'Select * from ' || table_name || ' FETCH FIRST 10 ROWS ONLY ';
    EXECUTE IMMEDIATE dyn_sql;
END;

A simple dynamic SQL in a stored Procedure (Oracle)

CREATE OR REPLACE PROCEDURE Retrieve_10_ROWS
IS
    dyn_sql VARCHAR2(200);
    table_name VARCHAR2(50);
BEGIN 
 table_name := 'HR.Employees';
 dyn_sql := 'Select * from ' || table_name || ' FETCH FIRST 10 ROWS ONLY ';
 
 EXECUTE IMMEDIATE dyn_sql;
END;

EXEC Retrieve_10_ROWS;

Dynamic/Reference Cursor and Dynamic SQL

CREATE OR REPLACE PROCEDURE Retrieve_10_ROWS_dyn_cur(p_table_to_query VARCHAR2) IS
    dyn_sql VARCHAR2(200);
    JOB_ID VARCHAR2(50);
    JOB_TITLE VARCHAR2(100);    
    TYPE cur_typ IS REF CURSOR;
    c cur_typ;
BEGIN
dyn_sql := 'Select JOB_ID, JOB_TITLE from ' || p_table_to_query || ' FETCH FIRST 10 ROWS ONLY ';
 OPEN c FOR dyn_sql; 
 --USING table_name;
    LOOP
        FETCH c INTO JOB_ID, JOB_TITLE;        
        EXIT WHEN c%NOTFOUND;
        -- process row here
        DBMS_OUTPUT.PUT_LINE(JOB_ID || ' ' || JOB_TITLE);
    END LOOP;
    CLOSE c;
 EXECUTE IMMEDIATE dyn_sql;
END;

What is a Dynamic Cursor in Oracle PL/SQL

Dynamic/Ref Cursor

CREATE OR REPLACE PROCEDURE query_invoice(
month VARCHAR2,
year VARCHAR2) IS
TYPE cur_typ IS REF CURSOR;
c cur_typ;

query_str VARCHAR2(200);
inv_num NUMBER;
inv_cust VARCHAR2(20);
inv_amt NUMBER;
BEGIN
query_str := 'SELECT num, cust, amt FROM inv_' || month ||'_'|| year
|| ' WHERE invnum = :id';

OPEN c FOR query_str USING inv_num;
LOOP
FETCH c INTO inv_num, inv_cust, inv_amt;
EXIT WHEN c%NOTFOUND;
-- process row here
END LOOP;
CLOSE c;
END;
/

Ref: https://docs.oracle.com/cd/B13789_01/appdev.101/b10795/adfns_dy.htm

cursor FOR loop

Cursor For Loop for an Explicit Cursor

Case Statement in Oracle PL/SQL

Write a SQL block that will categorize Employee Salaries. If the Salary is higher than average, it will show ‘Above Average’ to the output. If the Salary is lower than the average it will show ‘Below average’ to the output. Use HR.Employees Table. Also, utilize CASE statement.

SELECT employee_id,
       CASE 
           WHEN salary > (SELECT AVG(salary) FROM hr.employees) 
                THEN 'Above Average'
           WHEN salary < (SELECT AVG(salary) FROM hr.employees) 
                THEN 'Below Average'
           ELSE 'Average'
       END AS salary_comparison
FROM hr.employees
order by salary_comparison;

-- Ref: with a bit of information from the Internet

Variable Declaration Conventions in Oracle PL/SQL

Ref: https://trivadis.github.io/plsql-and-sql-coding-guidelines/v4.1/2-naming-conventions/naming-conventions/

3D Scatter Plot in Python

Visualizing 3-D numeric data with Scatter Plots

length, breadth and depth

Ref: https://towardsdatascience.com/the-art-of-effective-visualization-of-multi-dimensional-data-6c7202990c57

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import matplotlib as mpl
import numpy as np
import seaborn as sns
%matplotlib inline

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection=’3d’)

xs = wines[‘residual sugar’]
ys = wines[‘fixed acidity’]
zs = wines[‘alcohol’]
ax.scatter(xs, ys, zs, s=50, alpha=0.6, edgecolors=’w’)

ax.set_xlabel(‘Residual Sugar’)
ax.set_ylabel(‘Fixed Acidity’)
ax.set_zlabel(‘Alcohol’)

Other Visualizations of Wine Data

Univariate Analysis

tight_layout, 15 bins

Correlation Maps: Multivariate Analysis

Seaborn – Heatmaps

plot parallel coordinates

BOX Plot

Violin plot: sns.violinplot(x=”quality”, y=”sulphates”, data=wines, ax=ax) : Seaborn

3d Pairwise Scatter Plot

Images are taken from my own executions.

How to Disable SELinux

Ref: https://www.ibm.com/docs/en/tnpm/1.4.5?topic=tasks-disable-selinux-linux-only

Sample File: