Oracle Functions

Click on Image to see them clearly

Example:

CREATE FUNCTION get_bal(acc_no IN NUMBER)
RETURN NUMBER
IS acc_bal NUMBER(11,2);
BEGIN
SELECT order_total
INTO acc_bal
FROM orders
WHERE customer_id = acc_no;
RETURN(acc_bal);
END;
/

Ref: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/lnpls/CREATE-FUNCTION-statement.html

Oracle Stored Procedure: Create a simple stored procedure

Ref: https://docs.oracle.com/database/121/LNPLS/create_procedure.htm#LNPLS01373

Spearman Correlation Coefficient and Graph Mining

#!/usr/bin/env python

coding: utf-8

# 3rd Model: Deepgraph CNN: Stock Price Prediction using DeepGraphCNN Neural Networks. It includes GCN layers and CNN layers. I have added an MLP at the last layer to predict stock prices.

#

# Input graphs were created for spearman, Spearman, and Kendal Tau correlations/coefficients from historical stock prices. Also, another graph is created based on financial news articles.

#

# For the sake of making execution easier (and at once), I have kept multiple approaches (spearman, Spearman, and Kendal Tau, News Based) in the same file. One big code file can be difficult to handle; is done just for making execution easier.

#

# Because I initially tried separately and brought the code together, some code might be a bit redundant/repeating. I may have done some cleaning.

#

# An use case of DeepGraphCNN for Node Classification

# https://stellargraph.readthedocs.io/en/latest/demos/graph-classification/dgcnn-graph-classification.html

#

# Import Libraries

In[1]:

import libraries

import os
import pandas as pd
import math

In[2]:

Import Libraries for Graph, GNN, and GCN

import stellargraph as sg
from stellargraph import StellarGraph
from stellargraph.layer import DeepGraphCNN
from stellargraph.mapper import FullBatchNodeGenerator
from stellargraph.mapper import PaddedGraphGenerator
from stellargraph.layer import GCN

In[3]:

Machine Learnig related library Imports

from tensorflow.keras import layers, optimizers, losses, metrics, Model
from sklearn import preprocessing, model_selection
from IPython.display import display, HTML
import matplotlib.pyplot as plt
get_ipython().run_line_magic(‘matplotlib’, ‘inline’)
from tensorflow.keras.layers import Dense, Conv1D, MaxPool1D, Dropout, Flatten
from tensorflow import keras

In[4]:

If we want to drop NAN column or row wise for stock price data

I did not need to use this options that much

drop_cols_with_na = 1
drop_rows_with_na = 1

# Dataset: Using 30 companies from the Fortune 500 companies (the paper used these stocks)

In[5]:

df_s = pd.DataFrame();
data_file = "per-day-fortune-30-company-stock-price-data.csv";
df_s = pd.read_csv("./data/" + data_file, low_memory = False);
df_s.head()

In[6]:

You can see ANTM stock price data is empty

# Cure data such as replace missing/null values, use correct data type, sort by date (not really required)

In[7]:

convert Date field to be a Date Type

df_s["Date"] = df_s["Date"].astype(‘datetime64[ns]’)

Sort data by date although this is no longer needed as data already is sorted when I generated data

df_s = df_s.sort_values( by = ['Ticker','Date'], ascending = True )

df_s = df_s.sort_values( by = ‘Date’, ascending = True )
df_s.head()

In[8]:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html

df_s_transpose = df_s

try:
df_s_transpose = df_s_transpose.interpolate(inplace = False)
except:
print("An exception occurred. Operation ignored")
exit

check if any value is null

df_s_transpose.isnull().values.any()

check if any column (axis=1) is null

df_s_transpose[df_s_transpose.isna().any(axis = 1)]

In[9]:

df_s_transpose

In[10]:

df_s_transpose = df_s

if drop_cols_with_na == 1:
df_s_transpose = df_s_transpose.dropna(axis = 1);

print(df_s_transpose.shape)
df_s_transpose.head()

In[11]:

further check and verify

df_s_transpose.isnull().values.any()
df_s_transpose[df_s_transpose.isna().any( axis = 1 )]

In[12]:

making the date column as the index column for the dataset

df_s_transpose.index = df_s_transpose['Date']

df_s_transpose.index = df_s_transpose.index.astype(‘datetime64[ns]’)

# spearman Correlation Coefficient

In[13]:

df_s_transpose_spearman = df_s_transpose.corr(method = ‘spearman’, numeric_only = True)
df_s_transpose_spearman

# spearman Correlation Coefficient based Adjacency Graph Matrix

In[14]:

df_s_transpose_spearman[df_s_transpose_spearman >= 0.4] = 1
df_s_transpose_spearman[df_s_transpose_spearman < 0.4] = 0
df_s_transpose_spearman

In[15]:

make the diagonal element to be zero. No self loop/edge

import numpy as np
np.fill_diagonal(df_s_transpose_spearman.values, 0)
df_s_transpose_spearman

Create and visualize the Graphs

In[17]:

import networkx as nx
Graph_spearman = nx.Graph(df_s_transpose_spearman)

In[18]:

nx.draw_networkx(Graph_spearman, pos = nx.circular_layout( Graph_spearman ), node_color = ‘r’, edge_color = ‘b’)

# Experiment, we will divide the data into train, test, and validation graphs

In[19]:

df_s_transpose.corr(method = ‘spearman’, numeric_only = True)
#df_s_transpose[[{1,2,3}]]
#df_s_transpose.iloc[:, 0:10]

In[20]:

Train Graph

In[21]:

df_s_spearman_train = df_s_transpose.iloc[:, 0:15]
df_s_transpose_spearman_train = df_s_spearman_train.corr(method = ‘spearman’, numeric_only = True)
np.fill_diagonal(df_s_transpose_spearman_train.values, 0)

df_s_transpose_spearman_train[df_s_transpose_spearman_train >= 0.4] = 1
df_s_transpose_spearman_train[df_s_transpose_spearman_train < 0.4] = 0
df_s_transpose_spearman_train

df_s_transpose_spearman_train

# Test Graph

In[22]:

df_s_spearman_test = df_s_transpose.iloc[:, 15:] #df_s_transpose.iloc[:, 15:23]
df_s_transpose_spearman_test = df_s_spearman_test.corr(method = ‘spearman’, numeric_only = True)
np.fill_diagonal(df_s_transpose_spearman_test.values, 0)

df_s_transpose_spearman_train[df_s_transpose_spearman_test >= 0.4] = 1
df_s_transpose_spearman_train[df_s_transpose_spearman_test < 0.4] = 0
df_s_transpose_spearman_test

# Validation Graph

In[23]:

df_s_spearman_validation = df_s_transpose.iloc[:, 15:] #df_s_transpose.iloc[:, 23:]
df_s_transpose_spearman_validation = df_s_spearman_validation.corr(method = ‘spearman’, numeric_only = True)
np.fill_diagonal(df_s_transpose_spearman_validation.values, 0)
df_s_transpose_spearman_validation

df_s_transpose_spearman_validation[df_s_transpose_spearman_validation >= 0.4] = 1
df_s_transpose_spearman_validation[df_s_transpose_spearman_validation < 0.4] = 0
df_s_transpose_spearman_validation

In[24]:

graph_spearman_train = nx.Graph(df_s_transpose_spearman_train)
graph_spearman_test = nx.Graph(df_s_transpose_spearman_test)
graph_spearman_validation = nx.Graph(df_s_transpose_spearman_validation)

nx.draw_networkx(graph_spearman_train, pos = nx.circular_layout( graph_spearman_train ), node_color = ‘r’, edge_color = ‘b’)

In[25]:

df_s_spearman_train.corr(numeric_only = True)

In[26]:

nx.draw_networkx(graph_spearman_test, pos = nx.circular_layout( graph_spearman_test ), node_color = ‘r’, edge_color = ‘b’)

In[27]:

nx.draw_networkx(graph_spearman_validation, pos = nx.circular_layout( graph_spearman_validation ), node_color = ‘r’, edge_color = ‘b’)

# Create GCN layer. spearman

# Find all stocks = nodes

In[28]:

improvement: make sure only stocks/nodes that are in the graph are taken

all_stock_nodes = df_s_transpose_spearman.index.to_list()
all_stock_nodes[:5]

# Find all edges between nodes

#

This may need adjustment to reflect train, test, validation graphs

In[29]:

source = [];
target = [];
edge_feature = [];

for aStock in all_stock_nodes:
for anotherStock in all_stock_nodes:
if df_s_transpose_spearman[aStock][anotherStock] > 0:
#print(df_s_transpose_spearman[aStock][anotherStock])
source.append(aStock)
target.append(anotherStock)
edge_feature.append(1)

edge feature is not required except for news based graph

source, target, edge_feature

# Find all edges in Train, Test, and Validation Graphs

In[30]:

trainSource = [];
trainTarget = [];
trainEdge_feature = [];
trainNodeList = df_s_transpose_spearman_train.index.to_list();

testSource = [];
testTarget = [];
testEdge_feature = [];
testNodeList = df_s_transpose_spearman_test.index.to_list();

validationSource = [];
validationTarget = [];
validationEdge_feature = [];
validationNodeList = df_s_transpose_spearman_validation.index.to_list();

for aStock in trainNodeList:
for anotherStock in trainNodeList:
if df_s_transpose_spearman_train[aStock][anotherStock] > 0:
#print(df_s_transpose_spearman[aStock][anotherStock])
trainSource.append(aStock)
trainTarget.append(anotherStock)
trainEdge_feature.append(1)

for aStock in testNodeList:
for anotherStock in testNodeList:
if df_s_transpose_spearman_test[aStock][anotherStock] > 0:
#print(df_s_transpose_spearman[aStock][anotherStock])
testSource.append(aStock)
testTarget.append(anotherStock)
testEdge_feature.append(1)

for aStock in validationNodeList:
for anotherStock in validationNodeList:
if df_s_transpose_spearman_validation[aStock][anotherStock] > 0:

print(df_s_transpose_spearman[aStock][anotherStock])

validationSource.append(aStock)
validationTarget.append(anotherStock)
validationEdge_feature.append(1)

edge feature is not required except for news based graph

trainSource, trainTarget, trainEdge_feature
testSource, testTarget, testEdge_feature
validationSource, validationTarget, validationEdge_feature

# Create variables to create stellar graph

# Edges

In[31]:

https://stellargraph.readthedocs.io/en/stable/demos/basics/loading-pandas.html

spearman_edges = pd.DataFrame(
{"source": source, "target": target}
)

spearman_edges_data = pd.DataFrame(
{"source": source, "target": target, "edge_feature": edge_feature}
)

https://stellargraph.readthedocs.io/en/stable/demos/basics/loading-pandas.html

spearman_edges_train = pd.DataFrame(
{"source": trainSource, "target": trainTarget}
)

spearman_edges_data_train = pd.DataFrame(
{"source": trainSource, "target": trainTarget, "edge_feature": trainEdge_feature}
)

spearman_edges_test = pd.DataFrame(
{"source": testSource, "target": testTarget}
)

spearman_edges_data_test = pd.DataFrame(
{"source": testSource, "target": testTarget, "edge_feature": testEdge_feature}
)

spearman_edges_validation = pd.DataFrame(
{"source": validationSource, "target": validationTarget}
)

spearman_edges_train[:10]

# Have the time series data as part of the nodes

# Structure the Feature Matrix so that it can be passed to the GCN

In[32]:

df_s_transpose_feature = df_s_transpose.reset_index(drop = True, inplace = False)

df_s_transpose_feature = df_s_transpose_feature.values.tolist()

print(df_s_transpose_feature.values.tolist())

#df_s_transpose_feature['WY'].values
df_s_transpose_feature['AAPL'].shape, df_s_transpose_feature['AAPL'].values

In[33]:

len(all_stock_nodes)

In[34]:

bring/assign data to nodes

node_Data = [];
for x in all_stock_nodes:
node_Data.append( df_s_transpose_feature[x].values)

node_Data

In[35]:

convert node data variable into a dataframe so that the data structure is compatible with graph NN

spearman_graph_node_data = pd.DataFrame(node_Data, index = all_stock_nodes)
spearman_graph_node_data.head()

In[36]:

node_Data[14:15],
len(validationNodeList)
len(testNodeList)

In[37]:

Node time series data based on train, test, validation graph

In[38]:

convert node data variable into a dataframe so that the data structure is compatible with graph NN

spearman_graph_node_data_train = pd.DataFrame(node_Data[0:14], index = trainNodeList)
spearman_graph_node_data_train.head()

spearman_graph_node_data_test = pd.DataFrame(node_Data[14:], index = testNodeList) #pd.DataFrame(node_Data[15:23], index = testNodeList)
spearman_graph_node_data_test.head()

spearman_graph_node_data_validation = pd.DataFrame(node_Data[14:], index = validationNodeList) #pd.DataFrame(node_Data[22:30], index = validationNodeList)
spearman_graph_node_data_validation.head()

In[39]:

spearman_graph_node_data_train

# Graph (stellar) with features as part of Nodes

In[40]:

Overall

spearman_graph_with_node_features = StellarGraph(spearman_graph_node_data, edges = spearman_edges, node_type_default = "corner", edge_type_default = "line")
print(spearman_graph_with_node_features.info())

train nodes

spearman_train_graph_with_node_features = StellarGraph(spearman_graph_node_data_train, edges = spearman_edges_train, node_type_default = "corner", edge_type_default = "line")
print(spearman_train_graph_with_node_features.info())

test

spearman_test_graph_with_node_features = StellarGraph(spearman_graph_node_data_test, edges = spearman_edges_test, node_type_default = "corner", edge_type_default = "line")
print(spearman_test_graph_with_node_features.info())

validation

spearman_validation_graph_with_node_features = StellarGraph(spearman_graph_node_data_validation, edges = spearman_edges_validation, node_type_default = "corner", edge_type_default = "line")
print(spearman_validation_graph_with_node_features.info())

# Adapting everything for DeepGraphCNN

In[41]:

spearman_graph_node_data.iloc[0:15, :]

# Graphs to be jused for DeepGraphCNN

In[42]:

graphs = list()
#graphs.append(spearman_graph_with_node_features)
graphs.append(spearman_train_graph_with_node_features)
graphs.append(spearman_test_graph_with_node_features)
graphs.append(spearman_validation_graph_with_node_features)

In[43]:

summary = pd.DataFrame(
[(g.number_of_nodes(), g.number_of_edges()) for g in graphs],
columns=["nodes", "edges"],
)
summary.describe().round()

In[44]:

graph_labels = all_stock_nodes

In[45]:

Generator

#generator = FullBatchNodeGenerator(spearman_graph_with_node_features, method = "gcn") # , sparse = False
#vars(generator)

generator = PaddedGraphGenerator( graphs = graphs)

generator = PaddedGraphGenerator( spearman_graph_with_node_features)

In[46]:

vars(generator)

# Train Test Split

# Commented out on 2023-04-18

train_subjects, test_subjects = model_selection.train_test_split(

spearman_graph_node_data

)

#

val_subjects, test_subjects_step_2 = model_selection.train_test_split(

test_subjects

)

#

#, train_size = 500, test_size = None, stratify = test_subjects

#

train_subjects.shape, test_subjects.shape, val_subjects.shape, test_subjects_step_2.shape

In[98]:

Python/ML Correlation Coefficients

df_s_transpose_pearson = df_s_transpose.corr(method = ‘pearson’, numeric_only = True)
df_s_transpose_pearson

# Pearson Correlation Coefficient

df_s_transpose_pearson = df_s_transpose.corr(method = ‘pearson’, numeric_only = True)
df_s_transpose_pearson

Pearson Correlation Coefficient based Adjacency Graph Matrix

df_s_transpose_pearson[df_s_transpose_pearson >= 0.5] = 1
df_s_transpose_pearson[df_s_transpose_pearson < 0.5] = 0
df_s_transpose_pearson

Create a Graph

import networkx as nx
Graph_pearson = nx.Graph(df_s_transpose_pearson)

before the above step do: make the diagonal element to be zero. No self loop/edge

import numpy as np
np.fill_diagonal(df_s_transpose_pearson.values, 0)

Draw the graph

nx.draw_networkx(Graph_pearson, pos = nx.circular_layout( Graph_pearson ), node_color = ‘r’, edge_color = ‘b’)

Library Import in Python for ML/Graph ML

import libraries

import os
import pandas as pd
import math

Import Libraries for Graph, GNN (Graph Neural Network), and GCN (Graph Convolutional Network)

import stellargraph as sg
from stellargraph import StellarGraph
from stellargraph.layer import DeepGraphCNN
from stellargraph.mapper import FullBatchNodeGenerator
from stellargraph.mapper import PaddedGraphGenerator
from stellargraph.layer import GCN

Machine Learning related library Imports (Tensorflow)

from tensorflow.keras import layers, optimizers, losses, metrics, Model
from sklearn import preprocessing, model_selection
from IPython.display import display, HTML
import matplotlib.pyplot as plt
%matplotlib inline
from tensorflow.keras.layers import Dense, Conv1D, MaxPool1D, Dropout, Flatten
from tensorflow import keras

#how to read data from csv files
df_s = pd.read_csv("./data/" + data_file, low_memory = False);
df_s.head()

Convert Data Type and Sort Data

convert Date field to be a Date Type

df_s["Date"] = df_s["Date"].astype(‘datetime64[ns]’)

Sort data by date although this is no longer needed as data already is sorted when I generated data

df_s = df_s.sort_values( by = ['Ticker','Date'], ascending = True )

df_s = df_s.sort_values( by = ‘Date’, ascending = True )
df_s.head()

drop not available data

df_s_transpose = df_s_transpose.dropna(axis = 1);

C# Thread-Safe Concurrent Collections

C# Concurrent classes are provided through System.Collections.Concurrent

Concurrent classes: for thread-safe operations i. e. Now multiple threads can access the Collections without creating problems

Concurrent classes:

BlockingCollection
ConcurrentBag
ConcurrentStack
ConcurrentQueue
ConcurrentDictionary
Partitioner
Partitioner
OrderablePartitioner

Core building blocks across Javascript, PHP, C++, Java, Python, and Ruby

Core building blocks across Javascript, PHP, C++, Java, Python, and Ruby: Ref: https://codesensei.medium.com/core-building-blocks-and-which-programming-language-to-study-first-446977d119a8

Java Spring Concepts

https://youtu.be/0_7xyuVOJbQ

K-Means Clustering

Click on the images to see them clearly

#!/usr/bin/env python

coding: utf-8

In[1]:

k-means clustering

from numpy import unique
from numpy import where
from sklearn.datasets import make_classification
from sklearn.cluster import KMeans
from matplotlib import pyplot
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
get_ipython().run_line_magic(‘matplotlib’, ‘inline’)
import pandas as pd
import numpy as np
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

In[2]:

import warnings
warnings.filterwarnings(‘ignore’)

In[3]:

the combined data

data_folder = ‘./nhanes_input_data/’

import the CSV as a pandas dataframe

df = pd.read_csv( data_folder + ‘0_dietaryIntakeDataForClassificationAndAnalysisData.csv’)
df.shape

In[4]:

df.head(5)

In[5]:

parameters to be used for KMeans clustring: centres

X and/or kdf will have only features we want to create cluster around

kdf = df[
   
    [
        'RIDAGEYR_Age_in_years_at_screening'
        ,'URDACT_Albumin_creatinine_ratio_mg_g'
    ]
]
X = kdf
X[:5]

In[6]:

ref: internet (not my code, using as a library)

def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)
return df[indices_to_keep].astype(np.float64)

In[7]:

X has the features to cluster around (centres: Age, ACR). df has the complete data

after clustering is done using features in X, we find positions (index) for each data

in a cluster then we use those index positions to cluster the data from df

X.shape, df.shape

In[8]:

X = clean_dataset(X)

In[9]:

define the model

model = KMeans(n_clusters = 10) #,random_state=0, n_init="auto"

fit the model

model.fit(X)
#model.labels_

In[10]:

Create csv files with the cluster daya

One csv for one Cluster

In[11]:

howManyClusters = 10
for clusterId in range (howManyClusters):
ind_list = np.where(model.labels_ == clusterId )[0]
cluster = df.iloc[ind_list]
cluster.to_csv(‘./nhanes_output_data/classifiedGroups/kmeanscluster/cluster-‘
+ str(clusterId) + ‘.csv’);

In[12]:

model.cluster_centers_

In[13]:

Scatter plot to see each cluster points visually

std_data = StandardScaler().fit_transform(X)
plt.scatter(std_data[:,0], std_data[:,1], c = model.labels_, cmap = "rainbow")
plt.title("K-means Clustering of Diet and ACR data")
plt.show()

# References:

# print("Shape of cluster:", model.cluster_centers_.shape)

# https://stackoverflow.com/questions/50297142/get-cluster-points-after-kmeans-in-a-list-format

#

# https://machinelearningmastery.com/clustering-algorithms-with-python/

# https://stackoverflow.com/questions/50297142/get-cluster-points-after-kmeans-in-a-list-format

# https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

# https://datascience.stackexchange.com/questions/48693/perform-k-means-clustering-over-multiple-columns

# https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

#

>>> from sklearn.cluster import KMeans

>>> import numpy as np

>>> X = np.array([[1, 2], [1, 4], [1, 0],

… [10, 2], [10, 4], [10, 0]])

>>> kmeans = KMeans(n_clusters=2, random_state=0, n_init="auto").fit(X)

>>> kmeans.labels_

array([1, 1, 1, 0, 0, 0], dtype=int32)

>>> kmeans.predict([[0, 0], [12, 3]])

array([1, 0], dtype=int32)

>>> kmeans.cluster_centers_

array([[10.,  2.],

[ 1.,  2.]])

In[ ]:

C# Application using VS Code

You need C# Dev Kit. .Net Install Tools (.Net Runtime Install Tools), .Net SDK, and extensions that these extensions also depend on.
Then >.net option will give you the project create, open, build or similar features – Check Image below)

Having Visual Studio is the best option (and in Windows Environment under VM (Parallels or Oracle Virtual Box, VMWARE, Similar) or not ).
Creating .Net C# project in VS Code

Hope this helps.