Inductive and Deductive Methods for Data Analytics Projects

Inductive and Deductive Methods for Data Analytics Projects

Deductive: Top-down approach. Take existing theories and apply to data

Inductive: Bottom-up approach. Observe data and derive a hypothesis.

Examples in Data Analytics:

  • Inductive: Analyzing customer purchase data to identify recurring patterns in buying habits, which can lead to new marketing strategies or product recommendations.
  • Deductive: Testing a marketing hypothesis about the effectiveness of a new ad campaign by comparing its performance against a control group. 

Ref: Internet/Google AI

In a data analytics project, you may use both of the approaches. Initially, you may use an inductive approach to understand and explore data. Then you use the deductive method to test a specific hypothesis.

Data Analytics, Machine Learning, Data Science

Data Manipulation for ML and Data Analytics Projects.

•MySQL Data Manipulation:

https://www.databasejournal.com/mysql/mysql-data-manipulation-and-query-statements/

https://www.w3schools.com/sql/

https://www.tutorialspoint.com/sql/index.htm

•Workbench: https://www.tutorialspoint.com/create-a-new-database-with-mysql-workbench

•SQL Server Data Manipulation

https://www.tutorialspoint.com/ms_sql_server/index.htm

•Management Studio:

https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_management_studio.htm

•Power BI Data Manipulation

https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-tutorial-importing-and-analyzing-data-from-a-web-page

Data Manipulation in Python

https://www.analyticsvidhya.com/blog/2021/06/data-manipulation-using-pandas-essential-functionalities-of-pandas-you-need-to-know/

Data Analytics, Machine Learning, Data Science

How to Report (or Present) the outcome of your Analytics/ML Project

Reporting and Analysis

Examples

•Results section: Page 51: STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH
THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS

https://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=1692&context=etd_projects

•Check Results and Discussion sections

https://arxiv.org/ftp/arxiv/papers/2203/2203.06848.pdf

A Comparative Study on Forecasting of Retail Sales

May be complicated: Learning Context-Aware Classifier for Semantic Segmentation

https://arxiv.org/pdf/2303.11633.pdf

•Learning Context-Aware Classifier for Semantic Segmentation

•Check results section; also Discussion Section: SPEECH INTELLIGIBILITY CLASSIFIERS FROM 550K DISORDERED SPEECH SAMPLES

https://arxiv.org/pdf/2303.07533.pdf

•You can notice: results reported under different criteria, use of tables and figures.

•Notice/read the descriptions

Data Analytics, Machine Learning, Data Science

ROC/AUC: Classification: ROC and AUC

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

F1 Score in Machine Learning: https://www.geeksforgeeks.org/f1-score-in-machine-learning/

Ref: https://www.geeksforgeeks.org/f1-score-in-machine-learning/

“This formula ensures that both precision and recall must be high for the F1 score to be high. If either one drops significantly, the F1 score will also drop.”

LEAF and CNN:

LEAF: “Leaf: A learnable frontend for audio classification,” ICLR, 2021

ARIMA: Introducing ARIMA models

https://www.ibm.com/think/topics/arima-model

Autoregressive Integrated Moving Average (ARIMA) Prediction Model

https://www.investopedia.com/terms/a/autoregressive-integrated-moving-average-arima.asp

“What Is an Autoregressive Integrated Moving Average (ARIMA)?

An autoregressive integrated moving average, or ARIMA, is a statistical analysis model that uses time series data to either better understand the data set or to predict future trends. 

A statistical model is autoregressive if it predicts future values based on past values. For example, an ARIMA model might seek to predict a stock’s future prices based on its past performance or forecast a company’s earnings based on past periods.” : Ref: Investopedia

GCN: Graph Convolutional Networks (GCNs): Architectural Insights and Applications

GCNs are tailored to work with non-Euclidean data, making them suitable for a wide range of applications including social networks, molecular structures, and recommendation systems.

https://www.geeksforgeeks.org/deep-learning/graph-convolutional-networks-gcns-architectural-insights-and-applications

Facebook Prophet:

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.

https://facebook.github.io/prophet

Single community based linear models:

Google AI Overview:

“Single community-based linear models refer to statistical models where a single linear equation is used to predict a response variable based on the characteristics of a single community or group. These models assume a linear relationship between the predictor variables and the outcome within that specific community”

Multiple Community-Based Linear Models

“The term “Multiple Community-Based Linear Models” likely refers to a modeling framework where separate linear models are fitted for different communities (e.g., neighborhoods, schools, cities, regions), rather than combining all data into a single model.” Reference: ChatGPT Also, this may be reference: https://www.stats.ox.ac.uk/~snijders/mlbook.htm

Data Analytics Concepts and Methods

What is Analysis? [1]

• “A comprehensive, data-driven strategy for problem solving”

Analytics

• “Analytics uses logic, inductive and deductive reasoning, critical thinking, and quantitative methods along with data to examine phenomena and determine its essential features”

• “any solution that supports the identification of meaningful patterns and relationships among data.”

CONCEPTS [1]

Analytics Methods [1]:

[1] Ref: A Book: The Analytics Lifecycle Toolkit A Practical Guide for an Effective Analytics Capability, Wiley

Data Analytics, Machine Learning, Data Science

Some Basic SQLs in Oracle

— Find Tables in a Schema

— for SH schema

SELECT owner, table_name
FROM all_tables
where OWNER = ‘SH’;

— for HR Schema

SELECT owner, table_name
FROM all_tables
where OWNER = ‘HR’;


— Create table based on another table

DROP TABLE MyCustomer;

— Create the structure but no data

create table MyCustomer AS

Select Cust_ID, Cust_First_Name, Cust_Last_Name   

FROM sh.Customers

where ROWNUM < 0;

SELECT *

FROM MyCustomer;


DROP TABLE MyCustomer;

— Create both structure and bring data

create table MyCustomer AS

Select Cust_ID, Cust_First_Name, Cust_Last_Name   

FROM sh.Customers;

SELECT *

FROM MyCustomer;

BRING DATA FROM ANOTHER TABLE

— REMOVE ALL DATA FROM TABLE MyCustomer

TRUNCATE TABLE MyCustomer;

SELECT *

FROM MyCustomer;

— BRING DATA FROM ANOTHER TABLE

INSERT INTO MyCustomer

SELECT Cust_ID, Cust_First_Name, Cust_Last_Name

FROM SH.CUSTOMERS

FETCH FIRST 10 ROWS ONLY;

— SHOW INSERTED DATA

SELECT *

FROM MyCustomer;

Oracle PL/SQL Concepts

Oracle PL/SQL Concepts

  • BLOCK
    • declare …. Begin…End.
  • Cursor
  • Trigger
  • Programming Clauses:
    • if..then…else, case when, loops (for, while)
  • Stored Procedure
  • Function
  • Advanced SQLs and Analytics Functions
    • GROUP BY ROLLUP(), GROUP BY CUBE()
    • RANK(), DENSE_RANK(), ROW_NUMBER()
    • PARTITION_BY, ORDER BY X NULLS Last, ORDER BY X NULLS FIRST, RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND INTERVAL ‘30’ DAY Following
    • Hierarchical: Connect Prior
    • Windowing Functions: RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND INTERVAL ‘30’ DAY Following
    • Grouping Sets

Misc. Short Notes on Visual Studio and C#

Download Visual Studio Community Edition:

https://visualstudio.microsoft.com/vs/community

Compare Different Versions of Visual Studio:

https://visualstudio.microsoft.com/vs/compare

IPO Diagram for Your Code (Application)

IPO Diagram visually shows/describes key inputs, Processes/Operations, and resulting outputs from those operations.

Ref: https://www.youtube.com/watch?v=a10a11oxjrA&pp=0gcJCdgAo7VqN5tD

For UML class diagram Concepts, please check:

https://www.visual-paradigm.com/guide/uml-unified-modeling-language/uml-class-diagram-tutorial

An example from the URL above:

In Object Oriented Design:

Ref: https://www.visual-paradigm.com/guide/uml-unified-modeling-language/uml-aggregation-vs-composition/

Associations are relationships between classes in a UML Class Diagram i.e. How these classes are associated (in real world). Two types: Aggregation and Composition.

Aggregation and Composition are subsets of association meaning they are specific cases of association. In both aggregation and composition object of one class “owns” object of another class. But there is a subtle difference:

  • Aggregation implies a relationship where the child can exist independently of the parent. Example: Class (parent) and Student (child). Delete the Class and the Students still exist.
  • Composition implies a relationship where the child cannot exist independent of the parent. Example: House (parent) and Room (child). Rooms don’t exist separate to a House.

Generalization is a mechanism for combining similar classes of objects into a single, more general class.”

Specialization is the reverse process of Generalization means creating new sub-classes from an existing class.”

For OOP Concepts: Polymorphism, Encapsulation, Data Abstraction and Inheritance in Object-Oriented Programming

Check: https://raygun.com/blog/oop-concepts-java/

https://www.nerd.vision/post/polymorphism-encapsulation-data-abstraction-and-inheritance-in-object-oriented-programming

https://www.geeksforgeeks.org/understanding-encapsulation-inheritance-polymorphism-abstraction-in-oops

Linux Certification (RedHat, Ubuntu, Generic)

Linux Certification (RedHat, Ubuntu, Generic)

For UBUNTU:

CUE.01 Linux Quick Certification (QC)

CUE.02 Desktop Quick Certification (QC)

CUE.03 Server Quick Certification (QC)

Ref: https://ubuntu.com/credentials

RedHat:

Red Hat Certified System Administrator

Red Hat Certified Engineer

Red Hat Certified Specialist in Containers

Red Hat Certified OpenShift Administrator

Ref: https://www.redhat.com/en/services/certifications

Linux Professional Institute LPIC-1

LPIC-1, LPIC-2, LPIC-3

Ref: https://www.lpi.org/our-certifications/lpic-1-overview/

CompTIA Linux+

https://www.comptia.org/certifications/linux

Linux Foundation Certified System Administrator (LFCS):

https://training.linuxfoundation.org/certification/linux-foundation-certified-sysadmin-lfcs

GIAC Certified UNIX Security Administrator (GCUX):

GIAC Certifications:

https://www.giac.org/certifications

PowerShell : Check the Block Size of a Drive

Command:

Get-CimInstance -ClassName Win32_Volume | Select-Object Name, FileSystem, Label, Size, BlockSize | Sort-Object Name | Format-Table -AutoSize

Sample Output:

Needed to format a 1TB Memory Card. 64 KB seemed to be a good Block Size for general use. Windows by default did not give an option to select a smaller block size. Powershell commands could be used or 3rd party software could be used.

Java Creational Design Patterns

5 types of creational design patterns:

  1. Factory Design Patterns: Purpose: Create Objects, Keep Object Creation Centralized
  2. Abstract Factory Design Patterns
  3. Singleton Design Pattern: Limit instantiation  of a clas to only one instance
  4. Prototype Design Patterns: Object creation based on Prototype Object Instance; Simpler Object Creation than Factory.
  5. Builder Design Patterns: Object Creation, Complex Object Creation, Keep Complex Object Creation separate from it’s representation.

For Factory: Simple Factory, Factory Method, Abstract Factory

Factory Method: Construction separated from implementation. Objects can be created without defining the exact class of object to be created.

Abstract Factory: One layer above Factory Method. Super Factory. Creates other Factories to create objects.

Builder Design Patterns: Helps to create complex objects. Uses construction methods/processes that can be used to create different representations of Objects. Step by step construction.

Ref: https://www.geeksforgeeks.org/creational-design-pattern/

Builder Pattern: Ref: Wikipedia

Factory Method Design Pattern: (Wikipedia)

Abstract Factory Design Pattern: (Wikipedia)