Experimental Design Examples (Data Analytics Projects)

Data Analytics, Machine Learning, Data Science

Evaluating Your Data Analytics Project Outcome

Regression Projects

• R Square, Goodness of fit

• RMSE

Classification Projects

• Confusion Matrix

• ROC

• Accuracy, Recall, Precision

RL – Reinforcement

• Reward – Cumulative

Data Analytics, Machine Learning, Data Science

Dimensionality Reduction

Some Approaches

•Feature Selection

Feature Extraction

•PCA

SVD

LDA

Data Analytics, Machine Learning, Data Science

Initial Analysis of Text and Image Data (Data Analytics and ML Projects)

Initial Analysis of Text Data

• Stop word filter

• Lemma

• POS

• Vocabulary Analysis

Image Data: Initial Analysis

• Fix image size, ratios

• Image Scaling

• Transform to Gray

• Standardize

Data Analytics, Machine Learning, Data Science

Data Requirements for Data Analytics Projects

Data

• Dataset Characteristics

•Large Scale, Real, Representative, Relevant Features, balanced classes, unit relevant

• Adapting data/dataset for the project

•Clean, normalize/standardize, bring more data, and bring more data of the missing type

• Data Suitability for the project

• Check for R Square Measure

• Check for Bias, Variance,

• Do Exploratory Analysis

• Initial and Exploratory Analysis

Data Analytics, Machine Learning, Data Science

Initial and Exploratory Analysis for Data Analytics Projects

To have a thorough understanding of the data.

Two Types:

• Initial Analysis

• Exploratory Analysis

Initial Analysis:

  • Univariate
  • Bi-Variate
  • Multi-Variate

Univariate Analysis

• Deciding/Determining the dependent (target) variable

• Assigning the correct data types, appropriate column names

• Address: Inconsistencies, missing values, outliers

• Categorical variables with too many levels (address the issue)

• (understand) Distributions of the variables (is it a right fit for the project)

• Imbalance in the dependent variable

• Time variables

• Univariate visualizations

A detailed data dictionary

Low variance filter

Bivariate Analysis

• Pairwise relations

• Pairwise visualizations

• Correlation analysis

Multivariate Analysis

Multivariate relations

Statistical tools

Exploratory Analysis

Normalizing

• Subsetting the data

• Clustering

Others

• Decision rules, association rules, n-grams

• Time series analysis

Data Analytics, Machine Learning, Data Science

Inductive and Deductive Methods for Data Analytics Projects

Inductive and Deductive Methods for Data Analytics Projects

Deductive: Top-down approach. Take existing theories and apply to data

Inductive: Bottom-up approach. Observe data and derive a hypothesis.

Examples in Data Analytics:

  • Inductive: Analyzing customer purchase data to identify recurring patterns in buying habits, which can lead to new marketing strategies or product recommendations.
  • Deductive: Testing a marketing hypothesis about the effectiveness of a new ad campaign by comparing its performance against a control group. 

Ref: Internet/Google AI

In a data analytics project, you may use both of the approaches. Initially, you may use an inductive approach to understand and explore data. Then you use the deductive method to test a specific hypothesis.

Data Analytics, Machine Learning, Data Science

Data Manipulation for ML and Data Analytics Projects.

•MySQL Data Manipulation:

https://www.databasejournal.com/mysql/mysql-data-manipulation-and-query-statements/

https://www.w3schools.com/sql/

https://www.tutorialspoint.com/sql/index.htm

•Workbench: https://www.tutorialspoint.com/create-a-new-database-with-mysql-workbench

•SQL Server Data Manipulation

https://www.tutorialspoint.com/ms_sql_server/index.htm

•Management Studio:

https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_management_studio.htm

•Power BI Data Manipulation

https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-tutorial-importing-and-analyzing-data-from-a-web-page

Data Manipulation in Python

https://www.analyticsvidhya.com/blog/2021/06/data-manipulation-using-pandas-essential-functionalities-of-pandas-you-need-to-know/

Data Analytics, Machine Learning, Data Science

How to Report (or Present) the outcome of your Analytics/ML Project

Reporting and Analysis

Examples

•Results section: Page 51: STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH
THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS

https://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=1692&context=etd_projects

•Check Results and Discussion sections

https://arxiv.org/ftp/arxiv/papers/2203/2203.06848.pdf

A Comparative Study on Forecasting of Retail Sales

May be complicated: Learning Context-Aware Classifier for Semantic Segmentation

https://arxiv.org/pdf/2303.11633.pdf

•Learning Context-Aware Classifier for Semantic Segmentation

•Check results section; also Discussion Section: SPEECH INTELLIGIBILITY CLASSIFIERS FROM 550K DISORDERED SPEECH SAMPLES

https://arxiv.org/pdf/2303.07533.pdf

•You can notice: results reported under different criteria, use of tables and figures.

•Notice/read the descriptions

Data Analytics, Machine Learning, Data Science

ROC/AUC: Classification: ROC and AUC

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

F1 Score in Machine Learning: https://www.geeksforgeeks.org/f1-score-in-machine-learning/

Ref: https://www.geeksforgeeks.org/f1-score-in-machine-learning/

“This formula ensures that both precision and recall must be high for the F1 score to be high. If either one drops significantly, the F1 score will also drop.”

LEAF and CNN:

LEAF: “Leaf: A learnable frontend for audio classification,” ICLR, 2021

ARIMA: Introducing ARIMA models

https://www.ibm.com/think/topics/arima-model

Autoregressive Integrated Moving Average (ARIMA) Prediction Model

https://www.investopedia.com/terms/a/autoregressive-integrated-moving-average-arima.asp

“What Is an Autoregressive Integrated Moving Average (ARIMA)?

An autoregressive integrated moving average, or ARIMA, is a statistical analysis model that uses time series data to either better understand the data set or to predict future trends. 

A statistical model is autoregressive if it predicts future values based on past values. For example, an ARIMA model might seek to predict a stock’s future prices based on its past performance or forecast a company’s earnings based on past periods.” : Ref: Investopedia

GCN: Graph Convolutional Networks (GCNs): Architectural Insights and Applications

GCNs are tailored to work with non-Euclidean data, making them suitable for a wide range of applications including social networks, molecular structures, and recommendation systems.

https://www.geeksforgeeks.org/deep-learning/graph-convolutional-networks-gcns-architectural-insights-and-applications

Facebook Prophet:

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.

https://facebook.github.io/prophet

Single community based linear models:

Google AI Overview:

“Single community-based linear models refer to statistical models where a single linear equation is used to predict a response variable based on the characteristics of a single community or group. These models assume a linear relationship between the predictor variables and the outcome within that specific community”

Multiple Community-Based Linear Models

“The term “Multiple Community-Based Linear Models” likely refers to a modeling framework where separate linear models are fitted for different communities (e.g., neighborhoods, schools, cities, regions), rather than combining all data into a single model.” Reference: ChatGPT Also, this may be reference: https://www.stats.ox.ac.uk/~snijders/mlbook.htm

Data Analytics Concepts and Methods

What is Analysis? [1]

• “A comprehensive, data-driven strategy for problem solving”

Analytics

• “Analytics uses logic, inductive and deductive reasoning, critical thinking, and quantitative methods along with data to examine phenomena and determine its essential features”

• “any solution that supports the identification of meaningful patterns and relationships among data.”

CONCEPTS [1]

Analytics Methods [1]:

[1] Ref: A Book: The Analytics Lifecycle Toolkit A Practical Guide for an Effective Analytics Capability, Wiley

Data Analytics, Machine Learning, Data Science