Threat To Validity for Your Data Analytics Projects

• Internal

• External

• Construct

• Statistical Conclusion

• Internal: Informative variable missing. Bring data from other sources

• External: The Fixation variable makes the result perfect. The model may not generalize

• Construct: Class imbalance affects the outcome badly

• Statistical Conclusion: Based on the statistical measure used, the conclusion can be incorrect.


Data Mining: Association: Support, Confidence, and Lift

Data Analytics, Machine Learning, Data Science

McNemar’s Test

Chi-Square

McNemar’s  Test

Chi-square: “A chi-square test is used to help determine if observed results are in line with expected results, and to rule out that observations are due to chance.” Coinflip as an example [1]

References:
1. https://www.investopedia.com/terms/c/chi-square-statistic.asp

Data Analytics, Machine Learning, Data Science

Statistics for Data Analytics and Machine Learning Projects

Null Hypothesis

[2]

Paired t-test

Unpaired t-test

•Pearson Correlation

One Way: Analysis of variance

Spearman Correlation

Spearman

•Kendal Tau Coef

Wilcoxon Sum test

Basic EDA

•Mcnaimer’s test

•Friedman test

•Kruskal-Wallis Test

Two Way Analysis of variance

•K-Fold Cross Validation paired t-test

•Wilcoxon Signed Rank Test

Data Analytics, Machine Learning, Data Science

Make Sense of your Data: For Data Analytics Project

Hypothesis-based versus data-driven analysis

“Only those data analysts who are given time to explore and analyze data thoughtfully and thoroughly are consistently successful.”

Data Identification and Prioritization

Use Augmented data besides Data Pipeline

Analytics Sandbox


Characterizing the Data—Exploring a Single Variable

Data: Descriptive analysis options

Find: Distribution of quantitative variables

Reference: [1]. Gregory S. Nelson. The Analytics Lifecycle Toolkit: A Practical Guide for an Effective  Analytics Capability,  John Wiley & Sons © 2018

Data Analytics, Machine Learning

Data Analytics, Machine Learning, Data Science

Factors/Variables to Consider For Experimental Design for Data Analytics Projects

Design of experiments fishbone

REF: [1]. Gregory S. Nelson. The Analytics Lifecycle Toolkit: A Practical Guide for an Effective  Analytics Capability,  John Wiley & Sons © 2018 . Chapter 6 – Problem Framing

Data Analytics, Machine Learning

Data Analytics, Machine Learning, Data Science

Data Analytics Project: Problem Framing and Project Lifecycle

REF: Internet and

Gregory S. Nelson. The Analytics Lifecycle Toolkit: A Practical Guide fo an Effective  Analytics Capability,  John Wiley & Sons © 2018 . Chapter 6 – Problem Framing

Data Analytics, Machine Learning

Data Analytics, Machine Learning, Data Science

Model Selection

• Optimizations/Machine Learning/Data Mining/Deep Learning/Reinforcement Learning/Graph Mining/NLP/Genetic Algorithms

• Regression

• Linear

• Non-Linear

• Classifications

• Logistics Regression

• Sigmoid : Binary

• Softmax: Multi-Class

• Bayes Classifier

• SVM

• Bayesian: Regression/Classification

• Clustering

• K-NN

• KNN+

• Kmeans, Hierarchical, Density

•Machine Learning/Data Mining/Deep Learning/Reinforcement Learning/Graph Mining/NLP

•Time Series Analysis

•Decision (Regression, Classification) Trees

•Univariate

•Multivariate

•Random Forest

•Reinforcement Learning

•Q-Learning

•Monte Carlo

•Deep Learning (Know variations, find a fit)

•MLP

•LSTM

•RNN

•Ensemble Methods

•Multiple Learners Together

Ref: Internet, Demir Slides

Data Analytics, Machine Learning, Data Science

Model Selection for your Project

Potential Models

• Statistical Models

• Parametric and Non-Parametric

• Mathematical Model (Optimization)

• Machine Learning

• Data Mining

• Deep Learning

• Reinforcement Learning

• Graph Mining

• NLP

• Optimization

• Genetic Algorithm

•Association

•Basket Association

•Apriori Algorithm

•Supervised

•Classification

•Regression

•Unsupervised

•Clustering/Customer Segmentation

•Reinforcement

•Learn a policy (interactively)

•Game Playing

•Robot in a Maze

•Genetic

•Optimization

Data Analytics, Machine Learning, Data Science

Possible Data Analytics Project Goals

• Examine relations

• Test Hypothesis

• Validate

• Find groups/classes/rules

• Learn a policy

• Maximize Reward interactively

• Predict (Class or Value)

• Forecast (numeric, sales)

• Compare

• Classify

• Cluster

Data Analytics, Machine Learning, Data Science

Experimental Design Examples (Data Analytics Projects)

Data Analytics, Machine Learning, Data Science