•Internal •External •Construct •Statistical Conclusion •Internal: Informative variable missing. Bring data from other sources •External: Fixation variable make the result perfect. Model may not generalize •Construct: Class imbalance affects outcome badly •Statistical Conclusion: Based on the statistical measure used, the conclusion can be incorrect. •Data Mining: Association: Support, Confidence, and Lift Internal Validity Is your …
Category: Root
May 22
Threat To Validity for Your Data Analytics Projects
• Internal • External • Construct • Statistical Conclusion • Internal: Informative variable missing. Bring data from other sources • External: The Fixation variable makes the result perfect. The model may not generalize • Construct: Class imbalance affects the outcome badly • Statistical Conclusion: Based on the statistical measure used, the conclusion can be incorrect. …
May 22
McNemar’s Test
Chi-Square McNemar’s Test Chi-square: “A chi-square test is used to help determine if observed results are in line with expected results, and to rule out that observations are due to chance.” Coinflip as an example [1] References:1. https://www.investopedia.com/terms/c/chi-square-statistic.asp Data Analytics, Machine Learning, Data Science
May 22
Statistics for Data Analytics and Machine Learning Projects
•Null Hypothesis •[2] •Paired t-test •Unpaired t-test •Pearson Correlation •One Way: Analysis of variance •Spearman Correlation •Spearman •Kendal Tau Coef •Wilcoxon Sum test •Basic EDA •Mcnaimer’s test •Friedman test •Kruskal-Wallis Test •Two Way Analysis of variance •K-Fold Cross Validation paired t-test •Wilcoxon Signed Rank Test Data Analytics, Machine Learning, Data Science
May 22
Make Sense of your Data: For Data Analytics Project
Hypothesis-based versus data-driven analysis “Only those data analysts who are given time to explore and analyze data thoughtfully and thoroughly are consistently successful.” Data Identification and Prioritization Use Augmented data besides Data Pipeline Analytics Sandbox Characterizing the Data—Exploring a Single Variable Data: Descriptive analysis options Find: Distribution of quantitative variables Reference: [1]. Gregory S. Nelson. …
May 21
Model Selection
• Optimizations/Machine Learning/Data Mining/Deep Learning/Reinforcement Learning/Graph Mining/NLP/Genetic Algorithms • Regression • Linear • Non-Linear • Classifications • Logistics Regression • Sigmoid : Binary • Softmax: Multi-Class • Bayes Classifier • SVM • Bayesian: Regression/Classification • Clustering • K-NN • KNN+ • Kmeans, Hierarchical, Density •Machine Learning/Data Mining/Deep Learning/Reinforcement Learning/Graph Mining/NLP •Time Series Analysis •Decision (Regression, …
May 21
Model Selection for your Project
Potential Models • Statistical Models • Parametric and Non-Parametric • Mathematical Model (Optimization) • Machine Learning • Data Mining • Deep Learning • Reinforcement Learning • Graph Mining • NLP • Optimization • Genetic Algorithm •Association •Basket Association •Apriori Algorithm •Supervised •Classification •Regression •Unsupervised •Clustering/Customer Segmentation •Reinforcement •Learn a policy (interactively) •Game Playing •Robot in …
May 21
Possible Data Analytics Project Goals
• Examine relations • Test Hypothesis • Validate • Find groups/classes/rules • Learn a policy • Maximize Reward interactively • Predict (Class or Value) • Forecast (numeric, sales) • Compare • Classify • Cluster Data Analytics, Machine Learning, Data Science