Category: Analytics and Machine Learning Project Development

Pearson vs Spearman Correlation

Pearson: Generally Linear relation Assumes Linearity. Correlation between height and weight Sensitive to outliers. Spearman: increasing or decreasing relationship, but may not be linear. Monotonic. Higher marks lead to lower ranks, but generally not linearly. Does not assume Linearity. It can be good for categorical variables and relationships. Less Sensitive to outliers. Ref: Internet sources

Reporting (Results and Discussion) for your Data Analytics Projects

Evaluation, Results, Analysis, Reporting Evaluation: What and How •Evaluate: the accuracy and generality of the model • (we did in model evaluation, threat to validity) •Now Evaluate: if model meets the business objectives •Seek if there is some business reasons •why this model is deficient •Evaluation: Take this model and application on real world case …

Continue reading

Examples: Experiment Design

Experiment 1: Forecast the nations that will have the most suicides,  Data: Output variables: Method/Algorithm for this experiment Experiment 2: Find out the association of GDP and population size on suicide rates, Data: Output variables: Method/Algorithm for this experiment Experiment design 3: Predict which age groups are most prone to commit suicide Data: Output variables: Method/Algorithm …

Continue reading

Tools and Tutorials for Data Manipulation

Join Data from Multiple Sources •Power BI •Python •SQL •Databases and Data Warehouse •https://durhamcollege.desire2learn.com/d2l/le/content/467097/viewContent/6376898/View •Data Modeling and SQL •https://durhamcollege.desire2learn.com/d2l/le/content/467097/viewContent/6376900/View •Microsoft Power BI •https://durhamcollege.desire2learn.com/d2l/le/content/467097/viewContent/6377023/View Tutorials and Examples •MySQL Data Manipulation: •https://www.databasejournal.com/mysql/mysql-data-manipulation-and-query-statements/ •https://www.w3schools.com/sql/ •https://www.tutorialspoint.com/sql/index.htm •Workbench: https://www.tutorialspoint.com/create-a-new-database-with-mysql-workbench •SQL Server Data Manipulation •https://www.tutorialspoint.com/ms_sql_server/index.htm •Management Studio: •https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_management_studio.htm •Power BI Data Manipulation •https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-tutorial-importing-and-analyzing-data-from-a-web-page •Data Manipulation in Python •https://www.analyticsvidhya.com/blog/2021/06/data-manipulation-using-pandas-essential-functionalities-of-pandas-you-need-to-know/ Data Analytics, Machine …

Continue reading

Threat To Validity for Your Data Analytics Projects

•Internal •External •Construct •Statistical Conclusion •Internal: Informative variable missing. Bring data from other sources •External: Fixation variable make the result perfect. Model may not generalize •Construct: Class imbalance affects outcome badly •Statistical Conclusion: Based on the statistical measure used, the conclusion can be incorrect. •Data Mining: Association: Support, Confidence, and Lift Internal Validity Is your …

Continue reading

Threat To Validity for Your Data Analytics Projects

• Internal • External • Construct • Statistical Conclusion • Internal: Informative variable missing. Bring data from other sources • External: The Fixation variable makes the result perfect. The model may not generalize • Construct: Class imbalance affects the outcome badly • Statistical Conclusion: Based on the statistical measure used, the conclusion can be incorrect. …

Continue reading

McNemar’s Test

Chi-Square McNemar’s  Test Chi-square: “A chi-square test is used to help determine if observed results are in line with expected results, and to rule out that observations are due to chance.” Coinflip as an example [1] References:1. https://www.investopedia.com/terms/c/chi-square-statistic.asp Data Analytics, Machine Learning, Data Science

Statistics for Data Analytics and Machine Learning Projects

•Null Hypothesis •[2] •Paired t-test •Unpaired t-test •Pearson Correlation •One Way: Analysis of variance •Spearman Correlation •Spearman •Kendal Tau Coef •Wilcoxon Sum test •Basic EDA •Mcnaimer’s test •Friedman test •Kruskal-Wallis Test •Two Way Analysis of variance •K-Fold Cross Validation paired t-test •Wilcoxon Signed Rank Test Data Analytics, Machine Learning, Data Science

Make Sense of your Data: For Data Analytics Project

Hypothesis-based versus data-driven analysis “Only those data analysts who are given time to explore and analyze data thoughtfully and thoroughly are consistently successful.” Data Identification and Prioritization Use Augmented data besides Data Pipeline Analytics Sandbox Characterizing the Data—Exploring a Single Variable Data: Descriptive analysis options Find: Distribution of quantitative variables Reference: [1]. Gregory S. Nelson. …

Continue reading

Factors/Variables to Consider For Experimental Design for Data Analytics Projects

Design of experiments fishbone REF: [1]. Gregory S. Nelson. The Analytics Lifecycle Toolkit: A Practical Guide for an Effective  Analytics Capability,  John Wiley & Sons © 2018 . Chapter 6 – Problem Framing Data Analytics, Machine Learning Data Analytics, Machine Learning, Data Science