Model Selection for your Project

Potential Models

• Statistical Models

• Parametric and Non-Parametric

• Mathematical Model (Optimization)

• Machine Learning

• Data Mining

• Deep Learning

• Reinforcement Learning

• Graph Mining

• NLP

• Optimization

• Genetic Algorithm

•Association

•Basket Association

•Apriori Algorithm

•Supervised

•Classification

•Regression

•Unsupervised

•Clustering/Customer Segmentation

•Reinforcement

•Learn a policy (interactively)

•Game Playing

•Robot in a Maze

•Genetic

•Optimization

Data Analytics, Machine Learning, Data Science

Possible Data Analytics Project Goals

• Examine relations

• Test Hypothesis

• Validate

• Find groups/classes/rules

• Learn a policy

• Maximize Reward interactively

• Predict (Class or Value)

• Forecast (numeric, sales)

• Compare

• Classify

• Cluster

Data Analytics, Machine Learning, Data Science

Experimental Design Examples (Data Analytics Projects)

Data Analytics, Machine Learning, Data Science

Evaluating Your Data Analytics Project Outcome

Regression Projects

• R Square, Goodness of fit

• RMSE

Classification Projects

• Confusion Matrix

• ROC

• Accuracy, Recall, Precision

RL – Reinforcement

• Reward – Cumulative

Data Analytics, Machine Learning, Data Science

Dimensionality Reduction

Some Approaches

•Feature Selection

Feature Extraction

•PCA

SVD

LDA

Data Analytics, Machine Learning, Data Science

Initial Analysis of Text and Image Data (Data Analytics and ML Projects)

Initial Analysis of Text Data

• Stop word filter

• Lemma

• POS

• Vocabulary Analysis

Image Data: Initial Analysis

• Fix image size, ratios

• Image Scaling

• Transform to Gray

• Standardize

Data Analytics, Machine Learning, Data Science

Data Requirements for Data Analytics Projects

Data

• Dataset Characteristics

•Large Scale, Real, Representative, Relevant Features, balanced classes, unit relevant

• Adapting data/dataset for the project

•Clean, normalize/standardize, bring more data, and bring more data of the missing type

• Data Suitability for the project

• Check for R Square Measure

• Check for Bias, Variance,

• Do Exploratory Analysis

• Initial and Exploratory Analysis

Data Analytics, Machine Learning, Data Science

Initial and Exploratory Analysis for Data Analytics Projects

To have a thorough understanding of the data.

Two Types:

• Initial Analysis

• Exploratory Analysis

Initial Analysis:

  • Univariate
  • Bi-Variate
  • Multi-Variate

Univariate Analysis

• Deciding/Determining the dependent (target) variable

• Assigning the correct data types, appropriate column names

• Address: Inconsistencies, missing values, outliers

• Categorical variables with too many levels (address the issue)

• (understand) Distributions of the variables (is it a right fit for the project)

• Imbalance in the dependent variable

• Time variables

• Univariate visualizations

A detailed data dictionary

Low variance filter

Bivariate Analysis

• Pairwise relations

• Pairwise visualizations

• Correlation analysis

Multivariate Analysis

Multivariate relations

Statistical tools

Exploratory Analysis

Normalizing

• Subsetting the data

• Clustering

Others

• Decision rules, association rules, n-grams

• Time series analysis

Data Analytics, Machine Learning, Data Science

Inductive and Deductive Methods for Data Analytics Projects

Inductive and Deductive Methods for Data Analytics Projects

Deductive: Top-down approach. Take existing theories and apply to data

Inductive: Bottom-up approach. Observe data and derive a hypothesis.

Examples in Data Analytics:

  • Inductive: Analyzing customer purchase data to identify recurring patterns in buying habits, which can lead to new marketing strategies or product recommendations.
  • Deductive: Testing a marketing hypothesis about the effectiveness of a new ad campaign by comparing its performance against a control group. 

Ref: Internet/Google AI

In a data analytics project, you may use both of the approaches. Initially, you may use an inductive approach to understand and explore data. Then you use the deductive method to test a specific hypothesis.

Data Analytics, Machine Learning, Data Science

Data Manipulation for ML and Data Analytics Projects.

•MySQL Data Manipulation:

https://www.databasejournal.com/mysql/mysql-data-manipulation-and-query-statements/

https://www.w3schools.com/sql/

https://www.tutorialspoint.com/sql/index.htm

•Workbench: https://www.tutorialspoint.com/create-a-new-database-with-mysql-workbench

•SQL Server Data Manipulation

https://www.tutorialspoint.com/ms_sql_server/index.htm

•Management Studio:

https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_management_studio.htm

•Power BI Data Manipulation

https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-tutorial-importing-and-analyzing-data-from-a-web-page

Data Manipulation in Python

https://www.analyticsvidhya.com/blog/2021/06/data-manipulation-using-pandas-essential-functionalities-of-pandas-you-need-to-know/

Data Analytics, Machine Learning, Data Science