Data Analytics Project: Problem Framing and Project Lifecycle

REF: Internet and

Gregory S. Nelson. The Analytics Lifecycle Toolkit: A Practical Guide fo an Effective  Analytics Capability,  John Wiley & Sons © 2018 . Chapter 6 – Problem Framing

Data Analytics, Machine Learning

Data Analytics, Machine Learning, Data Science

Model Selection

• Optimizations/Machine Learning/Data Mining/Deep Learning/Reinforcement Learning/Graph Mining/NLP/Genetic Algorithms

• Regression

• Linear

• Non-Linear

• Classifications

• Logistics Regression

• Sigmoid : Binary

• Softmax: Multi-Class

• Bayes Classifier

• SVM

• Bayesian: Regression/Classification

• Clustering

• K-NN

• KNN+

• Kmeans, Hierarchical, Density

•Machine Learning/Data Mining/Deep Learning/Reinforcement Learning/Graph Mining/NLP

•Time Series Analysis

•Decision (Regression, Classification) Trees

•Univariate

•Multivariate

•Random Forest

•Reinforcement Learning

•Q-Learning

•Monte Carlo

•Deep Learning (Know variations, find a fit)

•MLP

•LSTM

•RNN

•Ensemble Methods

•Multiple Learners Together

Ref: Internet, Demir Slides

Data Analytics, Machine Learning, Data Science

Model Selection for your Project

Potential Models

• Statistical Models

• Parametric and Non-Parametric

• Mathematical Model (Optimization)

• Machine Learning

• Data Mining

• Deep Learning

• Reinforcement Learning

• Graph Mining

• NLP

• Optimization

• Genetic Algorithm

•Association

•Basket Association

•Apriori Algorithm

•Supervised

•Classification

•Regression

•Unsupervised

•Clustering/Customer Segmentation

•Reinforcement

•Learn a policy (interactively)

•Game Playing

•Robot in a Maze

•Genetic

•Optimization

Data Analytics, Machine Learning, Data Science

Possible Data Analytics Project Goals

• Examine relations

• Test Hypothesis

• Validate

• Find groups/classes/rules

• Learn a policy

• Maximize Reward interactively

• Predict (Class or Value)

• Forecast (numeric, sales)

• Compare

• Classify

• Cluster

Data Analytics, Machine Learning, Data Science

Experimental Design Examples (Data Analytics Projects)

Data Analytics, Machine Learning, Data Science

Evaluating Your Data Analytics Project Outcome

Regression Projects

• R Square, Goodness of fit

• RMSE

Classification Projects

• Confusion Matrix

• ROC

• Accuracy, Recall, Precision

RL – Reinforcement

• Reward – Cumulative

Data Analytics, Machine Learning, Data Science

Dimensionality Reduction

Some Approaches

•Feature Selection

Feature Extraction

•PCA

SVD

LDA

Data Analytics, Machine Learning, Data Science

Initial Analysis of Text and Image Data (Data Analytics and ML Projects)

Initial Analysis of Text Data

• Stop word filter

• Lemma

• POS

• Vocabulary Analysis

Image Data: Initial Analysis

• Fix image size, ratios

• Image Scaling

• Transform to Gray

• Standardize

Data Analytics, Machine Learning, Data Science

Data Requirements for Data Analytics Projects

Data

• Dataset Characteristics

•Large Scale, Real, Representative, Relevant Features, balanced classes, unit relevant

• Adapting data/dataset for the project

•Clean, normalize/standardize, bring more data, and bring more data of the missing type

• Data Suitability for the project

• Check for R Square Measure

• Check for Bias, Variance,

• Do Exploratory Analysis

• Initial and Exploratory Analysis

Data Analytics, Machine Learning, Data Science

Initial and Exploratory Analysis for Data Analytics Projects

To have a thorough understanding of the data.

Two Types:

• Initial Analysis

• Exploratory Analysis

Initial Analysis:

  • Univariate
  • Bi-Variate
  • Multi-Variate

Univariate Analysis

• Deciding/Determining the dependent (target) variable

• Assigning the correct data types, appropriate column names

• Address: Inconsistencies, missing values, outliers

• Categorical variables with too many levels (address the issue)

• (understand) Distributions of the variables (is it a right fit for the project)

• Imbalance in the dependent variable

• Time variables

• Univariate visualizations

A detailed data dictionary

Low variance filter

Bivariate Analysis

• Pairwise relations

• Pairwise visualizations

• Correlation analysis

Multivariate Analysis

Multivariate relations

Statistical tools

Exploratory Analysis

Normalizing

• Subsetting the data

• Clustering

Others

• Decision rules, association rules, n-grams

• Time series analysis

Data Analytics, Machine Learning, Data Science