





Data Analytics, Machine Learning, Data Science
May 21






Data Analytics, Machine Learning, Data Science
May 21
Regression Projects
• R Square, Goodness of fit
• RMSE
Classification Projects
• Confusion Matrix
• ROC
• Accuracy, Recall, Precision
RL – Reinforcement
• Reward – Cumulative
Data Analytics, Machine Learning, Data Science
May 21
May 21
Initial Analysis of Text Data
• Stop word filter
• Lemma
• POS
• Vocabulary Analysis
Image Data: Initial Analysis
• Fix image size, ratios
• Image Scaling
• Transform to Gray
• Standardize
Data Analytics, Machine Learning, Data Science
May 21
Data
• Dataset Characteristics
•Large Scale, Real, Representative, Relevant Features, balanced classes, unit relevant
• Adapting data/dataset for the project
•Clean, normalize/standardize, bring more data, and bring more data of the missing type
• Data Suitability for the project
• Check for R Square Measure
• Check for Bias, Variance,
• Do Exploratory Analysis
• Initial and Exploratory Analysis
Data Analytics, Machine Learning, Data Science
May 21
To have a thorough understanding of the data.
Two Types:
• Initial Analysis
• Exploratory Analysis
Initial Analysis:
Univariate Analysis
• Deciding/Determining the dependent (target) variable
• Assigning the correct data types, appropriate column names
• Address: Inconsistencies, missing values, outliers
• Categorical variables with too many levels (address the issue)
• (understand) Distributions of the variables (is it a right fit for the project)
• Imbalance in the dependent variable
• Time variables
• Univariate visualizations
Bivariate Analysis
• Pairwise relations
• Pairwise visualizations
• Correlation analysis
Multivariate Analysis
Exploratory Analysis
• Subsetting the data
• Clustering
Others
• Decision rules, association rules, n-grams
• Time series analysis
Data Analytics, Machine Learning, Data Science
May 20
Inductive and Deductive Methods for Data Analytics Projects
Deductive: Top-down approach. Take existing theories and apply to data
Inductive: Bottom-up approach. Observe data and derive a hypothesis.
“
Examples in Data Analytics:
“
Ref: Internet/Google AI
In a data analytics project, you may use both of the approaches. Initially, you may use an inductive approach to understand and explore data. Then you use the deductive method to test a specific hypothesis.
Data Analytics, Machine Learning, Data Science
May 18
•MySQL Data Manipulation:
•https://www.databasejournal.com/mysql/mysql-data-manipulation-and-query-statements/
•https://www.w3schools.com/sql/
•https://www.tutorialspoint.com/sql/index.htm
•Workbench: https://www.tutorialspoint.com/create-a-new-database-with-mysql-workbench
•SQL Server Data Manipulation
•https://www.tutorialspoint.com/ms_sql_server/index.htm
•Management Studio:
•https://www.tutorialspoint.com/ms_sql_server/ms_sql_server_management_studio.htm
•Power BI Data Manipulation
•Data Manipulation in Python
Data Analytics, Machine Learning, Data Science
May 18
Reporting and Analysis
•Examples
•Results section: Page 51: STOCK MARKET PREDICTION USING ENSEMBLE OF GRAPH
THEORY, MACHINE LEARNING AND DEEP LEARNING MODELS
•https://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=1692&context=etd_projects

•Check Results and Discussion sections
•https://arxiv.org/ftp/arxiv/papers/2203/2203.06848.pdf
•A Comparative Study on Forecasting of Retail Sales

•https://arxiv.org/pdf/2303.11633.pdf
•Learning Context-Aware Classifier for Semantic Segmentation
•Check results section; also Discussion Section: SPEECH INTELLIGIBILITY CLASSIFIERS FROM 550K DISORDERED SPEECH SAMPLES
•https://arxiv.org/pdf/2303.07533.pdf

•You can notice: results reported under different criteria, use of tables and figures.
•Notice/read the descriptions
Data Analytics, Machine Learning, Data Science
ROC/AUC: Classification: ROC and AUC
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
F1 Score in Machine Learning: https://www.geeksforgeeks.org/f1-score-in-machine-learning/

Ref: https://www.geeksforgeeks.org/f1-score-in-machine-learning/
“This formula ensures that both precision and recall must be high for the F1 score to be high. If either one drops significantly, the F1 score will also drop.”
LEAF and CNN:
LEAF: “Leaf: A learnable frontend for audio classification,” ICLR, 2021
ARIMA: Introducing ARIMA models
https://www.ibm.com/think/topics/arima-model
Autoregressive Integrated Moving Average (ARIMA) Prediction Model
https://www.investopedia.com/terms/a/autoregressive-integrated-moving-average-arima.asp
“What Is an Autoregressive Integrated Moving Average (ARIMA)?
An autoregressive integrated moving average, or ARIMA, is a statistical analysis model that uses time series data to either better understand the data set or to predict future trends.
A statistical model is autoregressive if it predicts future values based on past values. For example, an ARIMA model might seek to predict a stock’s future prices based on its past performance or forecast a company’s earnings based on past periods.” : Ref: Investopedia
GCN: Graph Convolutional Networks (GCNs): Architectural Insights and Applications
“GCNs are tailored to work with non-Euclidean data, making them suitable for a wide range of applications including social networks, molecular structures, and recommendation systems.“
Facebook Prophet:
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
https://facebook.github.io/prophet
Single community based linear models:
Google AI Overview:
“Single community-based linear models refer to statistical models where a single linear equation is used to predict a response variable based on the characteristics of a single community or group. These models assume a linear relationship between the predictor variables and the outcome within that specific community”
“The term “Multiple Community-Based Linear Models” likely refers to a modeling framework where separate linear models are fitted for different communities (e.g., neighborhoods, schools, cities, regions), rather than combining all data into a single model.” Reference: ChatGPT Also, this may be reference: https://www.stats.ox.ac.uk/~snijders/mlbook.htm
May 17
What is Analysis? [1]
• “A comprehensive, data-driven strategy for problem solving”
Analytics
• “Analytics uses logic, inductive and deductive reasoning, critical thinking, and quantitative methods along with data to examine phenomena and determine its essential features”
• “any solution that supports the identification of meaningful patterns and relationships among data.”
CONCEPTS [1]

Analytics Methods [1]:

[1] Ref: A Book: The Analytics Lifecycle Toolkit A Practical Guide for an Effective Analytics Capability, Wiley
Data Analytics, Machine Learning, Data Science