Misc. : Classifier Performance and Model Selection

Cross Validation:



Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation.May 23, 2018

machinelearningmastery.com › k-fold-cross-validation

A Gentle Introduction to k-fold Cross-Validation


Model selection - Wikipedia

Model selection is the task of selecting a statistical model from a set of candidate models, given data. In the simplest cases, a pre-existing set of data is considered. However, the task can also involve the design of experiments such that the data collected is well-suited to the problem of model selection. Given candidate models of similar predictive or explanatory power, the simplest model is most likely to be the best choice (Occam's razor).

Model Selection

Machine Learning Model Evaluation

"Holdout Cross-Validation

  • Classification Accuracy
  • Confusion matrix
  • Logarithmic Loss
  • Area under curve (AUC)
  • F-Measure

Regression Metrics

Root Mean Squared Error and Mean Absolute Error.


Model Assessment and Selection:

Training Error


"Training error is the error that you get when you run the trained model back on the training data. Remember that this data has already been used to train the model and this necessarily doesn't mean that the model once trained will accurately perform when applied back on the training data itself."

www.quora.com › What-is-a-training-and-test-error

What is a training and test error? - Quora

"Test error is the error when you get when you run the trained model on a set of data that it has previously never been exposed to. This data is often used to measure the accuracy of the model before it is shipped to production.

www.quora.com › What-is-a-training-and-test-error

What is a training and test error? - Quora"



Curse of dimensionality - Wikipedia

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.‎Domains · ‎Combinatorics · ‎Distance functions · ‎Nearest neighbor search


Bias–variance tradeoff - Wikipedia

The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set: The bias error is an error from erroneous assumptions in the learning algorithm."

"Bias Variance Dilemma"


What is bias and variance?

Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance.Mar 18, 2016

machinelearningmastery.com › gentle-introduction-to-the-bias-variance-...

Gentle Introduction to the Bias-Variance Trade-Off in Machine ..."

ROC Curve




A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate against the false positive rate at various threshold settings."
Ref: https://en.wikipedia.org/wiki/Receiver_operating_characteristic

What is MLE
"In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.
en.wikipedia.org › wiki › Maximum_likelihood_estimation

Maximum likelihood estimation - Wikipedia


MLE conceptually


Important Basic Concepts: Statistics for Big Data


***. ***. ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

If you want to contribute to occasional free and/or low cost online/offline training or charitable/non-profit work in the education/health/social service sector, you can financially contribute to: safoundation at salearningschool.com using Paypal or Credit Card (on http://sitestree.com/training/enrol/index.php?id=114 ).