Part -1 : Bootstrapping, Bagging, Random Forests

What is a Classification Tree:

www.solver.com › classification-tree

Classification Tree | solver

A Classification tree labels, records, and assigns variables to discrete classes. A Classification tree can also provide a measure of confidence that the classification is correct. A Classification tree is built through a process known as binary recursive partitioning.

Pros and Cons of Classification Trees

Advantages:

  1. Requires less effort for data preparation
  2. normalization not required
  3. scaling of data not required
  4. Missing values in the data does not affect tree building that much
  5. Easy to explain

Disadvantage:

  1. Small data change causes a large change in the decision tree
  2. sometimes calculation can become far more complex
  3. higher time to train the model
  4. relatively expensive

https://medium.com/@dhiraj8899/top-5-advantages-and-disadvantages-of-decision-tree-algorithm-428ebd199d9a

What is Ensemble Learning?

"In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone." Wikipedia

blog.statsbot.co › ensemble-learning-d1dcd548e936

Ensemble Learning to Improve Machine Learning Results

"Aug 22, 2017 - Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking)."

What is Bootstraping?

"In statistics, bootstrapping is any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy (defined in terms of bias, variance, confidence intervals, prediction error or some other such measure) to sample estimates."

en.wikipedia.org › wiki › Bootstrapping_(statistics)

Bootstrapping (statistics) - Wikipedia

Bagging Steps:
"Suppose there are N observations and M features in training data set. A sample from training data set is taken randomly with replacement. A subset of M features are selected randomly and whichever feature gives the best split is used to split the node iteratively. The tree is grown to the largest.Feb 19, 2018"

analyticsindiamag.com › primer-ensemble-learning-bagging-boosting

Bagging and Boosting - Analytics India Magazine

***. ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Our free or paid training events: https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

If you want to contribute to occasional free and/or low cost online/offline training or charitable/non-profit work in the education/health/social service sector, you can financially contribute to: safoundation at salearningschool.com using Paypal or Credit Card (on http://sitestree.com/training/enrol/index.php?id=114 ).