KL Divergence in Picture and Examples
"Kullback–Leibler divergence is the difference between the Cross Entropy H for PQ and the true Entropy H for P."
[1]
"And this is what we use as a loss function while training Neural Networks. When we have an image classification problem, the training data and corresponding correct labels represent P, the true distribution. The NN predictions are our estimations Q."
Reference for the above (including image) : https://towardsdatascience.com/entropy-cross-entropy-kl-divergence-binary-cross-entropy-cb8f72e72e65
The above URL is a pretty great read.
****
Everything below is from the Internet including images and equations esp. from [1]
"
What's the KL Divergence?
The Kullback-Leibler divergence (hereafter written as KL divergence) is a measure of how a probability distribution differs from another probability distribution.
The KL divergence measures the distance from the approximate distribution QQ to the true distribution PP
."
KL Divergence from Q to P
[1]
not a distance metric, not symmetric
Can be written as:
[1]
First term is the is the cross entropy between
PP and Q. Second term is the entropy of P
Forward and Reverse KL
Forward: mean seeking behaviour. Where P (.) has High Probability, Q (.) will also have to have high probability.
Kind of will approximate around mean. P = the one with two peaks. Q kind of took mean.
[1]
Reverse KL: Mode Seeking Behaviour
Where Q (.) has High Probability, P (.) will also have to have high probability.
[1]
References:
[1] https://dibyaghosh.com/blog/probability/kldivergence.html
[2] https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8
*** ***
"What is KL divergence used for?
Very often in Probability and Statistics we'll replace observed data or a complex distributions with a simpler, approximating distribution. KL Divergence helps us to measure just how much information we lose when we choose an approximation.May 10, 2017
www.countbayesie.com › blog › kullback-leibler-divergence-explained
Kullback-Leibler Divergence Explained — Count Bayesie
"
***. ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada
*** . *** *** . *** . *** . ***
Sayed Ahmed
BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc
Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)
Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool
Our free or paid training events: https://www.facebook.com/justetcsocial
Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/