{"id":16755,"date":"2020-02-06T22:23:36","date_gmt":"2020-02-07T03:23:36","guid":{"rendered":"http:\/\/bangla.salearningschool.com\/recent-posts\/?p=16755"},"modified":"2020-02-08T09:41:19","modified_gmt":"2020-02-08T14:41:19","slug":"kl-divergence-entropy-cross-entropy-example-use-cases-equations-as-well","status":"publish","type":"post","link":"http:\/\/bangla.sitestree.com\/?p=16755","title":{"rendered":"KL Divergence: Entropy: Cross Entropy: Example Use Cases. Equations as well."},"content":{"rendered":"\n\n<p><strong>KL Divergence in Picture and Examples<\/strong><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-9.png\" rel=\"attachment wp-att-16760\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-16760\" title=\"image-9-png\" src=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-9.png?resize=621%2C327\" alt=\"\" width=\"621\" height=\"327\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-9.png?w=1368 1368w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-9.png?resize=300%2C158 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-9.png?resize=1024%2C539 1024w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-9.png?resize=768%2C404 768w\" sizes=\"auto, (max-width: 621px) 100vw, 621px\" \/><\/a><\/p>\n<p>&#8220;Kullback\u2013Leibler divergence is the difference between the Cross Entropy H for PQ and the true Entropy H for P.&#8221;<\/p>\n<div id=\"attachment_16761\" style=\"width: 401px\" class=\"wp-caption alignnone\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-16761\" class=\"wp-image-16761 \" title=\"image-10-png\" src=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-10-e1581046176643.png?resize=401%2C43\" alt=\"KL\" width=\"401\" height=\"43\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-10-e1581046176643.png?w=420 420w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-10-e1581046176643.png?resize=300%2C32 300w\" sizes=\"auto, (max-width: 401px) 100vw, 401px\" \/><p id=\"caption-attachment-16761\" class=\"wp-caption-text\">KL<\/p><\/div>\n<p>[1]<\/p>\n<p>&#8220;And this is what we use as a loss function while training Neural Networks. When we have an image classification problem, the training data and corresponding correct labels represent P, the true distribution. The NN predictions are our estimations Q.&#8221;<\/p>\n<p>Reference for the above (including image) : <a href=\"https:\/\/towardsdatascience.com\/entropy-cross-entropy-kl-divergence-binary-cross-entropy-cb8f72e72e65\">https:\/\/towardsdatascience.com\/entropy-cross-entropy-kl-divergence-binary-cross-entropy-cb8f72e72e65<\/a><br \/>The above URL is a pretty great read.<\/p>\n<p>****<br \/>Everything below is from the Internet including images and equations esp. from [1]<\/p>\n<p>&#8220;<\/p>\n<h2>What&#8217;s the KL Divergence?<\/h2>\n<p>The <em>Kullback-Leibler divergence<\/em> (hereafter written as KL divergence) is a measure of how a probability distribution differs from another probability distribution.<\/p>\n<p>The KL divergence measures the distance <strong>from<\/strong> the approximate distribution QQ <strong>to the<\/strong> true distribution PP<\/p>\n<p>.&#8221;<\/p>\n<p><strong>KL Divergence from Q to P<\/strong><\/p>\n<p><a href=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-5.png\" rel=\"attachment wp-att-16756\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-16756\" title=\"image-5-png\" src=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-5.png?resize=585%2C130\" alt=\"\" width=\"585\" height=\"130\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-5.png?w=405 405w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-5.png?resize=300%2C67 300w\" sizes=\"auto, (max-width: 585px) 100vw, 585px\" \/><\/a><\/p>\n<p>[1]<br \/><br \/><strong>not a distance metric, not symmetric<\/strong><\/p>\n<p><strong>Can be written as:<\/strong><\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-16757 \" title=\"image-6-png\" src=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-6-e1581046232784.png?resize=383%2C41\" alt=\"\" width=\"383\" height=\"41\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-6-e1581046232784.png?w=420 420w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-6-e1581046232784.png?resize=300%2C32 300w\" sizes=\"auto, (max-width: 383px) 100vw, 383px\" \/><\/p>\n<p>[1]<\/p>\n<p>First term is the is the <em>cross entropy<\/em> between<br \/>PP and Q. Second term is the <em>entropy<\/em> of P<\/p>\n<h2>Forward and Reverse KL<\/h2>\n<p>Forward: mean seeking behaviour. Where P (.) has High Probability, Q (.) will also have to have high probability.<\/p>\n<p>Kind of will approximate around mean. P = the one with two peaks. Q kind of took mean.<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-7.png\" rel=\"attachment wp-att-16758\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-16758\" title=\"image-7-png\" src=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-7.png?resize=557%2C209\" alt=\"\" width=\"557\" height=\"209\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-7.png?w=576 576w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-7.png?resize=300%2C113 300w\" sizes=\"auto, (max-width: 557px) 100vw, 557px\" \/><\/a><\/p>\n<p>[1]<\/p>\n<p><strong>Reverse KL: Mode Seeking Behaviour<\/strong><br \/>Where Q (.) has High Probability, P (.) will also have to have high probability.<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-8.png\" rel=\"attachment wp-att-16759\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone  wp-image-16759\" title=\"image-8-png\" src=\"https:\/\/i0.wp.com\/bangla.salearningschool.com\/wp-content\/uploads\/2020\/02\/image-8.png?resize=557%2C209\" alt=\"\" width=\"557\" height=\"209\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-8.png?w=576 576w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-8.png?resize=300%2C113 300w\" sizes=\"auto, (max-width: 557px) 100vw, 557px\" \/><\/a><\/p>\n<p>[1]<\/p>\n<p>References:<br \/>[1] <a href=\"https:\/\/dibyaghosh.com\/blog\/probability\/kldivergence.html\">https:\/\/dibyaghosh.com\/blog\/probability\/kldivergence.html<\/a><br \/>[2] <a href=\"https:\/\/towardsdatascience.com\/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8\">https:\/\/towardsdatascience.com\/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence-2b382ca2b2a8<\/a><\/p>\n<p>*** ***<\/p>\n<p>&#8220;What is KL divergence used for?<br \/>Very often in Probability and Statistics we&#8217;ll replace observed data or a complex distributions with a simpler, approximating distribution. <strong>KL Divergence<\/strong> helps us to measure just how much information we lose when we choose an approximation.May 10, 2017<\/p>\n<p><a href=\"https:\/\/www.countbayesie.com\/blog\/2017\/5\/9\/kullback-leibler-divergence-explained\">www.countbayesie.com \u203a blog \u203a kullback-leibler-divergence-explained<\/a><\/p>\n<p>\u00a0<\/p>\n<p>\u00a0<\/p>\n<h3><a href=\"https:\/\/www.countbayesie.com\/blog\/2017\/5\/9\/kullback-leibler-divergence-explained\">Kullback-Leibler Divergence Explained \u2014 Count Bayesie<\/a><\/h3>\n<p>&#8220;<\/p>\n<p><em><strong>***. ***. ***<\/strong><\/em><br \/><em><strong>Note: Older short-notes from this site are posted on Medium: <\/strong><\/em><a href=\"https:\/\/medium.com\/@SayedAhmedCanada\">https:\/\/medium.com\/@SayedAhmedCanada<\/a><\/p>\n<p>*** . *** *** . *** . *** . ***<br \/><br \/><em><strong>Sayed Ahmed<\/strong><br \/><\/em><br \/><em><strong>BSc. Eng. in Comp. Sc. &amp; Eng. (BUET)<\/strong><\/em><br \/><em><strong>MSc. in Comp. Sc. (U of Manitoba, Canada)<\/strong><\/em><br \/><em><strong>MSc. in Data Science and Analytics (Ryerson University, Canada)<\/strong><\/em><br \/><em><strong>Linkedin<\/strong>: <a href=\"https:\/\/ca.linkedin.com\/in\/sayedjustetc\">https:\/\/ca.linkedin.com\/in\/sayedjustetc<\/a><br \/><\/em><\/p>\n<p><em><strong>Blog<\/strong>: <a href=\"http:\/\/bangla.salearningschool.com\/\">http:\/\/Bangla.SaLearningSchool.com<\/a>, <a href=\"http:\/\/sitestree.com\">http:\/\/SitesTree.com<\/a><\/em><br \/><em><strong>Online and Offline Training<\/strong>: <a href=\"http:\/\/training.SitesTree.com\">http:\/\/Training.SitesTree.com<\/a> (Also, can be free and low cost sometimes)<\/em><\/p>\n<p><em>Facebook Group\/Form to discuss (Q &amp; A): <\/em><a href=\"https:\/\/www.facebook.com\/banglasalearningschool\">https:\/\/www.facebook.com\/banglasalearningschool<\/a><\/p>\n<p>Our free or paid training events: <a href=\"https:\/\/www.facebook.com\/justetcsocial\">https:\/\/www.facebook.com\/justetcsocial<\/a><\/p>\n<p><em>Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. <\/em><a href=\"http:\/\/sitestree.com\/training\/\">http:\/\/sitestree.com\/training\/<\/a><\/p>","protected":false},"excerpt":{"rendered":"<p>KL Divergence in Picture and Examples &#8220;Kullback\u2013Leibler divergence is the difference between the Cross Entropy H for PQ and the true Entropy H for P.&#8221; [1] &#8220;And this is what we use as a loss function while training Neural Networks. When we have an image classification problem, the training data and corresponding correct labels represent &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"http:\/\/bangla.sitestree.com\/?p=16755\">Continue reading<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1910,182],"tags":[],"class_list":["post-16755","post","type-post","status-publish","format-standard","hentry","category-ai-ml-ds-rl-dl-nn-nlp-data-mining-optimization","category---blog","item-wrap"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":71079,"url":"http:\/\/bangla.sitestree.com\/?p=71079","url_meta":{"origin":16755,"position":0},"title":"What is entropy in decision tree?","author":"Sayed","date":"September 20, 2021","format":false,"excerpt":"If you can, answer the question below: Write your answer in the comment box. What is entropy in decision tree?","rel":"","context":"In &quot;Introduction to Machine Learning&quot;","block_context":{"text":"Introduction to Machine Learning","link":"http:\/\/bangla.sitestree.com\/?cat=1945"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16296,"url":"http:\/\/bangla.sitestree.com\/?p=16296","url_meta":{"origin":16755,"position":1},"title":"Deep Learning (DL): can you answer these introductory questions on DL? Target: Starters in DL","author":"Sayed","date":"October 5, 2019","format":false,"excerpt":"Deep Learning - 001: Introduction to Deep Learning. Deep Learning (DL): can you answer these introductory questions on DL? Target: Starters in DL Can you define AI, ML, DL? Can you draw a diagram to show the relations of AI, ML, DL? What is Symbolic AI? Is Symbolic AI good\u2026","rel":"","context":"In &quot;\u09ac\u09cd\u09b2\u0997 \u0964 Blog&quot;","block_context":{"text":"\u09ac\u09cd\u09b2\u0997 \u0964 Blog","link":"http:\/\/bangla.sitestree.com\/?cat=182"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":14751,"url":"http:\/\/bangla.sitestree.com\/?p=14751","url_meta":{"origin":16755,"position":2},"title":"Applications and Research on Reinforcement Learning","author":"Sayed","date":"May 3, 2019","format":false,"excerpt":"\"WHAT ARE MAJOR REINFORCEMENT LEARNING ACHIEVEMENTS & PAPERS FROM 2018?\" Reference: https:\/\/www.topbots.com\/most-important-ai-reinforcement-learning-research\/#ai-rl-paper-2018-10 \" Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures Temporal Difference Models: Model-Free Deep RL for Model-Based Control Addressing Function Approximation Error in Actor-Critic Methods\u2026","rel":"","context":"In &quot;\u09ac\u09cd\u09b2\u0997 \u0964 Blog&quot;","block_context":{"text":"\u09ac\u09cd\u09b2\u0997 \u0964 Blog","link":"http:\/\/bangla.sitestree.com\/?cat=182"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":14893,"url":"http:\/\/bangla.sitestree.com\/?p=14893","url_meta":{"origin":16755,"position":3},"title":"AI\/ML\/Data Science: Cross Validation: KFold Cross validation: Concepts, Examples, Projects","author":"Sayed","date":"July 9, 2019","format":false,"excerpt":"Train\/Test Split and Cross Validation in Python https:\/\/towardsdatascience.com\/train-test-split-and-cross-validation-in-python-80b61beca4b6 sklearn.ensemble.RandomForestRegressor\u00b6 https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.ensemble.RandomForestRegressor.html \"A random forest regressor. A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is\u2026","rel":"","context":"In &quot;AI ML DS RL DL NN NLP Data Mining Optimization&quot;","block_context":{"text":"AI ML DS RL DL NN NLP Data Mining Optimization","link":"http:\/\/bangla.sitestree.com\/?cat=1910"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16626,"url":"http:\/\/bangla.sitestree.com\/?p=16626","url_meta":{"origin":16755,"position":4},"title":"linkedin profile","author":"Sayed","date":"January 10, 2020","format":false,"excerpt":"Director\/Course Designer\/Trainer Justetc (Just Et Cetera) Social Services Non-profit. As we can basis, as time and schedule permit basis. Courses: https:\/\/SitesTree.com\/training (Free and\/or low cost workshops and training). Events: http:\/\/facebook.com\/justetcsocial. Subject areas in order: Big-Data & Machine Learning (ML, DL, NN, RL, NLP, Visualization), Cloud, Linux\/System Admin, Security, DBMS\/BI, Web\/Mobile\/Software\u2026","rel":"","context":"In &quot;AI ML DS RL DL NN NLP Data Mining Optimization&quot;","block_context":{"text":"AI ML DS RL DL NN NLP Data Mining Optimization","link":"http:\/\/bangla.sitestree.com\/?cat=1910"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16744,"url":"http:\/\/bangla.sitestree.com\/?p=16744","url_meta":{"origin":16755,"position":5},"title":"Misc. : Classifier Performance and Model Selection","author":"Sayed","date":"February 6, 2020","format":false,"excerpt":"Cross Validation: \" en.wikipedia.org Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is\u2026","rel":"","context":"In &quot;AI ML DS RL DL NN NLP Data Mining Optimization&quot;","block_context":{"text":"AI ML DS RL DL NN NLP Data Mining Optimization","link":"http:\/\/bangla.sitestree.com\/?cat=1910"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2020\/02\/image-1.jpeg?resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16755","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16755"}],"version-history":[{"count":3,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16755\/revisions"}],"predecessor-version":[{"id":16765,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16755\/revisions\/16765"}],"wp:attachment":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16755"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}