{"id":76515,"date":"2024-12-27T02:29:05","date_gmt":"2024-12-27T02:29:05","guid":{"rendered":"http:\/\/bangla.sitestree.com\/?p=76515"},"modified":"2024-12-27T02:29:05","modified_gmt":"2024-12-27T02:29:05","slug":"topics-reinforcement-learning-interactive-learning-in-decision-processes","status":"publish","type":"post","link":"http:\/\/bangla.sitestree.com\/?p=76515","title":{"rendered":"Topics: Reinforcement Learning (Interactive Learning in Decision Processes):"},"content":{"rendered":"\n<p>What is: Reinforcement Learning (Interactive Learning in Decision Processes)?<\/p>\n\n\n\n<p>&#8212; Is there a way to learn by interacting<\/p>\n\n\n\n<p>&#8212; i.e. interact have experience and use the experience to learn (predict the future)<\/p>\n\n\n\n<p>&#8212; Interact to explore and utilize what makes learning (goal\/outcome) enhanced<\/p>\n\n\n\n<p>&#8212; The computation approach of this method is Reinforcement Learning (Interactive Learning in Decision Processes)?<\/p>\n\n\n\n<p>&#8212; it is a goal oriented learning from interactions<\/p>\n\n\n\n<p>&#8212; it has it&#8217;s root in <strong>Markov decision process<\/strong>\u00a0(<strong>MDP<\/strong>) <\/p>\n\n\n\n<p>&#8212; <strong>Markov decision process<\/strong>\u00a0(<strong>MDP<\/strong>) is a model for\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Sequential_decision_making\">sequential decision making<\/a>\u00a0when\u00a0<a href=\"https:\/\/en.wikipedia.org\/wiki\/Outcome_(probability)\">outcomes<\/a>\u00a0are uncertain.<sup><a href=\"https:\/\/en.wikipedia.org\/wiki\/Markov_decision_process#cite_note-1\">[1]<\/a><\/sup> [Wikipedia]<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"287\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-55.png?resize=750%2C287\" alt=\"\" class=\"wp-image-76516\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-55.png?w=816 816w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-55.png?resize=300%2C115 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-55.png?resize=768%2C294 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-55.png?resize=750%2C287 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<p><strong>Reinforcement Learning (Interactive Learning in Decision Processes) Involves:<\/strong><\/p>\n\n\n\n<p>Markov decision processes<br>Dynamic Programming<br>Monte Carlo methods<br>Temporal-difference learning<br>Function approximation methods<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"230\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-56.png?resize=750%2C230\" alt=\"\" class=\"wp-image-76517\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-56.png?w=868 868w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-56.png?resize=300%2C92 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-56.png?resize=768%2C235 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-56.png?resize=750%2C230 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"250\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-57.png?resize=750%2C250\" alt=\"\" class=\"wp-image-76518\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-57.png?w=863 863w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-57.png?resize=300%2C100 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-57.png?resize=768%2C256 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-57.png?resize=750%2C250 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<p><strong>Monte Carlo methods<\/strong><\/p>\n\n\n\n<p>Solves problems with repeated random sampling.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"253\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-58.png?resize=750%2C253\" alt=\"\" class=\"wp-image-76519\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-58.png?w=873 873w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-58.png?resize=300%2C101 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-58.png?resize=768%2C260 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-58.png?resize=750%2C253 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<p><strong>Temporal-difference learning<\/strong>: combination of the Monte Carlo (MC) method and the Dynamic Programming (DP) method.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"219\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-60.png?resize=750%2C219\" alt=\"\" class=\"wp-image-76521\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-60.png?w=854 854w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-60.png?resize=300%2C87 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-60.png?resize=768%2C224 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-60.png?resize=750%2C219 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"251\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-59.png?resize=750%2C251\" alt=\"\" class=\"wp-image-76520\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-59.png?w=886 886w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-59.png?resize=300%2C101 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-59.png?resize=768%2C257 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-59.png?resize=750%2C251 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n\n\n\n<p><strong>Function approximation methods<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"750\" height=\"278\" src=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-61.png?resize=750%2C278\" alt=\"\" class=\"wp-image-76522\" srcset=\"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-61.png?w=925 925w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-61.png?resize=300%2C111 300w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-61.png?resize=768%2C285 768w, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2024\/12\/image-61.png?resize=750%2C278 750w\" sizes=\"auto, (max-width: 750px) 100vw, 750px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>What is: Reinforcement Learning (Interactive Learning in Decision Processes)? &#8212; Is there a way to learn by interacting &#8212; i.e. interact have experience and use the experience to learn (predict the future) &#8212; Interact to explore and utilize what makes learning (goal\/outcome) enhanced &#8212; The computation approach of this method is Reinforcement Learning (Interactive Learning &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"http:\/\/bangla.sitestree.com\/?p=76515\">Continue reading<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-76515","post","type-post","status-publish","format-standard","hentry","category-root","item-wrap"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":14773,"url":"http:\/\/bangla.sitestree.com\/?p=14773","url_meta":{"origin":76515,"position":0},"title":"Reinforcement Learning Concepts Explained in a Simple Way.","author":"Sayed","date":"May 17, 2019","format":false,"excerpt":"Reinforcement Learning Concepts Explained in a Simple (or not) Way. This is intended for the beginners who want to know the concepts used in Reinforcement Learning i.e. Interactive Learning. Reinforcement Learning is also one aspect of Machine Learning, Data Science, and AI Summary of Tabular Methods in Reinforcement Learning Comparison\u2026","rel":"","context":"In &quot;AI ML DS RL DL NN NLP Data Mining Optimization&quot;","block_context":{"text":"AI ML DS RL DL NN NLP Data Mining Optimization","link":"http:\/\/bangla.sitestree.com\/?cat=1910"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":17349,"url":"http:\/\/bangla.sitestree.com\/?p=17349","url_meta":{"origin":76515,"position":1},"title":"Lecture Slides: Introduction to Machine Learning","author":"Sayed","date":"August 17, 2020","format":false,"excerpt":"https:\/\/www.cmpe.boun.edu.tr\/~ethem\/i2ml2e\/ Lecture Slides: Introduction (pdf,ppt) Supervised Learning (pdf, ppt) Bayesian Decision Theory (pdf, ppt) Parametric Methods (pdf, ppt) Multivariate Methods (pdf, ppt) Dimensionality Reduction (pdf, ppt) Clustering (pdf, ppt) Nonparametric Methods (pdf, ppt) Decision Trees (pdf, ppt) Linear Discrimination (pdf, ppt) Multilayer Perceptrons (pdf, ppt) Local Models (pdf, ppt) Kernel\u2026","rel":"","context":"In &quot;\u09ac\u09cd\u09b2\u0997 \u0964 Blog&quot;","block_context":{"text":"\u09ac\u09cd\u09b2\u0997 \u0964 Blog","link":"http:\/\/bangla.sitestree.com\/?cat=182"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24971,"url":"http:\/\/bangla.sitestree.com\/?p=24971","url_meta":{"origin":76515,"position":2},"title":"AI Implementation Platforms: Reinforcement Learning Platforms and Applications: #Root","author":"Author-Check- Article-or-Video","date":"April 14, 2021","format":false,"excerpt":"\" Gym: https:\/\/gym.openai.com\/ Gym is a toolkit for developing and comparing .... It supports teaching agents everything from walking to playing games like Pong or Pinball. \" https:\/\/gym.openai.com\/ ---- \"Project Malmo integrates (deep) reinforcement learning, cognitive science, and many ideas from artificial intelligence. \" https:\/\/www.microsoft.com\/en-us\/research\/project\/project-malmo\/ ---- DeepMind: \"DeepMind's scientific mission\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24334,"url":"http:\/\/bangla.sitestree.com\/?p=24334","url_meta":{"origin":76515,"position":3},"title":"Matlab: Reinforcement Learning Examples","author":"Sayed","date":"April 7, 2021","format":false,"excerpt":"Getting Started Train a DQN Agent to Balance a Cart-Pole System Train a Q-Learning Agent to Solve Grid World Problems Train a Reinforcement Learning Agent in an MDP Environment Reinforcement Learning A Motivation for a Powertrain Control Engineer (21:26 Automated Driving Train DDPG Agent for Adaptive Cruise Control Train DQN\u2026","rel":"","context":"In &quot;\u09ac\u09cd\u09b2\u0997 \u0964 Blog&quot;","block_context":{"text":"\u09ac\u09cd\u09b2\u0997 \u0964 Blog","link":"http:\/\/bangla.sitestree.com\/?cat=182"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":78217,"url":"http:\/\/bangla.sitestree.com\/?p=78217","url_meta":{"origin":76515,"position":4},"title":"Model Selection","author":"Sayed","date":"May 21, 2025","format":false,"excerpt":"\u2022 Optimizations\/Machine Learning\/Data Mining\/Deep Learning\/Reinforcement Learning\/Graph Mining\/NLP\/Genetic Algorithms \u2022 Regression \u2022 Linear \u2022 Non-Linear \u2022 Classifications \u2022 Logistics Regression \u2022 Sigmoid : Binary \u2022 Softmax: Multi-Class \u2022 Bayes Classifier \u2022 SVM \u2022 Bayesian: Regression\/Classification \u2022 Clustering \u2022 K-NN \u2022 KNN+ \u2022 Kmeans, Hierarchical, Density \u2022Machine Learning\/Data Mining\/Deep Learning\/Reinforcement Learning\/Graph Mining\/NLP\u2026","rel":"","context":"In &quot;Analytics and Machine Learning Project Development&quot;","block_context":{"text":"Analytics and Machine Learning Project Development","link":"http:\/\/bangla.sitestree.com\/?cat=1974"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24965,"url":"http:\/\/bangla.sitestree.com\/?p=24965","url_meta":{"origin":76515,"position":5},"title":"Learn Reinforcement Learning #Root","author":"Author-Check- Article-or-Video","date":"April 14, 2021","format":false,"excerpt":"Learn Reinforcement Learning: Why and where to use Reinforcement Learning? Robotics for sure, Autonomous Vehicles for sure, Finance (creating better investment portfolio), Healthcare\/Medical, Inventory Management, Manufacturing or similar. https:\/\/chatbotsmagazine.com\/reinforcement-learning-and-its-practical-applications-8499e60cf751 10 Sets of Presentation Slides: https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/1 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/2 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/3 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/4 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/5 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/6 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/7 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/8 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/9 https:\/\/www.slideserve.com\/search\/presentations\/sutton-reinforcement-learning\/10 Sayed Ahmed Linkedin: https:\/\/ca.linkedin.com\/in\/sayedjustetc Blog: http:\/\/sitestree.com,\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/76515","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=76515"}],"version-history":[{"count":1,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/76515\/revisions"}],"predecessor-version":[{"id":76523,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/76515\/revisions\/76523"}],"wp:attachment":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=76515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=76515"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=76515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}