{"id":68329,"date":"2021-08-01T04:10:04","date_gmt":"2021-08-01T08:10:04","guid":{"rendered":"http:\/\/bangla.salearningschool.com\/recent-posts\/some-big-data-terms-big-data\/"},"modified":"2021-08-01T04:10:04","modified_gmt":"2021-08-01T08:10:04","slug":"some-big-data-terms-big-data","status":"publish","type":"post","link":"http:\/\/bangla.sitestree.com\/?p=68329","title":{"rendered":"Some Big Data Terms #Big Data"},"content":{"rendered":"<h1 id=\"firstHeading\" class=\"firstHeading\">MapReduce<\/h1>\n<p>&#8221;<\/p>\n<p><b>MapReduce<\/b> is a <a title=\"Programming model\" href=\"https:\/\/en.wikipedia.org\/wiki\/Programming_model\">programming model<\/a> and an associated implementation for processing and generating <a title=\"Big data\" href=\"https:\/\/en.wikipedia.org\/wiki\/Big_data\">big data<\/a> sets with a <a title=\"Parallel computing\" href=\"https:\/\/en.wikipedia.org\/wiki\/Parallel_computing\">parallel<\/a>, <a title=\"Distributed computing\" href=\"https:\/\/en.wikipedia.org\/wiki\/Distributed_computing\">distributed<\/a> algorithm on a <a class=\"mw-redirect\" title=\"Cluster (computing)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Cluster_%28computing%29\">cluster<\/a>.<sup id=\"cite_ref-1\" class=\"reference\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/MapReduce#cite_note-1\">[1]<\/a><\/sup><sup id=\"cite_ref-2\" class=\"reference\"><a href=\"https:\/\/en.wikipedia.org\/wiki\/MapReduce#cite_note-2\">[2]<\/a><\/sup><\/p>\n<p>A MapReduce program is composed of a <a title=\"Map (parallel pattern)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Map_%28parallel_pattern%29\"><b>Map()<\/b><\/a> <a class=\"mw-redirect\" title=\"Procedure (computing)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Procedure_%28computing%29\">procedure<\/a> (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a <b>Reduce()<\/b> method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The &#8220;MapReduce System&#8221; (also called &#8220;infrastructure&#8221; or &#8220;framework&#8221;) orchestrates the processing by <a title=\"Marshalling (computer science)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Marshalling_%28computer_science%29\">marshalling<\/a> the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for <a title=\"Redundancy (engineering)\" href=\"https:\/\/en.wikipedia.org\/wiki\/Redundancy_%28engineering%29\">redundancy<\/a> and <a title=\"Fault-tolerant computer system\" href=\"https:\/\/en.wikipedia.org\/wiki\/Fault-tolerant_computer_system\">fault tolerance<\/a>.<\/p>\n<p>&#8221;<\/p>\n<p><strong>Reference:<\/strong> https:\/\/en.wikipedia.org\/wiki\/MapReduce<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Apache Pig<\/strong><\/p>\n<p>&#8220;is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.&#8221;<\/p>\n<p><strong>Reference:<\/strong> https:\/\/pig.apache.org\/<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Hive:<\/strong><\/p>\n<p>&#8220;<span class=\"_Tgc _BBn\"><b>Hive<\/b> provides a SQL-like interface to <b>data<\/b> stored in HDP.<\/span>&#8221;<\/p>\n<p>&#8220;<span class=\"_Tgc _y9e _BBn\">Hive has three main functions: <b>data summarization<\/b>, query and analysis. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into queries.<\/span>&#8221;<\/p>\n<p><strong>Reference:<\/strong> http:\/\/searchdatamanagement.techtarget.com\/definition\/Apache-Hive<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Flume<\/strong>:<\/p>\n<p><span class=\"st\"><em>Flume<\/em> lets Hadoop users ingest high-volume streaming data into HDFS for storage. &#8230;. Integrating Data Silos with your <em>Big Data<\/em> Systems in Real-Time.<\/span><\/p>\n<p><strong>Reference<\/strong>: https:\/\/hortonworks.com\/apache\/flume\/<\/p>\n<p>&nbsp;<\/p>\n<p><strong>SQoop<\/strong>:<\/p>\n<p><span class=\"st\">Apache <em>Sqoop<\/em> efficiently transfers bulk data between Apache Hadoop and &#8230; Carolinas Healthcare: Using <em>Big Data<\/em> to Drive Point of Care Decision in Oncology.<\/span><\/p>\n<p>https:\/\/hortonworks.com\/apache\/sqoop\/<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Oozie<\/strong>:<\/p>\n<p><span class=\"st\">Apache <em>Oozie<\/em> is a Java Web application used to schedule Apache Hadoop jobs. &#8230;. Eight <em>Big Data<\/em> and Hadoop Meetups for the Hadoop Summit San Jose.<\/span><\/p>\n<p>https:\/\/hortonworks.com\/apache\/oozie\/<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp; From: http:\/\/sitestree.com\/?p=10657<br \/> Categories:Big Data<br \/>Tags:<br \/> Post Data:2017-06-23 12:49:30<\/p>\n<p>\t\tShop Online: <a href='https:\/\/www.ShopForSoul.com\/' target='new' rel=\"noopener\">https:\/\/www.ShopForSoul.com\/<\/a><br \/>\n\t\t(Big Data, Cloud, Security, Machine Learning): Courses: <a href='http:\/\/Training.SitesTree.com' target='new' rel=\"noopener\"> http:\/\/Training.SitesTree.com<\/a><br \/>\n\t\tIn Bengali: <a href='http:\/\/Bangla.SaLearningSchool.com' target='new' rel=\"noopener\">http:\/\/Bangla.SaLearningSchool.com<\/a><br \/>\n\t\t<a href='http:\/\/SitesTree.com' target='new' rel=\"noopener\">http:\/\/SitesTree.com<\/a><br \/>\n\t\t8112223 Canada Inc.\/JustEtc: <a href='http:\/\/JustEtc.net' target='new' rel=\"noopener\">http:\/\/JustEtc.net (Software\/Web\/Mobile\/Big-Data\/Machine Learning) <\/a><br \/>\n\t\tShop Online: <a href='https:\/\/www.ShopForSoul.com'> https:\/\/www.ShopForSoul.com\/<\/a><br \/>\n\t\tMedium: <a href='https:\/\/medium.com\/@SayedAhmedCanada' target='new' rel=\"noopener\"> https:\/\/medium.com\/@SayedAhmedCanada <\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>MapReduce &#8221; MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.[1][2] A MapReduce program is composed of a Map() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"http:\/\/bangla.sitestree.com\/?p=68329\">Continue reading<\/a><\/p>\n","protected":false},"author":8,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[1917],"tags":[],"class_list":["post-68329","post","type-post","status-publish","format-standard","hentry","category-fromsitestree-com","item-wrap"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":20653,"url":"http:\/\/bangla.sitestree.com\/?p=20653","url_meta":{"origin":68329,"position":0},"title":"Hadoop Random Notes","author":"Author-Check- Article-or-Video","date":"February 25, 2021","format":false,"excerpt":"Some Random Notes on Hadoop... Why Hadoop? ...randomly came across again....so, Just some random stuff. I may try to relate my exposure to it (or something), even that remotely relates to Hadoop What is Hadoop? Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":68333,"url":"http:\/\/bangla.sitestree.com\/?p=68333","url_meta":{"origin":68329,"position":1},"title":"Big Data and AWS #Big Data","author":"Author-Check- Article-or-Video","date":"August 1, 2021","format":false,"excerpt":"AWS Concepts and Tools for Big Data Implementation Amazon Kinesis Big Data Streaming and Amazon Kinesis Example: Using Amazon Kinesis to Stream and Analyze Apache Server Log Data Amazon Athena Big Data Processing and Analytics Example: Using Amazon Athena to Query Log Data From Amazon S3 DynamoDB No SQL DBMS:\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22237,"url":"http:\/\/bangla.sitestree.com\/?p=22237","url_meta":{"origin":68329,"position":2},"title":"On Data Management from Enterprise Data Analytics to Data-Based Decision Making #Root #Data Warehouse Misc","author":"Author-Check- Article-or-Video","date":"March 12, 2021","format":false,"excerpt":"On Data Management from Enterprise Data Analytics to Data-Based Decision Making Establish data quality standards and train others in this regard Five steps to an improved data quality assurance plan Data Quality Standards Handbook on Data Quality Assessment Methods and Tools Five Fundamental Data Quality Practices data management\/analysis software tools\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":67640,"url":"http:\/\/bangla.sitestree.com\/?p=67640","url_meta":{"origin":68329,"position":3},"title":"What is Data Warehousing? #Data Warehouse #Data Warehouse &#8211; 001 #Data Warehouse #Data Warehouse Misc","author":"Author-Check- Article-or-Video","date":"July 26, 2021","format":false,"excerpt":"A Data Warehouse is a storage of an organization's historical data. Usually, data warehouse is built to provide a decision support system to the management. Data warehouse can be mined (using data mining) and queried to collect overall and summarized business information that significantly helps in decision making. It can\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22577,"url":"http:\/\/bangla.sitestree.com\/?p=22577","url_meta":{"origin":68329,"position":4},"title":"Some Data Analysis Tools #Root #By Sayed Ahmed #Data Warehouse Misc #Big Data","author":"Author-Check- Article-or-Video","date":"March 16, 2021","format":false,"excerpt":"Just some Links: Data Collection and Analysis Tools http:\/\/asq.org\/learn-about-quality\/data-collection-analysis-tools\/overview\/overview.html \u00a0 Big Data Analytics: Time For New Tools http:\/\/www.informationweek.com\/big-data\/big-data-analytics\/big-data-analytics-time-for-new-tools\/a\/d-id\/1318106 \u00a0 Data analysis tools target non-experts Tools simplify the application of advanced analytics and the interpretation of results http:\/\/radar.oreilly.com\/2013\/08\/data-analysis-tools-target-non-experts.html \u00a0 Guide to big data analytics tools, trends and best practices http:\/\/searchbusinessanalytics.techtarget.com\/essentialguide\/Guide-to-big-data-analytics-tools-trends-and-best-practices \u00a0\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":67638,"url":"http:\/\/bangla.sitestree.com\/?p=67638","url_meta":{"origin":68329,"position":5},"title":"Random Information on BI #Data Warehouse #Data Warehouse &#8211; 001 #Data Warehouse #Data Warehouse Misc","author":"Author-Check- Article-or-Video","date":"July 26, 2021","format":false,"excerpt":"Real internet money\/business lies in B2B not in B2C. Decision Support System (DSS): Full spectrum of systems that allow\/help management to take decisions, such as, reporting, OLAP, data mining. Java provides support for web enabled DSS JDM is Java's Data mining\/BI API. BI Applications: Balanced Score-card, Activity Based Costing. Data\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/68329","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=68329"}],"version-history":[{"count":0,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/68329\/revisions"}],"wp:attachment":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=68329"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=68329"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=68329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}