MapReduce
”
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.[1][2]
A MapReduce program is composed of a Map() procedure (method) that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() method that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The “MapReduce System” (also called “infrastructure” or “framework”) orchestrates the processing by marshalling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for redundancy and fault tolerance.
”
Reference: https://en.wikipedia.org/wiki/MapReduce
Apache Pig
“is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.”
Reference: https://pig.apache.org/
Hive:
“Hive provides a SQL-like interface to data stored in HDP.”
“Hive has three main functions: data summarization, query and analysis. It supports queries expressed in a language called HiveQL, which automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. In addition, HiveQL supports custom MapReduce scripts to be plugged into queries.”
Reference: http://searchdatamanagement.techtarget.com/definition/Apache-Hive
Flume:
Flume lets Hadoop users ingest high-volume streaming data into HDFS for storage. …. Integrating Data Silos with your Big Data Systems in Real-Time.
Reference: https://hortonworks.com/apache/flume/
SQoop:
Apache Sqoop efficiently transfers bulk data between Apache Hadoop and … Carolinas Healthcare: Using Big Data to Drive Point of Care Decision in Oncology.
https://hortonworks.com/apache/sqoop/
Oozie:
Apache Oozie is a Java Web application used to schedule Apache Hadoop jobs. …. Eight Big Data and Hadoop Meetups for the Hadoop Summit San Jose.
https://hortonworks.com/apache/oozie/
From: http://sitestree.com/?p=10657
Categories:Big Data
Tags:
Post Data:2017-06-23 12:49:30
Shop Online: https://www.ShopForSoul.com/
(Big Data, Cloud, Security, Machine Learning): Courses: http://Training.SitesTree.com
In Bengali: http://Bangla.SaLearningSchool.com
http://SitesTree.com
8112223 Canada Inc./JustEtc: http://JustEtc.net (Software/Web/Mobile/Big-Data/Machine Learning)
Shop Online: https://www.ShopForSoul.com/
Medium: https://medium.com/@SayedAhmedCanada