Tuesday, May 21, 2019

Hadoop

Hadoop is a program that allows you to break up tasks to deal with big data.
Hadoop is a free open source application that allows different computers dealing with the same data to communicate with each other.
How it works:
A large amount of data can be broken into smaller chunks of data (called 'clusters'. A cluster is a group of similar data taken from a larger group of abstract data). The data is split up depending on keys and sorted into their own cluster. As there is a lot of data, several computers may be required to process all of the data. Each computer working on each cluster processes the data it has and outputs the results of each cluster bringing the outputs together. The end results are then compared. Hadoop allows the processing of big data to be much quicker because several computers can work together on the same data.

No comments:

Post a Comment