Traditionally SQL database or Relational database were enough to handle the small or medium dataset. But the ratio in which the data is growing, these traditional databases are not enough to handle them. So the terms called big data and no sql has been already been used since few years.
Big data are generally large sets of raw, unstructured data that are impossible to operate with simple traditional database management systems.
Hadoop, Hive, HBase are the popular platforms to operate with these large data sets.
NoSQL database in contrast to SQL database provides the mechanism to store and retrieve data in looser consistency model, which provides advantages like horizontal scaling and much higher availability and faster access.
Most popular NoSQL databases MangoDB, Solr.
Basically Hadoop is the open source implementation of MapReduce paper published by Google to handle big data for analytical purpose.
Hadoop is basically used to process data for analytical purpose rather than real time processing. Hadoop comes in mind when there is large set of data is involved, hundreds of gigabytes or even hundreds of petabytes of data is involved.
Basically Hadoop is used to post processing data analysis, often the data processing is measured in minutes and hours, some times even days.
MongoDB on the other hand is designed for realtime processing. Even though MongoDB can store massive amount of data, data processing at a time is done on a small subset of data.
- When big data is invovled
- For analytical purpose.
- Processing time measured in minutes and hours.
- For offline processing
- Eg: Weather forecasting
- Used when dataset is small.
- Processing time measured in milliseconds.
- For real time processing.
- Eg: search data on real time.