Welcome everyone to this week’s Hadoop tutorial, Previously we discussed the Top Reasons to Use Hadoop, Here in this part lets study about when and when not to use Hadoop.
PDF Download – Complete Apache Hadoop Training Course Content
When to use Hadoop:
1.Data Size and Data Diversity:
If you want to deal with a large amount of data from various resources and formats then you can be dealing that data with Hadoop because In this scenario Hadoop Technology is the right choice for you. Hadoop Database can able to processing any type of data quickly.
2. Multiple Frameworks for Big Data
Hadoop having various tools for a various purpose and can able to integrate with any analytic tools for multiple purposes like Mahout for machine learning, Spark for Real-time processing and Hbase for NoSql Databases.
3. Lifetime Data Availability
If you want to your data to be live or running everywhere, Hadoop is the best one for data stores. In Hadoop, you can increase data size at any time per your needs with minimum cost and there are no limits to the size of Hadoop Cluster.
4. Storing Diverse Data Sets
This is the best features of Hadoop Framework and it can store diverse sets of data range from text to image files. We can change processing query at any time in Hadoop Framework.
When Not to use Hadoop
If you want to Real-Time Analytics and wants results quickly, Hadoop should not be direct because Hadoop works on batch processing and also results response time is high in Hadoop.
Hadoop Cannot be used directly on real-time analytics so people used HDFS and make Real Time processing.Hadoop Using Spark for real-time processing and very quickly because Spark is 100 times faster than Hadoop.
“If I have 30 milliseconds to look up information in a database that has 300 million people, there’s no way Hadoop can do it, it’s not the technology for quick access.” says Claudia Perlich, chief scientist for Distillery”
2. Not Going to Replace Existing Infrastructure
Hadoop is not replacing your database infrastructure but your database can replace Hadoop either.
Hadoop stores the data on HDFS and it processed and transformed structured data format. After processing the data in Hadoop that transferred to a relational database for reporting.
If you don’t have better knowledge for Hadoop Framework, it is not suggested to use Hadoop for production. Hadoop should come with a disclaimer: “Handle with care” and it is a technology.
4. Small Data Set Processing
Hadoop platform is not recommended for processing smaller sets of data as there are tools available such as Excel, RDBMS etc that can perform the task. The use of Hadoop in such cases may even prove expensive.