Difference Between Apache Hadoop and Spark

  • Difference Apache Hadoop Spark

Apache Hadoop:

Apache Hadoop is an open source and java based framework for reliable, distributed computing architecture. Hadoop is a popular database which used to storing and processing the large amount of data.

Apache Spark:

Apache Spark is a general computing engine with fast processing a large Hadoop data set in the wide range of applications such as ETL, stream processing, machine learning, graph computation, etc. Even though Spark is a part of Hadoop ecosystem.

Quick Link – Why Learn Hadoop

Here we are discussed about features differences between apache hadoop and spark

SI No Features Apache Hadoop Apache Spark
1 Data Processing Engine MapReduce is the data processing engine tool for hadoop Spark is the one part of the hadoop ecosystem so core spark is the data processing engine of apache spark.
2 Visualization Zoomdata is the visualization tool for hadoop that directly connected to HDFS and hadoop technologies like Hive, Impala Zepplin is the Visualization tool for apache spark that collaboration and discovery tool for spark.
3 Security Hadoop supports Kerberos authentication which painful to manage. But, third party vendors have enabled organizations to leverage Active Directory Kerberos and LDAP for authentication. Apache Spark’s security is a bit sparse by currently only supporting authentication via shared secret (password authentication).
4 Cost Hadoop is an open source database and it runs on less expensive hardware. Spark requires more RAM memory and increasing it in cluster so cost increase gradually
5 Abstraction Hadoop does not have any abstractions. Spark having two types of abstractions that is RDD abstraction and Dstream abstraction.
6 SQL Support In hadoop hive used to runs SQL queries. In Spark Spark-SQL used to runs SQL queries.
7 Caching Hadoop cannot cache any data for future requirements. Spark can cache the data for future requirements.
8 Language support Java is the primary language for hadoop and otherwise support C++, Ruby, Python also. It supports the programming languages like Scala, Java, Python and R.
9 Processing Speed Hadoop processing speed slower than spark. Spark processing speed 100 times faster than Hadoop.
10 Line of code Hadoop having 1,20,000 lines of code so takes more time to execute the program. Apache spark having 20,000 lines of code so takes less time to execute programs.