Difference Between Apache Hadoop and Spark

Apache Hadoop:

Apache Hadoop is an open source and java based framework for reliable, distributed computing architecture. Hadoop is a popular database which used to storing and processing the large amount of data.

Apache Spark:

Apache Spark is a general computing engine with fast processing a large Hadoop data set in the wide range of applications such as ETL, stream processing, machine learning, graph computation, etc. Even though Spark is a part of Hadoop ecosystem.

Quick Link – Why Learn Hadoop

Here we are discussed about features differences between apache hadoop and spark

SI No	Features	Apache Hadoop	Apache Spark
1	Data Processing Engine	MapReduce is the data processing engine tool for hadoop	Spark is the one part of the hadoop ecosystem so core spark is the data processing engine of apache spark.
2	Visualization	Zoomdata is the visualization tool for hadoop that directly connected to HDFS and hadoop technologies like Hive, Impala	Zepplin is the Visualization tool for apache spark that collaboration and discovery tool for spark.
3	Security	Hadoop supports Kerberos authentication which painful to manage. But, third party vendors have enabled organizations to leverage Active Directory Kerberos and LDAP for authentication.	Apache Spark’s security is a bit sparse by currently only supporting authentication via shared secret (password authentication).
4	Cost	Hadoop is an open source database and it runs on less expensive hardware.	Spark requires more RAM memory and increasing it in cluster so cost increase gradually
5	Abstraction	Hadoop does not have any abstractions.	Spark having two types of abstractions that is RDD abstraction and Dstream abstraction.
6	SQL Support	In hadoop hive used to runs SQL queries.	In Spark Spark-SQL used to runs SQL queries.
7	Caching	Hadoop cannot cache any data for future requirements.	Spark can cache the data for future requirements.
8	Language support	Java is the primary language for hadoop and otherwise support C++, Ruby, Python also.	It supports the programming languages like Scala, Java, Python and R.
9	Processing Speed	Hadoop processing speed slower than spark.	Spark processing speed 100 times faster than Hadoop.
10	Line of code	Hadoop having 1,20,000 lines of code so takes more time to execute the program.	Apache spark having 20,000 lines of code so takes less time to execute programs.