Apache Hadoop is an open source and java based framework for reliable, distributed computing architecture. Hadoop is a popular database which used to storing and processing the large amount of data.
Apache Spark is a general computing engine with fast processing a large Hadoop data set in the wide range of applications such as ETL, stream processing, machine learning, graph computation, etc. Even though Spark is a part of Hadoop ecosystem.
Quick Link – Why Learn Hadoop
Here we are discussed about features differences between apache hadoop and spark
SI No | Features | Apache Hadoop | Apache Spark |
---|---|---|---|
1 | Data Processing Engine | MapReduce is the data processing engine tool for hadoop | Spark is the one part of the hadoop ecosystem so core spark is the data processing engine of apache spark. |
2 | Visualization | Zoomdata is the visualization tool for hadoop that directly connected to HDFS and hadoop technologies like Hive, Impala | Zepplin is the Visualization tool for apache spark that collaboration and discovery tool for spark. |
3 | Security | Hadoop supports Kerberos authentication which painful to manage. But, third party vendors have enabled organizations to leverage Active Directory Kerberos and LDAP for authentication. | Apache Spark’s security is a bit sparse by currently only supporting authentication via shared secret (password authentication). |
4 | Cost | Hadoop is an open source database and it runs on less expensive hardware. | Spark requires more RAM memory and increasing it in cluster so cost increase gradually |
5 | Abstraction | Hadoop does not have any abstractions. | Spark having two types of abstractions that is RDD abstraction and Dstream abstraction. |
6 | SQL Support | In hadoop hive used to runs SQL queries. | In Spark Spark-SQL used to runs SQL queries. |
7 | Caching | Hadoop cannot cache any data for future requirements. | Spark can cache the data for future requirements. |
8 | Language support | Java is the primary language for hadoop and otherwise support C++, Ruby, Python also. | It supports the programming languages like Scala, Java, Python and R. |
9 | Processing Speed | Hadoop processing speed slower than spark. | Spark processing speed 100 times faster than Hadoop. |
10 | Line of code | Hadoop having 1,20,000 lines of code so takes more time to execute the program. | Apache spark having 20,000 lines of code so takes less time to execute programs. |