Apache Hadoop is an open source and java based framework for reliable, distributed computing architecture. Hadoop is a popular database which used to storing and processing the large amount of data.
Apache Spark is a general computing engine with fast processing a large Hadoop data set in the wide range of applications such as ETL, stream processing, machine learning, graph computation, etc. Even though Spark is a part of Hadoop ecosystem.
Quick Link – Why Learn Hadoop
Here we are discussed about features differences between apache hadoop and spark
|SI No||Features||Apache Hadoop||Apache Spark|
|1||Data Processing Engine||MapReduce is the data processing engine tool for hadoop||Spark is the one part of the hadoop ecosystem so core spark is the data processing engine of apache spark.|
|2||Visualization||Zoomdata is the visualization tool for hadoop that directly connected to HDFS and hadoop technologies like Hive, Impala||Zepplin is the Visualization tool for apache spark that collaboration and discovery tool for spark.|
|3||Security||Hadoop supports Kerberos authentication which painful to manage. But, third party vendors have enabled organizations to leverage Active Directory Kerberos and LDAP for authentication.||Apache Spark’s security is a bit sparse by currently only supporting authentication via shared secret (password authentication).|
|4||Cost||Hadoop is an open source database and it runs on less expensive hardware.||Spark requires more RAM memory and increasing it in cluster so cost increase gradually|
|5||Abstraction||Hadoop does not have any abstractions.||Spark having two types of abstractions that is RDD abstraction and Dstream abstraction.|
|6||SQL Support||In hadoop hive used to runs SQL queries.||In Spark Spark-SQL used to runs SQL queries.|
|7||Caching||Hadoop cannot cache any data for future requirements.||Spark can cache the data for future requirements.|
|8||Language support||Java is the primary language for hadoop and otherwise support C++, Ruby, Python also.||It supports the programming languages like Scala, Java, Python and R.|
|9||Processing Speed||Hadoop processing speed slower than spark.||Spark processing speed 100 times faster than Hadoop.|
|10||Line of code||Hadoop having 1,20,000 lines of code so takes more time to execute the program.||Apache spark having 20,000 lines of code so takes less time to execute programs.|