Hadoop Interview Questions and Answers Set 7 -

61. How Spark uses Hadoop?

Spark has its own cluster management computation and mainly uses Hadoop for storage.

62. What is Spark SQL?

SQL Spark, better known as Shark is a novel module introduced in Spark to work with structured data and perform structured data processing. Through this module, Spark executes relational SQL queries on the data. The core of the component supports an altogether different RDD called SchemaRDD, composed of rows objects and schema objects defining data type of each column in the row. It is similar to a table in relational database.

63. What are the additional benefits YARN brings in to Hadoop?

Effective utilization of the resources as multiple applications can be run in YARN all sharing a common resource.YARN is backward compatible so all the existing MapReduce jobs.Using YARN, one can even run applications that are not based on the MaReduce model

64. Compare Sqoop and Flume

Criteria	Sqoop	Flume
Application	Importing data from RDBMS	Moving bulk streaming data into HDFS
Architecture	Connector – connecting to respective data	Agent – fetching of the right data
Loading of data	Event driven	Not event driven

65. What is Sqoop metastore?

Sqoop metastore is a shared metadata repository for remote users to define and execute saved jobs created using sqoop job defined in the metastore. The sqoop –site.xml should be configured to connect to the metastore.

HADOOP TRAINING
Weekend / Weekday Batch

66. Which are the elements of Kafka?

The most important elements of Kafka:

Topic – It is the bunch of similar kind of messages

Producer – using this one can issue communications to the topic

Consumer – it endures to a variety of topics and takes data from brokers.

Brokers – this is the place where the issued messages are stored

67. What is Kafka?

Wikipedia defines Kafka as “an open-source message broker project developed by the Apache Software Foundation written in Scala, where the design is heavily influenced by transaction logs”. It is essentially a distributed publish-subscribe messaging system.

68. What is the role of the ZooKeeper?

Kafka uses Zookeeper to store offsets of messages consumed for a specific topic and partition by a specific Consumer Group.

69. What are the key benefits of using Storm for Real Time Processing?

Easy to operate : Operating storm is quiet easy.

Real fast : It can process 100 messages per second per node.

Fault Tolerant : It detects the fault automatically and re-starts the functional attributes.

Reliable : It guarantees that each unit of data will be executed at least once or exactly once.

Scalable : It runs across a cluster of machine

70. List out different stream grouping in Apache storm?

Shuffle grouping
Fields grouping
Global grouping
All grouping
None grouping
Direct grouping
Local grouping

Search Tags: