Hadoop Interview Questions and Answers Set 10 -

91. List of some best tools that can be useful for data-analysis?

Tableau
RapidMiner
OpenRefine
KNIME
Google Search Operators
Solver
NodeXL
io
Wolfram Alpha’s
Google Fusion tables

92. List out some common problems faced by data analyst?

Some of the common problems faced by data analyst are

Common misspelling
Duplicate entries
Missing values
Illegal values
Varying value representations
Identifying overlapping data

93. Explain what are the tools used in Big Data?

Hadoop
Hive
Pig
Flume
Mahout
Sqoop

94. Which language is more suitable for text analytics? R or Python?

Since Python consists of a rich library called Pandas which allows the analysts to use high-level data analysis tools as well as data structures, while R lacks this feature. Hence Python will more suitable for text analytics.

95. What is logistic regression?

It is a statistical technique or a model in order to analyze a dataset and predict the binary outcome. The outcome has to be a binary outcome that is either zero or one or a yes or no.

HADOOP TRAINING
Weekend / Weekday Batch

96. Can you list few commonly used Hive services?

Command Line Interface (cli)
Hive Web Interface (hwi)
HiveServer (hiveserver)
Printing the contents of an RC file using the tool rcfilecat.
Jar
Metastore

97. What is indexing and why do we need it?

One of the Hive query optimization methods is Hive index. Hive index is used to speed up the access of a column or set of columns in a Hive database because with the use of index the database system does not need to read all rows in the table to find the data that one has selected.

98. What are the components used in Hive query processor?

The components of a Hive query processor include

Logical Plan of Generation.
Physical Plan of Generation.
Execution Engine.
UDF’s and UDAF’s.
Semantic Analyzer.
Type Checking

**99. If you run a select * query in Hive, Why does it not run MapReduce?**

The hive.fetch.task.conversion property of Hive lowers the latency of mapreduce overhead and in effect when executing queries like SELECT, FILTER, LIMIT, etc., it skips mapreduce function.

100. What is the use of explode in Hive?

Explode in Hive is used to convert complex data types into desired table formats. explode UDTF basically emits all the elements in an array into multiple rows.

Search Tags: