Hadoop Interview Questions and Answers Set 10

91. List of some best tools that can be useful for data-analysis?

  • Tableau
  • RapidMiner
  • OpenRefine
  • KNIME
  • Google Search Operators
  • Solver
  • NodeXL
  • io
  • Wolfram Alpha’s
  • Google Fusion tables

92. List out some common problems faced by data analyst?

Some of the common problems faced by data analyst are

  • Common misspelling
  • Duplicate entries
  • Missing values
  • Illegal values
  • Varying value representations
  • Identifying overlapping data

93. Explain what are the tools used in Big Data?

  • Hadoop
  • Hive
  • Pig
  • Flume
  • Mahout
  • Sqoop

94. Which language is more suitable for text analytics? R or Python?

Since Python consists of a rich library called Pandas which allows the analysts to use high-level data analysis tools as well as data structures, while R lacks this feature. Hence Python will more suitable for text analytics.

95. What is logistic regression?

It is a statistical technique or a model in order to analyze a dataset and predict the binary outcome. The outcome has to be a binary outcome that is either zero or one or a yes or no.

96. Can you list few commonly used Hive services?

  • Command Line Interface (cli)
  • Hive Web Interface (hwi)
  • HiveServer (hiveserver)
  • Printing the contents of an RC file using the tool rcfilecat.
  • Jar
  • Metastore

97. What is indexing and why do we need it?

One of the Hive query optimization methods is Hive index. Hive index is used to speed up the access of a column or set of columns in a Hive database because with the use of index the database system does not need to read all rows in the table to find the data that one has selected.

98. What are the components used in Hive query processor?

The components of a Hive query processor include

  • Logical Plan of Generation.
  • Physical Plan of Generation.
  • Execution Engine.
  • UDF’s and UDAF’s.
  • Semantic Analyzer.
  • Type Checking

99. If you run a select * query in Hive, Why does it not run MapReduce?

The hive.fetch.task.conversion property of Hive lowers the latency of mapreduce overhead and in effect when executing queries like SELECT, FILTER, LIMIT, etc., it skips mapreduce function.

100. What is the use of explode in Hive?

Explode in Hive is used to convert complex data types into desired table formats. explode UDTF basically emits all the elements in an array into multiple rows.