Job Description
Key words for Primary – HIVE SQL , SPARK and Sqoop, Rest optional.
- Experience working with Big Data eco-system including tools such as Hadoop, Map Reduce, Yarn, Hive, Pig, Impala, Spark , Kafka, Hive, Impala and Storm to name a few (particularly HIVE, Pig and Spark).
- Strong SQL skills (Strong data analysis and SQL skills in Relational/Columnar/Big data environments (Joins, Union, rank, group by, order etc.)) ; preferably HIVE SQL experience
- Experience with NoSQL Databases – HBase, Apache Cassandra, Vertica, or MongoDB
- Should have good Data Warehousing background, experience in design, build and support of Data Warehouses/Marts. Experience with Data primarily on the back of data integration, data quality, Migration, report testing using SQL. Should have Data Warehousing concepts clear
Must be from DEVELOPMENT background only.
- Ability to build and test rapidly Map Reduce code in a rapid, iterative manner
- Ability to articulate reasons behind the design choices being made
- Demonstrate excellent written and verbal communication skills
- Able to work independently in a fast-paced environment
- Experience with Agile implementation methodology and working in a globally distributed team structure
- Experience in Hadoop distributions (Apache / Cloudera / Hortonworks) – Basic understanding of Hortonworks distribution would be an added advantage
- Ability to deploy and maintain multi-node Hadoop cluster
- Knowledgeable in techniques for designing Hadoop-based file layout optimized to meet business needs
- Understands the tradeoffs between different approaches to Hadoop file design
- Experience with techniques of performance optimization for both data loading and data retrieval
- Experience in UNIX Shell scripting and Python is an added advantage
- Able to translate business requirements into logical and physical file structure design
- Basic understanding of scheduling process