Nineleaps Hiring Big Data Developer

Job Location: Chennai

Experience: 2+ Years

Primary Skill: Hadoop

Responsibility:

Design, build, test and maintain scalable and stable off the shelf applications to support
distributed processing using the Hadoop Ecosystem
Implement ETL and data processes for structured and unstructured data
Pipelines for optimal extraction of data from a wide variety of data sources, ingestion,
transformation, conversion validation
Conduct root cause analysis and advanced performance tuning for complex business
processes and functionality
Ability to review frameworks and design principles towards suitability in the project
context
Client orientation:
Propose the right solutions to the client by identifying & understanding critical pain
points
Contribute to the entire implementation process including driving the definition of
improvements based on business need and architectural improvements
Propose, pitch, sell, implement and prove success in continuous improvement initiatives
Work and collaborate with multiple teams and stakeholders
Agile orientation:
Be a part of the Agile ceremonies to groom stories and develop defect-free code for the
stories
Review code for quality and implementation best practices
Promote coding, testing and deployment best practices through hands-on research and
demonstration
Write testable code that enables extremely high levels of code coverage
Mentor young engineers towards guiding them to become great engineers

Desired Skills/ Experience:

Preferably 4 to 7 years of experience
Highly skilled in:
PySpark and Spark
PySpark SQL and Dataframe APIs
Interpreting Spark execution DAG as displayed in ApplicationMaster
Writing optimal PySpark codes + deep knowledge of Spark parameter tweaking for
execution optimization
Python (2 and 3), including knowledge of libraries like NumPy, Pandas, etc.
Writing sqoop scripts for ETL from TeraData
SQL and Analytical thinking
Strong understanding of:
Hadoop and Spark architectures and the MapReduce framework
Big data storages like HDFS, HBase, Cassandra
Data formats like Avro, Parquet, ORC, etc.
Exposure to at least one big data platform like Hortonworks, Cloudera, HDP, AWS-
EMR, MapR, etc.
Prior experience with:
Using monitoring and administration tools like Ambari, Ganglia, etc.
Scheduling big data applications using Oozie (Including workflow and coordinator
properties)
Good OO skills, including good design patterns knowledge
Good understanding of technologies like Hive, Pig, Presto, Impala, etc.
Prior experience in building spark infrastructure (cluster setup, administration,
performance tuning) [on-premise (bare metal) and / or cloud-based]
Knowledge of software best practices, like Test-Driven Development (TDD) and
Continuous Integration (CI)