- Creating complex data processing pipelines, as part of diverse, high energy teams
- Designing scalable implementations of the models developed by our Data Scientists
- Hands-on programming based on TDD, usually in a pair programming environment
- Deploying data pipelines in production based on Continuous Delivery practices
- Advising clients on the usage of different distributed storage and computing technologies from the plethora of options available in the ecosystem
What we look for in you?
- Strong development experience is a must. Consistent track record for education and professional career.
- Experience with Apache Spark (required)
- Experience with Hadoop administration and development (required)
- Good to have experience with Storm, Kafka, NiFi, Spark Streaming, Spark MLlib, Spark GraphX, Flink, Samza, Map Reduce
- Familiarity with data loading tools like Flume, Sqoop.
- Knowledge of workflow/schedulers like Oozie.
- Proven understanding with Hadoop, HBase, Hive, Pig, and HBase.
- Good understanding of Object oriented design, Design Patterns
- Has done development or debugging on Linux/ Unix platforms.
- Motivation to learn innovative trade of programming, debugging and deploying
- Self starter, with excellent self-study skills and growth aspirations
- Excellent written and verbal communication skills. Flexible attitude, perform under pressure.
- Test driven development, a commitment to quality and a thorough approach to the work.
- A good team player with ability to meet tight deadlines in a fast-paced environment
- Suitable qualifications and industry certifications
Skills we’re looking for
- 4+ years Big Data ecosystem experience along with admin, development, cloud and app integration experience
- 3+ years Consulting experience
- 3+ years enterprise projects – customer centricity, optimization, predictive engines, enterprise data hub
- Experience in Big Data application development involving various data processing techniques Data Ingestion, In-Stream data processing, Batch Analytics
- Excellent knowledge, experience with the Hadoop stack (Hadoop, Spark, Spark Streaming, H2o.ai, Hbase, Sqoop, Flume, Shark, Oozie, etc.).
- Solid exposure to Core Java and distributed computing
- Good understanding of NoSQL platforms like HBase, Couch Base, Vertica, MongoDB, Cassandra
- Proficient in SQL queries and stored procedures.
- Proficient in SQL, NoSQL, relational database design and methods for efficiently retrieving data Prior experience with Hadoop, HBase, Hive, Pig and Map/Reduce.