Job Location: Chennai
Experience: 2+ Years
Primary Skill: Hadoop
In this role you will be responsible for:
- Designing architecture, extending process and components that are responsible for the core of BigTapp’s
systems, using today’s best practices and modern technologies. - Building, optimizing, and maintenance of a high-performance data pipeline for the ingestion of consumer data from
multiple sources - Ensuring the correctness of data flowing through the data pipeline
- Productionalizing results from Data Science
- Code reviews; Maintaining appropriate documentation
- Discipline of building/deploying containerized applications with Docker, understand how infrastructure as code is
built with Terraform or Kubernetes. - To absorb an abstract idea and produce a design from concept to production fairly independently.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of
data sources using SQL and AWS ‘big data’ technologies. - Ensuring code has tests to catch gross errors prior to deployment
- Maintaining technical competence in the current state of the art of data and data pipelines
- Collaborating with Data Science in implementation and testing designs
Communication is critical for this role:
- Need to maintain communication with Data Team and the rest of the company
- Participate in status checks and data technical discussions
- Assist with presentations to Bridg teams and leaders on release updates, demos, progress, challenges, etc.
Your Background:
- Proficient in Java 8
- Experience in Data Platform concepts such as Flink, Spark, Beam, Google Dataflow, etc. We use Flink
- NoSQL usage of Mongodb, Elasticsearch, Cassandra, etc. We use Mongodb and Cassandra
- Debugging expertise
- Four Year Bachelor’s degree in Computer Science or Engineering.
- Expert in Golang, Python, Java, C++
- Experience building and optimizing ‘big data’ pipelines, architectures and data sets.
- Strong analytic skills related to working with unstructured datasets.
- Experience using big data tools: Hadoop, Spark, Kafka, etc.
- Preferred experience with AWS cloud services: EC2, EMR, RDS, Redshift, stream-processing systems: Storm,
Spark-Streaming, etc.
It will be awesome if you have experience with some or all the following:
- Kafka
- Python, R, Javascript
- Queuing (activeMQ, SQS, redis, etc.)
- AWS cloud (EC2, ECS, VPC, IAM, CloudFormation)