GSS Infotech Hiring Big Data Hadoop Developer

Job Location: Chennai

Experience: 2+ Years

Primary Skill: Knowledge for Hadoop

Job Duties:

  • Responsible for designing and implementing Spark and Hadoop (Hive, Hive UDF & SQOOP) modules.
  • Involve in architecture of Spark Engine for fast in-memory data processing and to save data in hive tables (HDFS).
  • Involve in data ingestion framework solutions from the Oracle and RDBMS Sources to Hadoop Environment.
  • Work on XLS based rules which creates the PIPE file and tags dynamically which process the data from source to destination.
  • Use Spark SQL to process the huge amount of structured data.
  • Work on Hive partitioning, bucketing, and perform different types of joins on Hive tables.
  • Develop custom User Defined Function (UDF’s) in Hive to transform the large volumes of data with respect to business requirement.
  • Develop Spark code to replace Mainframe process of generating system generated key files, a pre-requisite to process data.
  • Implement multiple use cases with spark core, spark SQL and spark streaming.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Work in loading and transforming of large sets of structured data.
  • Use Spark SQL to process the huge amount of structured data.
  • Analyze large amounts of data sets to determine optimal way to aggregate and report on it.
  • Use Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Fast data computation platform (Customized platform using Apache Spark, Kafka, SQL, Java and Scala libraries) – for design and development of analytic solutions including writing software’s for parallel/distributed computation.
  • Explore with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD’s, spark streaming, YARN And used CSV, Text, Avro, Parquet file formats.
  • Analyze large amounts of data sets to determine optimal way to aggregate and report on it.
  • Deploying the spark jobs and monitoring the performance of Spark Jobs by analyzing the Spark DAG in Spark UI.
  • Work on tuning the Spark Jobs by broadcasting large variables, setting data locality, caching judiciously, applying storage levels, using right level of parallelism, applying kryo serialization, setting execution memory and Storage memory.