Capgemini Hiring Python Data Scientist

Job Description

  • Working experience in various AI powered model implementation with NLP  Deep Learning algorithms
  • Experiences in unstructured text data analysis  language  speech  image and video data analysis across multiple industries e g  manufacturing  retail  etc
  • Develop connection from Spark streaming to Kafka Flume using Python
  • Examine Streaming performance and provide optimal and precise development using Python PySpark  includes connecting to Data  Structured or Unstructured   Extraction  Cleaning
  • Develop Models classification or clustering using MLib or Anaconda
  • Gather  evaluate and document business requirements related to analytics  translate to analytics solution definition  and ability to implement using Python or Scala
  • Data extraction from Raw files using Python Anaconda or built in for POC
  • Data pulling or creating from different sources such as HBase  Hive  Impala or MongoDB
  • Responsible for analyzing data from multiple data sources  DBs  flat files  etc    and building predictive models using Python
  • Linux shell scripting with Python and cron jobs to schedule the run  Batch or Real time
  • Scala knowledge is preferable in some cases
  • Different models and their performances in Real time and Batch developed using Python  Pandas  MLib PySpark and opting the better solution depending on the cases
  • Validate the models   statistically as well as from business perspective in discussions with business stakeholders
  • Ability to support and guide model deployment and model lifecycle management
  • Create model documentation as per client  regulatory standards
  • Degree in a quantitative field  Math  Statistics  Economics  Computer Science  and  or Engineering  MBA
  • Experience and skilled Python  incl  PySpark  Spark  MLib
  • Hands on experience in analytical techniques including sampling  clustering  decision trees  forecasting  SVM  Random Forest and linear  logistic regression
  • Hands on experience in Python  PySpark  MLib  Spark Mesos
  • Hands on experience using Hive  Hbase  Impala
  • Knowledge on Kafka and Flume is a plus
  • Data exploration using OpenCV  NumPy  Matplotlib  SciPy and Pandas for image analysis
  • Good Knowledge on Python based libraries e g  Keras   Tensor flow    Knowledge on Scala will be beneficial
  • Working with AWS  Cloudera Horton Works    Knowledge of where analytics fits in to an end to end business solution
  • Ability to work with business and technology teams to build and deploy an analytical solution as per client needs
  • Ability to multi task  solve problems and think strategically
  • Strong communication and collaboration skills
  • Working experience in various other data science technologies e g  R  SAS  SPSS  Matlab are also preferred