Capgemini Hiring Python Data Scientist

Job Description

Working experience in various AI powered model implementation with NLP Deep Learning algorithms
Experiences in unstructured text data analysis language speech image and video data analysis across multiple industries e g manufacturing retail etc
Develop connection from Spark streaming to Kafka Flume using Python
Examine Streaming performance and provide optimal and precise development using Python PySpark includes connecting to Data Structured or Unstructured Extraction Cleaning
Develop Models classification or clustering using MLib or Anaconda
Gather evaluate and document business requirements related to analytics translate to analytics solution definition and ability to implement using Python or Scala
Data extraction from Raw files using Python Anaconda or built in for POC
Data pulling or creating from different sources such as HBase Hive Impala or MongoDB
Responsible for analyzing data from multiple data sources DBs flat files etc and building predictive models using Python
Linux shell scripting with Python and cron jobs to schedule the run Batch or Real time
Scala knowledge is preferable in some cases
Different models and their performances in Real time and Batch developed using Python Pandas MLib PySpark and opting the better solution depending on the cases
Validate the models statistically as well as from business perspective in discussions with business stakeholders
Ability to support and guide model deployment and model lifecycle management
Create model documentation as per client regulatory standards
Degree in a quantitative field Math Statistics Economics Computer Science and or Engineering MBA
Experience and skilled Python incl PySpark Spark MLib
Hands on experience in analytical techniques including sampling clustering decision trees forecasting SVM Random Forest and linear logistic regression
Hands on experience in Python PySpark MLib Spark Mesos
Hands on experience using Hive Hbase Impala
Knowledge on Kafka and Flume is a plus
Data exploration using OpenCV NumPy Matplotlib SciPy and Pandas for image analysis
Good Knowledge on Python based libraries e g Keras Tensor flow Knowledge on Scala will be beneficial
Working with AWS Cloudera Horton Works Knowledge of where analytics fits in to an end to end business solution
Ability to work with business and technology teams to build and deploy an analytical solution as per client needs
Ability to multi task solve problems and think strategically
Strong communication and collaboration skills
Working experience in various other data science technologies e g R SAS SPSS Matlab are also preferred