Overview
Position: Data Scientist (Cheminfo)Location: Pune
ACS International India Pvt Ltd (ACS-I India) is looking for a Data Scientist would apply their analytical, statistical, and programming skills to collect, process, and analyze large data sets related to chemistry, materials science, or life sciences domains, such as chemical structures, materials or chemical properties, drug activity measures etc. You will collaborate with stakeholders across CAS to develop innovative data-driven solutions, using consultative problem-solving to address complex challenges
About ACS-I India
ACS International India Pvt Ltd. (ACS-I India) is a wholly owned subsidiary of ACS International Ltd, USA and a part of the American Chemical Society. ACS-I India represent products and services provided by ACS divisions, including Chemical Abstracts Service (CAS) to the world’s most important scientific companies, government organizations, global patent offices and academic institutions to promote research and discovery.
About CAS
Chemical Abstracts Service is a division of the American Chemical Society. It is a source of chemical information. The Company provides products and services, solutions for researchers and professional researchers, and support and training. CAS has provided the most comprehensive repository of research in chemistry and related sciences for over 100 years. The CAS finds, collects and organizes all publicly disclosed substance information and creates the world's most valuable chemistry databases. Scientist and patent professionals across the world rely on this database.
Job Responsibilities
- Efficiently communicate with other scientists on the project, actively and creatively develop solutions to support the overall project goals.
- Combine strong software development skills with a working knowledge of basic chemistry/physics/biology to develop sophisticated informatics solutions that drive efficiencies in data-based insights development.
- Build predictive models using machine learning algorithms and frameworks, such as TensorFlow, PyTorch, Scikit-learn etc.
- Apply NLP, machine learning, and deep learning in various domains.
- Present information and insights using data visualization techniques, such as matplotlib, plotly etc.
- Capable of self-directed research within broader goals set by group.
- Manage multiple projects at any given time along with tracking project milestones.
- Should be able to teach and train his/her team in all the above-mentioned aspects as and when required.
- PhD in Computer Science/ Cheminformatics/ Bioinformatics/ Computational Biology/ Medicinal Chemistry, Applied Statistics or a related field.
- 3+ years of post-degree experience working with large data sets/software development
- Experience building applications for public cloud environments (AWS preferred).
- Proficiency in programming languages such as Java/Scala/JavaScript/TypeScript/Python.
- Proficiency in Linux/Unix environments.
- Experience with databases technologies (relational, NoSQL, property graph, RDF/triple store).
- Self-motivated, proactive and excellent in communication skill.
- Experience with cheminformatics toolkits (e.g., OpenEye, CDK, RDKit) is plus.
- Experience with big data technology stack (Hadoop, Spark, HDFS, EMR, Glue).
- Experience with AWS DevOps tools (CodeCommit, Cloud Development Kit, CDK Pipeline).
- Experience with Databricks/SageMaker/DataRobot, MLFlow or other ML and MLOps tools.
- Experience building applications using AWS Serverless technologies such as Lambda, SQS, Fargate, DynamoDB, S3.
- Experience with Generative AI, LLMs, and prompt engineering.
- Experience with Neo4j and graph analytics.
- Overseas experience in working arrangements - working with teams off continent (e.g., N. America, Europe, etc.)