ACS-I India is looking for a Data Scientist would apply their analytical, statistical, and programming skills to collect, process, and analyze large data sets related to chemistry, materials science, or life sciences domains, such as chemical structures, materials or chemical properties, drug activity measures etc. You will work with other scientists and stakeholders to understand business problems, develop data-driven solutions, and communicate insights using data visualization techniques. This person would help in formulating strategies for data extraction, aggregation, automated data mining and model developing.
About ACS-I India:
ACS International India Pvt Ltd. (ACS-I India) is a wholly owned subsidiary of ACS International Ltd, USA and a part of the American Chemical Society. ACS-I India represent products and services provided by ACS divisions, including CAS to the world's most important scientific companies, government organizations, global patent offices and academic institutions to promote research and discovery.
About CAS:
CAS is a division of the American Chemical Society and is a source of chemical information. CAS provides products and services, solutions for researchers and professional searchers, and support and training. CAS has provided the most comprehensive repository of research in chemistry and related sciences for over 100 years. CAS finds, collects and organizes all publicly disclosed substance information and creates the world's most valuable chemistry databases. Scientists and patent professionals across the world rely on this database.
Job Responsibilities:
- Efficiently communicate with other scientists on the project, actively and creatively develop solutions to support the overall project goals.
- Combine strong software development skills with a working knowledge of basic chemistry/physics/biology to develop sophisticated informatics solutions that drive efficiencies in data-based insights development.
- Gather data from various internal and external sources, such as public and proprietary databases, literature etc.
- Process, extract, clean, transform, and integrate data using appropriate tools and methods.
- Build predictive models using machine learning algorithms and frameworks, such as TensorFlow, PyTorch, Scikit-learn etc.
- Present information and insights using data visualization techniques, such as matplotlib,
- Capable of self-directed research within broader goals set by group. seaborn, plotly, Power BI, Tableau etc.
- Manage multiple projects at any given time along with tracking project milestones.
- Should be able to teach and train his/her team in all the above-mentioned aspects as and when required.
Ideal Candidate will have:
- Experience with data engineering tools and techniques.
- Experience with big data technology stack (Hadoop, Spark, HDFS, EMR, Glue).
- Experience with AWS DevOps tools (CodeCommit, Cloud Development Kit, CDK Pipeline).
- Experience with Databricks/SageMaker/DataRobot, MLFlow or other ML and MLOps tools.
- Experience with cheminformatics toolkits (e.g., OpenEye, CDK, RDKit) is plus.
- Experience building applications using AWS Serverless technologies such as Lambda,
- SQS, Fargate, DynamoDB, S3.
- Experience with scientific databases such as Medline, NCBI, PubChem, EMBL, SciFnder etc.
- Overseas experience in working arrangements - working with teams off continent (e.g., N. America, Europe, etc.)
Job Requirements:
- PhD in Computer Science/ Cheminformatics/ Bioinformatics/ Computational Biology/ Medicinal Chemistry
- Applied Statistics or a related field.
- 3-5 years of post-degree experience working with large data sets/software development
- Experience building applications for public cloud environments (AWS preferred).
- Proficiency in programming languages such as Java/Scala/JavaScript/TypeScript/Python.
- Proficiency in Linux/Unix environments.
- Experience with databases technologies (relational, NoSQL, property graph, RDF/triple store).
- Self-motivated, proactive and excellent in communication skills
If interested, please send your cover letter and CV to [Confidential Information].