Role: NLP Data Scientist AI Real World Data (RWD)
Location : India, Remote (Work from Anywhere in India)
Minimum Qualification : Master's or Ph.D. degree in Computer Science, Data Science
Indicative Experience : 2-7 Years
Salary : Best in the Market
Domain: Preferably Life Sciences/Pharma.
Other benefits : Health Insurance, Provident Fund, Life Insurance, Reimbursement of Certification Expenses, Gratuity, 24x7 Health Desk
About Norstella
At Norstella, our mission is simple: to help our clients bring life-saving therapies to market quickerand help patients in need.
Founded in 2022, but with history going back to 1939, Norstella unites best-in-class brands to help clients navigate the complexities at each step of the drug development life cycle and get the right treatments to the right patients at the right time.
Each Organization (Citeline, Evaluate, MMIT, Panalgo, The Dedham Group) Delivers Must-have Answers For Critical Strategic And Commercial Decision-making. Together, Via Our Market-leading Brands, We Help Our Clients
- Citeline accelerate the drug development cycle
- Evaluate bring the right drugs to market
- MMIT identify barrier to patient access
- Panalgo turn data into insight faster
- The Dedham Group think strategically for specialty therapeutics
By combining the efforts of each organization under Norstella, we can offer an even wider breadth of expertise, cutting-edge data solutions and expert advisory services alongside advanced technologies such as real-world data, machine learning and predictive analytics.
As one of the largest global pharma intelligence solution providers, Norstella has a footprint across the globe with teams of experts delivering world class solutions in the USA, UK, The Netherlands, Japan, China and India.
Job Description
We are seeking a skilled NLP data scientist with a focus on language models to join our AI and Life Sciences Solutions team. Your expertise in processing and understanding natural language data, along with your knowledge of Electronic Health Records (EHR) and laboratory report analysis, will be instrumental in driving our data science initiatives and innovations, particularly in the development of rich multimodal real-world datasets to expedite RWD-driven drug development in pharma.
Responsibilities
- Employ and leverage NLP and open-source Large Language Models (LLM) such as LLama2, Mixtral, BERT, etc., to extract, process, and interpret unstructured medical data from diverse sources like EHRs, medical notes, and laboratory reports.
- Collaborate with clinical scientists and data scientists to create efficient NLP models for healthcare, exhibiting an understanding of both the technical and medical aspects of the data.
- Conduct data cleaning, preprocessing, and validation to maintain the accuracy and reliability of insights gathered from NLP processes.
- Validate and present data findings to stakeholders, exhibiting clear and effective communication skills
Required Skills/Qualifications
- Master's or Ph.D. degree in Computer Science, Data Science, Computational Linguistics, or a related analytical field.
- Deep understanding and direct experience (2+ years) in handling and interpreting electronic health records (EHR) and laboratory test results are a must.
- Proven experience (2+ years) in NLP with a strong knowledge of NLP techniques such as Named Entity Recognition (NER), text summarization, topic modelling, etc. and their applied use in healthcare.
- Expert-level understanding and practical experience (1+ years) with Large Language Models (LLM), e.g., inference and fine-tuning.
- Proficient in Python and SQL, with strong experience in NLP libraries such as NLTK, SpaCy, Hugging Face Transformers, and deep learning libraries such as PyTorch and TensorFlow.
- Familiarity with common data science and ML practices, e.g., version control systems, agile methodologies, and documentation.
- Experience working with the AWS cloud environment and large databases (e.g., AWS Redshift).
- Experience in managing the ML lifecycle using open-source tools (e.g., MLflow).
- Detail-oriented with strong analytical and problem-solving abilities.
- Excellent verbal and written communication skills, with the ability to present complex data to non-technical audience.
Preferred Qualifications
- Experience dealing with protected health information (PHI) and familiarity with healthcare-related data privacy laws such as HIPAA.
- Familiarity with standard healthcare codes and terminologies such as ICD-10, CPT, LOINC, and SNOMED CT.
- Experience in RAG (Retrieval-Augmented Generation) and vector storage in the context of storing a large volume of healthcare unstructured documents and querying those.
Our guiding principles for success at Norstella
01: Bold, Passionate, Mission-First
We have a lofty mission to Smooth Access to Life Saving Therapies and we will get there by being bold and passionate about the mission and our clients. Our clients and the mission in what we are trying to accomplish must be in the forefront of our minds in everything we do.
02: Integrity, Truth, Reality
We make promises that we can keep, and goals that push us to new heights. Our integrity offers us the opportunity to learn and improve by being honest about what works and what doesn't. By being true to the data and producing realistic metrics, we are able to create plans and resources to achieve our goals.
03: Kindness, Empathy, Grace
We will empathize with everyone's situation, provide positive and constructive feedback with kindness, and accept opportunities for improvement with grace and gratitude. We use this principle across the organization to collaborate and build lines of open communication.
04: Resilience, Mettle, Perseverance
We will persevere even in difficult and challenging situations. Our ability to recover from missteps and failures in a positive way will help us to be successful in our mission.
05: Humility, Gratitude, Learning
We will be true learners by showing humility and gratitude in our work. We recognize that the smartest person in the room is the one who is always listening, learning, and willing to shift their thinking.
Skills: nlp,natural language processing,python,pytorch,tensorflow,data science,sql,aws,amazon redshift