Responsibilities:
- Employ and leverage NLP and open-source Large Language Models (LLM) such as LLama2, Mixtral, BERT, etc., to extract, process, and interpret unstructured medical data from diverse sources like EHRs, medical notes, and laboratory reports.
- Collaborate with clinical scientists and data scientists to create efficient NLP models for healthcare, exhibiting an understanding of both the technical and medical aspects of the data.
- Conduct data cleaning, preprocessing, and validation to maintain the accuracy and reliability of insights gathered from NLP processes.
- Validate and present data findings to stakeholders, exhibiting clear and effective communication skills
Required Skills/Qualifications:
- Masters or Ph.D. degree in Computer Science, Data Science, Computational Linguistics, or a related analytical field.
- Deep understanding and direct experience (2+ years) in handling and interpreting electronic health records (EHR) and laboratory test results are a must.
- Proven experience (2+ years) in NLP with a strong knowledge of NLP techniques such as Named Entity Recognition (NER), text summarization, topic modelling, etc. and their applied use in healthcare.
- Expert-level understanding and practical experience (1+ years) with Large Language Models (LLM), e.g., inference and fine-tuning.
- Proficient in Python and SQL, with strong experience in NLP libraries such as NLTK, SpaCy, Hugging Face Transformers, and deep learning libraries such as PyTorch and TensorFlow.
- Familiarity with common data science and ML practices, e.g., version control systems, agile methodologies, and documentation.
- Experience working with the AWS cloud environment and large databases (e.g., AWS Redshift).
- Experience in managing the ML lifecycle using open-source tools (e.g., MLflow).
- Detail-oriented with strong analytical and problem-solving abilities.
- Excellent verbal and written communication skills, with the ability to present complex data to non-technical audience.
Preferred Qualifications:
- Experience dealing with protected health information (PHI) and familiarity with healthcare-related data privacy laws such as HIPAA.
- Familiarity with standard healthcare codes and terminologies such as ICD-10, CPT, LOINC, and SNOMED CT.
- Experience in RAG (Retrieval-Augmented Generation) and vector storage in the context of storing a large volume of healthcare unstructured documents and querying those.