- Analyze and correlate complex data from multiple sources.
- Perform exploratory data analysis to gain deeper understanding of the data.
- Work with stakeholders, including domain experts and product, to understand use cases and domain-specific problems and identify opportunities for applying machine learning to drive outcomes.
- Develop tools and algorithms for generating synthetic data sets.
- Define KPIs to measure model performance.
- Develop and test statistical and machine learning models for efficacy and operational impact.
- Write production quality code and work with other software engineering teams to deploy models into production.
- Support deployed models as a subject matter expert.
- Be creative and engineer novel features and methods to push beyond our current capabilities.
We are looking for someone who has
- Proven track record in the research, data science and engineering around building and shipping machine learning or statistical models that scale to high volumes of data (billions of data points).
- Proficiency with Python and SQL.
- Experience using key ML libraries including scikit-learn, tensorflow, pytorch, sparkml.
- Experience with data fusion and data cleansing techniques for creating labelled data sets.
- Experience with developing synthetic data sets using generative and statistical modelling techniques.
- Understanding of classical ML and DNN modelling techniques for supervised and unsupervised learning and sequence modelling.
- Understanding of 3 of either PGMs, RL, GNN, statistical modelling, or time series modelling.
- Understanding of ensembles and multi modal machine learning techniques to solve complex problems.
- Familiarity of large-scale distributed systems and related technologies such as Spark, Elasticsearch, Kubernetes, etc. is a plus.
- Familiarity with cloud platforms (we use AWS here) and automation technologies (e.g., Kubernetes, Jenkins, Chef, etc.) is a plus.
- Familiarity with the information/cyber security domain is a plus.