We are constantly looking for junior IT developers and analysts with an interest in text mining, Natural Language Processing (NLP) & machine learning.
Please email us your resume to careers@sahanatech.com to be contacted about future openings.
Currently, there is one opening as outlined below. This position is ready to be filled now.
Applied Machine Learning Engineer (NLP / AI Systems)
Responsibilities
- Design, develop, and maintain AI/ML solutions to support NIH grant application intake, peer review workflows, and analytics.
- Develop and apply and machine learning techniques (e.g., embeddings, classification, clustering, similarity analysis) on NLP and computer vision for tasks such as reviewer–application matching, keyword extraction, and document analysis.
- Build, evaluate, and iteratively improve machine learning models using structured and unstructured data, including text, documents, and images where applicable.
- Design and implement end-to-end ML pipelines, including data ingestion, preprocessing, feature/embedding generation, model execution, evaluation, and output generation.
- Debug, test, and optimize ML pipelines and tools to ensure reliable, consistent, and reproducible results.
- Refactor and improve code to enhance performance, scalability, and maintainability.
- Work with large and complex datasets, including implementing data validation, quality checks, and preprocessing workflows.
- Conduct experiments to evaluate model performance, analyze results, and refine approaches based on quantitative and qualitative findings.
- Collaborate with cross-functional teams (data scientists, analysts, program staff, and engineers) to translate business needs into practical AI/ML solutions.
- Ensure reproducibility and transparency through documentation, versioning, and structured workflows.
- Communicate methods, results, and limitations clearly to both technical and non-technical stakeholders.
- Stay current with advancements in applied AI/ML, including NLP, embeddings, and generative AI, and evaluate their applicability to NIH use cases.
Required Qualifications
- Strong programming expertise in Python (preferred) and/or R.
- Hands-on experience developing and applying machine learning models and workflows.
- Experience with NLP techniques such as text classification, embeddings, semantic similarity, or related methods.
- Experience with computer vision for medical images.
- Demonstrated ability in innovative ideas in solving complex data problems.
- Experience working with real-world datasets, including data cleaning, preprocessing, and feature engineering.
- Ability to debug, test, and improve complex code and analytical workflows.
- Familiarity with at least one modern ML framework (e.g., PyTorch, TensorFlow, scikit-learn, or equivalent).
- Strong analytical and problem-solving skills, including the ability to evaluate model performance and interpret results.
- Excellent communication skills, with the ability to explain technical concepts to diverse audiences.
Preferred / Desired Qualifications
- Experience with transformer-based models or large language models (LLMs), including practical applications such as text analysis or document processing.
- Experience with reviewer matching, recommendation systems, or document similarity problems.
- Familiarity with distributed data processing tools (e.g., Spark, Dask, Ray).
- Experience with experiment tracking, model versioning, or reproducible workflows (e.g., MLflow or similar tools).
- Familiarity with NIH data systems, biomedical text, or scientific research data.
- Understanding of evaluation metrics (e.g., accuracy, precision/recall) and model robustness.
Education & Experience
- Master’s in data science, Computer Science, Computational Linguistics, or related field (or equivalent experience).
- 3–8+ years of relevant experience in applied machine learning, data science, or a related field.
- Demonstrated experience delivering end-to-end ML solutions, including model development, evaluation, and pipeline implementation.