Careers

We are constantly looking for junior IT developers and analysts with an interest in text mining, Natural Language Processing (NLP) & machine learning.

Please email us your resume to careers@sahanatech.com to be contacted about future openings.

Currently, there is one opening as outlined below. This position is ready to be filled now.

Applied Machine Learning Engineer (NLP / AI Systems)

Responsibilities
  • Design, develop, and maintain AI/ML solutions to support NIH grant application intake, peer review workflows, and analytics.
  • Develop and apply and machine learning techniques (e.g., embeddings, classification, clustering, similarity analysis) on NLP and computer vision for tasks such as reviewer–application matching, keyword extraction, and document analysis.
  • Build, evaluate, and iteratively improve machine learning models using structured and unstructured data, including text, documents, and images where applicable.
  • Design and implement end-to-end ML pipelines, including data ingestion, preprocessing, feature/embedding generation, model execution, evaluation, and output generation.
  • Debug, test, and optimize ML pipelines and tools to ensure reliable, consistent, and reproducible results.
  • Refactor and improve code to enhance performance, scalability, and maintainability.
  • Work with large and complex datasets, including implementing data validation, quality checks, and preprocessing workflows.
  • Conduct experiments to evaluate model performance, analyze results, and refine approaches based on quantitative and qualitative findings.
  • Collaborate with cross-functional teams (data scientists, analysts, program staff, and engineers) to translate business needs into practical AI/ML solutions.
  • Ensure reproducibility and transparency through documentation, versioning, and structured workflows.
  • Communicate methods, results, and limitations clearly to both technical and non-technical stakeholders.
  • Stay current with advancements in applied AI/ML, including NLP, embeddings, and generative AI, and evaluate their applicability to NIH use cases.
Required Qualifications
  • Strong programming expertise in Python (preferred) and/or R.
  • Hands-on experience developing and applying machine learning models and workflows.
  • Experience with NLP techniques such as text classification, embeddings, semantic similarity, or related methods.
  • Experience with computer vision for medical images.
  • Demonstrated ability in innovative ideas in solving complex data problems.
  • Experience working with real-world datasets, including data cleaning, preprocessing, and feature engineering.
  • Ability to debug, test, and improve complex code and analytical workflows.
  • Familiarity with at least one modern ML framework (e.g., PyTorch, TensorFlow, scikit-learn, or equivalent).
  • Strong analytical and problem-solving skills, including the ability to evaluate model performance and interpret results.
  • Excellent communication skills, with the ability to explain technical concepts to diverse audiences.

Preferred / Desired Qualifications

  • Experience with transformer-based models or large language models (LLMs), including practical applications such as text analysis or document processing.
  • Experience with reviewer matching, recommendation systems, or document similarity problems.
  • Familiarity with distributed data processing tools (e.g., Spark, Dask, Ray).
  • Experience with experiment tracking, model versioning, or reproducible workflows (e.g., MLflow or similar tools).
  • Familiarity with NIH data systems, biomedical text, or scientific research data.
  • Understanding of evaluation metrics (e.g., accuracy, precision/recall) and model robustness.

Education & Experience

  • Master’s in data science, Computer Science, Computational Linguistics, or related field (or equivalent experience).
  • 3–8+ years of relevant experience in applied machine learning, data science, or a related field.
  • Demonstrated experience delivering end-to-end ML solutions, including model development, evaluation, and pipeline implementation.