You are viewing a preview of this job. Log in or register to view more details about this job.

Data Science Intern

Responsibilities/ Core Projects/Tasks:

·  Content & Data Quality for AI Search

  • Audit and improve chunk quality across OpenSearch indexes
  • Identify and fix missing or low‑quality metadata via scripts

·  Embedding & Search Evaluation

  • Evaluate and compare embedding models for retrieval quality
  • Build small evaluation datasets (Q&A pairs) to benchmark results 

·  LLM‑Powered Enrichment Pipelines

  • Assist in tuning pipelines that extract summaries, keywords, and tags
  • Help monitor enrichment coverage and quality

·  Observability & Reporting

  • Create scripts or notebooks to report on index health and enrichment status

What the Intern Will Learn:

  • How production RAG (Retrieval-Augmented Generation) systems are assembled, evaluated, and iterated
  • Practical vector search fundamentals (indexing, chunking, metadata, and relevance tuning) in OpenSearch
  • LLM-powered data enrichment pipelines at scale
  • Exposure to graph concepts and applied data modeling (reviewing existing Cypher/Memgraph queries and contributing small improvements)
  • ETL pipeline development for scientific datasets
  • AI search evaluation methodology and benchmarking
  • Working with large technical document collections (Handbooks, magazines, products, videos)

 

Requirements:

  • Senior undergraduate (final-year) or graduate student in Data Science 
  • Python proficiency (production-level scripting)
  • Familiarity with data manipulation (pandas, JSON, REST APIs)
  • Genuine interest in AI and language models
  • Curiosity about materials science or scientific data – no domain expertise required
  • Comfort working in a Linux/Docker environment

Nice to Have:

  • Experience with vector databases (OpenSearch, Pinecone, Weaviate, Chroma, etc.)
  • Exposure to LangChain, LLM APIs, or RAG patterns
  • Experience with graph databases (Neo4j, Memgraph, Cypher)
  • Familiarity with NLP libraries (Hugging Face transformers, spaCy)