You are viewing a preview of this job. Log in or register to view more details about this job.

Data Science Intern

Responsibilities/ Core Projects/Tasks:

· Content & Data Quality for AI Search

· Embedding & Search Evaluation

· LLM‑Powered Enrichment Pipelines

· Observability & Reporting

What the Intern Will Learn:

How production RAG (Retrieval-Augmented Generation) systems are assembled, evaluated, and iterated
Practical vector search fundamentals (indexing, chunking, metadata, and relevance tuning) in OpenSearch
LLM-powered data enrichment pipelines at scale
Exposure to graph concepts and applied data modeling (reviewing existing Cypher/Memgraph queries and contributing small improvements)
ETL pipeline development for scientific datasets
AI search evaluation methodology and benchmarking
Working with large technical document collections (Handbooks, magazines, products, videos)

Requirements:

Senior undergraduate (final-year) or graduate student in Data Science
Python proficiency (production-level scripting)
Familiarity with data manipulation (pandas, JSON, REST APIs)
Genuine interest in AI and language models
Curiosity about materials science or scientific data – no domain expertise required
Comfort working in a Linux/Docker environment

Nice to Have: