You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer — AI & Data Infrastructure

We’re looking for a Data Engineer with a strong foundation in data pipelines and a meaningful edge in AI-native data infrastructure — specifically RAG pipelines, vector search, embedding workflows, and semantic retrieval systems.

You’ll work on two interconnected problem sets:

The first is foundational: consolidating eight legacy systems into a unified, reliable data platform — ETL pipelines, a data warehouse, and cross-system client identity resolution.

The second is where the work gets genuinely interesting: transforming three decades of institutional research into an intelligent, searchable, interactable knowledge layer that clients can query in ways that weren’t possible two years ago.

This is a small, senior team. You’ll work directly with the CTO, have real architectural ownership, and build systems that are in production — not in a sandbox.

 

What You’ll Work On

Data Foundation & Migration

  • Lead the data engineering work for our research portal migration — extracting, transforming, and loading data from legacy systems into modern cloud infrastructure
  • Build and maintain ETL/ELT pipelines across multiple integration points: CRM, research distribution platforms, trading systems, and third-party APIs
  • Design and implement our “Golden Record” initiative — cross-system client identity resolution across eight legacy databases with no unified identifiers
  • Implement event-driven data flows using AWS EventBridge as the central routing layer, treating each source system as a swappable adapter

 

AI-Native Data Infrastructure (RAG & Search)

  • Design and build production-grade RAG (Retrieval-Augmented Generation) pipelines over AGCO’s research archive — ingestion, chunking strategy, embedding generation, vector storage, and retrieval
  • Implement hybrid search approaches that combine semantic (vector) search with keyword and metadata filtering, appropriate for structured financial research queries
  • Build and maintain embedding pipelines that keep the vector store current as new research is published, with full observability and freshness guarantees
  • Evaluate and implement emerging retrieval strategies as the space evolves:
  • Re-ranking with cross-encoders
  • Hypothetical Document Embeddings (HyDE)
  • Query expansion and decomposition
  • Graph-based retrieval (e.g., GraphRAG) for analyst relationship mapping
  • Structured metadata retrieval for faceted financial queries
  • Wire retrieval layers into LLM interfaces for research summarization, analyst Q&A, and recommendation-change tracking across the archive
  • Enable client queries such as: “Show me all emerging market buy recommendations from analysts with 10+ years of coverage who changed their view in the last 6 months”

 

DevOps & Data Infrastructure

  • Apply DataOps practices across all pipelines: version control, CI/CD, environment parity across dev/staging/production, and infrastructure as code
  • Monitor pipeline health, embedding freshness, retrieval quality, and LLM call latency — build alerting that catches problems before users do
  • Work within AGCO’s AWS environment (App Runner, EventBridge, CDK) and contribute to IaC best practices

 

Collaboration & Documentation

  • Partner with the CTO, product team, and application developers to translate business requirements into sound data and retrieval architecture decisions
  • Document data flows, schema designs, chunking strategies, and retrieval logic so systems are maintainable and not a black box
  • Contribute to evaluation frameworks for retrieval quality — precision, recall, answer faithfulness — so we know when the system is actually working