Data Engineer
Data Engineer - New Graduate
Redrob | New York, NY | Hybrid
About Redrob
Redrob is a Series A blitzscaling startup building the Android to ChatGPT's iPhone - creating accessible, application-layer LLMs that democratize AI for businesses and professionals worldwide. With $14M in funding from top-tier investors (including early backers of SpaceX and Lyft) and the South Korean government, we're on a mission to bridge the technology gap between developed and emerging markets.
Our global presence spans Seoul, New Delhi, Mumbai, and New York, positioning us at the forefront of making AI economically viable for every individual worldwide.
The Role
We’re seeking a data engineer who bridges software, MLOps, and infrastructure, passionate about supporting real-world AI products. You should enjoy working hands-on with data pipelines, large-scale AWS services, and machine learning environments.
You’ll be working directly with the AI Engineering Team driving the Redrob LLM roadmap, and together you’ll shape the backbone that feeds our fine-tuned AI systems and deliver a production-ready model.
What You'll Do
- Design and implement scalable data pipelines on AWS to ingest, clean, and transform structured and unstructured HR/Sales datasets (e.g., CRM exports, chat logs, resumes, support tickets).
- Develop and manage data lake architecture using AWS S3, Glue Catalog, Athena, and Redshift to support model training, analytics, and evaluation workflows.
- Integrate directly with SageMaker — building preprocessing pipelines, training data manifests, and calibration datasets for distillation, quantization, and fine-tuning.
- Automate ETL and data validation with Glue, Lambda, or Step Functions to enforce schema integrity and ensure JSONL compliance for instruction-tuning datasets.
- Implement robust data governance and lineage tracking using SageMaker Lineage, Glue Data Catalog, and LakeFS or MLflow for reproducibility and auditability.
- Optimize data workflows for performance and cost through Spot usage, S3 Select, and Glue job tuning.
- Ensure security and compliance with enterprise-grade controls — IAM roles, KMS encryption, private VPC endpoints, and data access monitoring via CloudWatch.
- Collaborate cross-functionally with AI, backend, and DevOps teams to integrate high-quality datasets into LLM training pipelines and downstream APIs.
Requirements
- 1+ years of experience in data engineering or ML data infrastructure.
- Strong proficiency in Python (Pandas, PySpark, or Dask).
- Deep knowledge of AWS services like S3, Glue, Athena, Redshift, Step Functions, Lambda, IAM, and KMS.
- Experience designing ETL pipelines using Airflow, Glue Workflows, or Step Functions.
- Familiarity with SageMaker Processing and Training workflows (data ingestion, manifests, and job orchestration).
- Strong SQL and data modeling skills for analytical and ML workloads.
- Experience implementing data versioning and lineage tracking (LakeFS, DVC, MLflow, or similar).
- Solid understanding of data security, encryption, and access management within AWS environments.
- Must be authorized to work in the United States (we cannot sponsor work visas at this time)
What We Offer
- Base Compensation: $100,000 - $120,000 USD
- Benefits: Comprehensive health, dental, and vision insurance
- Retirement: 401(k) plan
- Time Off: Unlimited PTO policy
- Work Style: Hybrid flexibility with our NYC office at 1 Penn Plaza
- Growth: Join a rapidly scaling startup with opportunities for accelerated career growth
- Impact: Work on products that democratize AI access for millions globally
Why Join Redrob?
This is a rare opportunity to join a well-funded startup at an inflection point, working on technology that rivals the biggest names in AI. As a new graduate, you'll have unprecedented exposure to senior leadership, direct impact on product strategy, and the chance to grow your career alongside a company poised for explosive growth.
If you're ready to build the future of accessible AI and make your mark on the global technology landscape, we want to hear from you.
Apply now and help us build the world's most accessible, end-to-end AI stack.
Redrob is an equal opportunity employer committed to building a diverse and inclusive team.