You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineering Intern

This internship is affiliated with Georgetown Startup Interns (GSI). Learn more at eship.georgetown.edu/GSI

Spurt! is a technology solutions company that helps entrepreneurs to build more efficient, more productive, and strategic businesses. Our practice is focused on deployments of technology, human-centred design thinking, and data analytics for business growth and expansion. We leverage digital and social technologies to trigger and nurture the transformative growth of businesses. Our group's operations are clustered into five operating arms: S.T.E.P., Paperclip, Solutions, MadeIn! and SpurtX!

The Data Engineering intern will work hands-on to build, test, and optimise data pipelines that power Spurt!'s suite of business tools, SpurtX!. This internship focuses on transforming raw data from recruitment, engagement, and performance workflows into reliable, high-quality datasets that fuel analytics dashboards, AI-driven insights, and operational intelligence.

Scope of work includes software development (Python/SQL scripting), data engineering (ETL/pipeline building), research (AI integration, tooling evaluation), and technical documentation.

The internship is remote-first within a collaborative team with structured mentorship Product Manager and Product Development Associates. Interns work autonomously on defined projects with daily sprint plannings, weekly check-ins and code reviews.

Key Deliverables

By the end of the internship, the Data Engineering intern will deliver:

⁠Production-ready data pipeline(s) for at least one core workflow (e.g., recruitment analytics, performance metrics aggregation) that processes real user data with 99%+ accuracy
Automated data quality framework including validation rules, cleansing procedures, and monitoring dashboards that proactively detect and alert on data anomalies
AI-ready dataset(s) prepared for machine learning models, complete with feature engineering documentation, versioning schema, and bias/privacy audit report
Comprehensive technical documentation covering:
- Pipeline architecture diagrams and data flow maps
- Transformation logic and SQL/Python code with inline comments
- Runbook for monitoring, troubleshooting, and maintaining pipelines
- Recommendations for future enhancements (performance optimisation, new data sources, AI integration opportunities
Cloud infrastructure improvements such as:
- Monitoring/alerting setup for pipeline health metrics
- Cost optimisation recommendations based on usage analysis
- Automation scripts for deployment or scaling tasks
Final presentation demonstrating pipeline functionality, data quality metrics, and impact on downstream analytics/AI features, delivered to Product and Engineering teams

Quality standards: All code must pass peer review, follow Spurt! coding standards.

Technical Skills

Python programming: Proficiency in writing clean, efficient scripts for data manipulation, ETL workflows, and automation
SQL: Strong ability to write complex queries (joins, subqueries, aggregations, window functions) for data extraction and transformation
⁠Data pipeline fundamentals: Understanding of ETL/ELT concepts, data modeling (star/snowflake schemas), and workflow orchestration
Cloud platforms: Familiarity with AWS or GCP services (S3, BigQuery, Lambda, CloudWatch) or willingness to learn quickly
Version control: Experience using Git/GitHub for collaborative development
ETL/orchestration tools: Exposure to Airflow, dbt, or similar pipeline frameworks
Data warehousing: Basic knowledge of columnar storage, partitioning, and indexing strategies
Containerisation: Familiarity with Docker for deploying data workflows
Streaming data: Awareness of real-time data processing concepts (Kafka, Kinesis) - exposure is a plus but not required
AI/ML basics: Understanding of how data feeds into machine learning models (feature engineering, train/test splits, embeddings)

Analytical & Professional Skill

Attention to detail: Obsessive about data accuracy, edge cases, and quality validation
Problem-solving: Proactive mindset to identify issues before they escalate and propose preventive solutions
Ownership: Takes full responsibility for projects from start to finish, delivering outcomes that exceed expectations
Communication: Able to document technical work clearly and explain complex data concepts to non-technical stakeholders
Collaboration: Comfortable working across Product, Engineering, and AI teams to align data systems with business needs
Continuous learning: Passion for staying current with data engineering and AI trends, willingness to research new tools/techniques

Mindset & Work Style

Self-directed with ability to work independently while seeking help when stuck
⁠Structured thinker who documents decisions and processes meticulously
Bias for action: ships working solutions iteratively rather than waiting for perfection
Comfortable with ambiguity in early-stage startup environment where requirements evolve