Data Engineering Intern
This internship is affiliated with Georgetown Startup Interns (GSI). Learn more at eship.georgetown.edu/GSI
Spurt! is a technology solutions company that helps entrepreneurs to build more efficient, more productive, and strategic businesses. Our practice is focused on deployments of technology, human-centred design thinking, and data analytics for business growth and expansion. We leverage digital and social technologies to trigger and nurture the transformative growth of businesses. Our group's operations are clustered into five operating arms: S.T.E.P., Paperclip, Solutions, MadeIn! and SpurtX!
The Data Engineering intern will work hands-on to build, test, and optimise data pipelines that power Spurt!'s suite of business tools, SpurtX!. This internship focuses on transforming raw data from recruitment, engagement, and performance workflows into reliable, high-quality datasets that fuel analytics dashboards, AI-driven insights, and operational intelligence.
Scope of work includes software development (Python/SQL scripting), data engineering (ETL/pipeline building), research (AI integration, tooling evaluation), and technical documentation.
The internship is remote-first within a collaborative team with structured mentorship Product Manager and Product Development Associates. Interns work autonomously on defined projects with daily sprint plannings, weekly check-ins and code reviews.
- Key Deliverables
By the end of the internship, the Data Engineering intern will deliver:
- Production-ready data pipeline(s) for at least one core workflow (e.g., recruitment analytics, performance metrics aggregation) that processes real user data with 99%+ accuracy
- Automated data quality framework including validation rules, cleansing procedures, and monitoring dashboards that proactively detect and alert on data anomalies
- AI-ready dataset(s) prepared for machine learning models, complete with feature engineering documentation, versioning schema, and bias/privacy audit report
- Comprehensive technical documentation covering:
- Pipeline architecture diagrams and data flow maps
- Transformation logic and SQL/Python code with inline comments
- Runbook for monitoring, troubleshooting, and maintaining pipelines
- Recommendations for future enhancements (performance optimisation, new data sources, AI integration opportunities
- Cloud infrastructure improvements such as:
- Monitoring/alerting setup for pipeline health metrics
- Cost optimisation recommendations based on usage analysis
- Automation scripts for deployment or scaling tasks
- Final presentation demonstrating pipeline functionality, data quality metrics, and impact on downstream analytics/AI features, delivered to Product and Engineering teams
Quality standards: All code must pass peer review, follow Spurt! coding standards.
Technical Skills
- Python programming: Proficiency in writing clean, efficient scripts for data manipulation, ETL workflows, and automation
- SQL: Strong ability to write complex queries (joins, subqueries, aggregations, window functions) for data extraction and transformation
- Data pipeline fundamentals: Understanding of ETL/ELT concepts, data modeling (star/snowflake schemas), and workflow orchestration
- Cloud platforms: Familiarity with AWS or GCP services (S3, BigQuery, Lambda, CloudWatch) or willingness to learn quickly
- Version control: Experience using Git/GitHub for collaborative development
- ETL/orchestration tools: Exposure to Airflow, dbt, or similar pipeline frameworks
- Data warehousing: Basic knowledge of columnar storage, partitioning, and indexing strategies
- Containerisation: Familiarity with Docker for deploying data workflows
- Streaming data: Awareness of real-time data processing concepts (Kafka, Kinesis) - exposure is a plus but not required
- AI/ML basics: Understanding of how data feeds into machine learning models (feature engineering, train/test splits, embeddings)
Analytical & Professional Skill
- Attention to detail: Obsessive about data accuracy, edge cases, and quality validation
- Problem-solving: Proactive mindset to identify issues before they escalate and propose preventive solutions
- Ownership: Takes full responsibility for projects from start to finish, delivering outcomes that exceed expectations
- Communication: Able to document technical work clearly and explain complex data concepts to non-technical stakeholders
- Collaboration: Comfortable working across Product, Engineering, and AI teams to align data systems with business needs
- Continuous learning: Passion for staying current with data engineering and AI trends, willingness to research new tools/techniques
Mindset & Work Style
- Self-directed with ability to work independently while seeking help when stuck
- Structured thinker who documents decisions and processes meticulously
- Bias for action: ships working solutions iteratively rather than waiting for perfection
- Comfortable with ambiguity in early-stage startup environment where requirements evolve