Data Engineering Intern (3_2025.SIP)
Data Engineering Intern – Data Foundations
Affinity Solutions (Affinity) is the leading consumer purchase insights company. We provide a complete view of U.S. and U.K. consumer spending, across and between brands, via exclusive access to fully permissioned data from over 140 million debit and credit cards. This data is transformed into privacy-compliant, actionable intelligence for marketers, consultancies, and financial services companies to drive strategic growth and lasting customer relationships. Visit us at www.affinitysolutions.com to discover how we're shaping the future of consumer purchase insights.
Affinity Solutions seeks a smart, curious, and technically savvy intern to join our cutting-edge Data Science team.
About Your Internship Role:
The Data Science team at Affinity Solutions builds statistical and machine learning models that help clean, normalize, and transform our data from semi-structured into structured form and power all our adtech/martech products at scale. We build efficient and scalable data pipelines and ML models to understand credit card transaction strings, develop methodology and tools to precisely and effectively measure market campaign effects, and research in-house and public data sources for consumer spend behavior insights.
In this role, you will be working on improving the accuracy, efficiency, and coverage of our geolocation information. You will work with other data engineers and data scientists on normalizing location information in credit card transaction data, improving the coverage and quality of location ground truth datasets, improving the matching algorithms that assign specific store locations to credit card transactions, and developing and improving metrics to assess the quality of the matching process.
This role will follow a hybrid structure, working remotely (within the USA) and going into our NY (Manhattan), CA (San Jose), or TX (Plano) office.
Duration of Internship Role: Summer 2025 (June through August).
Salary: $25/hr for undergraduate students, $30/hr for current graduate students.
Location: NY (Manhattan), CA (San Jose), or TX (Plano). Must be located in the US for the entirety of the internship.
Your Contributions:
- Engage in R&D to develop and improve data pipelines and location-matching algorithms.
- Develop metrics and quality measurement frameworks to ensure high quality of results.
- Mine large consumer datasets in the cloud environment to identify new opportunities for improvements.
- Communicate methodologies and results to management and non-technical stakeholders.
Your Qualifications:
- Currently pursuing an advanced degree in Computer Science or other field that provides advanced training in software engineering and data science.
- Strong software engineering and data engineering skills, especially Python and SQL.
- Strong understanding of database and data warehouse technologies (Postgress, Redshift, Snowflake).
- Entrepreneurial, highly self-motivated, and collaborative attitude, with a keen attention to detail; able to learn quickly and effectively prioritize and execute tasks in a demanding environment.
- Great communication skills (verbal, written and presentation)
- Experience building ETL pipelines for large scale data in the cloud and writing production-quality code are both highly preferred.