
Software Development Internship
Software Development Internship
Location: Remote or On-Site (Austin, TX)
Duration: 3–12 Months
Start Date: Flexible
Commitment: Part-time (15-30 hrs/week)
About the Role: We are seeking a motivated and technically skilled Software Development Intern with a strong foundation in Python programming to join our data engineering team. This internship offers hands-on experience in building scalable ETL pipelines, designing and maintaining web scraping systems, and working with data storage and indexing technologies like PostgreSQL and Elasticsearch. The ideal candidate will be passionate about data, detail-oriented, and eager to solve real-world problems with code.
Key Responsibilities:
- Design and develop robust web scraping scripts to collect structured and unstructured data from diverse sources.
- Build and maintain ETL (Extract, Transform, Load) pipelines to ingest and process large volumes of data.
- Develop modular, reusable, and efficient Python code using popular libraries (e.g., requests, BeautifulSoup, Scrapy, pandas, SQLAlchemy, etc.).
- Integrate and manage data storage using PostgreSQL, ensuring reliability and performance.
- Index and search large datasets using JSON document store for scalable querying and analytics.
- Collaborate with cross-functional teams to define data requirements and support data-driven projects.
- Write clear documentation and unit tests for your code.
Required Qualifications:
- Current enrollment in a Bachelor's or Master's program in Computer Science, Data Science, Engineering, or a related field.
- Solid programming experience in Python.
- Familiarity with web scraping tools and techniques (e.g., handling captchas, rotating proxies, parsing HTML/JSON/XML).
- Understanding of ETL concepts and pipeline architecture.
- Experience with PostgreSQL (or another SQL-based DBMS) and basic SQL querying.
- Strong problem-solving skills, attention to detail, and a growth mindset.
- Preferred Qualifications Experience using Docker and Git-based workflows.
- Familiarity with asynchronous programming (e.g., asyncio, aiohttp) for high-performance scraping.
- Exposure to task scheduling frameworks.
- Familiarity with REST APIs and data validation techniques.
- Interest in working with large language models and machine learning to refine data.
What You'll Gain:
- Real-world experience working on scalable data engineering infrastructure.
- Mentorship from senior engineers and data professionals.
- An opportunity to contribute to meaningful projects and see your work in production.