Junior Data Engineer
We're seeking a Junior Data Engineer with a passion for solving real-world problems using large, complex datasets. This is an exciting opportunity to join a collaborative team that’s laying the foundation for future AI/ML applications. In this role, you’ll help us tackle internal data challenges and develop scalable, high-quality datasets critical to machine learning success.
You’ll work closely with a cross-functional team handling diverse datasets — including invoices, software and hardware asset data — and play a key role in data wrangling, preparation, and quality assurance. This is a growth-focused position, ideal for candidates eager to develop their careers toward AI/ML engineering or data science.
Responsibilities
Collaborate with data scientists and engineers to clean, preprocess, and structure large-scale datasets for AI/ML readiness.
Support the team in building ETL pipelines and applying data wrangling techniques to prepare real-world datasets.
Conduct data quality assessments and document issues or anomalies for remediation.
Contribute to feature engineering efforts and create ML-ready datasets.
Help manage and manipulate data using Python, Spark, and cloud-based data services.
Participate in Agile ceremonies and adapt to evolving project needs.
Learn and contribute to model development, deployment, and experimentation as you grow.
Primary Skills (Highest Priority)
Data Engineering: Advanced skills in ETL, data wrangling, and preprocessing large structured and semi-structured datasets.
Dataset Quality Assessment: Ability to detect and address data issues to ensure high-quality inputs for ML models.
Feature Engineering: Experience transforming raw data into features used in ML pipelines.
Python Programming: Proficiency in Python for data manipulation and scripting.
ML Framework Familiarity: Exposure to libraries such as Scikit-learn, TensorFlow, PyTorch, or Keras.
Spark Proficiency: Familiarity with Spark DataFrames, RDDs, and Dataset for large-scale data processing.
Secondary Skills (Nice to Have)
Experience with open-source LLMs (e.g., LLaMA, Mixtral) and orchestration frameworks (e.g., LangChain, CrewAI).
Understanding of common ML/DL algorithm development and tuning.
Exposure to cloud platforms (AWS, Azure, or GCP) including services like SageMaker or Bedrock.
Familiarity with MLOps practices such as model versioning, deployment, and monitoring.
Git version control and collaborative coding experience.
Tertiary Skills (Desirable)
AI Application Development: End-to-end AI solution development experience.
API development for AI/ML model integration.
Prompt engineering for LLM-based use cases.
Agile/Scrum team experience.
Willingness and ability to mentor and grow with the team.
Why Join Us
Growth Path: This role is designed as a launchpad to more advanced AI/ML positions.
Collaborative Culture: Work in a supportive team that values knowledge-sharing and continuous learning.
Real Impact: Help shape the foundational datasets that fuel meaningful AI/ML innovations.