You are viewing a preview of this job. Log in or register to view more details about this job.

Data Engineer - Machine Learning Pipelines (Ray/Kubernetes) - Internship

ABOUT RIBBON COMMUNICATIONS Ribbon Communications is a company with two decades of leadership in real-time communications. Built on world-class technology and intellectual property, the company delivers intelligent, secure, embedded real-time communications for today’s world. The company transforms fixed, mobile and enterprise networks from legacy environments to secure IP and cloud-based architectures, enabling highly productive communications for consumers and businesses. With 64 locations in 27 countries around the globe, Ribbon’s innovative, market-leading portfolio empowers service providers and enterprises with rapid service creation in a fully virtualized environment. To learn more, visit rbbn.com.

OPPORTUNITY

Ribbon Communications is looking for a Data Engineering Intern to help build and optimize machine learning pipelines using Ray in a Kubernetes environment on Ribbon Analytics. You will work closely with our Machine Learning Engineers and Data Scientists to develop a scalable pipeline that ingests data from an SQL database and performs anomaly detection on large datasets.

Ribbon Analytics is a big data network analytics and security product that collects, processes and reacts to massive amounts of data collected from the network, leveraging machine learning and other techniques to analyze trends and outliers in the data and take action to mitigate security threats, fraud, etc in a customer’s network.

LOCATION

Hybrid/Westford, MA

What You’ll Do: (Responsibilities)

· Implement Machine Learning Pipelines:

Develop and implement machine learning pipelines using Ray for distributed data processing and anomaly detection.
Design and build efficient data ingestion pipelines to extract data from SQL databases.
Implement data preprocessing and feature engineering steps within the Ray environment.
Integrate anomaly detection models (e.g., isolation forests, autoencoders) into the pipelines.

Kubernetes Deployment:

Deploy and manage Ray clusters on Kubernetes using the Ray operator.
Create and maintain Kubernetes YAML configurations for Ray deployments.
Assist in troubleshooting and optimizing Ray deployments on Kubernetes.

SQL Database Interaction:

Write and optimize SQL queries to extract and transform data from relational databases.
Implement data ingestion strategies for continuous or batch data consumption.

Anomaly Detection:

Assist in the selection and implementation of appropriate anomaly detection algorithms.
Evaluate and monitor the performance of anomaly detection models.
Trigger incidents based on detected anomalies.

Collaboration and Documentation:

Collaborate with data scientists and engineers to understand requirements and implement solutions.
Document pipeline designs, implementations, and configurations.
Contribute to code reviews and knowledge sharing.

What We’re Looking For: (Qualifications)

· Currently pursuing or recently completed a degree in Computer Science, Data Engineering, Machine Learning, or a related field.

· Proficiency in generating SQL queries for analytics applications

· Experience with Python and ML frameworks (e.g., Scikit-learn, TensorFlow, PyTorch).

· Familiarity with Ray for distributed computing.

· Knowledge of Kubernetes and containerized deployments.

· Some exposure to anomaly detection techniques (e.g., Isolation Forest, Autoencoders, Statistical Methods).

· Strong problem-solving skills and ability to work in a collaborative team environment.

· Ability to work in the Westford MA office two days per week

Ways To Stand Out from the Crowd (Preferred Skills)

· Experience with Ray or other distributed computing frameworks (e.g., Flink, Spark) in a Kubernetes environment.

· Experience implementing distributed machine learning training and inference pipelines.