Learn More
Help

You are viewing a preview of this job. Log in or register to view more details about this job.

Summer Research Assistant: Attacks and Defenses in LLMs

Large Language Models (LLMs) are increasingly deployed in real-world AI systems, where they interact with users, external tools, and sensitive data. Despite recent advances in alignment and safety, LLM-powered systems remain vulnerable to a wide range of attacks, including data poisoning, backdoor insertion, prompt injection, and jailbreaks. These vulnerabilities pose serious risks to model reliability, privacy, and regulatory compliance.

This project focuses on identifying and understanding security and safety weaknesses in LLMs, as well as developing effective defense mechanisms to improve their trustworthiness. We will study how malicious behaviors can be introduced during training, fine-tuning, or deployment, and how they may be triggered through carefully crafted inputs or compliance-driven operations such as model unlearning. On the defense side, we aim to design techniques that enhance robustness, transparency, and interpretability of LLMs, enabling practitioners to better detect, analyze, and mitigate hidden threats.

Interns will gain hands-on experience with state-of-the-art LLMs, attack and defense methodologies, and experimental evaluation of AI safety mechanisms. The project is suitable for students interested in AI security, trustworthy machine learning, and foundation model research, and provides an opportunity to contribute to cutting-edge research at the intersection of machine learning, security, and responsible AI.