You are viewing a preview of this job. Log in or register to view more details about this job.

Research Scientist

Research Scientist, Large Language Models & Agents

Responsibilities:

1. Conduct systematic research and exploration in cutting-edge areas of Large Language Models (LLM) and Agents, including but not limited to instruction fine-tuning (SFT), function calling (Function Call), reinforcement learning (RLHF/RLAIF), multimodal fusion, long-context modeling, and more.

2. Track the latest papers and open-source advancements in the LLM and Agent fields, rapidly reproduce research findings, validate experiments, and produce research reports, benchmark results, and improvement plans.

3. Deeply investigate the adaptability and application potential of large and small models in industry scenarios, conducting research on model comparison, capability evaluation, compression, and optimization, while exploring localization adaptation and innovative tuning strategies.

4. Propose innovative methods for cutting-edge problems in model training, inference, and deployment (e.g., efficient parallelism, optimized operators, data strategies), and drive feasibility validation in experimental environments.

5. Participate in the technical precipitation and sharing of research results, including internal workshops, academic papers, patents, and open-source projects, to promote continuous breakthroughs for the team in LLM and Agent frontiers.

Requirements:

1. Master's or Ph.D. degree in Computer Science, Artificial Intelligence, Machine Learning, Data Science, or a related field (preferred).

2. Possess relevant research or project experience in large models, including but not limited to pre-training, instruction fine-tuning, reinforcement learning, and agent methodologies.

3. Familiar with Transformer architecture and core mechanisms (self-attention, positional encoding, etc.), and understand key large model optimization techniques (GQA, MQA, FlashAttention, etc.).

4. Understand and be able to implement large model training techniques (model parallelism, data parallelism, pipeline parallelism, etc.), and be proficient in at least one large model training framework (DeepSpeed, Megatron, Colossal-AI, etc.); experience with multi-node, multi-GPU experiments is a plus.

5. Possess an in-depth understanding of the data pipelines required for large model training and evaluation, including data construction, cleaning, augmentation, and labeling methods, and be able to propose research-oriented improvements for data quality and distribution issues.

6. Possess a solid theoretical foundation in machine learning and deep learning, and be proficient in research toolchains such as PyTorch and Huggingface.

7. Demonstrate strong research capabilities, including problem abstraction, experimental design, result analysis, and academic writing; experience with paper publication or open-source contributions is a plus.