(PhD) Senior engineer of multimodal algorithms
Job responsibilities
1. Tracking, implementation, and best practice process standardization of recent algorithms in the fields of multi-modal large model understanding and generation, image and text information extraction, screen understanding, and other industry applications;
2. Research on corresponding new algorithm frameworks to solve core technical problems in the technical field;
3. Work closely with the R&D team to apply multi-modal understanding and generation technology to actual projects to meet cutting-edge innovation goals or business goals.
Job requirements
1. Computer science, electronic engineering, mathematics and other related fields;
2. Have in-depth research on at least one of the following research directions: multi-modal feature learning, target detection/Grouding, OCR, image and video description, VQA, LLM training, RL, etc.;
3. Proficient in using mainstream deep learning frameworks such as PyTorch/TF;
4. Have a good foundation in algorithms and data structures; have a solid foundation in mathematics and statistics, and be familiar with knowledge in optimization, probability, linear algebra and other related fields;
5. Have good teamwork spirit and communication skills, and be able to work independently in a team environment;
6. Be proactive, have good problem-solving and innovation abilities, and be passionate about cutting-edge technologies.
Working language is Mandarin.