You are viewing a preview of this job. Log in or register to view more details about this job.

Post-Doctorate Researcher

The Physical and Computational Sciences Directorate (PCSD) researchers lead major R&D efforts in experimental and theoretical interfacial chemistry, chemical analysis, high energy physics, interfacial catalysis, multifunctional materials, and integrated high-performance and data-intensive computing.

PCSD is PNNL’s primary steward for research supported by the Department of Energy’s Offices of Basic Energy Sciences, Advanced Scientific Computing Research, and Nuclear Physics, all within the Department of Energy's Office of Science.

Additionally, Directorate staff perform research and development for private industry and other government agencies, such as the Department of Defense and NASA. The Directorate's researchers are members of interdisciplinary teams tackling challenges of national importance that cut across all missions of the Department of Energy.

The Data Sciences & Machine Intelligence group in the Advanced Computing, Mathematics, and Data Division at PNNL seeks a multifaceted post-doctorate researcher to join the group to lead and support scientific research in the broad areas of data science, artificial intelligence (AI) and machine learning (ML) with a focus on advancing natural language processing (NLP) and data science for scientific domains. This is an excellent opportunity to develop your scientific career in an outstanding research institution by joining an interdisciplinary research team that focuses on accelerating scientific discovery. The emphasis of this position will be growing existing and crafting new capabilities in the areas of AI, scientific machine learning and NLP to strengthen the group’s leadership position in data science and machine intelligence. A successful candidate will have demonstrable expertise in the broad areas of data science and NLP such as: (i) training and evaluation of language models; (ii) scientific machine learning, (iii) high performance computing; (iii) high level languages such as Python, and AI/ML libraries such as PyTorch/HuggingFace. We are looking for a proactive, highly motivated individual with an aptitude for contributing on multi-disciplinary teams.

  • Develop novel machine learning/NLP methods for a wide range of applications in basic science and environmental review/permitting.
  • Apply your knowledge of NLP, multimodal representation learning, and data analytics to integrate and clean data, recognize patterns, pose questions, and/or make discoveries from structured and/or unstructured data, primarily in the areas of basic science and environmental review/permitting, but generalizable to other domains.
  • Develop and maintain high quality software for machine learning/NLP projects.
  • Publish results in high impact scientific computing journals, and present at top-tier conferences, and to sponsoring agencies.
  • Mentor and train graduate and undergraduate interns.

Minimum Qualifications:

  • Candidates must have received a PhD within the past five years (60 months) or within the next 8 months from an accredited college or university.

Preferred Qualifications:

  • PhD in Computer Science, Electrical and Computer Engineering, Data Science.
  • Experience working in Python including PyTorch, and libraries commonly used in machine learning for NLP (e.g., HuggingFace, deepspeed, containerization technologies).
  • Solid knowledge of core skills in data science and machine learning, including data/information curation from large unstructured data (text, image) in the form of PDFs.
  • Hands on experience in training, studying, or analyzing large language models for domain specific tasks.
  • Proficiency in computer science engineering skills. Experience in a production grade deployment team is preferred, especially with applications in a cloud environment (AWS, Azure, VertexAI).
  • Demonstrable experience in conducting scientific exploration and research through artifacts such as scientific publications in top tier venues (ACL, EMNLP, NAACL, AAAI, NeurIPS, EMNLP, ICLR, etc.) and publicly released software tools.