Mitigating Data Scarcity in Polymer Property Prediction via Multi-task Auxiliary Learning

Image
Portrait of Gabriel Pinheiro
Event Speaker
Gabriel Pinheiro
Visiting PhD student in Computer Science from Universidade Federal de São Paulo
Event Type
CBEE Seminar
Date
Event Location
Kelley 1003 and Zoom
Event Description

Polymeric materials are composed of repeating molecular units (monomers) bonded into chains or networks. They constitute everyday items such as plastic bottles, containers, bags, and synthetic fibers, as well as specialty products like artificial heart valves. The chemical and physical properties of these materials can be tailored by adjusting the molecular structures and composition of the monomers, as well as the structural characteristics of the chains/networks. This vast design space presents machine learning opportunities for polymer design.

However, challenges such as data scarcity and how to treat the chain architecture and monomer sequence of the polymer for machine learning are still open questions. To tackle these challenges, we have compiled a large dataset of polymers labeled with various properties obtained from both molecular simulations and wet-lab experiments. Then, we developed a supervised training framework that leverages this extensive polymer dataset as auxiliary training tasks to address the data scarcity issue for a target polymer prediction task.

Speaker Biography

Gabriel A. Pinheiro is a computer scientist at the Federal University of São Paulo. He is currently doing a PhD internship at Oregon State University, under the mentorship of Prof. Cory M. Simon and Prof. Xiaoli Fern. He earned his MSc in Applied Computing in 2020 from the National Institute for Space Research. His current research interests focus on investigating machine learning strategies for chemistry problems. During his internship, he has been working with machine learning to address challenges of small data regime and data representation related to polymer materials science