Collaboration develops AI to discover new molecules and materials

Xiaoli Fern and Cory Simon

When chemical engineer Cory Simon and computer scientist Xiaoli Fern first met at a social event at Oregon State University, little did they know that a friendly game of cornhole would lead to a long, fruitful collaboration. Their interdisciplinary partnership, which merges artificial intelligence and chemistry, aims to speed up and reduce the cost of discovering materials and molecules.

The researchers have combined their expertise to create machine learning models capable of predicting the properties of new materials and molecules, with implications for various industry applications, including separating, storing, and sensing gases.

"Machine learning can play a crucial role in accelerating material and molecular discovery by predicting their properties and making the search more efficient," Fern said.

Harnessing graph neural networks for molecular property prediction

Their work centers around graph neural networks, which are particularly well-suited for prediction tasks on molecules.

“GNNs are a class of deep learning models that can operate directly on graph-structured data,” Fern said. “Since molecules can be naturally represented as graphs, with atoms as nodes and bonds as edges, GNNs are a natural fit for molecular property prediction tasks. In particular, we employ message-passing neural networks to encode local and global features of molecular graphs effectively.”

Chrystal structure of a metal-organic framework, IRMOF-1
The crystal structure of a metal-organic framework, IRMOF-1. This material exhibits nano-sized pores that adsorb gas molecules. Loosely, the internal surface of IRMOF-1 provides “parking spaces” for gas molecules. Since different gas species are attracted to the pore walls, these materials can be exploited for separating gases, too. Hundreds of thousands of different materials can be made in the lab, and AI can help sort through the possibilities and predict which structure will be optimal for a given gas storage, separation, or sensing task.

Fern and Simon’s model ingests molecular structures and uses GNNs to predict properties such as solubility, gas adsorption capacity, or even the perceived smell of a molecule. To achieve this, the researchers employ a combination of supervised and unsupervised learning techniques, including pre-training.

“The ability to computationally predict the properties of molecules and materials accurately is essential for the efficient discovery of new molecules and materials, as it helps researchers focus on the most promising candidates in the lab,” Simon said. “More, if we quantify the uncertainty in the model’s predictions, we can guide decision-making in the lab for optimization and exploration of molecules and materials.”

Addressing scarcity of labeled data

One key challenge in applying machine learning to molecular discovery is the scarcity of labeled data. To address this, Fern and Simon have employed transfer learning, a technique that leverages knowledge learned from one task to improve performance on another, related task.

“We utilize transfer learning to make the most of the limited labeled data we have,” Fern said. “By pre-training our GNNs on large, unlabeled data sets, we can extract meaningful representations that can be fine-tuned on smaller, labeled data sets to achieve better performance.”

Machine learning is also often employed in domains where rules are not explicitly known, enabling the creation of predictive models. Various learning algorithms exist, with some being more interpretable than others. Decision tree algorithms, for example, are considered interpretable models, as they make decisions based on specific feature questions and provide a clear basis for those decisions. However, modern methods, particularly deep learning, are generally less interpretable, making it difficult to understand the contributing structures to predictions.

There is much ongoing research into explaining the predictions of AI systems. A significant challenge lies in explaining graph neural networks that operate on molecules, as the explanation methods often measure the importance of specific edges or nodes by removing them from the graph to observe their impact. In the context of chemical structures, removing these components result in invalid structures, complicating interpretation. Overall, explaining the basis for machine learning predictions in complex domains remains a difficult problem that group is addressing.

The future of interdisciplinary collaboration

Looking ahead, Fern and Simon are optimistic about the potential of their interdisciplinary collaboration to advance the field.

“We believe that the marriage of machine learning and chemistry will continue to revolutionize how we discover and design new materials and molecules,” Fern said.

Indeed, Fern and Simon's collaboration is a compelling example of the potential for interdisciplinary research to drive innovation. As their work continues to evolve, they are exploring new ways to enhance their models and expand their applications; for example, by incorporating multiple sources of information (data) to strengthen a model’s predictions.

Active learning strategies

Another area of exploration is the development of active learning strategies, which involve iteratively refining the model by selecting the most informative examples for training.

“Active learning allows our model to iteratively select the most valuable data points to learn from, thereby improving its performance and reducing the need for extensive labeled data,” Fern said. “This can be particularly useful in molecular discovery, where acquiring labeled data can be time-consuming and costly.”

Expanding applications and collaborations

The researchers are also examining the potential for their model to be applied to other domains, such as polymers.

“Our fundamental work has the potential to impact a wide range of chemical industries, from the development of new nano-porous materials for gas separations to the optimization of polymers for medical imaging,” Simon said. “The application of machine learning to the chemical sciences can rapidly accelerate the design and discovery of molecules and materials – especially when combined with automation in the lab.”

Simon and Fern also recognize the potential benefits of collaborating with industry to further their research and promote workforce development. They invite external partners to join their efforts. Their research teams boast top-tier doctoral candidates from Oregon State, offering valuable access to emerging talent. By fostering these collaborative relationships, they aim to drive innovation in material and molecular discovery while supporting the professional growth of the next generation of experts.

Fern and Simon’s partnership demonstrates transformative potential in combining machine learning and chemical engineering. Their work underscores the importance of interdisciplinary research and sheds light on the future of material and molecular discovery.

If you’re interested in connecting with the AI and Robotics Program for hiring and collaborative projects, please contact

Subscribe to AI @ Oregon State

Return to AI @ Oregon State


Nov. 27, 2023