Using machine learning to accurately count species

Computer science and ecology may seem like an unlikely combination at first, but it’s exactly the niche Oregon State University assistant professor, Rebecca Hutchinson, envisioned. Her research uses machine learning and statistical modeling to help scientists answer questions like: What will happen to monarch butterflies under climate change? What are the habitat requirements of olive-sided flycatchers? How can we build a reserve that birds will want to live in?

Hutchinson did not start her research in ecology, however. Her Ph.D. work at Carnegie Mellon University was applied to brain imaging research. But she realized her passion was for the environment, so she moved to Corvallis to pursue postdoctoral research in which she could use computer science to inform fields related to sustainability. The move paid off when she received a SEES fellowship (Science, Engineering, and Education for Sustainability) from the National Science Foundation and began her interdisciplinary research.

Earlier this year, NSF reupped their support of Hutchinson, now an assistant professor with appointments in both engineering (computer science) and the College of Agricultural Sciences (fisheries, wildlife, conservation sciences), with a prestigious CAREER award. Hutchinson plans to use the $564,000 award to tackle challenges for the machine learning methods typically used to build species distribution models, or SDMs.

“You build an SDM by correlating observations of species — are they there or not? — with environmental features,” she said. “Then you can use the model to understand why species live where they do and how likely a species is to occur at a new site. But the spatial aspects of both species and environmental data can be problematic for the machine learning currently used in the models.”

Hutchinson will research methods for lowering the potential for bias to creep into model quality estimates and for accounting for the inevitable underreporting of species during biodiversity surveys performed primarily by citizen science groups.

“To assess model quality, typically some data are held out from model building,” she said. “Then the model’s ability to predict the unseen data is used to measure its quality. With spatial data, however, randomly selecting data to hold out can lead to optimistic bias in quality estimates.

“The error introduced by underreporting can be corrected by conducting multiple observations at the same site and estimating the probability of detecting the species, but community science programs usually aren’t set up that way,” she said. “Our award will support research to create groups of multiple observations after the fact to better account for underreporting.”

Story By

Steve Frandzel and Chris Palmer

Dec. 30, 2021

CBEE