Jim Crutchfield wants to teach a machine to “see” in a new way, discovering patterns that evolve over time instead of recognizing patterns based on a stored template.
It sounds like an easy task – after all, any animal with basic vision can see a moving object, decide whether it is food or a threat and react accordingly, but what comes easily to a scallop is a challenge for the world’s biggest supercomputers.
Crutchfield, along with physics graduate student Adam Rupe and postdoc Ryan James, is designing these new machine learning systems to allow supercomputers to spot large-scale atmospheric structures, such as hurricanes and atmospheric rivers, in climate data. The UC Davis Complexity Sciences Center, which Crutchfield leads, was recently named as an Intel Parallel Computing Center and is collaborating with Intel Research, the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) at the Lawrence Berkeley Lab, Stanford University, and University of Montreal. The entire Big Data Center project is led by Prabhat, leader of the Data And Analytics Services Group at the Berkeley lab.
The team works on NERSC’s CORI supercomputer, in the top five of the world’s fastest machines with over 600,000 CPU cores.
Modern science is full of “big data.” For climate science, that includes both satellite- and ground-based measurements that span the planet, as well as “big” simulations.
“We need new kind of machine learning to interpret very large data and planet-wide simulations,” Crutchfield said. Climate and weather systems evolve over time, so the machines need to be able to find patterns not only in space but over time.
“Dynamics are key to this,” Crutchfield said. Humans (and other visual animals) recognize dynamic changes very quickly, but it’s much harder for machines.
Pattern Discovery is more than Pattern Recognition
With existing technology, computers recognize patterns based on an existing template. That’s how voice recognition systems work, by comparing your voice to an existing catalog of sounds. These pattern recognition systems can be very useful but they can’t identify anything truly new – that isn’t represented in their template.
Crutchfield and his team are taking a different approach, based on pattern discovery. They are working on algorithms that allow computers to identify structures in data without knowing what they are in advance.
“Learning novel patterns is what humans are uniquely good at, but machines can’t do it,” he said.
Using pattern discovery, a supercomputer would learn how to identify hurricanes or other features in climate and weather data. It might also identify new kinds of structures that are too complex for humans to perceive at all.
While this application is in global climate modeling, Crutchfield hopes to make it a new paradigm for analyzing very large datasets.
“Usually, you apply known models to interpret the data. To say that you will extract your model directly from the data is a radical claim,” he said.
The collaboration is part of the Intel Parallel Computing Centers program, which provides funding to universities, institutions, and research labs to modernize key community codes used across a wide range of disciplines to run on industry-standard parallel architectures.
Video: Global simulation of atmospheric water vapor produced by CORI supercomputer at NERSC