Taking Cues From Speech Recognition, New Machine Learning Algorithm Finds Patterns in RNA Structures

By Greg Watry

Software inspired by speech recognition technology could help scientists understand the secret language inside cells. A machine learning algorithm called patteRNA, designed by UC Davis researchers, rapidly mines ribonucleic acid, commonly called RNA, for specific structures, providing a new method to establish links between structure, function and disease.

The study, co-authored by integrative genetics and genomics Ph.D. student Mirko Ledda and Assistant Professor Sharon Aviran, UC Davis Genome Center, appears in Genome Biology.

Deciphering the biological role of RNA structures

RNA is essential to all biological processes, from gene expression and regulation to protein synthesis. While DNA stores an organism’s genetic information, RNA puts that genetic information to use.

“Think of DNA like the hard drive of a computer and RNA as files. RNA is basically what’s extracted from the hard drive and used by a cell,” said Ledda.

Unlike the stable genome, an RNA transcriptome—the entire range of RNA molecules expressed in a cell—constantly changes. The molecular structure of RNA is influenced by a variety of factors, from the external environment to an organism’s developmental stage.

A visual overview of how patteRNA works. Courtesy photo

A visual overview of how patteRNA works (Courtesy image)

A visual overview of how patteRNA works. Courtesy photo

RNA molecules form various substructures—called “structural motifs”—that help carry out specific biological functions. But establishing a link between structure and function has proven very difficult. Often, the first step in studying function is to determine the RNA molecule’s structure.

“This is the greatest challenge currently in the field,” said Ledda. “When we try to predict RNA structures using computers, most of the time we’re not even 30 percent accurate. This makes studying the biological role of RNA structures rather difficult.”

PatteRNA doesn’t try to predict structures. Rather, it uses a method called structure profiling, which provides a snapshot of all RNA structures in a cell. RNA structural motifs will produce specific signatures in the data. A biologist interested in a specific motif can ask the algorithm to search this snapshot for that particular signature. For example, bacteria and yeast possess RNA structures called thermosensors that sense heat or cold shocks.

“Researchers think thermosensors exist in more complex organisms, such as plants, but nobody has actually proven it,” said Ledda. With patteRNA, a biologist could take the structural motif of a thermosensor and search for its presence in the transcriptome of various cells and organisms.

“PatteRNA will hopefully allow us to find thermosensors in various organisms and, more generally, functional RNA structures in the context of living cells,” said Ledda.

Finding a signal in the noise

PatteRNA’s design was informed by speech recognition algorithms. Ledda likened RNA transcriptomes to vocal signals.

“When you speak, you produce a vocal signal,” said Ledda. “A speech recognition algorithm parses through the noise and identifies spoken sounds, making sense of the sentence. In the same way, I’m trying to infer the presence of RNA substructures based on the structure profiling signal I see.”

As previously noted, RNA molecules are versatile and malleable in their structure. Form precedes function, and it’s possible that snapshots of the same RNA transcriptome will differ depending on the time and situation. It’s a genetic map that’s constantly shifting, like the Marauder’s Map of the Harry Potter series.

“PatteRNA is a little bit ahead of its time,” Ledda said. “In terms of applications, its usefulness is going to increase with time as we acquire more and more knowledge of what various RNA structures actually do in the cell.”

A new tool for understanding disease

It’s well known that genetic mutations can lead to diseases. One possible way these mutations influence health is by changing the structure of a cell’s RNA. Several diseases are linked to such mutations, including retinoblastoma, eye cancer and hypertension. To better understand these diseases, scientists could use patteRNA to search through a cell’s RNA transcriptome for similar structural changes and identify new RNAs that play a role in diseases. Though still in its infancy, patteRNA has the potential to help find new drug targets for genetic diseases of all kinds.

“Until this paper, there were no methods that allowed us to study specific RNA structures at the scale of an entire cell, so patteRNA is filling this much needed gap,” said Ledda.

More information

PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures (Genome Biology)

PatteRNA is open-source and freely available for download on GitHub.

Greg Watry is a writer for the UC Davis College of Biological Sciences. This story originally appeared on the College of Biological Sciences web site.

Leave a Reply

Your email address will not be published. Required fields are marked *