The Protein Folding Problem

Javier Viaña
Ph.D. Explainable AI
Publicado Enero 27, 2020

The challenge

The protein they are the essential constituents of living cells. Among other functions, they are responsible for the creation of tissues, enzymes, hormones and antibodies. But even being such a basic element, its shape is highly complex.

Each protein can be represented as a chain of amino acids, a sequence of data. Based on this information, the string has one form or another. That is, amino acids fold, stretch and twist the three-dimensional structure of proteins. The longer the data stream, the greater the complexity of the final appearance it takes.

The real challenge is to accurately predict the shape of a protein based on the amino acids that make it up.

The history

The problem dates back to 1972, whenChristian Anfinsen,Nobel Prize in Chemistry, postulated that this was possible.

For the past 50 years, prediction methods have been mostly experimental. Whose computational cost was excessive in some cases. In fact, it has been estimated that there are around 10 ^ 300 possible configurations for a typical protein. A calculation that would last longer than the age of our universe.

But 3 months ago, this great challenge changed radically. AlphaFold, the algorithm created by DeepMind,it has been able to get a near perfect result compared to the previous methods. The English startup that has already made history withAlphaGo defeating the famous professional Go player, Lee Sedol.

The method

AlphaFold, has been trained with the database Protein Data Bank. The algorithm uses both the amino acid sequence and the final structure of the protein to optimize its parameters.

First, it calculates a covariation matrix of the amino acid chain (the input) with respect to homologous amino acid sequences (obtained from the database). The resulting image is inserted into a Convolutional Neural Network for processing.

The output of this architecture is a map of the characteristics of the protein chain. With this new result, a first approximation of the three-dimensional structure can be obtained.

At this stage, the shape of the protein is characterized by two distributions. The distribution of distances and that of rotations. Basically, the first indicates where in space there are fragments of the protein, and the second what is the angle or how twisted it is at that point.

Using the information from these two, the algorithm obtains a specific variable for each protein, the potential. It is a mathematical model that can be derived. Therefore, it is possible to apply the learning rules of the gradient of the potential function. The final, refined geometry is compared with the experimental solution obtained by X-ray crystallography or high resolution cryo-electron microscopy.

The consequences

These results mark a new milestone in the career of Artificial Intelligence. Its applications are multiple, from helping in the discovery of new medicines, or in the understanding of certain diseases, to providing the keys to degrade the plastic contained in the oceans.

We are at the threshold of an era of research in the evolution of proteins. A new stage in molecular biology full of discoveries.