Our bodies hold a mystery at the molecular level: how do the chains of amino acids that make up all our proteins fold into 3D shapes? When your body uses proteins, it does not read them from one end of the chain to another (like it does for DNA, or like we do when we read). Instead, the protein chain is folded up in a particular manner to make a specific 3D shape that permits specific biological functions. Take hemoglobin, for instance. This protein’s shape enables it to fold around oxygen and transport it throughout our body in our blood. Other proteins are found in cell membranes and can act as receptors for neurotransmitters. Receptors are often conceptualized as locks (proteins) into which keys (neurotransmitters) fit. The neurotransmitter changes the shape of the receptor and, thus, the functioning of the cell. It is the 3D shape of the protein that is essential, not the actual amino acid sequence.
Some diseases like Alzheimer’s are thought to be caused by the buildup of misfolded proteins. In the current pandemic, the ability of researchers to describe the shape of the COVID-19 virus proteins led to our current vaccines.
AI helps out
Given the importance of proteins’ shapes for normal function and disease, learning their 3D shape from their amino acid sequence is crucially important. Computational biology has been working on this issue with limited success since the 1960s. Discovering the actual shapes of proteins is difficult and either requires crystals of the protein that can be studied using X-ray crystallography or, more recently, analysis using cryo-electron microscopy. Multiple Nobel Prizes have been awarded to individuals who have made contributions to understanding protein shapes and how they function.
In recent years, artificial intelligence (AI) has been developed to solve this protein folding problem. Early attempts (circa 1980) worked poorly – they were built around a few cases and failed when applied to other proteins. In 1994 a competition was set up and run every two years to see if researchers could develop better programs. About 100 protein sequences with an unknown structure are released, and teams use computer programs to see if they can determine the 3D shape. Other groups work experimentally to determine the proteins’ actual shape, and then the computer program results are compared to the experimental results. A score of 0 to 100 is assigned based on how well the computer programs do. In the first competition, the programs did reasonably well on easy proteins but horribly with more difficult proteins, scoring below 20. By 2016 the best programs had a score of about 40 for the most challenging proteins. In 2018 DeepMind developed a program called AlphaFold that won with a score of about 60 for the difficult proteins.
In this year’s competition, the current version of AlphaFold scored 92.4, and for the most challenging proteins, 87. These scores are remarkable (90 is considered equivalent to experimental results), and some scientists have declared the protein folding problem solved. The contest judges were so concerned that somehow this program could cheat that they presented it with an unusual problem, a protein one of the judges had been studying for 10 years but could not determine the shape of from X-ray evidence. AlphaFold suggested a structure that quickly made sense of the experimental data and finally solved this structural problem.
Being able to rapidly predict the shape of proteins means experimental data can be more quickly understood, and it will now be possible to determine basic biological processes more clearly. We will also more clearly see where such processes go wrong in misfolded proteins, allowing us to more quickly develop drugs that can interact effectively with the proteins and provide treatments for currently incurable diseases. This computer programming advance can help us better understand how wonderfully we are made, and how woefully we can go wrong in disease.