1695138274 Google identifies millions of mutations in proteins that can cause

Google identifies millions of mutations in proteins that can cause disease

Google identifies millions of mutations in proteins that can cause

It is the holy grail of modern medicine: identifying the changes in the genome that lead to the development of diseases of genetic origin. The task is not easy, because each person has thousands of mutations related to the genetic information that he inherited from his parents. Most are harmless, but there is a percentage that can be pathogenic. Now researchers at Google DeepMind, Alphabet’s artificial intelligence company, have cataloged 71 million of these mutations. The program was also able to classify them and found that a third could alter the function of the proteins and cause serious pathologies.

DNA contains the instructions for the development of all living things. This book contains each of his recipes for creating cells, organs, and functions in the form of sequences of their basic components. These basic components, the building blocks of life, are proteins. They are made up of a series of amino acids, sometimes hundreds, which in turn are made up of trios of nucleotides, the letters of the genetic alphabet. If one of these nucleotides is replaced by another in a type of mutation, it is called a nonsense variant. For the most part, these variants have no influence on the function of the protein. However, in other cases, the mutation is catastrophic and leads to pathologies with genetic amyotrophic lateral sclerosis (ALS) or sickle cell anemia.

So far, about 4 million of these missense variants have been identified in the 19,233 proteins that make up every human. But they were noticed in only 2% of cases, that is, when it is known whether they are harmless (the majority) or whether they can be a source of disease. Artificial intelligence (AI) has now increased the number of known variants sixfold and classified most of them according to their potential impact on protein function.

More information

The authors of this achievement, published in the renowned journal Science, are DeepMind scientists. It is the same group that several years ago developed AlphaFold, an AI program that can predict the structure of virtually all proteins and is considered one of the greatest advances in computational biology. What they did now was to redesign and target it to detect antisense mutations in protein expression. In addition, the new tool AlphaMissense classifies in its training with a high degree of probability the effects that this variant could have on the function of the protein.

AlphaMissense

Deep Mind researcher Jung Chen, lead author of the study, explains what AlphaMissense does: “We knew that AlphaFold was a very good model for predicting the three-dimensional structure of proteins from a massive sequence.” We also knew that this 3D “Structure of proteins is very important for their function and basically shows what it is,” explains Chen. If its function can be inferred from the structure, any change in that structure could be the result of a mutation. And another fundamental aspect is AlphaMissense’s ability to learn from the evolutionary constraints of related sequences. That is, evolution has shaped what the structure of a protein can be and what it shouldn’t be if you don’t want problems. To improve knowledge of this aspect, the system was trained on the structures of human and primate proteins. “Through training you see millions of protein sequences and learn what a normal protein sequence looks like. And if we get one with a mutation, it can tell us whether it’s bad or not,” he adds.

At the end, Cheng makes a comparison: “It’s very similar to human language. When we replace a word in an English sentence, a person familiar with the language can immediately tell whether that word replacement changes the meaning of the sentence or not.” His AlphaMissense was able to detect 89% of the 71 million antisense identified -Classify variants. Of these, 57% were probably benign and a third were probably pathogenic. Of the remaining 11%, AI would not know its impact. “The model assigns each variant a value between zero and one and indicates the probability that the variant is pathogenic. By pathogen we mean that our pathogenic variant is more likely to be associated with or cause a disease,” explains the scientist.

Cheng’s explanations highlight both the strength of AlphaMissense, its very high ability to classify variants, and one of its weaknesses: the percentages refer to probabilities. Until the era of powerful computers and AI, characterizing the structure of a protein or its mutations was a gigantic task. Before the advent of these technologies, the structure of about 200,000 proteins had been determined, a task that took 60 years and involved thousands of scientists. This required many hours in the laboratory or the use of particle accelerators. But they were real observations of the true structure of a real protein. In the case of computational biology, these are virtual proteins and variants that then need to be confirmed. In the case of AlphaMissense, the accuracy of the calculations is 90%.

“Understanding the disease”

Regarding possible applications, Žiga Avsec, also from DeepMind and lead co-author of the study, said in an online conference: “The first step in finding treatments is to try to understand the disease well, both for complex ones as well as for rare diseases.” means finding genes associated with them.” For Avsec, tools like AlphaMissense “can help us better identify variants and potentially discover new genes; By better understanding genetics, we will be able to have a more informed opinion about some genes that we may not have been sure before were related to the disease.” “That’s the general idea, through better genetics, discovering new genes and gaining additional statistical power to detect new relationships, but this will not directly lead to new drugs as such,” he added.

A few days ago, the analysis of the 200 million proteins that AlphaFold discovered last year was published. The Spanish bioinformatician Íñigo Barrio was involved in this key analysis. “AlphaFold changed the world,” says Barrio, who isn’t too keen on AlphaMissense. “It is relevant, it is a new method for assessing variants and could be used to monitor rare diseases. But there are already other prediction software available.” Barrio also illustrates one of the limitations of this artificial intelligence. AlphaMissense catalogs antisense variants individually, but many of the genetic pathologies “are the product of the combination of several of these mutations,” he recalls.

Biologist José Antonio Márquez, who heads the crystallography platform of the European Molecular Biology Laboratory, has a similar opinion: “It is one of the applications of the method.” [AlphaFold]perhaps it is not so relevant at a scientific level, but it is in the sense of a start to translate a discovery into possible applications.” Among these applications, Márquez highlights the acceleration of “research into genetic diseases and especially rare diseases, as it helps in this “To develop hypotheses about the mechanism that causes the disease.”

You can follow THEME on Facebook, Twitter and Instagram, or sign up here to receive our weekly newsletter.