1694620591 Analysis of the 200 million known proteins suggests that humans

Analysis of the 200 million known proteins suggests that humans have 13 unique three-dimensional shapes

In 1961, American architect Irving Geis received an unusual assignment: to hand-draw the first structure of a protein discovered using X-rays: myoglobin, which is responsible for the oxygen supply to muscles and the red color of meat . It is a type of necklace with 153 beads that fold into eight intertwined spirals. It took Geis six months to draw it, but his efforts managed to generate worldwide fascination with this invisible inner world. Now science has taken off. Last year, Google-owned artificial intelligence company DeepMind managed to accurately predict the structure of more than 200 million proteins, almost all of them known. A Spanish bioinformatician, Inigo Barrio, helped organize this chaos by grouping them in a similar way. Their work reveals surprising data. Humans have 13 exclusive structures that are not found in any other living being. A ubiquitous bacterium in soil, the bacterium Acidobacteria, has nearly 1,900 unique forms.

The DNA of a living being is a recipe book for the production of proteins, the basic building blocks of life. In humans, there are approximately 30,000 different types that deal with vital functions such as energy production, support and defense against viruses. They are large and complex molecules, some of which have simple shapes – spheres, cylinders, rings, stars, spirals. and even swastikas– and others with unimaginable structures, such as hemoglobin, which carries oxygen through the blood from the lungs to the rest of the body. It is made up of thousands of carbon, hydrogen, nitrogen, oxygen, sulfur and iron atoms. Its formula is C₂₉₅₂H₄₆₆₄N₈₁₂O₈₃₂S₈Fe₄.

Barrio, born 36 years ago in Pamplona, ​​experienced this tidal wave at the European Bioinformatics Institute in Hinxton, United Kingdom. The researcher and his colleagues have developed a new algorithm called Foldseek Cluster that is able to identify similar patterns in this huge disturbance. Barrio used the tool with the AlphaFold database, a jungle of 215 million proteins. The team has identified 2.3 million types of structures, of which more than 700,000 are unknown. Understanding the structure of a protein is important for understanding its function and potentially developing new drugs, as the researchers explain in their study, published this Wednesday in the journal Nature, the flagship of the world’s best science.

More information

“There is almost always a connection between the structure of a protein and its function. Almost always. In biology you should never say “always,” says Barrio, who recently joined the Wellcome Sanger Institute, also in Hinxton, near Cambridge. His work has managed to link proteins of known function with other unexplored proteins. “If proteins A and B have a very similar structure, one can conclude that they have a similar function,” explains the researcher. His work is reminiscent of an archaeologist retrieving mysterious prehistoric tools from underground. “If you see something in the shape of a beak, you might think it is used for stabbing, but there are exceptions. A fork and a comb look very similar, but are not used for the same purpose,” he warns.

The AlphaFold database contains predictions from DeepMind and the European Bioinformatics Institute, part of the European Molecular Biology Laboratory, an organization with more than 1,800 employees at headquarters in Spain, France, Germany, Italy and the United Kingdom. Analysis of the 215 million proteins suggests that most of the structures arose very early in the evolution of living things, in the common ancestors of animals and plants or even earlier. Only 4% of configurations appear to be specific to a single species.

“Humans have 13 groups of proteins with unique structures,” emphasizes Barrio. The illustration contrasts with those of the five organisms that exhibit the most unique three-dimensional shapes: the bacteria Acidobacteria bacterium, Escherichia coli and Chloroflexi bacterium, the Asian spider Araneus ventricosus and the pharaonic squid, each with between 1,400 and 1,900 exclusive structures. “We tend to think of evolution as a linear process, but it’s more of a tree.” We’re at the end of a branch, but bacteria have continued to evolve in their own branches. There are bacteria that are newer than us,” explains the bioinformatician. “Furthermore, developing a new framework for a new problem is not always the best way to advance. Structures are often recycled. There are proteins in the human species that may have a different function than the one they had in our ancestors,” argues Barrio.

Bioinformatician Iñigo Barrio, photographed this Wednesday at the Wellcome Sanger Institute in Hinxton (United Kingdom).The bioinformatician Iñigo Barrio, photographed this Wednesday at the Wellcome Sanger Institute in Hinxton (United Kingdom).Wellcome Sanger Institute

British company DeepMind boasts that its artificial intelligence system achieves 95% accuracy. However, according to Barrio, nine of the 13 specifically human structures are based on predictions with high uncertainty, possibly because they are particularly disorganized conformations. The remaining four are VPS53, which is involved in transport within cells; U54, a herpesvirus protein integrated into the human genome; annexins, which are involved in transport across cell membranes; and a fourth, little-understood protein that may be more of a simple fragment. The 30,000 types of human proteins are grouped into about 9,000 structures.

Another lead author of the study, Portuguese bioinformatician Pedro Beltrao, highlights the discovery of human proteins involved in the immune system that are very similar to other bacterial proteins of unknown function. “This suggests that the proteins involved in the immune system may have an ancient evolutionary origin that we share with bacterial species.” If true, this could transform our knowledge of immunity,” said Beltrao of the Swiss Federal Polytechnic School of Zurich ( Switzerland) in a statement.

The biologist Júlia Domingo considers the new work, in which she was not involved, to be “very necessary”. “We are entering a new era of big data and we need new tools to process, analyze and use this data at high speed,” he reflects. Domingo, together with other colleagues at the Center for Genomic Regulation (CRG) in Barcelona, ​​developed a method to identify a type of hidden button that changes the function of proteins. Domingo warns that the structure is not enough to figure out the mission. “Other functional levels are involved, such as energies and affinity for other proteins,” he emphasizes.

It took architect Irving Geis six months to draw myoglobin in 1961. The British chemist John Kendrew, who provided him with the data, won the Nobel Prize in Chemistry in 1962 for discovering this first structure using X-rays. According to Iñigo Barrio, the possibilities that are now opening up with artificial intelligence and new algorithms are unimaginable. “With the previous methods, this work would have taken us ten years. It took us five days,” he says.

You can follow THEME on Facebook, Twitter and Instagram, or sign up here to receive our weekly newsletter.