Protein structure prediction: On the route from sequence to function

BioStrand (a subsidiary of IPA)
4 min readMar 22, 2023

As discussed in our previous blog post in this series, there is a large discrepancy between sequenced proteins and structure-resolved proteins. This can be attributed to the fact that the cost for protein sequencing has gone down dramatically in the last decades, while experimentally determining protein structures remains a costly endeavor, relying on expensive and fallible experimental setups. Furthermore, apart from natural protein sequences, the space of (stable) proteins is also growing significantly due to proteins synthesized in a biotechnological setting. The previous blog pos t showed that in silico protein prediction pioneered by AlphaFold changed the game, making protein structures readily available [1].

In this blog post, we will go into the subtleties of protein structure prediction and show some interesting points in which Alphafold and competitors lack accuracy. By combining structure prediction with a physics-based approach, these limitations can be overcome. Subsequently, we will unveil how availability of protein structures, combined with recent advances in machine learning accelerates functional annotation of proteins.

The physical rules of protein structure

In silico pipelines determining functional characteristics of proteins starting from protein sequences benefit heavily from the addition of structural information. Indeed, protein structure is influential for their function with notable examples being protein binding properties and mechanical stability. Still, determining the functional properties from protein structures remains a non-trivial problem. In one of its latest releases, AlphaFold introduced AlphaFold-Multimer [2], a specialized model for predicting structures of protein complexes. While an influential step in the direction of protein-protein binding, the problem remains challenging (especially for antibodies), and limited to binding pose prediction. To quantify the binding affinity of protein-protein complexes, physics-based approaches such as molecular dynamics are still necessary.

Another functional property is mechanical stability. AlphaFold and other structure prediction tools predict atomic coordinates from sequence representations of the protein. However, this view is not completely in line with reality, where proteins are large molecules embedded in a solvent, behaving under the constraints of physical equations of motion determined by the atomic interactions and influence of temperature. In reality, proteins are not rigid bodies, but flexible assemblies of molecules. Again, molecular dynamics simulations allow to capture these dynamics. Recent observations linked the predicted quality metrics of AlphaFold to mechanical instability [3,4]. Concretely, AlphaFold’s pLDDT and PAE metrics have been observed to correlate with local and global flexibility of the protein in question. Even though AlphaFold predicts a static structure starting from a sequence, these quality metrics hint to dynamic properties of the proteins as well.

The symbiosis of deep learning: building a neural network ecosystem

AlphaFold combines bioinformatical tools such as multiple sequence alignment with a deep learning approach. The modular nature of the latter ingredient allows to build neural networks on top of AlphaFold’s output. These neural networks could be employed for predicting functional properties. In recent years, geometric deep learning has gained traction [5]. It provides a viewpoint on deep learning focused on designing neural networks in symbiosis with their targeted input domain. Here, this input domain corresponds to the outputs of AlphaFold; three-dimensional coordinates of the protein’s atoms. For these data structures, natural representations are graphs, where nodes represent atoms and edges represent relative orientations between atoms.

The popularity of graph neural networks has surged in recent years, becoming one of the most popular topics in big machine learning conferences such as ICLR. These rapid advances are also adopted in machine learning pipelines predicting protein function from structure. A notable example is paratope/epitope prediction for antibody binding [6,7]. Another approach uses a graph neural network to predict molecular functions indexed by the gene ontology [8]. Although predictions by neural networks are still hard to interpret, methods exist that uncover the input features a model focuses on to make its prediction. These approaches can be used for identifying residues in a protein that play a role in binding ligands or have a catalytic role.

In conclusion, protein structure prediction provides a vital step towards functional characterization of proteins. Given AlphaFold’s results, subsequent modeling and simulations are needed to uncover all relevant properties of unannotated proteins. These modeling efforts will provide to be paramount in the years ahead and building a platform around them will accelerate research in functional protein characterization.

Sources:

[1] John Jumper, Richard Evans, Alexander Pritzel,et al. Highly accurate protein structure prediction with AlphaFold, Nature 596, 583–589

[2] Richard Evans, Michael O’Neill, Alexander Pritzel, et al. Protein complex prediction with AlphaFold-Multimer, bioRxiv 2021.10.04.463034

[3] Alexander Jussupow, Ville R. I. Kaila Effective Molecular Dynamics from Neural-Network Based Structure Prediction Models, bioRxiv 2022.10.17.512476

[4] Diego del Alamo, Davide Sala, Hassane S Mchaourab, Jens Meiler Sampling alternative conformational states of transporters and receptors with AlphaFold2

eLife 11:e75751.

[5] Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges, arXiv:2104.13478

[6] Lewis Chinery, Newton Wahome, Iain Moal, Charlotte M. Deane Paragraph — Antibody Paratope prediction using Graph Neural Networks with minimal feature vectors, bioRxiv 2022.06.10.495640

[7] Jerome Tubiana, Dina Schneidman-Duhovny, Haim J. Wolfson ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction. Nature Methods 19, 730–739 (2022)

[8] Vladimir Gligorijević, P. Douglas Renfrew, Tomasz Kosciolek, et al. Structure-based protein function prediction using graph convolutional networks. Nature Communiations 12, 3168 (2021)

Originally published at https://blog.biostrand.ai on March 22, 2023.

--

--

BioStrand (a subsidiary of IPA)

Software and proprietary solutions for MULTI-omics data analysis. Effective research requires convenient and scalable tools.