Foldseek Revolutionizing Protein Structure Search with 3Di Technology

image

Summary

  • AI-Driven Protein Structure Access: AI tools like AlphaFold2 have made nearly all protein structures accessible, leading to massive databases interaction.

  • Challenge in Data Analysis: Traditional methods for comparing protein structures are too slow for large databases.

  • Foldseek Introduction: Foldseek, a new tool by Michel van Kempen's team, enables rapid and efficient protein structure searches.

  • Innovative 3Di Alphabet: Foldseek uses a 3Di alphabet representing amino acid interactions, making structure comparisons faster.

  • Impact and Potential: Foldseek outperforms existing tools, accelerating research in drug discovery, disease understanding, and bioinformatics.

We are currently entering a new phase in the field of biology, where accurate predictions of protein structures are easily accessible for nearly all known proteins. The advancement is forced by artificial intelligence systems such as AlphaFold2 which has resulted in a significant increase in the accessibility of protein structures to the public.

Databases like the European Bioinformatics Institute and ESM Atlas now contain millions of predicted models. The large amount of structural data provides a unique chance for understanding the complex mechanisms of life on a molecular scale. Still, this remarkable advancement in the field of structural biology is hindered by the challenge of effectively searching and analyzing these extensive datasets.

Conventional techniques for comparing protein structures are computationally burdensome, rendering them unfeasible for searching databases that house hundreds of millions of structures. Consider the challenge of locating a particular grain of sand amidst a vast expanse of beach; the endeavour appears overwhelming.
Here is where Foldseek comes into play. Foldseek, created by Michel van Kempen and his team, is an innovative tool that greatly speeds up the search for protein structures. This tool makes it possible to efficiently explore the extensive databases of protein structures.

Recently published in the prestigious scientific journal Nature Biotechnology, this research signifies a significant breakthrough in our capacity to explore and examine the ever-growing collection of protein structures.
The exceptional speed of Foldseek is derived from its innovative methodology for representing protein structures. Foldseek integrates data on the three-dimensional interactions among amino acids within a protein, rather than relying exclusively on the linear sequence of amino acids. The "3Di alphabet" is a representation that captures the fundamental structure of a protein's fold, enabling more efficient comparisons.

How it works

Describing the 3Di alphabet to someone who is not knowledgeable about it is similar to explaining the architectural design of a structure. Instead of describing each individual brick and window (comparable to amino acids), we can establish more general architectural components such as arches, columns, or domes (comparable to 3Di states).

These elements include the general aesthetic and structural arrangement of the building, facilitating quicker comparisons among various architectural styles. The 3Di alphabet is a system that characterizes repetitive arrangements of amino acid interactions, which allows for the representation of the overall structure of a protein and facilitates quicker searches.

The research team emphasizes the innovation of their method in their paper, stating that the 3Di alphabet developed for Foldseek focuses on tertiary residue-residue interactions rather than backbone conformations, which proved crucial for achieving high sensitivities.


Foldseek utilizes the 3Di representation to convert the task of comparing structures into a more efficient problem of aligning sequences. It achieves this by initially transforming both the query protein structure and the database structures into 3Di sequences. Subsequently, it utilizes sophisticated sequence search algorithms, similar to those utilized for discovering similar DNA or protein sequences, to swiftly determine comparable structures.

How Benchmarks performed in Foldseek

Comparative benchmarking studies shows that Foldseek outperforms existing state-of-the-art methods in terms of both speed and accuracy, establishing its groundbreaking capabilities. The Michel van Kempen and his team reported that Foldseek reduces computation times by four to five orders of magnitude, yet maintains sensitivities of 86%, 88%, and 133% compared to Dali, TM-align, and CE, respectively. This highlights the superior performance of Foldseek in comparison to popular tools like Dali, TM-align, and CE. To provide a comparison, the time required for TM-align to perform a search on a single CPU core for a month can be accomplished by Foldseek in a matter of seconds.


The consequences of this speed are significant. Foldseek enables researchers to efficiently explore extensive databases such as AlphaFoldDB, which house vast collections of protein structures. This tool facilitates the identification of related proteins, the discovery of evolutionary connections, and the recognition of potential targets for drug development. The authors emphasize the transformative potential of structure-based analysis, stating that the wide availability of high-quality structures for almost every folded protein is beneficial for biology and bioinformatics.


Foldseek's influence extends fundamental research, offering the potential for accelerating drug discovery, comprehending diseases, and advancing new biotechnologies. This enables researchers from diverse fields to explore the complicated branches of protein structure and provide rapid and convenient protein structure search capabilities.

This empowers researchers to uncover novel findings that could have substantial implications for human health and beyond.

The creation of Foldseek represents a significant advancement in our capacity to utilize the potential of protein structure data, ushering the beginning of a new era in the field of structural bioinformatics.

This article is based on the research paper published in Nature

Subscribe to Free Alerts

Log in to create free customized alerts based on your prefernces

Create Customized Alerts