AI reveals shared protein signals behind convergent traits across distant species
A research team from the Chinese Academy of Sciences used an advanced protein language model to spot hidden shared features in proteins from unrelated species that evolved similar traits. Think echolocation in bats and dolphins, or flight in birds and insects. The work points to sequence-level patterns that map to function, even when overall sequences look different.
Published in the Proceedings of the National Academy of Sciences, the study argues that an underrated sequence basis contributes to functional trait convergence. Instead of focusing on small, obvious similarities, the model surfaces higher-order signals linked to 3D structure and function.
Why this matters
Convergent evolution has been a useful natural experiment for decades, but most tools compare local sequence motifs or specific residues. This approach scales to whole-protein representations and can capture features that correlate with structure and biophysical behavior. For evolutionary biology, that means clearer hypotheses about how function emerges. For protein engineering, it hints at new routes to design by function rather than strict homology.
What the team did differently
Instead of relying on hand-crafted alignments, the researchers applied an AI model trained on large protein sequence corpora to embed proteins into a high-dimensional space. In that space, proteins supporting similar functions-despite distant ancestry-clustered by shared features. This made it possible to detect signals linked to traits like echolocation or flight without forcing a direct sequence alignment.
Examples that illustrate the signal
Traits such as echolocation in bats and dolphins arise independently but rely on proteins that handle similar sensory processing demands. Flight involves metabolic, muscular, and structural demands that recur across distant lineages. The model's embeddings highlight these convergences by surfacing function-linked features that standard similarity searches often miss.
Implications for research
For evolutionary studies, this provides a quantitative way to test convergence at the protein level beyond anecdotal residue overlaps. For functional genomics, it offers a shortcut to prioritizing candidates in non-model species by mapping them to functional neighborhoods. For protein design, it suggests alternative scaffolds that encode the same functional "signature," widening the search space.
Practical notes and caveats
Embeddings capture statistical correlations; they still need experimental validation. Dataset biases and model training objectives can skew which features are emphasized. A strong workflow pairs these embeddings with orthogonal evidence-structure prediction, mutational scans, and phenotype assays.
How to apply this in your work
- Use protein language model embeddings to cluster sequences by function, not only by homology.
- Compare convergent lineages to extract conserved functional signatures for targets of interest.
- Prioritize variants for wet-lab testing using embedding proximity plus structure and conservation scores.
- Document negative results to refine feature thresholds and reduce false positives.
Further reading
- Proceedings of the National Academy of Sciences (PNAS)
- Convergent evolution overview (Nature Education)
Skill up
If your team is building AI capability for bio research workflows, explore role-based options here: AI courses by job.
Your membership also unlocks: