BioML ZK

Flow matching and diffusion perform comparably on biomolecular structure prediction

Flow matching and diffusion perform comparably on biomolecular structure prediction[1]. This is, to my knowledge, the only head-to-head comparison of these two approaches for protein structure modeling or design. References 1. Gong, C., Chen, X., Zhang, Y., Song, Y., Zhou, H., & Xiao, W. (2025). Protenix-Mini: Efficient Structure Predictor

Not all high-fitness sequences have plausible evolutionary paths from lower-fitness starting points via sequential introduction of mutations

I first saw this precise idea articulated by Weinreich et al. in their aptly named "Darwinian evolution can follow only very few mutational paths to fitter proteins"[1]. Many mutation combinations don't have additive effects on every aspect governing protein fitness (expression, stability, function, etc.), and

Conformational entropy could still matter in miniprotein binder design

Antibody V-regions improve their affinity for targets by both creating more high-energy interactions and reducing the conformational entropy of their antigen-binding loops[1][2]. Entropy's importance in antibody-antigen affinity seems obvious, given that loop residues largely mediate binding[3][4]. But for de novo-designed miniprotein binders, which often

Glutamate- and lysine-rich designs are susceptible to expression failure resulting from adenosine-rich sequences

High rates of glutamate and lysine introduction are a staple of structure-based sequence design by ProteinMPNN and related models, regardless of who trains them[1]. Recently, an analysis of the Bits in Bio competition showed that high glutamate/lysine content is predictive of expression failure[2]: The authors traced this

Essential proteins are more thermostable and over-expressed than non-essential proteins

Several properties of proteins have a sigmoid-like effect on cellular fitness. Thermostability is one of them: beyond a certain threshold, increases in protein thermostability confer no benefit, and decreases impose no penalty[1]: This concept, termed the principle of marginal stability[2], is somewhat contradicted by the observation that essential

Training inverse folding models with label smoothing improves fitness prediction performance

BERT protein language models and inverse folding models learn to predict masked tokens[1]; coevolutionary patterns and propensities are expected to emerge from training in an unsupervised manner[2]. This works fine when training on sequence data, which is plentiful. However, it is not clear that there are enough experimental

Training protein structure-based neural networks exclusively on predicted protein structures worsens performance on experimental structures due to the training data's idealized local geometry

Predicted protein structures, particularly monomeric structures, have become ubiquitous thanks to the release of the AlphaFold Database[1] and its successors[2]. Yet training structure-based neural networks exclusively on these synthetic structures has now been widely shown to worsen performance on experimental structures. Hsu et al., who trained the structure-based

Language models matter less than search algorithms for difficult protein design tasks

Two papers from the last week show how search algorithms improve the performance of deep learning-based protein design when design restraints are available. The first compared AbLang-2, ESM-2, and other models, finding middling success rates and low variance when used as-is[1]. However, a noticeable performance boost was observed when

Diffusion-based structure prediction can be guided by backpropagating to the conditioning embeddings rather than the atomic coordinates directly, and such embeddings can be re-refined in subsequent iterations

Diffusion-based biomolecular structure prediction, which is used in latest-generation methods like AlphaFold3[1] and BioEmu[2], can be guided or steered into specific conformations by backpropagating to the conditioning representations rather than the atomic coordinates being diffused [3][4]. This was recently shown by two methods, EmbedOpt and IT-Optimization. There

Antibodies show evidence of significant epistasis in biophysical properties but not binding

Antibodies undergo somatic mutation during affinity maturation in ways that show evidence of epistasis in biophysical properties like expression and polyspecificity, but not binding [1][2]. In a study by Kirby et al 2025[1:1] on 12 anti-SARS-CoV-2 antibodies using deep mutational scanning and yeast display, no combination of

Antibody-antigen complex prediction by AF3-generation methods is data-limited

The recent report of Protenix-v1[1] contained an architecturally identical second model trained with a more recent train:test split. Unlike other outstanding problems in biomolecular structure prediction, antibody-antigen complex prediction showed remarkable improvement in the fraction of models achieving a cutoff of DockQ≥0.23, shown in the third

Latest