UniESA: a unified data-driven framework for enzyme stereoselectivity and activity prediction†
Abstract
High stereoselectivity and activity are the keys for the application of enzymes in the industrial context. Rational design of enzymes with improved fitness is essential in developing and optimizing enzymes for applications in multiple scenarios. However, the current in silico prediction tools show limited performance in the analysis of the effects of single or multiple amino acid substitutions. Here, we developed UniESA, a unified framework based on the AAindex or protein language model encoding, signal processing refinement and machine learning regression. Enzyme sequences were numerically represented as the input to build sequence-stereoselectivity/activity models. UniESA demonstrated satisfactory performance on six public datasets and successfully revealed two mutants with strict stereoselectivity (dep > 99.5%) and one mutant with activity improved by 2.8-fold in the carbonyl reductase evolution task, requiring only one-tenth or even one-thousandth of the experimental workload necessary for traditional directed evolution. In essence, UniESA is a valuable tool that enables novel insights into enzyme engineering, dramatic experimental workload reduction, and the discovery of high-fitness enzymes in green industrial applications.