Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment
arXiv:2603.06748v1 Announce Type: new Abstract: Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbon
arXiv:2603.06748v1 Announce Type: new Abstract: Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc mutation, inference-time biasing, or retraining on property-specific subsets, yet they are target dependent and demand substantial domain expertise or careful hyperparameter tuning. In this paper, we introduce ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained inverse folding models to satisfy diverse developability objectives while preserving structural fidelity. ProtAlign employs a semi-online Direct Preference Optimization strategy with a flexible preference margin to mitigate conflicts among competing objectives and constructs preference pairs using in silico property predictors. Applied to the widely used ProteinMPNN backbone, the resulting model MoMPNN enhances developability without compromising designability across tasks including sequence design for CATH 4.3 crystal structures, de novo generated backbones, and real-world binder design scenarios, making it an appealing framework for practical protein sequence design.
Executive Summary
This article introduces ProtAlign, a multi-objective preference alignment framework for protein sequence design that balances competing properties such as solubility, thermostability, and expression while preserving structural fidelity. The framework fine-tunes pretrained inverse folding models using a semi-online Direct Preference Optimization strategy with a flexible preference margin, constructed from in silico property predictors. Experimental results demonstrate that ProtAlign enhances developability without compromising designability across various tasks, including sequence design, de novo generated backbones, and real-world binder design scenarios. The proposed framework addresses the limitations of existing approaches, which are target dependent and require substantial domain expertise or hyperparameter tuning.
Key Points
- ▸ ProtAlign is a multi-objective preference alignment framework for protein sequence design.
- ▸ The framework fine-tunes pretrained inverse folding models using a semi-online Direct Preference Optimization strategy.
- ▸ ProtAlign preserves structural fidelity and enhances developability without compromising designability.
Merits
Robustness to Multiple Objectives
ProtAlign's multi-objective preference alignment framework allows it to balance competing properties such as solubility, thermostability, and expression.
Flexibility
The semi-online Direct Preference Optimization strategy with a flexible preference margin enables ProtAlign to adapt to different tasks and objectives.
Preservation of Structural Fidelity
ProtAlign fine-tunes pretrained inverse folding models to preserve structural fidelity while enhancing developability.
Demerits
Dependence on Pretrained Models
ProtAlign relies on pretrained inverse folding models, which may not be readily available for all protein sequences.
Complexity of Hyperparameter Tuning
The flexible preference margin and semi-online Direct Preference Optimization strategy may require careful hyperparameter tuning.
Limited Generalizability
ProtAlign's performance may be limited to the specific protein sequences and tasks used in the experimental results.
Expert Commentary
The introduction of ProtAlign represents a significant advancement in the field of protein design, as it addresses the limitations of existing approaches. However, the dependence on pretrained models and the complexity of hyperparameter tuning may limit its adoption in practice. Furthermore, the limited generalizability of ProtAlign's performance may require additional research to fully realize its potential. Overall, ProtAlign is a promising framework that has the potential to enhance the developability of protein sequences and improve the design of novel biotherapeutics and vaccines.
Recommendations
- ✓ Further research is needed to evaluate the generalizability of ProtAlign's performance across different protein sequences and tasks.
- ✓ The development of new methods for hyperparameter tuning and the extension of ProtAlign to other protein design tasks would enhance its practical application.