Agent_MPO is slower than Agent_HER2 on learning good HER2 specificity.C.Learning curve of FvNetCharge. antibody sequences with an improved success rate than the traditional propose-then-filter approach. It has the potential to be used in practical antibody design, thus empowering the antibody discovery and development process. The source code of AB-Gen is freely available at Zenodo (https://doi.org/10.5281/zenodo.7657016) and BioCode (https://ngdc.cncb.ac.cn/biocode/tools/BT007341). Keywords:Protein design, Transformer, Reinforcement learning, Generative modeling, Multi-objective optimization == Introduction == Antibodies have become an increasingly important therapeutic for many diseases, because of their capabilities to bind to antigens with high specificity and affinity[1],[2]. To discover antibodies with high specificity, hybridomas and phage display methods are typically used, which can discover potential lead candidates. However, the lead optimization Thrombin Inhibitor 2 process usually takes up the majority of the preclinical discovery and development cycle, where the lead candidates discovered are further optimized with multiple properties, including pharmacokinetics, solubility, viscosity, expression levels, and immunogenicity[3],[4],[5]. This is largely due to the low throughput in the late-stage development, and addressing one issue usually causes another[6]. In recent years, especially after the success of AlphaFold2[7],de novoprotein design has gained attention and several methods have been developed to design proteins with certain structures[8],[9]. For example, RFDesign was proposed to design proteins with specific functions, such as immunogen, enzyme activity, and proteinprotein interaction[10]. These methods are guided by structure-based constraints and targeted to design novel protein sequences with certain structure patterns, thus new functions[8],[10],[11]. Though promising, these methods are not designed to optimize properties that have no clear associations with structures, such as solubility and viscosity, thus not suitable for the multi-property optimization task in antibody design. In silicoantibody design is an emergent topic with notable progress. A few Thrombin Inhibitor 2 deep learning methods have been proposed to generate novel antibody sequences. An auto-regressive dilated convolutional neural network was trained on 1.2 million natural nanobody sequences, and used to generate complementarity determining region 3 (CDR3) sequences[12]. Their designed library was filtered from the model-generated sequences and showed better expression than a 1000-fold larger synthetic library. It demonstrated the power of generative models in learning the space of antibodies that can be expressed. Another work pre-trained a long short-term memory (LSTM)[13]on 70,000 heavy chain complementarity determining region 3 (CDRH3) sequences and fine-tuned on molecular docking datasets or with experimentally validated predictors to generate high affinity sequences against antigens[6],[14]. Besides, Transformers[15]were also used to design antibody sequences. One work[16]used a Transformer decoder[17]to generate CDRH3 sequences. Their model was trained on 558 million antibody variable region sequences, conditioning on chain type and species-of-origin, and demonstrated a better design than random baselines. Thrombin Inhibitor 2 Another work[18]used a Transformer encoder to separate human and non-human sequences. This Thrombin Inhibitor 2 model can separate human and non-human sequences with high accuracy, thus guiding the humanization of antibody sequences. Although these studies showed the power of generative models to learn useful information on antibody sequences, none of them aimed at solving the multi-property optimization problem in antibody design. In this study, we developed a reinforcement learning (RL) framework, called AB-Gen, to design antibody libraries that fulfill multi-property constraints. Specifically, we used AB-Gen to explore the CDRH3 sequence space, which contains the highest diversity in antibodies. More than 75 million CDRH3 sequences were obtained from CLEC4M the Observed Antibody Space (OAS) database[19]to train a prior model. A generative pre-trained Transformer (GPT) was used as the policy network of the agent and the prior model was used to initiate it. We trained AB-Gen with two different settings to illustrate the improvement from the multi-property optimization. In the first setting, an agent, named Agent_HER2, was trained to only optimize human epidermal growth factor receptor-2 (HER2) specificity[6]and in the second setting, another agent, named Agent_MPO, was trained to optimize multiple desirable properties, including HER2 specificity, major histocompatibility complex (MHC) II affinity[20], clearance, and viscosity[4]. From the results, we showed that the prior model could learn the sequence space of CDRH3s and generate sequences with similar property distributions to the training dataset. Thrombin Inhibitor 2 Besides, both Agent_HER2 and Agent_MPO were capable of generating novel CDRH3 sequences that fulfilled the predefined property constraints, but Agent_MPO achieved an apparently higher success rate in generating sequences of desirable properties. Finally, an antibody library targeting HER2 was designed and highly conserved residues among the generated sequences were found. The importance.