CN116741260B - Antibody structure optimization method and device based on deep learning model - Google Patents

Antibody structure optimization method and device based on deep learning model Download PDF

Info

Publication number
CN116741260B
CN116741260B CN202311009931.3A CN202311009931A CN116741260B CN 116741260 B CN116741260 B CN 116741260B CN 202311009931 A CN202311009931 A CN 202311009931A CN 116741260 B CN116741260 B CN 116741260B
Authority
CN
China
Prior art keywords
antibody
antigen
optimization
model
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311009931.3A
Other languages
Chinese (zh)
Other versions
CN116741260A (en
Inventor
司马鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Chuangteng Software Co ltd
Original Assignee
Suzhou Chuangteng Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Chuangteng Software Co ltd filed Critical Suzhou Chuangteng Software Co ltd
Priority to CN202311009931.3A priority Critical patent/CN116741260B/en
Publication of CN116741260A publication Critical patent/CN116741260A/en
Application granted granted Critical
Publication of CN116741260B publication Critical patent/CN116741260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Biotechnology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an antibody structure optimization method and device based on a deep learning model, wherein the method comprises the following steps: obtaining an antigen structure file and an antibody structure file, and respectively carrying out nonstandard residue treatment on the antigen structure file and the antibody structure file; docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex; inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model so as to output an optimization result according to a pre-input optimization task; the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network. The technical problem that an antibody sequence and a structure with specific functions cannot be generated efficiently in the prior art is solved.

Description

Antibody structure optimization method and device based on deep learning model
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an antibody structure optimization method and device based on a deep learning model.
Background
Antibodies are important immune proteins produced in immune responses, and are widely developed and used for the treatment of cancer, infectious diseases, inflammation, and the like. An antibody comprises two heavy chains and two light chains, the overall structure of which is similar, and six variable regions determine the specificity of the antibody for antigen, which are referred to as Complementarity Determining Regions (CDRs), denoted h_cdr1, h_cdr2, h_cdr3, l_cdr1, l_cdr2 and l_cdr3, respectively, where H represents the heavy chain and L represents the light chain. Thus, the most important step in developing an effective therapeutic antibody is to design complementarity determining regions that bind to a particular antigen, and the evolutionary process by which antibodies bind to antigens with higher affinity and specificity is known as affinity maturation. Traditional methods of experiments to increase affinity maturation of antibodies are expensive, complex and time consuming.
The disclosed antibody optimization design calculation method relies on sampling protein sequences and structures from complex biophysical energy functions, which are often time consuming and prone to local optima. While deep learning models are changing the work in the areas of protein structure prediction, engineering and design. At present, some protein design methods based on a deep learning model exist, but the design task of an antibody is mainly focused on a CDR, and the CDR is a region with high variability and flexibility, so that the design difficulty is great. In recent years, deep learning methods have shown potential in antibody design by generating protein sequences using language models. Sequence-based methods, while more efficient, can only produce new antibodies based on previously observed antibodies, and it is difficult to produce antibodies directed against specific antigen structures.
Therefore, providing a method and a device for optimizing an antibody structure based on a deep learning model, so as to be capable of efficiently generating an antibody sequence and a structure with specific functions, is a problem to be solved by those skilled in the art.
Disclosure of Invention
Therefore, the embodiment of the invention provides an antibody structure optimization method and device based on a deep learning model, so as to solve the technical problem that an antibody sequence and structure with specific functions cannot be generated efficiently in the prior art.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
the invention provides an antibody structure optimization method based on a deep learning model, which comprises the following steps:
obtaining an antigen structure file and an antibody structure file, and respectively carrying out nonstandard residue treatment on the antigen structure file and the antibody structure file;
docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex;
inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model so as to output an optimization result according to a pre-input optimization task;
the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network.
In some embodiments, training is performed by using antigen-antibody complex samples and optimization task parameters based on a diffusion probability model and a constant neural network to obtain the antibody structure optimization model, which specifically comprises the following steps:
determining a forward diffusion process and generating a Markov chain of the diffusion process based on the diffusion probability model;
generating an initial distribution state according to the antigen-antibody complex sample and the optimization task parameters;
inputting the initial distribution state into the diffusion probability model, and gradually adding noise to the data through a forward diffusion process until the data distribution approximately reaches the prior distribution state;
starting from the prior distribution state by utilizing the isomorphous neural network through the generation and diffusion process, iteratively converting the isomorphous neural network into a generation and analysis state so as to obtain the antibody structure optimization model.
In some embodiments, the constant neural network is utilized to iteratively transform the prior distribution state into a generated component state by generating a diffusion process, specifically comprising:
initializing the CDRs with any sequence, position and orientation of the antibody using the antigen-antibody complex sample as input;
information is aggregated from the antigen and antibody frameworks, iteratively updating the amino acid type, position and orientation of each amino acid on the CDRs to yield a generated profile.
In some embodiments, the optimization task includes at least one of:
sequence-structure co-design;
designing a sequence for a given skeleton structure;
structural design is carried out for a given sequence;
antibody regeneration.
In some embodiments, the diffusion probability model comprises two parameterized Markov chains and uses variational inference to generate samples consistent with the original data distribution after a finite time;
the Markov chain comprises a forward chain and a reverse chain, wherein the forward chain is used for gradually adding Gaussian noise to data according to a pre-designed noise progress until the distribution of the data tends to be prior distribution; the reverse chain is used to learn to gradually recover the raw data distribution starting from a given a priori and using parameterized gaussian transformation kernels to generate an antibody structure optimization model.
In some embodiments, the initial distribution state of the diffusion probability model includes the original structure, sequence, and CDR states of the antigen-antibody complex.
In some embodiments, the network mechanism of the constant neural network comprises an embedded layer, a coding layer and an output layer, wherein the embedded layer comprises an amino acid embedded layer and a pair of embedded layers;
the amino acid embedding layer is used for extracting the embedding of the amino acid type, the local coordinates of heavy atoms of the amino acid, the dihedral angle of an amino acid framework and the marking information of a CDR region;
the paired embedding layers are used for extracting the amino acid types, the sequence relative positions and the paired distances of two amino acids;
the coding layer is used for coding the current diffusion state so as to capture the relation among amino acids and provide advanced characterization for each residue for denoising;
and the output layer is used for outputting the diffusion state result obtained by the coding layer.
The invention also provides an antibody structure optimization device based on the deep learning model, which comprises:
the data acquisition unit is used for acquiring an antigen structure file and an antibody structure file, and respectively carrying out nonstandard residue treatment on the antigen structure file and the antibody structure file;
the data processing unit is used for butting the antigen structure file processed by the non-standard residues with the antibody structure file processed by the non-standard residues to obtain an antigen-antibody complex;
the result generation unit is used for inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model so as to output an optimization result according to a pre-input optimization task;
the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.
According to the method and the device for optimizing the antibody structure based on the deep learning model, provided by the invention, the antigen structure file and the antibody structure file are obtained, and the antigen structure file and the antibody structure file are respectively subjected to nonstandard residue treatment; docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex; the antigen-antibody complex is used as an input file to be input into a pre-trained antibody structure optimization model, so that an optimization result can be output according to a pre-input optimization task, and the technical problem that an antibody sequence and a structure with specific functions cannot be generated efficiently in the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.
FIG. 1 is a flowchart of an antibody structure optimization method based on a deep learning model according to the present invention;
FIG. 2 is a schematic flow chart of parameter selection in the method for optimizing antibody structure provided by the invention;
FIG. 3 is a second flowchart of the method for optimizing antibody structure based on deep learning model according to the present invention;
FIG. 4 is a third flowchart of the method for optimizing antibody structure based on deep learning model according to the present invention;
FIG. 5 is a flowchart of a method for optimizing antibody structure based on deep learning model according to the present invention;
FIG. 6 is one of the flow charts of model training provided by the present invention;
FIG. 7 is a second flow chart of model training provided by the present invention;
FIG. 8 is a diagram showing one of the protein structures of the embodiment of the present invention;
FIG. 9 is a second protein structure diagram of an embodiment of the present invention;
FIG. 10 is a block diagram of an antibody structure optimization device based on a deep learning model according to the present invention;
fig. 11 is a block diagram of a computer device according to the present invention.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to solve the technical problem that an antibody sequence and a structure with specific functions cannot be generated efficiently in the prior art, the invention provides an antibody structure optimization method based on a deep learning model, and the invention optimizes the antigen binding activity of the existing antibody based on a constant diffusion probability model and a Complementary Determining Region (CDR) of the deep learning model combined with specific antigens so as to realize affinity maturation of the antibody and greatly reduce the experimental cost; in the actual use scene, the antibody structure optimization model provided by the invention can be embedded into an OpenMM (open source molecular simulation software) program to realize the antibody design with atomic resolution, repair the side chain residue atoms of the antibody-antibody complex after the optimization of the scheme, and realize energy minimization.
Referring to fig. 1, fig. 1 is a flowchart of an antibody structure optimization method based on a deep learning model according to the present invention.
In a specific embodiment, the method for optimizing the antibody structure based on the deep learning model provided by the invention comprises the following steps:
s110: obtaining an antigen structure file and an antibody structure file, and respectively carrying out nonstandard residue treatment on the antigen structure file and the antibody structure file;
s120: docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex;
s130: inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model so as to output an optimization result according to a pre-input optimization task; among these, the optimization tasks include four types, including sequence-structure co-design, sequence design for a given framework structure, structure design for a given sequence, and antibody regeneration.
The antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network.
In an actual use scene, the antibody structure optimization model provided by the invention is embedded into a program of an OpenMM, and an antibody CDR generation component is embedded on an interface of an Open source platform, wherein the component has an output model number option and is the number of antigen-antibody complex models after CDR optimization output by an algorithm. In the algorithm flow, the overall parameter selection is as shown in fig. 2, the optimization setting and the output setting are carried out according to the input component parameters, and the CDR region and the optimization region are regenerated through the optimization setting; after regenerating the CDR region, carrying out back mutation if the generation is completed, and carrying out back mutation step number; if the generation is not completed, the structural characteristics are reserved so as to reserve the original CDR region protein skeleton and sequence. For the existing antigen-antibody complex, the antigen-antibody docking procedure is not needed, and the workflow shown in figure 3 can be directly used for completing the task of optimizing the antigen binding activity of the existing antibody. The output result of the antibody structure optimization model includes the name of each output antibody (named according to the input antibody name, the selected CDR region, and the number of output models), the Root Mean Square Deviation (RMSD) of the optimized structure and the original structure, the optimized structure file, the Open MM repaired structure file, and the structure display.
In some embodiments, the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and a constant neural network, as shown in fig. 4, and specifically comprises the following steps:
s410: determining a forward diffusion process and generating a Markov chain of the diffusion process based on the diffusion probability model; the initial distribution state of the diffusion probability model comprises the original structure, sequence and CDR state of the antigen-antibody complex.
S420: generating an initial distribution state according to the antigen-antibody complex sample and the optimization task parameters;
s430: inputting the initial distribution state into the diffusion probability model, and gradually adding noise to the data through a forward diffusion process until the data distribution approximately reaches the prior distribution state;
s440: starting from the prior distribution state by utilizing the isomorphous neural network through the generation and diffusion process, iteratively converting the isomorphous neural network into a generation and analysis state so as to obtain the antibody structure optimization model.
Specifically, the flow based on the diffusion probability model and the constant-change neural network is as follows:
1. the diffusion probability model defines a Markov chain of two diffusion processes, namely a forward diffusion process and a generated diffusion process. The forward diffusion process gradually adds noise to the data until the data distribution approximates the a priori distribution. The generation diffusion process starts with an a priori distribution and iteratively converts it into the desired distribution.
2. The training model relies on a forward diffusion process to model noise data.
3. The constant neural network is used for parameterized diffusion processes, where each layer is constant and can handle different spatial transformations.
4. Sampling algorithms are used for a variety of antibody design tasks, including sequence structure co-design, fixed framework CDR design, antibody optimization, and the like.
In the above specific embodiment, the optimization method provided by the invention learns to generate data by denoising the samples distributed a priori, and the isomorphic neural network can process non-euclidean data, such as rotation and translation operations in protein structure; combinations of these methods can be used to jointly model sequences and structures and generate protein molecules with specific functions. Thus, the model can be highly competitive in terms of binding affinity as measured by biophysical energy functions and other protein design metrics. The invention firstly builds and trains the constant diffusion deep learning model, and then completes various antibody design tasks by using the trained model.
In this embodiment, the present invention optimizes antigen binding activity of an existing antibody to achieve antibody affinity maturation using a depth generation model that can be modeled in conjunction with the sequence and structure of the Complementarity Determining Regions (CDRs) of the antibody based on a diffusion probability model and an isomorphous neural network. More importantly, the joint distribution of CDR sequences and their structures is directly dependent on antigen structure. Given a protein complex consisting of antigen and antibody framework as input, CDRs are initialized with any sequence, position and orientation of the antibody. The diffusion probability model first aggregates information from antigen and antibody frameworks and then iteratively updates the amino acid type, position and orientation of each amino acid on the CDRs. The model can perform four functional designs of sequence-structure joint design, sequence design (framework fixation only allows residue mutation) for a given framework structure, structure design (sequence is unchanged and only structure optimization) for a given sequence, and antibody regeneration (the optimization process does not refer to the original CDR region structure). Finally, the side chains are filled using the Open MM algorithm, reconstructing the CDR structure at the atomic level. The antibody structure optimization model provided by the invention is embedded in an open source system by an antibody CDR generation assembly, and the processing flow on a platform is shown in figure 5.
In some embodiments, the constant neural network is utilized to iteratively transform the prior distribution state into a generated component state by generating a diffusion process, specifically comprising:
initializing the CDRs with any sequence, position and orientation of the antibody using the antigen-antibody complex sample as input;
information is aggregated from the antigen and antibody frameworks, iteratively updating the amino acid type, position and orientation of each amino acid on the CDRs to yield a generated profile.
Wherein the diffusion probability model comprises two parameterized Markov chains and uses variation inference to generate samples consistent with the original data distribution after a finite time; the Markov chain comprises a forward chain and a reverse chain, wherein the forward chain is used for gradually adding Gaussian noise to data according to a pre-designed noise progress until the distribution of the data tends to be prior distribution; the reverse chain is used to learn to gradually recover the raw data distribution starting from a given a priori and using parameterized gaussian transformation kernels to generate an antibody structure optimization model.
Specifically, in one specific use scenario, training is performed based on the diffusion probability model and the constant-variation neural network to obtain the flow of the antibody structure optimization model is shown in fig. 6. The diffusion probability model consists of two parameterized Markov chains and uses variational inference to generate samples consistent with the original data distribution after a finite time. The forward chain acts to perturb the data, gradually adding gaussian noise to the data according to a pre-designed noise schedule, until the distribution of the data tends to be a priori, i.e., a standard gaussian distribution. The reverse chain starts from a given a priori and uses parameterized gaussian transformation kernels to learn to recover the raw data distribution step by step, thus yielding a highly flexible and computationally easy generative model.
The data set used for training the model is from the SAbdab database, the structure with the resolution lower than that of 4A is removed, and the original structure, sequence and CDR state of the antigen-antibody complex in the data set are the initial distribution state in the diffusion probability model.
The forward diffusion process is characterized in that Gaussian noise is added to the forward chain until the data distribution tends to be prior distribution (standard Gaussian distribution), the noise is mainly added to skeleton atom coordinate data of an antigen-antibody complex, and the coordinates of the whole structure can be scaled and moved to enable the distribution of the atom coordinates to be approximately matched with the standard normal distribution, so that the data set of the invention enters a prior distribution state.
The reverse chain uses a parameterized Gaussian conversion core from a priori distribution state, a process of learning to gradually restore original data distribution is a generation and diffusion process of the invention, the process replaces the Gaussian conversion core with a constant-change neural network to realize the task of generating a new antigen-antibody structure by noise reduction, and of course, the invention can realize four different design tasks in the technical scheme by training and sampling a constraint part state, and the sampled new antigen-antibody structure state is the generation and distribution state of the invention.
In some embodiments, the network mechanism of the constant neural network comprises an embedded layer, a coding layer and an output layer, wherein the embedded layer comprises an amino acid embedded layer and a pair of embedded layers;
the amino acid embedding layer is used for extracting the embedding of the amino acid type, the local coordinates of heavy atoms of the amino acid, the dihedral angle of an amino acid framework and the marking information of a CDR region;
the paired embedding layers are used for extracting the amino acid types, the sequence relative positions and the paired distances of two amino acids;
the coding layer is used for coding the current diffusion state so as to capture the relation among amino acids and provide advanced characterization for each residue for denoising;
and the output layer is used for outputting the diffusion state result obtained by the coding layer.
Specifically, as shown in fig. 7, the flow of the constant-change neural network is mainly composed of an embedded layer, a coding layer and an output layer, wherein the embedded layer is composed of an amino acid embedded layer and a pair of embedded layers. Wherein the amino acid embedding layer comprises extracting the following information: embedding of amino acid types (each of the 20 amino acid types is represented by an embedding vector), local coordinates of amino acid heavy atoms, amino acid backbone dihedral angles, and CDR region tags, and the like. The pairwise embedding layer includes information that extracts the relationship between two residues: amino acid types of two amino acids, sequential relative positions, paired distances, and the like. The coding layer for coding the current diffusion state consists of a 3-dimensional multi-head attention layer stack of direction perception. The purpose is to capture the relationship between amino acids and provide advanced characterization for each residue for denoising. To train the model, the invention samples the time step, the spatial rotation translation is constant, then adds noise to the training samples by using a diffusion process, uses the noise data to calculate the loss, and counter-propagates the loss to update the model parameters. For the generation of the diffusion process, at each step, the isomorphous neural network takes as input the current antigen-antibody complex and CDR states, and parameterizes the distribution of CDR sequences, positions and orientations for use in the next step. And finally, constructing a full-atomic structure by using a side chain filling algorithm for outputting, namely an output layer of the isomorphous neural network.
The Root Mean Square Deviation (RMSD) of the CDRs of the training generated novel antibodies is shown in the table below:
in order to facilitate understanding, the following specific use scenario is taken as an example, and the implementation process of the antibody structure optimization method provided by the invention is briefly described.
The method relies on a platform, and an antibody structure optimization model is embedded into the platform in the form of an antibody CDR generation assembly. In the execution process, firstly, the antigen and antibody structure files are read in through a read structure file assembly, then macromolecule nonstandard residue treatment is carried out through a macromolecule pretreatment assembly, antigen-antibody docking can be carried out after completion, and the docked antigen-antibody complex can be used as an input file of an antibody CDR generation assembly of the invention, wherein the input of a plurality of antigen-antibody complex structures is supported.
The optimized region parameters of the "antibody CDR-generating" module include six CDR regions: h_cdr1, h_cdr2, h_cdr3, l_cdr1, l_cdr2 and l_cdr3, where H represents the heavy chain and L represents the light chain, which are multi-selectable, the model operates with optimization of only selected CDR regions. The optimization task parameters of the "antibody CDR generation" component are: (1) sequence-structure co-design; (2) Sequence design for a given backbone structure (backbone fixation, only residue mutations allowed); (3) Structural design (sequence is unchanged, only structural optimization) is carried out on a given sequence (4) four functional designs of antibody regeneration (the optimization process does not refer to the original CDR region structure) are carried out.
When the task (1) is selected, the sequence and the framework structure of the CDR region are allowed to be changed simultaneously, and because the optimization algorithm is a diffusion probability model, the function of back mutation can be realized by generating a diffused denoising step, and the task option is opened, and the mutation step of part of the CDR region can be changed into the back mutation step, so that the invention adds the parameter option of the number of the back mutation steps to the task (1). Selecting the CDR region regeneration, clicking the reverse mutation step to be yes, and filling the reverse mutation step number. The larger the number of steps of the procedure used for back mutation in the whole optimization process, the more similar the generated CDR region is to the original CDR region, but the weaker the optimization effect on the affinity of the CDR region may be. The total number of mutation steps for CDR region optimization was 100, so the maximum number of back mutation steps that could be set was 100. Setting 0 for this term is equivalent to fully regenerating the CDR region, and setting 100 should result in the original CDR region being entered.
In the selection of task (2), the "retaining original CDR region protein skeleton" of the "retaining structural features" is selected.
In the selection of task (3), the "original CDR sequence" of "structural feature retention" is selected, and the optimized structure may have a CDR region protein chain broken, as shown in the box selection in fig. 8.
The output structure is checked, and the Open MM is used for complementary repair of side chain atoms of all broken protein structures, and a force constant is set for fixing atoms except for the selected CDR region during the optimization of the Open MM, so that energy minimization is only carried out on the selected CDR region atoms, and the atoms with missing residues are repaired, and the effect is shown in fig. 9.
When selecting task (4), selecting "regenerating CDR region" and then "making back mutation" clicking "no" can be used.
According to the invention, the antigen binding activity of the existing antibody is optimized by adopting a generation model based on constant diffusion, and the model can iteratively generate CDR candidates in a sequence structural space, so that the sampling process can be interfered and constraint can be applied to support a wider design task; the model also takes into account the 3D structure of the antigen to perform antibody sequence structural design; in addition, the method designs the side chain orientation of each amino acid, and realizes the design of the antibody with atomic resolution.
In the specific embodiment, according to the antibody structure optimization method based on the deep learning model, the antigen structure file and the antibody structure file are obtained, and the antigen structure file and the antibody structure file are respectively subjected to nonstandard residue treatment; docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex; the antigen-antibody complex is used as an input file to be input into a pre-trained antibody structure optimization model, so that an optimization result can be output according to a pre-input optimization task, and the technical problem that an antibody sequence and a structure with specific functions cannot be generated efficiently in the prior art is solved.
In addition to the above method, the present invention also provides an antibody structure optimization device based on a deep learning model, as shown in fig. 10, the device includes:
a data acquisition unit 1010, configured to acquire an antigen structure file and an antibody structure file, and perform nonstandard residue processing on the antigen structure file and the antibody structure file respectively;
a data processing unit 1020 for docking the antigen structure file after the nonstandard residue treatment with the antibody structure file after the nonstandard residue treatment to obtain an antigen-antibody complex;
a result generating unit 1030 for inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model to output an optimization result according to a pre-input optimization task;
the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network.
In some embodiments, training is performed by using antigen-antibody complex samples and optimization task parameters based on a diffusion probability model and a constant neural network to obtain the antibody structure optimization model, which specifically comprises the following steps:
determining a forward diffusion process and generating a Markov chain of the diffusion process based on the diffusion probability model;
generating an initial distribution state according to the antigen-antibody complex sample and the optimization task parameters;
inputting the initial distribution state into the diffusion probability model, and gradually adding noise to the data through a forward diffusion process until the data distribution approximately reaches the prior distribution state;
starting from the prior distribution state by utilizing the isomorphous neural network through the generation and diffusion process, iteratively converting the isomorphous neural network into a generation and analysis state so as to obtain the antibody structure optimization model.
In some embodiments, the constant neural network is utilized to iteratively transform the prior distribution state into a generated component state by generating a diffusion process, specifically comprising:
initializing the CDRs with any sequence, position and orientation of the antibody using the antigen-antibody complex sample as input;
information is aggregated from the antigen and antibody frameworks, iteratively updating the amino acid type, position and orientation of each amino acid on the CDRs to yield a generated profile.
In some embodiments, the optimization task includes at least one of:
sequence-structure co-design;
designing a sequence for a given skeleton structure;
structural design is carried out for a given sequence;
antibody regeneration.
In some embodiments, the diffusion probability model comprises two parameterized Markov chains and uses variational inference to generate samples consistent with the original data distribution after a finite time;
the Markov chain comprises a forward chain and a reverse chain, wherein the forward chain is used for gradually adding Gaussian noise to data according to a pre-designed noise progress until the distribution of the data tends to be prior distribution; the reverse chain is used to learn to gradually recover the raw data distribution starting from a given a priori and using parameterized gaussian transformation kernels to generate an antibody structure optimization model.
In some embodiments, the initial distribution state of the diffusion probability model includes the original structure, sequence, and CDR states of the antigen-antibody complex.
In some embodiments, the network mechanism of the constant neural network comprises an embedded layer, a coding layer and an output layer, wherein the embedded layer comprises an amino acid embedded layer and a pair of embedded layers;
the amino acid embedding layer is used for extracting the embedding of the amino acid type, the local coordinates of heavy atoms of the amino acid, the dihedral angle of an amino acid framework and the marking information of a CDR region;
the paired embedding layers are used for extracting the amino acid types, the sequence relative positions and the paired distances of two amino acids;
the coding layer is used for coding the current diffusion state so as to capture the relation among amino acids and provide advanced characterization for each residue for denoising;
and the output layer is used for outputting the diffusion state result obtained by the coding layer.
In the above specific embodiment, according to the antibody structure optimization device based on the deep learning model provided by the invention, the antigen structure file and the antibody structure file are obtained, and the antigen structure file and the antibody structure file are respectively subjected to nonstandard residue treatment; docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex; the antigen-antibody complex is used as an input file to be input into a pre-trained antibody structure optimization model, so that an optimization result can be output according to a pre-input optimization task, and the technical problem that an antibody sequence and a structure with specific functions cannot be generated efficiently in the prior art is solved.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and model predictions. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The model predictions of the computer device are used to store static information and dynamic information data. The network interface of the computer device is used for communicating with an external terminal through a network connection. Which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It will be appreciated by those skilled in the art that the structure shown in FIG. 11 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
Corresponding to the above embodiments, the present invention further provides a computer storage medium, which contains one or more program instructions. Wherein the one or more program instructions are for being executed with the method as described above.
The present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program being capable of performing the above method when being executed by a processor.
In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The processor may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.
The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory.
The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (Direct Rambus RAM, DRRAM).
The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.
Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the invention.

Claims (9)

1. An antibody structure optimization method based on a deep learning model, which is characterized by comprising the following steps:
obtaining an antigen structure file and an antibody structure file, and respectively carrying out nonstandard residue treatment on the antigen structure file and the antibody structure file;
docking the antigen structure file treated by the non-standard residues with the antibody structure file treated by the non-standard residues to obtain an antigen-antibody complex;
inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model so as to output an optimization result according to a pre-input optimization task;
the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network;
the optimization task comprises sequence-structure joint design; designing a sequence for a given skeleton structure; structural design and antibody regeneration for a given sequence;
the antibody structure optimization model is embedded into an OpenMM program, an antibody CDR generation assembly is embedded into an interface of an Open source platform, optimization setting and output setting are carried out according to input assembly parameters, and a CDR region and an optimization region are regenerated through the optimization setting; after regenerating the CDR region, carrying out back mutation if the generation is completed, and carrying out back mutation step number; if the generation is not completed, the structural characteristics are reserved so as to reserve the original CDR region protein skeleton and sequence.
2. The antibody structure optimization method according to claim 1, wherein the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and a constant neural network, and specifically comprises the following steps:
determining a forward diffusion process and generating a Markov chain of the diffusion process based on the diffusion probability model;
generating an initial distribution state according to the antigen-antibody complex sample and the optimization task parameters;
inputting the initial distribution state into the diffusion probability model, and gradually adding noise to the data through a forward diffusion process until the data distribution approximately reaches the prior distribution state;
starting from the prior distribution state by utilizing the isomorphous neural network through the generation and diffusion process, iteratively converting the isomorphous neural network into a generation and analysis state so as to obtain the antibody structure optimization model.
3. The antibody structure optimization method according to claim 2, characterized in that it is iteratively transformed into a generator-diversity state by generating a diffusion process starting from the a priori-diversity state by using a constant neural network, comprising in particular:
initializing the CDRs with any sequence, position and orientation of the antibody using the antigen-antibody complex sample as input;
information is aggregated from the antigen and antibody frameworks, iteratively updating the amino acid type, position and orientation of each amino acid on the CDRs to yield a generated profile.
4. A method of antibody structure optimization according to any one of claims 1-3, wherein the diffusion probability model comprises two parameterized markov chains and uses variance inference to generate samples consistent with the original data distribution after a finite time;
the Markov chain comprises a forward chain and a reverse chain, wherein the forward chain is used for gradually adding Gaussian noise to data according to a pre-designed noise progress until the distribution of the data tends to be prior distribution; the reverse chain is used to learn to gradually recover the raw data distribution starting from a given a priori and using parameterized gaussian transformation kernels to generate an antibody structure optimization model.
5. The method of claim 4, wherein the initial distribution state of the diffusion probability model comprises the original structure, sequence and CDR states of the antigen-antibody complex.
6. The method for optimizing antibody structure according to any one of claims 1 to 3, wherein the network mechanism of the constant neural network comprises an embedding layer, a coding layer and an output layer, wherein the embedding layer comprises an amino acid embedding layer and a paired embedding layer;
the amino acid embedding layer is used for extracting the embedding of the amino acid type, the local coordinates of heavy atoms of the amino acid, the dihedral angle of an amino acid framework and the marking information of a CDR region;
the paired embedding layers are used for extracting the amino acid types, the sequence relative positions and the paired distances of two amino acids;
the coding layer is used for coding the current diffusion state so as to capture the relation among amino acids and provide advanced characterization for each residue for denoising;
and the output layer is used for outputting the diffusion state result obtained by the coding layer.
7. An antibody structure optimization device based on a deep learning model, which is characterized by comprising:
the data acquisition unit is used for acquiring an antigen structure file and an antibody structure file, and respectively carrying out nonstandard residue treatment on the antigen structure file and the antibody structure file;
the data processing unit is used for butting the antigen structure file processed by the non-standard residues with the antibody structure file processed by the non-standard residues to obtain an antigen-antibody complex;
the result generation unit is used for inputting the antigen-antibody complex as an input file into a pre-trained antibody structure optimization model so as to output an optimization result according to a pre-input optimization task;
the antibody structure optimization model is obtained by training an antigen-antibody complex sample and optimization task parameters based on a diffusion probability model and an isomorphism neural network.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-6 when the program is executed.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1-6.
CN202311009931.3A 2023-08-11 2023-08-11 Antibody structure optimization method and device based on deep learning model Active CN116741260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311009931.3A CN116741260B (en) 2023-08-11 2023-08-11 Antibody structure optimization method and device based on deep learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311009931.3A CN116741260B (en) 2023-08-11 2023-08-11 Antibody structure optimization method and device based on deep learning model

Publications (2)

Publication Number Publication Date
CN116741260A CN116741260A (en) 2023-09-12
CN116741260B true CN116741260B (en) 2023-11-28

Family

ID=87901540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311009931.3A Active CN116741260B (en) 2023-08-11 2023-08-11 Antibody structure optimization method and device based on deep learning model

Country Status (1)

Country Link
CN (1) CN116741260B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114822696A (en) * 2022-04-29 2022-07-29 北京深势科技有限公司 Attention mechanism-based antibody non-sequencing prediction method and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114822696A (en) * 2022-04-29 2022-07-29 北京深势科技有限公司 Attention mechanism-based antibody non-sequencing prediction method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Antigen-Specific Antibody Design and Optimization with Diffusion-BasedGenerative Models for Protein Structures;Shitong Luo 等;https://doi.org/10.1101/2022.07.10.499510;第1-21页 *

Also Published As

Publication number Publication date
CN116741260A (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN107798697A (en) A kind of medical image registration method based on convolutional neural networks, system and electronic equipment
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
CN117454495B (en) CAD vector model generation method and device based on building sketch outline sequence
WO2024213099A1 (en) Data processing method and apparatus
CN116935964A (en) Method and system for predicting post-translational modification site of antibody based on deep learning model
CN111598822A (en) Image fusion method based on GFRW and ISCM
CN116741260B (en) Antibody structure optimization method and device based on deep learning model
CN113276119B (en) Robot motion planning method and system based on graph Wasserstein self-coding network
CN118196227A (en) Texture synthesis method based on diffusion model and re-weighting strategy
CN118298906A (en) Protein and small molecule docking method, device, electronic equipment and storage medium
WO2023246834A1 (en) Reinforcement learning (rl) for protein design
CN111063000B (en) Magnetic resonance rapid imaging method and device based on neural network structure search
Tang et al. A deep map transfer learning method for face recognition in an unrestricted smart city environment
WO2024119597A1 (en) Cryo-electron microscope protein model building method based on neural network, and storage medium
CN115661340B (en) Three-dimensional point cloud up-sampling method and system based on source information fusion
CN116978450A (en) Protein data processing method, device, electronic equipment and storage medium
CN112837420B (en) Shape complement method and system for terracotta soldiers and horses point cloud based on multi-scale and folding structure
US20240078430A1 (en) Disentangled wasserstein autoencoder for protein engineering
CN117637029B (en) Antibody developability prediction method and device based on deep learning model
CN118335202B (en) Method for designing antibody structure and sequence based on generated neural network model
CN116385666B (en) Human body model redirection method and device based on feedback type cyclic neural network
CN118314243B (en) Decorative surface material texture image generation method, system and medium
CN117935291B (en) Training method, sketch generation method, terminal and medium for sketch generation model
Soleymani et al. Structure-based protein and small molecule generation using EGNN and diffusion models: A comprehensive review
CN118799337A (en) Medical image segmentation method and system based on diffusion model and domain adaptation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant