CN115458048B - Antibody humanization method based on sequence coding and decoding - Google Patents

Antibody humanization method based on sequence coding and decoding Download PDF

Info

Publication number
CN115458048B
CN115458048B CN202211128757.XA CN202211128757A CN115458048B CN 115458048 B CN115458048 B CN 115458048B CN 202211128757 A CN202211128757 A CN 202211128757A CN 115458048 B CN115458048 B CN 115458048B
Authority
CN
China
Prior art keywords
antibody
gene
training
antibody molecule
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211128757.XA
Other languages
Chinese (zh)
Other versions
CN115458048A (en
Inventor
袁红
郭凌敏
吴彤
徐永凤
李月
戴佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Meisai Biomedical Technology Co ltd
Original Assignee
Hangzhou Meisai Biomedical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Meisai Biomedical Technology Co ltd filed Critical Hangzhou Meisai Biomedical Technology Co ltd
Priority to CN202211128757.XA priority Critical patent/CN115458048B/en
Publication of CN115458048A publication Critical patent/CN115458048A/en
Application granted granted Critical
Publication of CN115458048B publication Critical patent/CN115458048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The application relates to the field of biological research, and particularly discloses an antibody humanization method based on sequence encoding and decoding, which takes a gene sequence as a text sequence by adopting an artificial intelligent model based on natural semantic understanding, and respectively characterizes the gene sequence of an antibody molecule in a human body and the characteristic distribution information of the gene sequence of the modified rabbit-source antibody by fusing global implicit characteristics of the gene sequence and multi-scale neighborhood associated characteristics under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.

Description

Antibody humanization method based on sequence coding and decoding
Technical Field
The present application relates to the field of biological research, and more particularly, to a sequence codec-based antibody humanization method.
Background
Antibody humanization is an important component of experimental research in the production and preparation of recombinant antibodies (monoclonal antibodies). The antibody humanization is a process of developing from a rabbit antibody to a humanized antibody. Most of the monoclonal antibodies used clinically are murine monoclonal antibodies, and there are various limitations to the use of murine antibodies due to the species specificity of humans and mice, and drug-resistant antibodies are produced.
The mouse antibody or rabbit antibody as foreign protein enters human body, which can make human immune system respond, and produce specific antibody using mouse antibody as antigen, namely produce human anti-mouse antibody (HAMA), usually the foreign protein can be cleared quickly in human body, and half-life period is short. Similarly, rabbit antibodies suffer from the disadvantage that humanized designs are required to reduce immunogenicity. Because rabbit-derived antibodies have various limitations in clinical applications, recombinant DNA techniques have been used to humanize rabbit-derived antibodies, thereby humanizing the antibodies.
The traditional mouse or rabbit antibody is humanized, namely, the mouse or rabbit antibody has a very similar outline with an antibody molecule in a human body through genetic modification, so that the human immune system is avoided from being recognized, and the HAMA reaction is avoided from being induced. The humanization of antibodies should follow two basic principles, namely maintaining or increasing the affinity and specificity of the antibodies, and greatly reducing or substantially eliminating the immunogenicity of the antibodies.
In the prior art, the traditional humanized method of the rabbit antibody is similar to a murine method, the amino acid sequence of the rabbit source in the framework region is mutated into a human source by adopting a homologous modeling method, and finally the affinity of the antibody is determined by ELISA or SPR and the like, so that the humanized version is selected. Since rabbit antibodies have very low homology with human antibodies, the structural reliability of homology modeling is low, resulting in a general decrease in rabbit anti-affinity of the mutated humanized version.
Thus, an optimized antibody humanization scheme is desired to achieve a higher degree of homology.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides an antibody humanization method based on sequence encoding and decoding, which takes a gene sequence as a text sequence by adopting an artificial intelligent model based on natural semantic understanding, and respectively characterizes the gene sequence of an antibody molecule in a human body and the characteristic distribution information of the gene sequence of the modified rabbit-derived antibody by fusing global implicit characteristics of the gene sequence and multi-scale neighborhood associated characteristics under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.
According to one aspect of the present application, there is provided a sequence codec-based antibody humanization method comprising:
a training phase comprising:
acquiring training data, wherein the training data comprises a gene sequence of a training antibody molecule, a gene sequence of a rabbit source antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit source antibody after training modification;
The gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification are respectively passed through the context encoder based on the converter and the multiscale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector and a training modification rabbit antibody gene feature vector;
calculating a transfer matrix of the training transformed rabbit source antibody gene feature vector relative to the training antibody molecule gene feature vector as a training classification feature matrix;
passing the training classification feature matrix through the classifier to obtain a classification loss function value;
calculating a classification mode digestion inhibition loss function value of the training transformed rabbit antibody gene feature vector and the training antibody molecule gene feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of the two norms of the difference feature vector between the training transformed rabbit antibody gene feature vector and the training antibody molecule gene feature vector; and
training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function value and the classification mode digestion suppression loss function value as a loss function value; and
An inference phase comprising:
obtaining the gene sequence of an antibody molecule in a human body;
the gene sequence of the antibody molecule is subjected to training by a context encoder based on a converter to obtain a plurality of gene expression feature vectors, and the gene expression feature vectors are cascaded to obtain an antibody molecule global gene feature vector;
the gene sequence of the antibody molecule is subjected to a multi-scale neighborhood feature extraction module which is completed by training so as to obtain a multi-scale neighborhood antibody molecule feature vector;
cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector;
obtaining the gene sequence of the modified rabbit antibody;
processing the gene sequence of the modified rabbit source antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit source antibody gene feature vector;
calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as a classification feature matrix; and
and training the classification feature matrix by using a classifier to obtain a class probability value, wherein the class probability value represents the homology of the modified rabbit antibody and the antibody molecules in the human body.
In the above method for humanizing an antibody based on sequence encoding and decoding, the step of passing the training classification feature matrix through the classifier to obtain a classification loss function value includes: processing the training classification feature matrix using the classifier to generate a training classification result with the following formula:
Figure SMS_1
, wherein />
Figure SMS_2
Representing projection of the training classification feature matrix as a vector,/->
Figure SMS_3
Weight matrix for full connection layer, +.>
Figure SMS_4
A bias matrix representing the fully connected layer; and calculating a cross entropy value between the training classification result and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the training transformed rabbit source antibody in the training data as the classification loss function value.
In the above method for humanizing an antibody based on sequence encoding and decoding, the calculating the classification mode digestion inhibition loss function value of the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector comprises: calculating the classification mode digestion inhibition loss function values of the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector according to the following formula;
Wherein, the formula is:
Figure SMS_5
wherein
Figure SMS_6
and />
Figure SMS_7
Respectively representing the characteristic vector of the rabbit source antibody gene after training modification and the characteristic vector of the training antibody molecule gene, and +.>
Figure SMS_8
and />
Figure SMS_9
Respectively representing weight matrixes of the classifier on the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector, and the weight matrixes are +.>
Figure SMS_10
Representing the square of the two norms of the vector, +.>
Figure SMS_11
Representing the F-norm of the matrix,>
Figure SMS_12
an exponential operation representing a matrix representing computing a natural exponential function value exponentiated by the eigenvalue of each position in the matrix and a vector representing computing a natural exponential function value exponentiated by the eigenvalue of each position in the vector.
In the above antibody humanization method based on sequence encoding and decoding, the extracting module of the multi-scale neighborhood feature for obtaining the multi-scale neighborhood antibody molecule feature vector by training the gene sequence of the antibody molecule comprises: inputting the gene sequence of the antibody molecule into a first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale antibody molecule feature vector, wherein the first convolution layer is provided with a first one-dimensional convolution kernel with a first length; inputting the gene sequence of the antibody molecule into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antibody molecule feature vector, wherein the second convolution layer has a second one-dimensional convolution kernel of a second length, and the first length is different from the second length; and cascading the first neighborhood scale antibody molecule feature vector and the second neighborhood scale antibody molecule feature vector to obtain the multi-scale neighborhood antibody molecule feature vector.
In the above method for humanizing an antibody based on sequence encoding and decoding, the inputting the gene sequence of the antibody molecule into the first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood-scale antibody molecule feature vector includes: performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using a first convolution layer of the multi-scale neighborhood feature extraction module according to the following formula to obtain a first neighborhood scale antibody molecule feature vector;
wherein, the formula is:
Figure SMS_13
wherein ,ais the first convolution kernelxWidth in the direction,
Figure SMS_14
For the first convolution kernel parameter vector, +.>
Figure SMS_15
For a local vector matrix that operates with a convolution kernel function, w is the size of the first convolution kernel, and X represents the gene sequence of the antibody molecule;
in the above method for humanizing an antibody based on sequence encoding and decoding, the inputting the gene sequence of the antibody molecule into the second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antibody molecule feature vector includes: performing one-dimensional convolution coding on the gene sequence of the antibody molecules by using a second convolution layer of the multi-scale neighborhood feature extraction module according to the following formula to obtain feature vectors of the second neighborhood scale antibody molecules;
Wherein, the formula is:
Figure SMS_16
wherein b is the second convolution kernelxWidth in the direction,
Figure SMS_17
For a second convolution kernel parameter vector, +.>
Figure SMS_18
For the local vector matrix to be operated on with a convolution kernel, m is the size of the second convolution kernel, X represents the gene sequence of the antibody molecule
In the above method for humanizing an antibody based on sequence encoding and decoding, the calculating the transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as the classification feature matrix includes: calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as the classification feature matrix according to the following formula;
wherein, the formula is:
Figure SMS_19
=/>
Figure SMS_20
wherein
Figure SMS_21
Representing the gene characteristic vector of the modified rabbit antibody,>
Figure SMS_22
representing the gene eigenvector of said antibody molecule, +.>
Figure SMS_23
Representing the classification feature matrix,/->
Figure SMS_24
Representing matrix multiplication.
According to another aspect of the present application, there is provided a sequence codec-based antibody humanization system comprising:
a training module, comprising:
the training data acquisition unit is used for acquiring training data, wherein the training data comprises a gene sequence of a training antibody molecule, a gene sequence of a rabbit source antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit source antibody after training modification;
The feature vector extraction unit is used for respectively passing the gene sequence of the training antibody molecule and the gene sequence of the rabbit-derived antibody after training modification through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector and a training-modified rabbit-derived antibody gene feature vector;
the training classification characteristic matrix generation unit is used for calculating a transfer matrix of the training transformed rabbit source antibody gene characteristic vector relative to the training antibody molecule gene characteristic vector as a training classification characteristic matrix;
the classification loss function value calculation unit is used for passing the training classification characteristic matrix through the classifier to obtain a classification loss function value;
a classification mode digestion inhibition loss function value calculation unit, configured to calculate a classification mode digestion inhibition loss function value of the training antibody molecule gene feature vector and the training antibody gene feature vector after the training transformation, where the classification mode digestion inhibition loss function value is related to a square of a two-norm of a difference feature vector between the training antibody molecule gene feature vector and the training antibody gene feature vector after the training transformation; and
A training unit for training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function value and the classification mode digestion suppression loss function value as a loss function value; and
an inference module comprising:
a physiological information acquisition unit for acquiring a gene sequence of an antibody molecule in a human body;
an antibody molecule global gene feature vector generation unit, configured to pass a trained context encoder based on a transducer through a gene sequence of the antibody molecule to obtain a plurality of gene expression feature vectors, and cascade the plurality of gene expression feature vectors to obtain an antibody molecule global gene feature vector;
the multi-scale neighborhood feature extraction unit is used for extracting the gene sequence of the antibody molecule through a multi-scale neighborhood feature extraction module which is completed by training so as to obtain a multi-scale neighborhood antibody molecule feature vector;
the cascade unit is used for cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector;
the rabbit source antibody gene sequence acquisition unit is used for acquiring the modified rabbit source antibody gene sequence;
The modified rabbit source antibody gene characteristic extraction unit is used for processing the gene sequence of the modified rabbit source antibody through the context encoder based on the converter and the multi-scale neighborhood characteristic extraction module so as to obtain a modified rabbit source antibody gene characteristic vector;
the classification characteristic matrix generation unit is used for calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antibody molecule gene characteristic vector as a classification characteristic matrix; and
and the class probability value generation unit is used for obtaining class probability values from the classification feature matrix through a classifier which is trained, wherein the class probability values represent the homology of the modified rabbit source antibody and the antibody molecules in the human body.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the sequence codec based antibody humanization method as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a sequence codec based antibody humanization method as described above.
Compared with the prior art, the antibody humanization method based on sequence encoding and decoding provided by the application has the advantages that the gene sequence is regarded as a text sequence by adopting an artificial intelligent model based on natural semantic understanding, and the characteristic distribution information of the antibody molecule gene sequence in the human body and the gene sequence of the modified rabbit source antibody is respectively represented by fusion of global implicit characteristics of the gene sequence and multi-scale neighborhood associated characteristics under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
FIG. 1 illustrates a schematic representation of a heavy chain of a rabbit antibody according to an embodiment of the present application.
Fig. 2 illustrates another schematic of the heavy chain of a rabbit antibody according to an embodiment of the present application.
FIG. 3 illustrates a schematic diagram of a structural model built by homologous modeling according to an embodiment of the present application.
FIG. 4 illustrates a schematic diagram of a structural model built from co-evolutionary modeling in accordance with an embodiment of the present application.
Fig. 5 illustrates a flow chart of a training phase in a sequence codec based antibody humanization method according to an embodiment of the present application.
Fig. 6 illustrates a flow chart of an inference phase in a sequence codec based antibody humanization method according to an embodiment of the present application.
Fig. 7 illustrates an architectural diagram of a training phase in a sequence codec-based antibody humanization method according to an embodiment of the present application.
Fig. 8 illustrates an architectural diagram of an inference phase in a sequence codec-based antibody humanization method according to an embodiment of the present application.
Fig. 9 illustrates a flowchart of an antibody molecule multi-scale neighborhood feature extraction process in a sequence codec-based antibody humanization method according to an embodiment of the present application.
Fig. 10 illustrates a block diagram of a sequence codec-based antibody humanization system according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Scene overview
Accordingly, considering that affinity is easily lost or reduced due to conventional methods, in one example, a Kinetics experimental model is selected using a FORTEBIO instrument and related software. The affinity of humanized individual antibodies designed by conventional methods was reduced by analysis.
Therefore, in the technical solution of the present application, the affinity can be maintained unchanged or even higher by using a non-homologous modeling method. The method comprises the following specific steps:
step 1: building a rabbit anti-structure model by adopting a non-homologous modeling method, building a structure model by adopting a co-evolution method, building a structure model by using alphafold II and performing sequence decoding, and finally designing an amino acid sequence to obtain the humanized antibody with consistent affinity.
Step 2: selecting a progressive method for humanization, selecting a humanized antibody with 80-90% homology as a structural model, and then selecting a fully-humanized antibody with 90-99% homology as the structural model; a higher degree of humanisation is achieved.
Step 3: humanized antibody affinity was verified by design.
Based on this, it is considered that although a humanized antibody having 80-90% homology is selected as a structural model first, and then a fully human antibody having 90-99% homology is selected as a structural model to achieve a higher degree of humanization, it is necessary to examine the homology of the gene sequence of the modified rabbit antibody with that of an antibody molecule in a human body. Therefore, in the technical solution of the present application, it is desirable to characterize the feature distribution information of the antibody molecule gene sequence in the human body and the gene sequence of the modified rabbit-derived antibody by adopting an artificial intelligence model based on natural semantic understanding, regarding the gene sequence as a text sequence, and fusing global implicit features of the gene sequence and multi-scale neighborhood associated features under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.
Specifically, in the technical scheme of the present application, first, the gene sequence of an antibody molecule in a human body is obtained. Next, considering that each gene in the gene sequence of the antibody molecule in the human body has semantic feature information of context, the gene sequence of the antibody molecule is processed using a context encoder based on a transducer to extract the gene sequence of the antibody molecule based on global high-dimensional semantic features to be more suitable for characterizing the essential features of the gene of the antibody molecule in the human body. And then, cascading the plurality of gene expression feature vectors to integrate global implicit feature information of genes of the antibody molecules in the human body, thereby obtaining the global gene feature vector of the antibody molecules.
In particular, in the technical scheme of the present application, it is considered that since the gene is composed of many bases, the bases are sites, and there are four bases of ATCG in DNA. Therefore, the gene sequence of the antibody molecule in the human body has an ATCG base sequence composed of a plurality of ATCG bases. Thus, in the solution of the present application, the genetic sequence of the antibody molecule in the human body is subjected to a one-hot encoding to convert it into an input vector before encoding the genetic sequence by the context encoder.
It will be appreciated that since there are different implicit features for each gene segment under different gene segment spans in the gene sequence of the antibody molecule in the human body, the multi-scale neighborhood features can extract the relevant features under different gene segment spans. Therefore, in the technical scheme of the application, a multi-scale neighborhood feature extraction module is further used for encoding the gene sequences of the antibody molecules so as to extract multi-scale neighborhood related features of the gene sequences of the antibody molecules in the human body under different gene fragment spans, thereby obtaining multi-scale neighborhood antibody molecule feature vectors.
In this way, the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector are cascaded to perform feature fusion to obtain the antibody molecule gene feature vector.
Further, in order to accurately evaluate and judge the homology between the modified rabbit antibody and the antibody molecule in the human body, it is necessary to obtain the gene sequence of the modified rabbit antibody. And similarly, the modified rabbit-source antibody gene feature vector with global multi-scale neighborhood correlation features under different gene segment spans is obtained by processing the gene sequence of the modified rabbit-source antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module.
Then, since the genetic features of the modified rabbit antibody are not the same as the feature dimensions of the antibody molecules in the human body in the high-dimensional feature space, and the humanized antibody needs a higher affinity, in order to accurately determine the homology between the modified rabbit antibody and the antibody molecules in the human body, the transfer matrix of the modified rabbit antibody genetic feature vector relative to the antibody molecule genetic feature vector is further calculated to classify, and further the homology between the modified rabbit antibody and the antibody molecules in the human body is evaluated to obtain a higher degree of humanization.
Particularly, in the technical solution of the present application, since the classification feature matrix is a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antibody molecule gene feature vector, in the training process of the classifier, when gradient back propagation passes through the feature extraction model of the modified rabbit-derived antibody gene feature vector and the antibody molecule gene feature vector, that is, the context encoder based on the converter adds the multiscale neighborhood feature extraction module, the resolution of feature patterns expressed by the modified rabbit-derived antibody gene feature vector and the antibody molecule gene feature vector is caused by abnormal gradient branching, so that a classification pattern resolution suppression loss function is introduced:
Figure SMS_25
Figure SMS_26
and />
Figure SMS_27
The modified rabbit source antibody gene characteristic vector and the antibody molecule gene characteristic vector are respectively +.>
Figure SMS_28
and />
Figure SMS_29
Respectively classifier pair->
Figure SMS_30
and />
Figure SMS_31
Weight matrix of>
Figure SMS_32
Representing the square of the two norms of the vector.
Here, by introducing the classification mode digestion inhibition loss function, the pseudo-difference of the classifier weight can be pushed to the real characteristic distribution difference of the modified rabbit source antibody gene characteristic vector and the antibody molecule gene characteristic vector, so that the regularization of the directional derivative in the gradient back propagation near the gradient branching point is ensured, that is, the gradient is subjected to over-weighting between the modified rabbit source antibody gene characteristic vector and the characteristic extraction mode of the antibody molecule gene characteristic vector, thereby inhibiting the classification mode digestion of the characteristics, and further improving the classification accuracy. Thus, the homology between the modified rabbit antibody and the antibody molecule in the human body can be accurately evaluated and judged, so that the homology between the gene sequence of the modified rabbit antibody and the gene sequence of the antibody molecule in the human body can be accurately checked.
Based on this, the present application provides a method for humanizing an antibody based on sequence encoding and decoding, comprising: a training phase comprising: acquiring training data, wherein the training data comprises a gene sequence of a training antibody molecule, a gene sequence of a rabbit source antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit source antibody after training modification; the gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification are respectively passed through the context encoder based on the converter and the multiscale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector and a training modification rabbit antibody gene feature vector; calculating a transfer matrix of the training transformed rabbit source antibody gene feature vector relative to the training antibody molecule gene feature vector as a training classification feature matrix; passing the training classification feature matrix through the classifier to obtain a classification loss function value; calculating a classification mode digestion inhibition loss function value of the training transformed rabbit antibody gene feature vector and the training antibody molecule gene feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of the two norms of the difference feature vector between the training transformed rabbit antibody gene feature vector and the training antibody molecule gene feature vector; and training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function value and the classification mode digestion suppression loss function value as a loss function value; further comprises: an inference phase comprising: obtaining the gene sequence of an antibody molecule in a human body; the gene sequence of the antibody molecule is subjected to training by a context encoder based on a converter to obtain a plurality of gene expression feature vectors, and the gene expression feature vectors are cascaded to obtain an antibody molecule global gene feature vector; the gene sequence of the antibody molecule is subjected to a multi-scale neighborhood feature extraction module which is completed by training so as to obtain a multi-scale neighborhood antibody molecule feature vector; cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector; obtaining the gene sequence of the modified rabbit antibody; processing the gene sequence of the modified rabbit source antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit source antibody gene feature vector; calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as a classification feature matrix; and training the classification feature matrix by using a classifier to obtain a class probability value, wherein the class probability value represents the homology of the modified rabbit antibody and the antibody molecules in the human body.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Exemplary humanization methods
Embodiment one: rabbit anti-sequence analysis
The rabbit antibody sequences were selected as follows:
heavy chain > VH
QSVKESEGGLFKPTDTLTLTCTVSGFSLSSYAISWVRQAPGNGLEWIGIINSYGSTYYASWAKSRSTITRNTNENTVTLKMTSLTAADTATYFCARGYAGSSGGYIWGPGTLVTVSS
Light chain > VL
AAVLTQTPSPVSAAVGGTVTIKCQSSQSVYNNNLLSWYQQKPGQPPKLLIYDASNLPSGVPDRFSGSGSGTQFTLTISGVQCDDAATYYCLGGYYGSDAGGNTFGGGTEVVVK
Comparing the rabbit heavy chain sequence with the human germline antibody sequence, the homology between the rabbit anti-sequence and the human antibody sequence is lower than 70%. Heavy chains such as VK/EG/F/T and the like are sites to be humanized, as shown in FIG. 1.
The rabbit antibody light chain sequence is compared with the human germline antibody sequence, and the homology degree of the rabbit anti-sequence compared with the human antibody sequence is lower than 70 percent. Heavy chains such as AVL/PV/A/GT are sites to be humanized, as shown in FIG. 2.
Embodiment two: building rabbit anti-structure model through homologous modeling
Homology modeling: 5-10 optimal structural solutions are selected by a swiss-model homology modeling method, a Loop region is generally modeled by the homology modeling method, and if the comparison result of the CDR amino acid sequences is lower than 50% Identity, a CDR3 structural model is built by a de novo modeling method. The nearest 10 antibody crystal structure models (structural resolution higher than 2.5 angstroms) of the sequence were extracted using PDB BLAST, and the optimal structural model was selected compared to the automatic modeling model. The structural model built by homologous modeling is shown in fig. 3.
Embodiment III:
co-evolution modeling: 2 optimal structural solutions are selected by adopting an alpha fold II co-evolution modeling method, a humanized antibody with 80-90% homology is selected as a structural model, and then a fully-humanized antibody with 90-99% homology is selected as the structural model. The structural model built by co-evolution modeling is shown in fig. 4.
Embodiment four:
traditional humanized design scheme: the original murine sequence was mutated to human sequence by database alignment. The original sequence of the rabbit antibody was designed as a plurality of humanized amino acids (huVH 1, huVH2, huVH3, huVL1, huVL2, huVL 3), and the designed sequences were combined into a humanized antibody for antibody expression in an Expi 293 mammalian expression system.
The sequencing results of the above humanized amino acid sequences are shown in the following table.
Figure SMS_33
Purifying the humanized antibody: the serial humanized antibodies expressed in the Expi 293 cells were collected from the cell supernatants and purified according to standard procedures for protein purification. And (3) characterizing an experimental result, wherein the purity of the purified humanized antibody exceeds 90%.
Fifth embodiment:
homology modeling design humanized antibody activity detection:
the binding activity of the humanized antibody to the antigen was detected by ELISA:
Coating the antigen of 0.5 mug/mL of the plate to be detected with the target antigen by ELISA method, setting the concentration gradient of the humanized antibody sample purified in the example 4 to be 0.00004-1.2 mug/mL, and measuring the binding activity OD value to detect the binding affinity of the humanized antibody and the target antigen, wherein the test result is shown in the figure; as can be seen from the graph, with the increase of the concentration of the sample, the OD value has obvious trend of rising, clear upper and lower platforms are generated very rapidly, the window is large, and the humanized antibody has weak antigen binding affinity and female parent ratio, high humanized degree and obvious affinity drop;
calculated EC50 calculations are shown in the following table:
Figure SMS_34
example six:
affinity kinetic assay
And opening the FORTEBIO instrument and related software, and selecting a kinetic experimental mode. The affinity of humanized individual antibodies designed by conventional methods was reduced by analysis.
Figure SMS_35
Embodiment seven:
the improved humanized design scheme comprises the following steps: the original murine sequence was mutated to human sequence by database alignment. The original sequence of the rabbit antibody was designed as a plurality of humanized amino acids (huVH 4, huVH5, huVH6, huVL4, huVL5, huVL 6), and the designed sequences were combined into a humanized antibody for antibody expression in an Expi 293 mammalian expression system. The results of the above humanized amino acid sequences are shown in the following table.
Figure SMS_36
Purifying the humanized antibody: the serial humanized antibodies expressed in the Expi 293 cells were collected from the cell supernatants and purified according to standard procedures for protein purification. And (3) characterizing an experimental result, wherein the purity of the purified humanized antibody exceeds 90%.
Embodiment seven:
the method models and designs the humanized antibody activity detection:
the binding activity of the humanized antibody to the antigen was detected by ELISA:
coating the antigen of 0.5 mug/mL of the plate to be detected with the target antigen by ELISA method, setting the concentration gradient of the humanized antibody sample purified in the example 7 to be 0.00004-1.2 mug/mL, and measuring the binding activity OD value to detect the binding affinity of the humanized antibody and the target antigen, wherein the test result is shown in the figure; as can be seen from the graph, with the increase of the concentration of the sample, the OD value has obvious trend of rising, clear upper and lower platforms are generated very rapidly, the window is large, the antigen binding affinity of the humanized antibody is equivalent to that of the female parent, the humanized degree is high, the affinity is basically consistent, and the humanized is successful;
calculated EC50 calculations are shown in the following table:
Figure SMS_37
example eight:
affinity kinetic assay
And opening the FORTEBIO instrument and related software, and selecting a kinetic experimental mode. Through analysis, the affinity of the humanized individual antibodies designed by the traditional method is kept consistent with that of the female parent, and the humanized design is successful.
Figure SMS_38
Example nine: resolution of anti-crystal structure of rabbit
The rabbit anti-light and heavy chain sequences were synthesized separately, purified by expression in 293F cells, or 20 mg high purity protein was screened for crystals using a crystallization robot, data was collected by X-ray crystallography, and the structure was resolved by molecular replacement or the like. Analyzing and comparing the structures predicted by different methods by using pymol software, wherein the RMSD value of the structure obtained by the traditional homologous modeling method is 1.42 angstroms, and the RMSD value of the structure obtained by using the technical method is 0.41 angstroms; the structure of the antibody predicted by the technical method is more accurate, and the success rate and activity after humanization are higher than those of the traditional method.
In particular, in the technical solution of the present application, it is considered that although a humanized antibody having 80-90% homology is selected as a structural model first, and then a fully human antibody having 100% homology is selected as a structural model to achieve a higher degree of humanization, it is still necessary to examine the homology of the gene sequence of the modified rabbit antibody with the gene sequence of an antibody molecule in a human body. Therefore, in the technical solution of the present application, it is desirable to characterize the feature distribution information of the antibody molecule gene sequence in the human body and the gene sequence of the modified rabbit-derived antibody by adopting an artificial intelligence model based on natural semantic understanding, regarding the gene sequence as a text sequence, and fusing global implicit features of the gene sequence and multi-scale neighborhood associated features under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.
Exemplary homology checking method
Fig. 5 illustrates a flow chart of a training phase in a sequence codec based antibody humanization method according to an embodiment of the present application. As shown in fig. 5, the sequence codec-based antibody humanization method according to an embodiment of the present application includes: a training phase comprising: s110, obtaining training data, wherein the training data comprises a gene sequence of a training antibody molecule, a gene sequence of a rabbit source antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit source antibody after training modification; s120, respectively passing the gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector and a training modified rabbit antibody gene feature vector; s130, calculating a transfer matrix of the gene feature vector of the rabbit source antibody after training modification relative to the gene feature vector of the training antibody molecule as a training classification feature matrix; s140, passing the training classification characteristic matrix through the classifier to obtain a classification loss function value; s150, calculating a classification mode digestion inhibition loss function value of the rabbit source antibody gene feature vector after training modification and the training antibody molecule gene feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of the two norms of the differential feature vector between the rabbit source antibody gene feature vector after training modification and the training antibody molecule gene feature vector; and S160, training the context encoder based on the converter, the multi-scale neighborhood feature extraction module and the classifier by taking the weighted sum of the classification loss function value and the classification mode digestion inhibition loss function value as the loss function value.
Fig. 6 illustrates a flow chart of an inference phase in a sequence codec based antibody humanization method according to an embodiment of the present application. As shown in fig. 6, the sequence codec-based antibody humanization method according to an embodiment of the present application further includes an inference phase, including the steps of: s210, obtaining a gene sequence of an antibody molecule in a human body; s220, the gene sequence of the antibody molecule is subjected to training by a context encoder based on a converter to obtain a plurality of gene expression feature vectors, and the gene expression feature vectors are cascaded to obtain an antibody molecule global gene feature vector; s230, the gene sequence of the antibody molecule is subjected to a multi-scale neighborhood feature extraction module which is completed by training so as to obtain a multi-scale neighborhood antibody molecule feature vector; s240, cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector; s250, obtaining a gene sequence of the modified rabbit antibody; s260, processing the gene sequence of the modified rabbit source antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit source antibody gene feature vector; s270, calculating a transfer matrix of the gene feature vector of the transformed rabbit source antibody relative to the gene feature vector of the antibody molecule as a classification feature matrix; and S280, training the classification feature matrix by using a classifier to obtain a class probability value, wherein the class probability value represents the homology of the modified rabbit antibody and the antibody molecules in the human body.
Fig. 7 illustrates an architectural diagram of a training phase in a sequence codec-based antibody humanization method according to an embodiment of the present application. As shown in fig. 7, in the training phase, in the network structure, first, training data including a gene sequence of a training antibody molecule, a gene sequence of a rabbit-derived antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit-derived antibody after training modification are acquired; then, the obtained gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification are respectively passed through the context encoder based on the converter and the multiscale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector and a training modified rabbit antibody gene feature vector; calculating a transfer matrix of the training transformed rabbit source antibody gene feature vector relative to the training antibody molecule gene feature vector to serve as a training classification feature matrix; the obtained classification characteristic matrix passes through the classifier to obtain a classification loss function value; secondly, calculating a classification mode digestion inhibition loss function value of the rabbit source antibody gene feature vector after training modification and the training antibody molecule gene feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of the two norms of the differential feature vector between the rabbit source antibody gene feature vector after training modification and the training antibody molecule gene feature vector; further, training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function value and the classification mode digestion suppression loss function value as a loss function value.
More specifically, in the training phase, in step S110, training data including a gene sequence of a training antibody molecule, a gene sequence of a training engineered rabbit-source antibody, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the training engineered rabbit-source antibody is acquired. It is contemplated that although a humanized antibody having 80-90% homology is first selected as a structural model, and then a fully human antibody having 100% homology is selected as a structural model to achieve a higher degree of humanization, it is still necessary to examine the homology of the gene sequence of the modified rabbit antibody with that of an antibody molecule in a human body. Therefore, in the technical scheme of the application, the gene sequence of the training antibody molecule, the gene sequence of the rabbit source antibody after training modification and the true value of the homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit source antibody after training modification can be obtained through a gene sequence analyzer.
More specifically, in the training phase, in step S120, the genetic sequence of the training antibody molecule and the genetic sequence of the training transformed rabbit antibody are passed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module, respectively, to obtain a training antibody molecule genetic feature vector and a training transformed rabbit antibody genetic feature vector. Considering that each gene in the gene sequence of the antibody molecule in the human body has semantic feature information of context, the gene sequence of the training antibody molecule is processed using a context encoder based on a transducer to obtain a training antibody molecule gene feature vector, in particular, in the technical scheme of the present application, considering that since the gene is composed of many bases, bases are sites, there are four bases of ATCG in DNA. Therefore, the gene sequence of the antibody molecule in the human body has an ATCG base sequence composed of a plurality of ATCG bases. Thus, in the solution of the present application, the genetic sequence of the antibody molecule in the human body is subjected to a one-hot encoding to convert it into an input vector before encoding the genetic sequence by the context encoder; further, the input vector passes through the multi-scale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector; further, in order to accurately evaluate and judge the homology between the modified rabbit antibody and the antibody molecule in the human body, the gene sequence of the modified rabbit antibody needs to be obtained, more specifically, the gene sequence of the modified rabbit antibody is processed by a context encoder based on a converter and the multi-scale neighborhood feature extraction module, so as to obtain the gene feature vector of the modified rabbit antibody with global multi-scale neighborhood association features under different gene fragment spans.
More specifically, in the training phase, in step S130, a transfer matrix of the training engineered rabbit source antibody gene feature vector relative to the training antibody molecule gene feature vector is calculated as a training classification feature matrix. Because the genetic features of the rabbit antibody after training and transformation are different from the feature dimensions of the training antibody molecules in the human body in a high-dimensional feature space, and the humanized antibody needs higher affinity, in order to accurately judge the homology of the rabbit antibody after training and transformation with the training antibody molecules in the human body, the transfer matrix of the rabbit antibody gene feature vector after training and transformation relative to the training antibody molecule gene feature vector is further calculated to classify, and the homology of the rabbit antibody after training and transformation with the training antibody molecules in the human body is further evaluated to obtain higher degree of humanization. In a specific example of the application, calculating a transfer matrix of the training transformed rabbit source antibody gene feature vector relative to the training antibody molecule gene feature vector as the training classification feature matrix according to the following formula;
Wherein, the formula is:
Figure SMS_39
=/>
Figure SMS_40
/>
wherein
Figure SMS_41
Representing the gene characteristic vector of the rabbit antibody after training modification,>
Figure SMS_42
representing the gene feature vector of the training antibody molecule, < >>
Figure SMS_43
Representing the classification feature matrix,/->
Figure SMS_44
Representing matrix multiplication.
More specifically, in a training phase, in step S140, the training classification feature matrix is passed through the classifier to obtain a classification loss function value. In a specific example of the present application, the passing the training classification feature matrix through the classifier to obtain a classification loss function value includes: processing the training classification feature matrix using the classifier to generate a training classification result with the following formula:
Figure SMS_45
, wherein />
Figure SMS_46
Representing projection of the training classification feature matrix as a vector,/->
Figure SMS_47
Weight matrix for full connection layer, +.>
Figure SMS_48
A bias matrix representing the fully connected layer; and calculating a cross entropy value between the training classification result and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the training transformed rabbit source antibody in the training data as the classification loss function value.
More specifically, in the training phase, in step S150, a classification mode digestion inhibition loss function value of the training antibody gene feature vector and the training antibody molecule gene feature vector is calculated, wherein the classification mode digestion inhibition loss function value is related to the square of the two norms of the differential feature vector between the training antibody gene feature vector and the training antibody molecule gene feature vector. Particularly, in the technical scheme of the application, because the classification feature matrix is a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antibody molecule gene feature vector, in the training process of the classifier, when gradient back propagation passes through the feature extraction model of the modified rabbit-derived antibody gene feature vector and the antibody molecule gene feature vector respectively, that is, when the context encoder based on the converter adds the multiscale neighborhood feature extraction module, the modified rabbit-derived antibody gene feature vector and the antibody molecule gene feature vector express feature patterns are digested due to abnormal gradient branching, so that a classification pattern digestion inhibition loss function is introduced. In a specific example of the present application, the calculating the classification mode digestion inhibition loss function value of the training engineered rabbit source antibody gene feature vector and the training antibody molecule gene feature vector includes: calculating the classification mode digestion inhibition loss function values of the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector according to the following formula;
Wherein, the formula is:
Figure SMS_49
wherein
Figure SMS_50
and />
Figure SMS_51
Respectively representing the characteristic vector of the rabbit source antibody gene after training modification and the characteristic vector of the training antibody molecule gene, and +.>
Figure SMS_52
and />
Figure SMS_53
Respectively representing weight matrixes of the classifier on the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector, and the weight matrixes are +.>
Figure SMS_54
Representing the square of the two norms of the vector, +.>
Figure SMS_55
Representing the F-norm of the matrix,>
Figure SMS_56
an exponential operation representing a matrix representing computing a natural exponential function value exponentiated by the eigenvalue of each position in the matrix and a vector representing computing a natural exponential function value exponentiated by the eigenvalue of each position in the vector. Here, by introducing the classification pattern digestion inhibition loss function, the pseudo-difference of classifier weights can be pushed to the true of the engineered rabbit source antibody gene feature vector and the antibody molecule gene feature vectorThe real characteristic distribution difference ensures that the directional derivative of the gradient in the back propagation is regularized near a gradient branching point, namely, the gradient is weighted between the modified rabbit source antibody gene characteristic vector and the characteristic extraction mode of the antibody molecule gene characteristic vector, so that the characteristic classification mode digestion is inhibited, and the classification accuracy is improved. Thus, the homology between the modified rabbit antibody and the antibody molecules in the human body can be accurately evaluated and judged, so that the higher degree of humanization can be obtained, and the affinity can be maintained unchanged or even higher. / >
More specifically, in a training phase, the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier are trained with a weighted sum of the classification loss function value and the classification mode digestion suppression loss function value as a loss function value in step S160. Namely, the weighted sum of the classification loss function value and the classification mode digestion inhibition loss function value updates the parameters of the context encoder, the parameters of the multi-scale neighborhood feature extraction module and the parameters of the classifier.
After training is completed, an inference phase is entered. That is, the context encoder based on the converter, the multi-scale neighborhood feature extraction module and the classifier which are trained by the training stage can be obtained according to the method, and then the context encoder based on the converter, the multi-scale neighborhood feature extraction module which is trained by the training stage and the classifier which are trained by the training stage are used in actual deduction to obtain a more accurate classification result of the homology of the rabbit antibody after transformation and the antibody molecules in the human body.
Fig. 8 illustrates an architectural diagram of an inference phase in a sequence codec-based antibody humanization method according to an embodiment of the present application. As shown in fig. 8, in the estimation phase, in the network structure, first, the gene sequence of the antibody molecule in the human body and the gene sequence of the rabbit antibody after modification are obtained; then, the obtained gene sequence of the antibody molecule passes through a context encoder based on a converter which is completed by training to obtain a plurality of gene expression feature vectors, and the plurality of gene expression feature vectors are cascaded to obtain an antibody molecule global gene feature vector; meanwhile, the genetic sequence of the modified rabbit source antibody is processed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module so as to obtain a modified rabbit source antibody genetic feature vector; secondly, the gene sequence of the antibody molecule is subjected to a multi-scale neighborhood feature extraction module which is completed through training so as to obtain a multi-scale neighborhood antibody molecule feature vector; cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector; calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as a classification feature matrix; and then, the classification feature matrix is used for obtaining a class probability value through a classifier which is completed by training, wherein the class probability value represents the homology between the modified rabbit antibody and the antibody molecules in the human body.
More specifically, in the inference phase, in step S210 and step S220, the gene sequence of the antibody molecule in the human body is obtained; and (3) training the gene sequence of the antibody molecule by a context encoder based on a converter to obtain a plurality of gene expression eigenvectors, and cascading the plurality of gene expression eigenvectors to obtain an antibody molecule global gene eigenvector. Considering that each gene in the gene sequence of an antibody molecule in the human body has semantic feature information of context, the gene sequence of the antibody molecule is processed using a context encoder based on a transducer to extract the gene sequence of the antibody molecule based on global high-dimensional semantic features to be more suitable for characterizing the essential features of the gene of the antibody molecule in the human body. And then, cascading the plurality of gene expression feature vectors to integrate global implicit feature information of genes of the antibody molecules in the human body, thereby obtaining the global gene feature vector of the antibody molecules.
More specifically, in the inference stage, in step S230 and step S240, the gene sequence of the antibody molecule is passed through a multi-scale neighborhood feature extraction module completed by training to obtain a multi-scale neighborhood antibody molecule feature vector; and cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector. It will be appreciated that since there are different implicit features for each gene segment under different gene segment spans in the gene sequence of the antibody molecule in the human body, the multi-scale neighborhood features can extract the relevant features under different gene segment spans. Therefore, in the technical scheme of the application, a multi-scale neighborhood feature extraction module is further used for encoding the gene sequences of the antibody molecules so as to extract multi-scale neighborhood related features of the gene sequences of the antibody molecules in the human body under different gene fragment spans, thereby obtaining multi-scale neighborhood antibody molecule feature vectors. In this way, the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector are cascaded to perform feature fusion to obtain the antibody molecule gene feature vector.
Fig. 6 illustrates a flowchart of an antibody molecule multi-scale neighborhood feature extraction process in a sequence codec-based antibody humanization method according to an embodiment of the present application. As shown in fig. 9, in the antibody molecule multi-scale neighborhood feature extraction process, the method includes: s231, inputting the gene sequence of the antibody molecule into a first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale antibody molecule feature vector, wherein the first convolution layer is provided with a first one-dimensional convolution kernel with a first length; s232, inputting the gene sequence of the antibody molecule into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antibody molecule feature vector, wherein the second convolution layer is provided with a second one-dimensional convolution kernel with a second length, and the first length is different from the second length; and S233, cascading the first neighborhood scale antibody molecule feature vector and the second neighborhood scale antibody molecule feature vector to obtain the multi-scale neighborhood antibody molecule feature vector. More specifically, the first convolution layer of the multi-scale neighborhood feature extraction module is used for carrying out one-dimensional convolution coding on the gene sequence of the antibody molecule according to the following formula to obtain a first neighborhood scale antibody molecule feature vector;
Wherein, the formula is:
Figure SMS_57
wherein ,ais the first convolution kernelxWidth in the direction,
Figure SMS_58
For the first convolution kernel parameter vector, +.>
Figure SMS_59
For a local vector matrix that operates with a convolution kernel,wbeing the size of the first convolution kernel, X represents the gene sequence of the antibody molecule; further performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using a second convolution layer of the multi-scale neighborhood feature extraction module according to the following formula to obtain a feature vector of the second neighborhood scale antibody molecule;
wherein, the formula is:
Figure SMS_60
wherein b is the second convolution kernelxWidth in the direction,
Figure SMS_61
For a second convolution kernel parameter vector, +.>
Figure SMS_62
For the local vector matrix to operate with the convolution kernel function, m is the size of the second convolution kernel,Xrepresenting the gene sequence of the antibody molecule.
More specifically, in the inference phase, in step S250, the gene sequence of the engineered rabbit antibody is obtained. It will be appreciated that in order to be able to accurately assess and judge the homology of the engineered rabbit antibody with the antibody molecules in the human body, it is also necessary to obtain the gene sequences of the engineered rabbit antibody. In the technical scheme of the application, the gene sequence of the modified rabbit antibody can be obtained through a gene sequence analyzer.
More specifically, in the inference phase, in step S260, the genetic sequence of the engineered rabbit antibody is processed by the transducer-based context encoder and the multi-scale neighborhood feature extraction module to obtain an engineered rabbit antibody genetic feature vector. And similarly, the modified rabbit-source antibody gene feature vector with global multi-scale neighborhood correlation features under different gene segment spans is obtained by processing the gene sequence of the modified rabbit-source antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module.
More specifically, in the inference stage, in step S270 and step S280, a transfer matrix of the engineered rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector is calculated as a classification feature matrix; and training the classification feature matrix by using a classifier to obtain a class probability value, wherein the class probability value represents the homology of the modified rabbit antibody and the antibody molecules in the human body. In a specific example of the present application, the calculating the transfer matrix of the engineered rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as the classification feature matrix includes: calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as the classification feature matrix according to the following formula;
Wherein, the formula is:
Figure SMS_63
=/>
Figure SMS_64
wherein
Figure SMS_65
Representing the gene characteristic vector of the modified rabbit antibody,>
Figure SMS_66
representing the gene eigenvector of said antibody molecule, +.>
Figure SMS_67
Representing the classification feature matrix,/->
Figure SMS_68
Representing matrix multiplication.
In summary, the sequence codec-based antibody humanization method according to the embodiments of the present application is illustrated, in which a genetic sequence is regarded as a text sequence by using an artificial intelligence model based on natural semantic understanding, and the feature distribution information of the genetic sequence of the antibody molecule in the human body and the genetic sequence of the modified rabbit antibody is respectively represented by fusion of global implicit features of the genetic sequence and multi-scale neighborhood associated features under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.
Exemplary System
Fig. 10 illustrates a block diagram of a sequence codec-based antibody humanization system according to an embodiment of the present application. As shown in fig. 10, a sequence codec-based antibody humanization system 500 according to an embodiment of the present application includes: a training module 510 and an inference module 520.
As shown in fig. 10, the training module 510 includes: a training data obtaining unit 511, configured to obtain training data, where the training data includes a gene sequence of a training antibody molecule, a gene sequence of a rabbit antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification; a feature vector extraction unit 512, configured to pass the gene sequence of the training antibody molecule and the gene sequence of the training transformed rabbit antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module, respectively, to obtain a training antibody molecule gene feature vector and a training transformed rabbit antibody gene feature vector; the training classification characteristic matrix generation unit 513 is configured to calculate a transfer matrix of the training transformed rabbit source antibody gene characteristic vector relative to the training antibody molecule gene characteristic vector as a training classification characteristic matrix;
a classification loss function value calculation unit 514, configured to pass the training classification feature matrix through the classifier to obtain a classification loss function value; a classification pattern digestion inhibition loss function value calculation unit 515 for calculating a classification pattern digestion inhibition loss function value of the training antibody molecule gene feature vector and the training antibody gene feature vector, wherein the classification pattern digestion inhibition loss function value is related to a square of a two-norm of a difference feature vector between the training antibody molecule gene feature vector and the training antibody gene feature vector after training transformation; and a training unit 516 for training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function value and the classification mode digestion suppression loss function value as a loss function value.
As shown in fig. 7, the inference module 520 includes: a physiological information acquisition unit 521 for acquiring a gene sequence of an antibody molecule in a human body; an antibody molecule global gene feature vector generation unit 522 for passing the gene sequence of the antibody molecule through a trained transducer-based context encoder to obtain a plurality of gene expression feature vectors, and concatenating the plurality of gene expression feature vectors to obtain an antibody molecule global gene feature vector; the multi-scale neighborhood feature extraction unit 523 is configured to obtain a multi-scale neighborhood antibody molecular feature vector by using a multi-scale neighborhood feature extraction module that is completed by training the gene sequence of the antibody molecule; a cascade unit 524, configured to cascade the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector; a rabbit-derived antibody gene sequence acquisition unit 525 for acquiring a modified rabbit-derived antibody gene sequence; a modified rabbit antibody gene feature extraction unit 526, configured to process, by using the context encoder based on the converter and the multi-scale neighborhood feature extraction module, the gene sequence of the modified rabbit antibody to obtain a modified rabbit antibody gene feature vector; a classification feature matrix generating unit 527, configured to calculate a transfer matrix of the transformed rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as a classification feature matrix; and a class probability value generating unit 528, configured to pass the classification feature matrix through a classifier that is completed by training to obtain a class probability value, where the class probability value represents homology between the modified rabbit antibody and an antibody molecule in a human body.
In one example, in the above-described sequence codec-based antibody humanized system 500, the classification loss function value calculation unit 514 includes: processing the training classification feature matrix using the classifier to generate a training classification result with the following formula:
Figure SMS_69
, wherein />
Figure SMS_70
Representing projection of the training classification feature matrix as a vector,/->
Figure SMS_71
Weight matrix for full connection layer, +.>
Figure SMS_72
A bias matrix representing the fully connected layer; and, calculating the training classification result and the training antibody molecule gene sequence and the training modification in the training dataCross entropy values between the true values of homology between the gene sequences of the post-rabbit antibody are used as the class loss function values.
In one example, in the above-described sequence codec-based antibody humanized system 500, the classification mode digestion suppression loss function value calculation unit 515 includes: calculating the classification mode digestion inhibition loss function values of the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector according to the following formula;
wherein, the formula is:
Figure SMS_73
wherein
Figure SMS_74
and />
Figure SMS_75
Respectively representing the characteristic vector of the rabbit source antibody gene after training modification and the characteristic vector of the training antibody molecule gene, and +.>
Figure SMS_76
and />
Figure SMS_77
Respectively representing weight matrixes of the classifier on the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector, and the weight matrixes are +.>
Figure SMS_78
Representing the square of the two norms of the vector, +.>
Figure SMS_79
Representing the F-norm of the matrix,>
Figure SMS_80
an exponential operation representing a matrix and a vector, the exponential operation representing the calculation of the self-power of eigenvalues at various locations in the matrixThe exponential function value of the vector is calculated by the exponential operation of the vector, and the natural exponential function value with the eigenvalue of each position in the vector as the power is calculated.
In one example, in the above-described sequence codec-based antibody humanization system 500, the multi-scale neighborhood feature extraction unit 523 is further configured to: inputting the gene sequence of the antibody molecule into a first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale antibody molecule feature vector, wherein the first convolution layer is provided with a first one-dimensional convolution kernel with a first length; inputting the gene sequence of the antibody molecule into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antibody molecule feature vector, wherein the second convolution layer has a second one-dimensional convolution kernel of a second length, and the first length is different from the second length; and cascading the first neighborhood scale antibody molecule feature vector and the second neighborhood scale antibody molecule feature vector to obtain the multi-scale neighborhood antibody molecule feature vector.
In one example, in the above-described sequence codec-based antibody humanized system 500, the classification feature matrix generating unit 527 includes: calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as the classification feature matrix according to the following formula;
wherein, the formula is:
Figure SMS_81
=/>
Figure SMS_82
wherein
Figure SMS_83
Representing the gene characteristic vector of the modified rabbit antibody,>
Figure SMS_84
representing the antibody molecule geneFeature vector->
Figure SMS_85
Representing the classification feature matrix,/->
Figure SMS_86
Representing matrix multiplication.
In summary, the sequence codec-based antibody humanization system according to the embodiments of the present application is illustrated, which uses an artificial intelligence model based on natural semantic understanding to treat a gene sequence as a text sequence, and characterizes the gene sequence of an antibody molecule in a human body and the feature distribution information of the gene sequence of the modified rabbit antibody by fusing global implicit features of the gene sequence and multi-scale neighborhood associated features under different gene spans. And evaluating the homology of the modified rabbit antibody with the antibody molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit antibody relative to the gene characteristics of the antibody molecules, so as to verify the homology of the gene sequences of the modified rabbit antibody with the gene sequences of the antibody molecules in the human body.

Claims (8)

1. A method for humanizing an antibody based on sequence encoding and decoding, comprising:
obtaining the gene sequence of an antibody molecule in a human body;
the gene sequence of the antibody molecule is subjected to training by a context encoder based on a converter to obtain a plurality of gene expression feature vectors, and the gene expression feature vectors are cascaded to obtain an antibody molecule global gene feature vector;
the gene sequence of the antibody molecule is subjected to a multi-scale neighborhood feature extraction module which is completed by training so as to obtain a multi-scale neighborhood antibody molecule feature vector;
cascading the antibody molecule global gene feature vector and the multi-scale neighborhood antibody molecule feature vector to obtain an antibody molecule gene feature vector;
obtaining the gene sequence of the modified rabbit antibody;
processing the gene sequence of the modified rabbit source antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit source antibody gene feature vector;
calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as a classification feature matrix; and
And training the classification feature matrix by using a classifier to obtain a class probability value, wherein the class probability value represents the homology of the modified rabbit antibody and the antibody molecules in the human body.
2. The method for humanizing an antibody based on sequence coding according to claim 1, wherein said training the gene sequence of the antibody molecule through a multi-scale neighborhood feature extraction module to obtain a multi-scale neighborhood antibody molecule feature vector, comprises:
inputting the gene sequence of the antibody molecule into a first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale antibody molecule feature vector, wherein the first convolution layer is provided with a first one-dimensional convolution kernel with a first length;
inputting the gene sequence of the antibody molecule into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antibody molecule feature vector, wherein the second convolution layer has a second one-dimensional convolution kernel of a second length, and the first length is different from the second length; and
and cascading the first neighborhood scale antibody molecule feature vector and the second neighborhood scale antibody molecule feature vector to obtain the multi-scale neighborhood antibody molecule feature vector.
3. The method of sequence codec based antibody humanization according to claim 2, wherein said inputting the gene sequence of the antibody molecule into the first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood-scale antibody molecule feature vector, comprises:
performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using a first convolution layer of the multi-scale neighborhood feature extraction module according to the following formula to obtain a first neighborhood scale antibody molecule feature vector;
wherein, the formula is:
Figure QLYQS_1
wherein ,ais the first convolution kernelxWidth in the direction,
Figure QLYQS_2
For the first convolution kernel parameter vector, +.>
Figure QLYQS_3
For a local vector matrix that operates with a convolution kernel,wfor the size of the first convolution kernel,Xrepresenting the gene sequence of the antibody molecule.
4. A method of sequence codec based antibody humanization according to claim 3, wherein said inputting the gene sequence of the antibody molecule into the second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood-scale antibody molecule feature vector, comprises:
performing one-dimensional convolution coding on the gene sequence of the antibody molecules by using a second convolution layer of the multi-scale neighborhood feature extraction module according to the following formula to obtain feature vectors of the second neighborhood scale antibody molecules;
Wherein, the formula is:
Figure QLYQS_4
wherein b is the second convolution kernelxWidth in the direction,
Figure QLYQS_5
For a second convolution kernel parameter vector, +.>
Figure QLYQS_6
For the local vector matrix to operate with the convolution kernel function, m is the size of the second convolution kernel,Xrepresenting the gene sequence of the antibody molecule.
5. The method of sequence codec-based antibody humanization according to claim 4, wherein said calculating a transfer matrix of said engineered rabbit source antibody gene feature vector relative to said antibody molecule gene feature vector as a classification feature matrix, comprises:
calculating a transfer matrix of the modified rabbit source antibody gene feature vector relative to the antibody molecule gene feature vector as the classification feature matrix according to the following formula;
wherein, the formula is:
Figure QLYQS_7
=/>
Figure QLYQS_8
wherein
Figure QLYQS_9
Representing the gene characteristic vector of the modified rabbit antibody,>
Figure QLYQS_10
representing the gene characteristic vector of the antibody molecule,
Figure QLYQS_11
representing the classification feature matrix,/->
Figure QLYQS_12
Representing matrix multiplication.
6. The sequence codec based antibody humanization method according to claim 1, further comprising training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier;
The training of the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier includes:
acquiring training data, wherein the training data comprises a gene sequence of a training antibody molecule, a gene sequence of a rabbit source antibody after training modification, and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit source antibody after training modification;
the gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification are respectively passed through the context encoder based on the converter and the multiscale neighborhood feature extraction module to obtain a training antibody molecule gene feature vector and a training modification rabbit antibody gene feature vector;
calculating a transfer matrix of the training transformed rabbit source antibody gene feature vector relative to the training antibody molecule gene feature vector as a training classification feature matrix;
passing the training classification feature matrix through the classifier to obtain a classification loss function value;
calculating a classification mode digestion inhibition loss function value of the training transformed rabbit antibody gene feature vector and the training antibody molecule gene feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of the two norms of the difference feature vector between the training transformed rabbit antibody gene feature vector and the training antibody molecule gene feature vector; and
And training the context encoder based on the converter, the multi-scale neighborhood feature extraction module and the classifier by taking the weighted sum of the classification loss function value and the classification mode digestion inhibition loss function value as the loss function value.
7. The method of sequence codec based antibody humanization according to claim 6, wherein said calculating a classification pattern digestion inhibition loss function value for said training engineered rabbit source antibody gene feature vector and said training antibody molecule gene feature vector, comprises:
calculating the classification mode digestion inhibition loss function values of the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector according to the following formula;
wherein, the formula is:
Figure QLYQS_13
wherein
Figure QLYQS_14
and />
Figure QLYQS_15
Respectively representing the characteristic vector of the rabbit source antibody gene after training modification and the characteristic vector of the training antibody molecule gene, and +.>
Figure QLYQS_16
and />
Figure QLYQS_17
Respectively representing weight matrixes of the classifier on the training transformed rabbit source antibody gene feature vector and the training antibody molecule gene feature vector, and the weight matrixes are +.>
Figure QLYQS_18
Representing the square of the two norms of the vector, +.>
Figure QLYQS_19
Representing the F-norm of the matrix, >
Figure QLYQS_20
An exponential operation representing a matrix representing computing a natural exponential function value exponentiated by the eigenvalue of each position in the matrix and a vector representing computing a natural exponential function value exponentiated by the eigenvalue of each position in the vector.
8. The method of sequence codec based antibody humanization according to claim 7, wherein said passing the training classification feature matrix through the classifier to obtain a classification loss function value comprises:
processing the training classification feature matrix using the classifier to generate a training classification result with the following formula:
Figure QLYQS_21
, wherein />
Figure QLYQS_22
Representing projection of the training classification feature matrix as a vector,/->
Figure QLYQS_23
Weight matrix for full connection layer, +.>
Figure QLYQS_24
A bias matrix representing the fully connected layer; and
and calculating a cross entropy value between the training classification result and a true value of homology between the gene sequence of the training antibody molecule and the gene sequence of the rabbit antibody after training modification in the training data as the classification loss function value.
CN202211128757.XA 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding Active CN115458048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211128757.XA CN115458048B (en) 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211128757.XA CN115458048B (en) 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding

Publications (2)

Publication Number Publication Date
CN115458048A CN115458048A (en) 2022-12-09
CN115458048B true CN115458048B (en) 2023-05-26

Family

ID=84304404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211128757.XA Active CN115458048B (en) 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding

Country Status (1)

Country Link
CN (1) CN115458048B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103145834A (en) * 2013-01-17 2013-06-12 广州泰诺迪生物科技有限公司 Antibody humanization transformation method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1695252A2 (en) * 2003-12-08 2006-08-30 Xencor, Inc. Protein engineering with analogous contact environments
US8440797B2 (en) * 2010-12-06 2013-05-14 Dainippon Sumitomo Pharma Co., Ltd. Human monoclonal antibody
WO2018132752A1 (en) * 2017-01-13 2018-07-19 Massachusetts Institute Of Technology Machine learning based antibody design
WO2019084559A1 (en) * 2017-10-27 2019-05-02 Apostle, Inc. Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods
TWI749367B (en) * 2018-09-14 2021-12-11 美商美國禮來大藥廠 Cd200r agonist antibodies and uses thereof
CN114664376A (en) * 2022-03-31 2022-06-24 重庆邮电大学 miRNA-mRNA target prediction method based on sequence statistical characterization learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103145834A (en) * 2013-01-17 2013-06-12 广州泰诺迪生物科技有限公司 Antibody humanization transformation method

Also Published As

Publication number Publication date
CN115458048A (en) 2022-12-09

Similar Documents

Publication Publication Date Title
Liao Enhanced sampling and free energy calculations for protein simulations
King et al. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction
CN110910951B (en) Method for predicting free energy of protein and ligand binding based on progressive neural network
Shen et al. Identification of helix capping and β-turn motifs from NMR chemical shifts
Li et al. Protein loop modeling using deep generative adversarial network
US11942188B2 (en) Obtaining an improved therapeutic ligand
CN113257357A (en) Method for predicting protein residue contact map
CN110488020B (en) Protein saccharification site identification method
Huang et al. A review of protein inter-residue distance prediction
CN115458048B (en) Antibody humanization method based on sequence coding and decoding
CN109101785B (en) Protein structure prediction method based on secondary structure similarity selection strategy
KR20230121880A (en) Prediction of complete protein expression from masked protein expression
Tan et al. Cross-gate mlp with protein complex invariant embedding is a one-shot antibody designer
CN116189776A (en) Antibody structure generation method based on deep learning
CN116312752A (en) Rigid body protein butt joint method based on isomorphism map neural network
Jing et al. Protein inter-residue contacts prediction: methods, performances and applications
Zou et al. Antibody humanization via protein language model and neighbor retrieval
WO2022112260A1 (en) Predicting protein structures over multiple iterations using recycling
EP4205119A1 (en) Predicting protein structures using auxiliary folding networks
WO2023170844A1 (en) Method for producing library by machine learning
WO2024051806A1 (en) Method for designing humanized antibody sequence
Jacobsson et al. Prediction of the number of residue contacts in proteins using LSTM neural networks
Torrisi Predicting Protein Structural Annotations by Deep and Shallow Learning
Mirzabeigi et al. Designing of knowledge-based potentials via B-spline basis functions for native proteins detection
Ruffolo et al. MUFold-Contact and TPCref: New Methods for Protein Structure Contact Prediction and Refinement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant