CN115458048A - Antibody humanization method based on sequence encoding and decoding - Google Patents

Antibody humanization method based on sequence encoding and decoding Download PDF

Info

Publication number
CN115458048A
CN115458048A CN202211128757.XA CN202211128757A CN115458048A CN 115458048 A CN115458048 A CN 115458048A CN 202211128757 A CN202211128757 A CN 202211128757A CN 115458048 A CN115458048 A CN 115458048A
Authority
CN
China
Prior art keywords
gene
training
antigen molecule
antibody
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211128757.XA
Other languages
Chinese (zh)
Other versions
CN115458048B (en
Inventor
袁红
郭凌敏
吴彤
徐永凤
李月
戴佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Meisai Biomedical Technology Co ltd
Original Assignee
Hangzhou Meisai Biomedical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Meisai Biomedical Technology Co ltd filed Critical Hangzhou Meisai Biomedical Technology Co ltd
Priority to CN202211128757.XA priority Critical patent/CN115458048B/en
Publication of CN115458048A publication Critical patent/CN115458048A/en
Application granted granted Critical
Publication of CN115458048B publication Critical patent/CN115458048B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The application relates to the field of biological research, and particularly discloses an antibody humanization method based on sequence coding and decoding, which takes a gene sequence as a text sequence by adopting an artificial intelligence model based on natural semantic understanding, and respectively represents the antigen molecule gene sequence in a human body and the characteristic distribution information of the gene sequence of the modified rabbit source antibody by fusing the global implicit characteristic of the gene sequence and the multi-scale neighborhood correlation characteristic under different gene spans. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by utilizing the transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.

Description

Antibody humanization method based on sequence encoding and decoding
Technical Field
The present application relates to the field of biological research, and more particularly, to a method for humanizing an antibody based on a sequence encoding/decoding scheme.
Background
Antibody humanization is an important component of experimental research in the production of recombinant antibodies (monoclonal antibodies). Antibody humanization is a process of progressing from a rabbit-derived antibody to a human-derived antibody. Most clinically used monoclonal antibodies are mouse-derived monoclonal antibodies, and due to species specificity of human and mice, the use of mouse-derived antibodies is limited and generates anti-drug antibodies.
The mouse antibody or rabbit antibody as foreign protein can enter human body, which can make human immune system generate response, and generate specific antibody using mouse antibody as antigen, namely human anti-mouse antibody (HAMA), usually the heterologous protein can be cleared quickly in human body, and half-life period is very short. Similarly, rabbit-derived antibodies also have this drawback and need to be designed for humanization to reduce immunogenicity. Because of various limitations in clinical applications of rabbit-derived antibodies, people use recombinant DNA technology to humanize rabbit-derived antibodies.
The traditional humanization of a mouse-derived or rabbit-derived antibody is to ensure that the antibody has extremely similar profile with antibody molecules in a human body through gene modification, thereby evading the recognition of a human immune system and avoiding inducing HAMA reaction. Humanization of antibodies should be performed following two basic principles, namely, maintaining or increasing the affinity and specificity of the antibody, and substantially reducing or substantially eliminating the immunogenicity of the antibody.
In the existing technical scheme, the traditional humanization method of a rabbit-derived antibody is similar to that of a mouse source, a homology modeling method is adopted to mutate a rabbit-derived amino acid sequence of a framework region into a human source, and finally, the affinity of the antibody is determined by an ELISA (enzyme-linked immuno sorbent assay) or SPR (surface plasmon resonance) method and the like, and a humanized version is selected. The rabbit antibody and the human antibody have low homology, and the structural reliability of homologous modeling is low, so that the rabbit anti-affinity of the mutated humanized version is generally reduced.
Therefore, an optimized antibody humanization scheme is desired to achieve a higher degree of homologation.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides an antibody humanization method based on sequence coding and decoding, which treats a gene sequence as a text sequence by adopting an artificial intelligence model based on natural semantic understanding, and respectively represents the antigen molecule gene sequence in a human body and the characteristic distribution information of the gene sequence of the rabbit source antibody after modification by fusing the global implicit characteristic of the gene sequence and the multi-scale neighborhood correlation characteristics under different gene spans. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.
According to one aspect of the present application, there is provided a method for humanizing an antibody based on a sequence coding, comprising:
a training phase comprising:
acquiring training data, wherein the training data comprise a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody and a true value of the homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody;
respectively enabling the gene sequence of the training antigen molecule and the gene sequence of the rabbit source antibody after training modification to pass through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector and a rabbit source antibody gene feature vector after training modification;
calculating a transfer matrix of the training and transformed rabbit source antibody gene characteristic vector relative to the training antigen molecule gene characteristic vector as a training classification characteristic matrix;
passing the training classification feature matrix through the classifier to obtain a classification loss function value;
calculating a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of a two-norm of a difference feature vector between the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector; and
training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function values and the classification mode digestion mitigation loss function values as loss function values; and
an inference phase comprising:
obtaining a gene sequence of an antigen molecule in a human body;
obtaining a plurality of gene expression characteristic vectors by passing the gene sequence of the antigen molecule through a trained context encoder based on a converter, and cascading the gene expression characteristic vectors to obtain an antigen molecule global gene characteristic vector;
the gene sequence of the antigen molecule passes through a trained multi-scale neighborhood feature extraction module to obtain an antigen molecule multi-neighborhood scale feature vector;
cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector;
obtaining a gene sequence of the modified rabbit source antibody;
processing the gene sequence of the modified rabbit-derived antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector;
calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector as a classification characteristic matrix; and
and obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the degree of homology of the rabbit-derived antibody and antigen molecules in the human body after modification.
In the antibody humanization method based on sequence coding, the passing the training classification feature matrix through the classifier to obtain a classification loss function value includes: processing the training classification feature matrix using the classifier with the following formula to generate a training classification result, wherein the formula is:
softmax{(M c ,B c ) L Project (F), where Project (F) represents the projection of the training classification feature matrix as a vector, M c Weight matrix being a fully connected layer, B c A bias matrix representing a fully connected layer; and calculating the cross entropy value between the training classification result and the true value of the homology between the gene sequence of the training antigen molecule in the training data and the gene sequence of the rabbit-derived antibody after training transformation as the classification loss function value.
In the antibody humanization method based on sequence encoding and decoding, the calculating a function value of the rabbit-derived antibody gene feature vector after the training transformation and the classification mode digestion inhibition loss of the training antigen molecule gene feature vector includes: calculating the classification mode digestion inhibition loss function value of the rabbit-derived antibody gene characteristic vector after the training transformation and the training antigen molecule gene characteristic vector according to the following formula;
wherein the formula is:
Figure BDA0003850065750000031
wherein V1 and V2 Respectively representing the training modified rabbit source antibody gene characteristic vector and the training antigen molecule gene characteristic vector, and M 1 and M2 Respectively representing the weight matrixes of the classifier for the training modified rabbit-derived antibody gene feature vector and the training antigen molecule gene feature vector,
Figure BDA0003850065750000032
represents the square of the two-norm of the vector, | · | F An F norm representing a matrix, exp (·) represents a matrix and an exponential operation of a vector, the exponential operation of the matrix representing a calculation of a natural exponent function value raised to an eigenvalue of each position in the matrix, the exponential operation of the vector representing a calculation of a natural exponent function value raised to an eigenvalue of each position in the vector.
In the antibody humanization method based on sequence encoding and decoding, the obtaining the multi-scale neighborhood feature vector of the antigen molecule by the trained multi-scale neighborhood feature extraction module of the gene sequence of the antigen molecule includes: inputting the gene sequence of the antigen molecule into a first convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a first neighborhood scale antigen molecule characteristic vector, wherein the first convolution layer has a first one-dimensional convolution kernel with a first length; inputting the gene sequence of the antigen molecule into a second convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a second neighborhood scale antigen molecule characteristic vector, wherein the second convolution layer has a second one-dimensional convolution kernel with a second length, and the first length is different from the second length; and cascading the first neighborhood scale antigen molecule feature vector and the second neighborhood scale antigen molecule feature vector to obtain the antigen molecule multi-neighborhood scale feature vector.
In the antibody humanization method based on sequence coding and decoding, the inputting the gene sequence of the antigen molecule into the first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale antigen molecule feature vector includes: performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using a first convolution layer of the multi-scale neighborhood characteristic extraction module according to the following formula to obtain a first neighborhood scale antigen molecule characteristic vector;
wherein the formula is:
Figure BDA0003850065750000041
wherein a is the width of the first convolution kernel in the X direction, F (a) is a parameter vector of the first convolution kernel, G (X-a) is a local vector matrix operated with a convolution kernel function, w is the size of the first convolution kernel, and X represents the gene sequence of the antigen molecule;
in the above antibody humanization method based on sequence coding and decoding, the inputting the gene sequence of the antigen molecule into the second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antigen molecule feature vector includes: performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using a second convolution layer of the multi-scale neighborhood characteristic extraction module according to the following formula to obtain a second neighborhood scale antigen molecule characteristic vector;
wherein the formula is:
Figure BDA0003850065750000042
wherein b is the width of the second convolution kernel in the X direction, F (b) is a second convolution kernel parameter vector, G (X-b) is a local vector matrix operated with the convolution kernel function, m is the size of the second convolution kernel, and X represents the gene sequence of the antigen molecule.
In the antibody humanization method based on sequence encoding and decoding, the calculating a transfer matrix of the gene feature vector of the modified rabbit-derived antibody relative to the gene feature vector of the antigen molecule as a classification feature matrix includes: calculating a transfer matrix of the gene characteristic vector of the rabbit-derived antibody relative to the gene characteristic vector of the antigen molecule as the classification characteristic matrix according to the following formula;
wherein the formula is:
Figure BDA0003850065750000043
wherein V1 Representing the gene feature vector of the rabbit source antibody after modification,V 2 representing the antigenic molecule gene feature vector, M representing the classification feature matrix,
Figure BDA0003850065750000044
representing a matrix multiplication.
According to another aspect of the present application, there is provided an antibody humanization system based on sequence coding, comprising:
a training module comprising:
the training data acquisition unit is used for acquiring training data, wherein the training data comprises a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody and a true value of the homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody;
the feature vector extraction unit is used for enabling the gene sequence of the training antigen molecule and the gene sequence of the rabbit-derived antibody after training and modification to pass through the context encoder based on the converter and the multi-scale neighborhood feature extraction module respectively so as to obtain a training antigen molecule gene feature vector and a rabbit-derived antibody gene feature vector after training and modification;
the training classification characteristic matrix generating unit is used for calculating a transfer matrix of the rabbit-derived antibody gene characteristic vector relative to the training antigen molecule characteristic vector after training transformation as a training classification characteristic matrix;
the classification loss function value calculation unit is used for enabling the training classification characteristic matrix to pass through the classifier so as to obtain a classification loss function value;
a classification mode digestion inhibition loss function value calculation unit, configured to calculate a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule gene feature vector, where the classification mode digestion inhibition loss function value is related to a square of a two-norm of a difference feature vector between the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule gene feature vector; and
a training unit to train the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function values and the classification mode digestion suppression loss function values as loss function values; and
an inference module comprising:
a physiological information acquisition unit for acquiring a gene sequence of an antigen molecule in a human body;
the antigen molecule global gene feature vector generating unit is used for enabling the gene sequence of the antigen molecule to pass through a trained context encoder based on a converter to obtain a plurality of gene expression feature vectors, and cascading the gene expression feature vectors to obtain the antigen molecule global gene feature vector;
the multi-scale neighborhood characteristic extraction unit is used for enabling the gene sequence of the antigen molecule to pass through a trained multi-scale neighborhood characteristic extraction module so as to obtain an antigen molecule multi-neighborhood scale characteristic vector;
the cascade unit is used for cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector;
the gene sequence acquisition unit of the rabbit-derived antibody is used for acquiring the gene sequence of the modified rabbit-derived antibody;
the modified rabbit-derived antibody gene feature extraction unit is used for processing the gene sequence of the modified rabbit-derived antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector;
a classification characteristic matrix generating unit for calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector as a classification characteristic matrix; and
and the class probability value generating unit is used for obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the homology of the rabbit-derived antibody and antigen molecules in the human body after modification.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform a method of humanizing an antibody based on a sequence codec as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a method of humanizing an antibody based on a sequence codec as described above.
Compared with the prior art, the antibody humanization method based on sequence coding and decoding provided by the application considers the gene sequence as a text sequence by adopting an artificial intelligence model based on natural semantic understanding, and respectively represents the antigen molecule gene sequence in a human body and the characteristic distribution information of the gene sequence of the modified rabbit-derived antibody by fusing the global implicit characteristic of the gene sequence and the multi-scale neighborhood associated characteristics under different gene spans. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by using a transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 illustrates a schematic diagram of a heavy chain of a rabbit antibody according to embodiments of the present application.
Fig. 2 illustrates another schematic diagram of a heavy chain of a rabbit antibody according to embodiments of the present application.
FIG. 3 illustrates a schematic diagram of a structural model of homology modeling building according to an embodiment of the application.
FIG. 4 illustrates a schematic diagram of a structural model built by co-evolutionary modeling according to an embodiment of the application.
Fig. 5 illustrates a flow diagram of a training phase in a method of sequence codec based antibody humanization according to an embodiment of the present application.
FIG. 6 illustrates a flow chart of the inference stage in a sequence codec based antibody humanization method according to an embodiment of the present application.
Fig. 7 illustrates an architecture diagram of a training phase in a sequence codec based antibody humanization method according to an embodiment of the present application.
Fig. 8 illustrates an architectural diagram of an inference stage in a sequence codec based antibody humanization method according to an embodiment of the present application.
Fig. 9 illustrates a flowchart of an antigen molecule multi-scale neighborhood feature extraction process in an antibody humanization method based on sequence coding according to an embodiment of the present application.
FIG. 10 illustrates a block diagram of an antibody humanization system based on sequence coding according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Overview of scenes
Accordingly, considering the affinity loss or degradation that is easily caused by the conventional method, in one example, using the FORTEBIO instrument and related software, the kinetic experimental mode was selected. Through analysis, the affinity of the humanized individual antibodies designed by the traditional method is reduced.
Therefore, in the technical scheme of the application, the non-homologous modeling method can maintain the affinity unchanged or even higher. The method comprises the following specific steps:
step 1: the method comprises the steps of constructing a rabbit anti-structural model by adopting a non-homologous modeling method, constructing a structural model by adopting a coevolution method, constructing the structural model by using alphafold II, performing sequence de-coding, and finally designing an amino acid sequence to obtain the humanized antibody with consistent affinity.
Step 2: selecting a progressive method for humanization, selecting a humanized antibody with 80-90% of homology as a structural model, and then selecting a fully humanized antibody with 90-99% of homology as the structural model; a higher degree of humanisation is achieved.
And step 3: the affinity of the humanized antibody is verified by design.
Based on this, it is considered that although a humanized antibody with 80-90% homology is selected as a structural model first, and then a fully human antibody with 90-99% homology is selected as a structural model to achieve higher degree of humanization, it is still necessary to examine the homology of the gene sequence of the rabbit antibody after modification and the gene sequence of the antibody molecule in the human body. Therefore, in the technical scheme of the application, it is expected that a gene sequence is regarded as a text sequence by adopting an artificial intelligence model based on natural semantic understanding, and the characteristic distribution information of the antigen molecule gene sequence in the human body and the gene sequence of the modified rabbit-derived antibody is respectively represented by fusing the global implicit characteristic of the gene sequence and the multi-scale neighborhood correlation characteristics under different gene spans. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by utilizing the transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.
Specifically, in the technical scheme of the application, firstly, a gene sequence of an antigen molecule in a human body is obtained. Then, considering that each gene in the gene sequence of the antigen molecule in the human body has the semantic feature information of the context, the context encoder based on the converter is used for processing the gene sequence of the antigen molecule to extract the essential features of the gene of the antigen molecule based on the global high-dimensional semantic features so as to be more suitable for characterizing the antigen molecule in the human body. Then, the multiple gene expression characteristic vectors are cascaded to integrate global implicit characteristic information of the genes of the antigen molecules in each human body, so that the antigen molecule global gene characteristic vector is obtained.
In particular, in the technical scheme of the application, it is considered that since the gene is composed of a plurality of bases, the bases are sites, and there are ATCG four bases in the DNA. Therefore, the gene sequence of the human antigen molecule has an ATCG base sequence consisting of a plurality of ATCG bases. Therefore, in the technical solution of the present application, before the gene sequence is encoded by the context encoder, the gene sequence of the antigen molecule in the human body is first subjected to unique heat encoding to be converted into an input vector.
It should be understood that, since different implicit features exist in each gene segment under different gene segment spans in the gene sequence of the antigen molecule in the human body, the multi-scale neighborhood features can extract the associated features under different gene segment spans. Therefore, in the technical scheme of the application, a multi-scale neighborhood feature extraction module is further used for encoding the gene sequence of the antibody molecule to extract multi-scale neighborhood associated features of the gene sequence of the antibody molecule in the human body under different gene segment spans, so that an antigen molecule multi-neighborhood scale feature vector is obtained.
And then cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to perform feature fusion to obtain the antigen molecule gene feature vector.
Furthermore, in order to accurately evaluate and judge the homology between the modified rabbit-derived antibody and the antigen molecules in the human body, the gene sequence of the modified rabbit-derived antibody needs to be obtained. Similarly, the gene sequence of the modified rabbit-derived antibody is processed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module, so that a modified rabbit-derived antibody gene feature vector with global multi-scale neighborhood correlation features under different gene fragment spans is obtained.
Then, because the genetic characteristics of the modified rabbit-derived antibody and the genetic characteristics of the antigen molecules in the human body have different characteristic scales in a high-dimensional characteristic space and the humanized antibody needs higher affinity, in order to accurately judge the homology of the modified rabbit-derived antibody and the antigen molecules in the human body, a transfer matrix of the modified rabbit-derived antibody genetic characteristic vector relative to the antigen molecule genetic characteristic vector is further calculated to classify, and then the homology of the modified rabbit-derived antibody and the antigen molecules in the human body is evaluated to obtain higher humanization.
In particular, in the technical solution of the present application, since the classification feature matrix is a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antigen molecule gene feature vector, during a training process of the classifier, when a gradient back propagation passes through feature extraction models of the modified rabbit-derived antibody gene feature vector and the antigen molecule gene feature vector, respectively, that is, the converter-based context encoder and the multi-scale neighborhood feature extraction module are added, a digestion inhibition loss function of a classification pattern is introduced, which may cause digestion of feature patterns expressed by the modified rabbit-derived antibody gene feature vector and the antigen molecule gene feature vector due to abnormal gradient branches:
Figure BDA0003850065750000091
V 1 and V2 Respectively are the characteristic vector of the gene of the rabbit derived antibody after modification and the characteristic vector of the gene of the antigen molecule, and M 1 and M2 Classifier pair V respectively 1 and V2 The weight matrix of (a) is determined,
Figure BDA0003850065750000092
representing the square of the two-norm of the vector.
Here, by introducing the classification pattern digestion inhibition loss function, the pseudo difference of the classifier weight can be pushed to the real feature distribution difference of the modified rabbit-derived antibody gene feature vector and the antigen molecule gene feature vector, so that the directional derivative is enabled to be regularized near a gradient branch point when the gradient is reversely propagated, that is, the gradient is subjected to over-weighting between the modified rabbit-derived antibody gene feature vector and the feature extraction pattern of the antigen molecule gene feature vector, so that the classification pattern digestion of the features is inhibited, and the classification accuracy is improved. Therefore, the homology of the rabbit-derived antibody after modification and the antigen molecule in the human body can be accurately evaluated and judged, and the homology of the gene sequence of the rabbit-derived antibody after modification and the gene sequence of the antibody molecule in the human body can be accurately checked.
Based on this, the present application provides a method for humanizing an antibody based on a sequence coding, comprising: a training phase comprising: acquiring training data, wherein the training data comprises a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody and a true value of the homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody; respectively enabling the gene sequence of the training antigen molecule and the gene sequence of the rabbit-derived antibody after training and modification to pass through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector and a rabbit-derived antibody gene feature vector after training and modification; calculating a transfer matrix of the training and transformed rabbit source antibody gene characteristic vector relative to the training antigen molecule gene characteristic vector as a training classification characteristic matrix; passing the training classification feature matrix through the classifier to obtain a classification loss function value; calculating a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of a two-norm of a difference feature vector between the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector; and training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function values and the classification mode digestion inhibition loss function values as loss function values; further comprising: an inference phase comprising: obtaining a gene sequence of an antigen molecule in a human body; obtaining a plurality of gene expression characteristic vectors by passing the gene sequence of the antigen molecule through a trained context encoder based on a converter, and cascading the gene expression characteristic vectors to obtain an antigen molecule global gene characteristic vector; the gene sequence of the antigen molecule passes through a trained multi-scale neighborhood feature extraction module to obtain an antigen molecule multi-neighborhood scale feature vector; cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector; obtaining a gene sequence of the modified rabbit source antibody; processing the gene sequence of the modified rabbit-derived antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector; calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector as a classification characteristic matrix; and obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the homology of the rabbit-derived antibody and antigen molecules in the human body after modification.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.
Exemplary methods of humanization
The first embodiment is as follows: rabbit antibody sequence analysis
The rabbit antibody sequences were selected as follows:
heavy chain > VH
QSVKESEGGLFKPTDTLTLTCTVSGFSLSSYAISWVRQAPGNGLEWIGIINSYGSTYYASWAKSRSTITRNTNENTVTLKMTSLTAADTATYFCARGYAGSSGGYIWGPGTLVTVSS
Light chain > VL
AAVLTQTPSPVSAAVGGTVTIKCQSSQSVYNNNLLSWYQQKPGQPPKLLIYDASNLPSGVPDRFSGSGSGTQFTLTISGVQCDDAATYYCLGGYYGSDAGGNTFGGGTEVVVK
The rabbit antibody heavy chain sequence is compared with the human germline antibody sequence, and the comparison shows that the degree of homology of the rabbit anti-sequence and the human antibody sequence is lower than 70%. The heavy chains such as VK/EG/F/T, etc. are all sites to be humanized as shown in FIG. 1.
The rabbit light chain sequence and the human germline antibody sequence are compared, and the comparison shows that the degree of homology of the rabbit anti-sequence and the human antibody sequence is lower than 70%. The heavy chains such as AVL/PV/A/GT, etc. are all the sites to be humanized as shown in FIG. 2.
Example two: homologous modeling building rabbit anti-structural model
Homologous modeling: selecting 5-10 optimal structural solutions by adopting a swiss-model homologous modeling method, modeling a Loop region by using the homologous modeling method generally, and building a CDR3 structural model by using a de novo modeling method if the CDR amino acid sequence comparison result shows that the content is lower than 50 percent. The PDB BLAST was used to retrieve the closest 10 antibody crystal structure models (structure resolution higher than 2.5 a) of the sequences, compared to the automated modeling model, and the optimal structure model was selected. And (3) modeling the constructed structural model by homologous modeling, as shown in FIG. 3.
Example three:
co-evolutionary modeling: selecting 2 optimal structural solutions by adopting an alpha fold II coevolution modeling method, selecting a humanized antibody with 80-90% of homology as a structural model, and then selecting a fully humanized antibody with 90-99% of homology as the structural model. And (4) carrying out coevolution modeling on the constructed structural model, as shown in FIG. 4.
Example four:
traditional humanization design schemes: and (4) mutating the original murine sequence into the human sequence by database comparison. The original sequence of the rabbit-derived antibody was designed as a plurality of humanized amino acids (huVH 1, huVH2, huVH3, huVL1, huVL2, huVL 3), and the designed sequences were combined into a humanized antibody, and the antibody was expressed in an Expi 293 mammalian expression system.
The results of sequencing the above humanized amino acid sequence are shown in the following table.
Figure BDA0003850065750000111
Purifying the humanized antibody: a series of humanized antibodies expressed in Expi 293 cells were collected from cell supernatants and purified according to standard protocol for protein purification. The experimental results are characterized, and the purity of the purified humanized antibody is over 90 percent.
Example five:
the activity of the humanized antibody is detected by homologous modeling design:
the activity of binding of the humanized antibody to the antigen was detected by ELISA:
coating a plate to be detected with 0.5 mu g/mL of antigen by using a target antigen through an ELISA method, setting the concentration gradient of the humanized antibody sample purified in the embodiment 4 to be 0.00004-1.2 mu g/mL, and measuring the binding activity OD value to detect the strength of the binding affinity of the humanized antibody and the target antigen, wherein the test result is shown in the figure; as can be seen from the graph, the OD value rises obviously along with the rise of the concentration of the sample, clear upper and lower platforms are generated very quickly, the window is large, the antigen binding affinity of the humanized antibody is weaker than that of the female parent, the humanized degree is high, and the affinity is reduced obviously;
the results of the calculated EC50 calculation are given in the following table:
Sample ID EC50(μg/mL)
mVH+mVL 0.02054
H1L1 1.4298
H2L2 3.6758
H3L3 /
positive control 0.01683
Example six:
affinity kinetic assay
Opening a FORTEBIO instrument and related software, and selecting a Kinetics experiment mode. Through analysis, the affinity of the humanized individual antibodies designed by the traditional method is reduced.
Name of sample KD(M) ka(1/Ms) kd(1/s) R2 Rmax(nm) Ratio:WT/Variant
Female parent 1.12E-09 1.46E+05 1.64E-04 0.997 0.476 1.62
H1L1 2.20E-07 1.73E+04 3.81E-03 0.997 0.464 1.67
H2L2 3.13E-08 1.78E+05 5.57E-03 0.996 0.489 1.69
H3L3 4.36E-07 1.96E+05 8.55E-03 0.996 0.435 1.80
Example seven:
improved humanization design protocol: and (4) mutating the original murine sequence into the human sequence by database comparison. The original sequence of the rabbit-derived antibody was designed as a plurality of humanized amino acids (huVH 4, huVH5, huVH6, huVL4, huVL5, huVL 6), and the designed sequences were combined into a humanized antibody, and the antibody was expressed in an Expi 293 mammalian expression system. The results of the above humanized amino acid sequences are shown in the following table.
Figure BDA0003850065750000121
Figure BDA0003850065750000131
Purifying the humanized antibody: a series of humanized antibodies expressed in Expi 293 cells were collected from the cell supernatant and purified according to standard protocol for protein purification. The experimental results are characterized, and the purity of the purified humanized antibody is over 90 percent.
Example seven:
the technical method comprises the following steps of modeling and designing the activity detection of the humanized antibody:
the activity of binding of the humanized antibody to the antigen was detected by ELISA:
coating a plate to be detected with 0.5 mu g/mL of antigen by using a target antigen through an ELISA method, setting the concentration gradient of the humanized antibody sample purified in the embodiment 7 to be 0.00004-1.2 mu g/mL, and measuring the OD (OD) value of the binding activity to detect the strength of the binding affinity of the humanized antibody and the target antigen, wherein the test result is shown in the figure; as can be seen from the graph, the OD value is obviously increased along with the increase of the concentration of the sample, clear upper and lower platforms are generated very quickly, the window is large, the antigen binding affinity of the humanized antibody is equivalent to that of the parent, the humanization degree is high, the affinity is basically consistent, and the humanization is successful;
the results of the calculated EC50 calculation are given in the following table:
Sample ID EC50(μg/mL)
mVH+mVL 0.01649
H4L4 0.01156
H5L5 0.02173
H6L6 0.01124
positive control 0.01019
Example eight:
affinity kinetic assay
Opening a FORTEBIO instrument and related software, and selecting a Kinetics experiment mode. Through analysis, the affinity of the humanized individual antibodies designed by the traditional method is consistent with that of the female parent, and the humanized design is successful.
Figure BDA0003850065750000141
Example nine: resolution of rabbit anti-crystal structure
Respectively synthesizing rabbit anti-light heavy chain sequences, expressing and purifying in 293F cells, or 20mg high-purity protein, screening crystals by using a crystallization robot, collecting data by X-ray crystallography and diffraction, and analyzing the structure by methods such as molecular replacement and the like. By analyzing and comparing structures predicted by different methods using pymol software, the RMSD value of the structure obtained by the traditional homologous modeling method was 1.42 angstroms, while the RMSD value of the structure obtained using the method of the present technology was 0.41 angstroms; the antibody structure predicted by the technical method is more accurate, and the humanized antibody has higher synthesis power and activity and is superior to the traditional method.
In particular, in the technical scheme of the present application, it is considered that although a humanized antibody with 80-90% homology is selected as a structural model first, and then a fully human antibody with 100% homology is selected as a structural model to achieve higher degree of humanization, the homology between the gene sequence of the modified rabbit antibody and the gene sequence of the antibody molecule in the human body still needs to be checked. Therefore, in the technical scheme of the application, it is expected that a gene sequence is regarded as a text sequence by adopting an artificial intelligence model based on natural semantic understanding, and the characteristic distribution information of the antigen molecule gene sequence in the human body and the gene sequence of the modified rabbit-derived antibody is respectively represented by fusing the global implicit characteristic of the gene sequence and the multi-scale neighborhood correlation characteristics under different gene spans. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by utilizing the transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.
Exemplary homology check method
Fig. 5 illustrates a flow diagram of a training phase in a method of sequence codec based antibody humanization according to an embodiment of the present application. As shown in fig. 5, the method for humanizing an antibody based on a sequence coding according to an embodiment of the present application includes: a training phase comprising: s110, obtaining training data, wherein the training data comprise a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody and a true value of the homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody; s120, the gene sequence of the training antigen molecule and the gene sequence of the rabbit-derived antibody after training and modification are respectively passed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector and a rabbit-derived antibody gene feature vector after training and modification; s130, calculating a transfer matrix of the gene feature vector of the rabbit source antibody after training modification relative to the gene feature vector of the training antigen molecule as a training classification feature matrix; s140, enabling the training classification characteristic matrix to pass through the classifier to obtain a classification loss function value; s150, calculating a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of a two-norm of a difference feature vector between the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector; and S160, training the context encoder based on converter, the multi-scale neighborhood feature extraction module and the classifier by taking the weighted sum of the classification loss function value and the classification mode digestion inhibition loss function value as a loss function value.
FIG. 6 illustrates a flow chart of the inference stage in a sequence codec based antibody humanization method according to an embodiment of the present application. As shown in fig. 6, the method for humanizing an antibody based on a sequence codec according to an embodiment of the present application further includes an inference stage including the steps of: s210, obtaining a gene sequence of an antigen molecule in a human body; s220, passing the gene sequence of the antigen molecule through a trained context encoder based on a converter to obtain a plurality of gene expression characteristic vectors, and cascading the gene expression characteristic vectors to obtain an antigen molecule global gene characteristic vector; s230, passing the gene sequence of the antigen molecule through a trained multi-scale neighborhood feature extraction module to obtain an antigen molecule multi-neighborhood scale feature vector; s240, cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector; s250, obtaining a gene sequence of the modified rabbit antibody; s260, processing the gene sequence of the modified rabbit-derived antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector; s270, calculating a transfer matrix of the gene characteristic vector of the modified rabbit source antibody relative to the antigen molecule gene characteristic vector as a classification characteristic matrix; and S280, obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the homology of the rabbit source antibody after modification and antigen molecules in a human body.
Fig. 7 illustrates an architecture diagram of a training phase in an antibody humanization method based on sequence coding according to an embodiment of the present application. As shown in fig. 7, in the training phase, in the network structure, first, training data is obtained, where the training data includes a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody, and a true value of a degree of homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody; then, the obtained gene sequence of the training antigen molecule and the gene sequence of the rabbit source antibody after training and modification are respectively passed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector and a rabbit source antibody gene feature vector after training and modification; then calculating a transfer matrix of the rabbit-derived antibody gene characteristic vector after training modification relative to the training antigen molecule gene characteristic vector as a training classification characteristic matrix; passing the obtained classification characteristic matrix through the classifier to obtain a classification loss function value; secondly, calculating a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene characteristic vector after the training transformation and the training antigen molecule gene characteristic vector, wherein the classification mode digestion inhibition loss function value is related to the square of a two-norm of a difference characteristic vector between the rabbit-derived antibody gene characteristic vector after the training transformation and the training antigen molecule gene characteristic vector; further, the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier are trained with a weighted sum of the classification loss function values and the classification mode digestion suppression loss function values as loss function values.
More specifically, in the training phase, in step S110, training data is obtained, where the training data includes a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody, and a true value of a degree of homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody. Considering that although a humanized antibody with 80-90% homology is selected as a structural model first and then a fully human antibody with 100% homology is selected as a structural model to achieve higher degree of humanization, it is still necessary to examine the homology between the gene sequence of the rabbit antibody after modification and the gene sequence of the antibody molecule in the human body. Therefore, in the technical scheme of the application, the gene sequence of the training antigen molecule, the gene sequence of the rabbit-derived antibody after training and modification, and the true value of the homology between the gene sequence of the training antigen molecule and the gene sequence of the rabbit-derived antibody after training and modification can be obtained through a gene sequence analyzer.
More specifically, in the training phase, in step S120, the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody are respectively passed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector and a training modified rabbit-derived antibody gene feature vector. Considering that each gene in the gene sequence of the antigen molecule in the human body has semantic feature information of context, the gene sequence of the training antigen molecule is processed using a context encoder based on a converter to obtain a training antigen molecule gene feature vector, and particularly, in the technical scheme of the present application, considering that since the gene is composed of many bases, the bases are sites, and there are four bases of ATCG in DNA. Therefore, the gene sequence of the human antigen molecule has an ATCG base sequence consisting of a plurality of ATCG bases. Therefore, in the technical solution of the present application, before the context encoder is used to encode the gene sequence, the gene sequence of the antigen molecule in the human body is subjected to unique hot encoding to be converted into an input vector; further, the input vector passes through the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector; further, in order to accurately evaluate and judge the homology between the modified rabbit-derived antibody and the antigen molecules in the human body, a gene sequence of the training modified rabbit-derived antibody needs to be obtained, and more specifically, the gene sequence of the training modified rabbit-derived antibody is processed based on a context encoder of a converter and the multi-scale neighborhood feature extraction module, so as to obtain the gene feature vector of the training modified rabbit-derived antibody with global multi-scale neighborhood correlation features under different gene segment spans.
More specifically, in the training phase, in step S130, a transfer matrix of the rabbit-derived antibody gene feature vector after the training transformation relative to the training antigen molecule gene feature vector is calculated as a training classification feature matrix. Because the genetic characteristics of the rabbit-derived antibody after training modification and the genetic characteristics of the training antigen molecules in the human body have different characteristic scales in a high-dimensional characteristic space, and the humanized antibody needs higher affinity, in order to accurately judge the homology of the rabbit-derived antibody after training modification and the training antigen molecules in the human body, a transfer matrix of the gene characteristic vector of the rabbit-derived antibody after training modification relative to the gene characteristic vector of the training antigen molecules is further calculated for classification, and then the homology of the rabbit-derived antibody after training modification and the training antigen molecules in the human body is evaluated, so that higher-degree humanization is obtained. In a specific example of the application, a transfer matrix of the rabbit-derived antibody gene feature vector after training modification relative to the training antigen molecule gene feature vector is calculated as the training classification feature matrix according to the following formula;
wherein the formula is:
Figure BDA0003850065750000171
wherein V1 Expressing the characteristic vector of the rabbit source antibody gene after the training transformation, V 2 Representing the training antigen molecule gene feature vector, M representing the classification feature matrix,
Figure BDA0003850065750000172
representing a matrix multiplication.
More specifically, in the training phase, in step S140, the training classification feature matrix is passed through the classifier to obtain a classification loss function value. In a specific example of the present application, the passing the training classification feature matrix through the classifier to obtain a classification loss function value includes: processing the training classification feature matrix using the classifier with a formula to generate a training classification result, wherein the formula is:
softmax{(M c ,B c ) Project (F), where Project (F) represents projecting the training classification feature matrix as a vector, M c Weight matrix being a fully connected layer, B c A bias matrix representing a fully connected layer; and calculating the cross entropy value between the training classification result and the true value of the homology between the gene sequence of the training antigen molecule in the training data and the gene sequence of the rabbit-derived antibody after training transformation as the classification loss function value.
More specifically, in the training phase, in step S150, a class pattern digestion inhibition loss function value of the training modified rabbit-derived antibody gene feature vector and the training antigen molecule gene feature vector is calculated, wherein the class pattern digestion inhibition loss function value is related to the square of the two-norm of the differential feature vector between the training modified rabbit-derived antibody gene feature vector and the training antigen molecule gene feature vector. Particularly, in the technical solution of the present application, since the classification feature matrix is a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antigen molecule gene feature vector, in a training process of the classifier, when gradient back propagation respectively passes through feature extraction models of the modified rabbit-derived antibody gene feature vector and the antigen molecule gene feature vector, that is, the context encoder based on the converter adds the multi-scale neighborhood feature extraction module, digestion of feature patterns expressed by the modified rabbit-derived antibody gene feature vector and the antigen molecule gene feature vector may be caused due to abnormal gradient branches, and thus, a classification pattern digestion inhibition loss function is introduced. In a specific example of the present application, the calculating a function value of the digest inhibition loss in the classification mode of the feature vector of the rabbit-derived antibody gene after the training modification and the feature vector of the training antigen molecule gene includes: calculating the classification mode digestion inhibition loss function values of the rabbit source antibody gene characteristic vector after the training transformation and the training antigen molecule gene characteristic vector according to the following formula;
wherein the formula is:
Figure BDA0003850065750000181
wherein V1 and V2 Respectively representing the training modified rabbit source antibody gene characteristic vector and the training antigen molecule gene characteristic vector, and M 1 and M2 Respectively representing the weight matrixes of the classifier for the training modified rabbit-derived antibody gene feature vector and the training antigen molecule gene feature vector,
Figure BDA0003850065750000182
represents the square of the two-norm of the vector, | - | F An F norm representing a matrix, exp (-) representing a matrix and an exponential operation of a vector representing a calculation of a natural exponent function value raised to the power of the eigenvalue of each position in the matrix, the exponential operation of the vector representing a calculation of a natural exponent function value raised to the power of the eigenvalue of each position in the vector. Here, by introducing the classification pattern resolution inhibition loss function, the classifier can be usedThe pseudo difference of the weight is pushed to the real feature distribution difference of the modified rabbit-derived antibody gene feature vector and the antigen molecule gene feature vector, so that the directional derivative is enabled to be regularized near a gradient branch point when the gradient is reversely propagated, namely, the gradient is subjected to weighting between the modified rabbit-derived antibody gene feature vector and the feature extraction mode of the antigen molecule gene feature vector, the feature classification mode is eliminated, and the classification accuracy is improved. Therefore, the homology of the rabbit source antibody after modification and the antigen molecule in the human body can be accurately evaluated and judged so as to obtain higher humanization, and further, the affinity can be maintained unchanged or even higher.
More specifically, in the training phase, the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier are trained in step S160 with a weighted sum of the classification loss function values and the classification mode-solving rejection loss function values as the loss function values. Namely, the weighted sum of the classification loss function value and the classification mode digestion inhibition loss function value updates the parameters of the context encoder, the parameters of the multi-scale neighborhood feature extraction module and the parameters of the classifier.
After training is completed, the inference phase is entered. Namely, the context encoder based on the converter, the multi-scale neighborhood feature extraction module and the classifier which are trained in the training stage can be obtained according to the method, and then the context encoder based on the converter and the multi-scale neighborhood feature extraction module which are trained in the training stage are used in actual inference to obtain a more accurate classification result of the homology of the rabbit-derived antibody and the antigen molecules in the human body after modification.
Fig. 8 illustrates an architectural diagram of an inference stage in a sequence codec based antibody humanization method according to an embodiment of the present application. As shown in fig. 8, in the inference stage, in the network structure, first, the gene sequence of the antigen molecule in the human body and the gene sequence of the rabbit-derived antibody after modification are obtained; then, the obtained gene sequence of the antigen molecule passes through a trained context encoder based on a converter to obtain a plurality of gene expression characteristic vectors, and the gene expression characteristic vectors are cascaded to obtain an antigen molecule global gene characteristic vector; meanwhile, the gene sequence of the modified rabbit-derived antibody is processed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector; secondly, the gene sequence of the antigen molecule passes through a trained multi-scale neighborhood feature extraction module to obtain an antigen molecule multi-neighborhood scale feature vector; cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector; then calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector as a classification characteristic matrix; and then, obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the homology of the rabbit-derived antibody and the antigen molecules in the human body after modification.
More specifically, in the inference phase, in steps S210 and S220, the gene sequence of the antigen molecule in the human body is acquired; and (3) passing the gene sequence of the antigen molecule through a trained context encoder based on a converter to obtain a plurality of gene expression characteristic vectors, and cascading the gene expression characteristic vectors to obtain an antigen molecule global gene characteristic vector. Considering that each gene in the gene sequence of the antigen molecule in the human body has semantic feature information of context, the context encoder based on the converter is used for processing the gene sequence of the antigen molecule to extract essential features of the gene sequence of the antigen molecule based on global high-dimensional semantic features so as to be more suitable for characterizing the antigen molecule in the human body. Then, the multiple gene expression characteristic vectors are cascaded to integrate the global implicit characteristic information of the genes of the antigen molecules in each human body, so that the global gene characteristic vector of the antigen molecules is obtained.
More specifically, in the inference phase, in step S230 and step S240, the gene sequence of the antigen molecule is passed through a trained multi-scale neighborhood feature extraction module to obtain a multi-scale neighborhood feature vector of the antigen molecule; and cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain the antigen molecule gene feature vector. It should be understood that, since different implicit features exist in each gene segment under different gene segment spans in the gene sequence of the antigen molecule in the human body, the multi-scale neighborhood features can extract the associated features under different gene segment spans. Therefore, in the technical scheme of the application, a multi-scale neighborhood feature extraction module is further used for encoding the gene sequence of the antibody molecule to extract multi-scale neighborhood associated features of the gene sequence of the antibody molecule in the human body under different gene segment spans, so that an antigen molecule multi-neighborhood scale feature vector is obtained. And then cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to perform feature fusion to obtain the antigen molecule gene feature vector.
Fig. 6 illustrates a flowchart of an extraction process of an antigen molecule multi-scale neighborhood feature in an antibody humanization method based on sequence coding according to an embodiment of the present application. As shown in fig. 9, in the process of extracting the multi-scale neighborhood features of the antigen molecules, the method includes: s231, inputting the gene sequence of the antigen molecule into a first convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a first neighborhood scale antigen molecule characteristic vector, wherein the first convolution layer has a first one-dimensional convolution kernel with a first length; s232, inputting the gene sequence of the antigen molecule into a second convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a second neighborhood scale antigen molecule characteristic vector, wherein the second convolution layer has a second one-dimensional convolution kernel with a second length, and the first length is different from the second length; and S233, cascading the first neighborhood scale antigen molecule feature vector and the second neighborhood scale antigen molecule feature vector to obtain the multi-scale neighborhood antigen molecule feature vector. More specifically, the gene sequence of the antibody molecule is subjected to one-dimensional convolution coding by using a first convolution layer of the multi-scale neighborhood characteristic extraction module according to the following formula so as to obtain a first neighborhood scale antigen molecule characteristic vector;
wherein the formula is:
Figure BDA0003850065750000202
wherein a is the width of the first convolution kernel in the X direction, F (a) is a parameter vector of the first convolution kernel, G (X-a) is a local vector matrix operated with a convolution kernel function, w is the size of the first convolution kernel, and X represents the gene sequence of the antigen molecule; further using a second convolution layer of the multi-scale neighborhood characteristic extraction module to perform one-dimensional convolution coding on the gene sequence of the antibody molecule by using the following formula so as to obtain a second neighborhood scale antigen molecule characteristic vector;
wherein the formula is:
Figure BDA0003850065750000201
wherein b is the width of the second convolution kernel in the X direction, F (b) is a second convolution kernel parameter vector, G (X-b) is a local vector matrix operated with the convolution kernel function, m is the size of the second convolution kernel, and X represents the gene sequence of the antigen molecule.
More specifically, in the inference phase, in step S250, the gene sequence of the engineered rabbit-derived antibody is obtained. It should be understood that, in order to accurately evaluate and judge the homology between the rabbit-derived antibody and the antigen molecules in the human body, the gene sequence of the rabbit-derived antibody needs to be obtained. In the technical scheme of the application, the gene sequence of the modified rabbit source antibody can be obtained through a gene sequence analyzer.
More specifically, in the inference phase, in step S260, the gene sequence of the modified rabbit-derived antibody is processed by the context encoder based on converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector. Similarly, the gene sequence of the modified rabbit-derived antibody is processed through the context encoder based on the converter and the multi-scale neighborhood feature extraction module, so that a modified rabbit-derived antibody gene feature vector with global multi-scale neighborhood correlation features under different gene segment spans is obtained.
More specifically, in the inference phase, in steps S270 and S280, a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antigen molecule gene feature vector is calculated as a classification feature matrix; and obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the homology of the rabbit source antibody after modification and antigen molecules in a human body. In a specific example of the present application, said calculating a transfer matrix of said modified rabbit-derived antibody gene feature vector relative to said antigen molecule gene feature vector as a classification feature matrix includes: calculating a transfer matrix of the gene characteristic vector of the rabbit-derived antibody relative to the gene characteristic vector of the antigen molecule as the classification characteristic matrix according to the following formula;
wherein the formula is:
Figure BDA0003850065750000211
wherein V1 Expressing the modified rabbit antibody gene feature vector, V 2 Representing the antigenic molecule gene feature vector, M representing the classification feature matrix,
Figure BDA0003850065750000212
representing a matrix multiplication.
In summary, the antibody humanization method based on sequence coding and decoding is elucidated, and by adopting an artificial intelligence model based on natural semantic understanding, a gene sequence is regarded as a text sequence, and the feature distribution information of an antigen molecule gene sequence in a human body and a gene sequence of a modified rabbit-derived antibody is respectively represented by fusing a global implicit feature of the gene sequence and multi-scale neighborhood correlation features under different gene spans. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by utilizing the transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.
Exemplary System
FIG. 10 illustrates a block diagram of an antibody humanization system based on sequence coding, according to an embodiment of the present application. As shown in fig. 10, the antibody humanization system 500 based on sequence coding according to an embodiment of the present application includes: a training module 510 and an inference module 520.
As shown in fig. 10, the training module 510 includes: a training data obtaining unit 511, configured to obtain training data, where the training data includes a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody, and a true value of a degree of homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody; a feature vector extraction unit 512, configured to pass the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody through the converter-based context encoder and the multi-scale neighborhood feature extraction module, respectively, to obtain a training antigen molecule gene feature vector and a training modified rabbit-derived antibody gene feature vector; a training classification feature matrix generating unit 513, configured to calculate a transfer matrix of the rabbit-derived antibody gene feature vector after the training modification relative to the training antigen molecule gene feature vector as a training classification feature matrix;
a classification loss function value calculating unit 514, configured to pass the training classification feature matrix through the classifier to obtain a classification loss function value; a classification mode digestion inhibition loss function value calculation unit 515, configured to calculate a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene feature vector after the training modification and the training antigen molecule gene feature vector, where the classification mode digestion inhibition loss function value is related to a square of a two-norm of a difference feature vector between the rabbit-derived antibody gene feature vector after the training modification and the training antigen molecule gene feature vector; and a training unit 516 for training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function values and the classification mode digestion suppression loss function values as loss function values.
As shown in fig. 7, the inference module 520 includes: a physiological information acquisition unit 521 for acquiring a gene sequence of an antigen molecule in a human body; an antigen molecule global gene feature vector generating unit 522, configured to pass a trained converter-based context encoder through a gene sequence of the antigen molecule to obtain a plurality of gene expression feature vectors, and cascade the plurality of gene expression feature vectors to obtain an antigen molecule global gene feature vector; a multi-scale neighborhood feature extraction unit 523, configured to pass the gene sequence of the antigen molecule through a trained multi-scale neighborhood feature extraction module to obtain a multi-scale neighborhood feature vector of the antigen molecule; a cascading unit 524, configured to cascade the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector; a gene sequence acquisition unit 525 of the rabbit-derived antibody, which is used for acquiring the gene sequence of the modified rabbit-derived antibody; a modified rabbit-derived antibody gene feature extraction unit 526, configured to process the gene sequence of the modified rabbit-derived antibody through the converter-based context encoder and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector; a classification feature matrix generating unit 527, configured to calculate a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antigen molecule gene feature vector as a classification feature matrix; and a class probability value generating unit 528, configured to obtain a class probability value by using the trained classifier of the classification feature matrix, where the class probability value represents a degree of homology between the rabbit-derived antibody and an antigen molecule in a human body after modification.
In one example, in the antibody humanization system 500 based on the sequence codec described above, the classification loss function value calculating unit 514 includes: processing the training classification feature matrix using the classifier with the following formula to generate a training classification result, wherein the formula is:
softmax{(M c ,B c ) Project (F), where Project (F) represents projecting the training classification feature matrix as a vector, M c Weight matrix being a fully connected layer, B c A bias matrix representing a fully connected layer; and calculating the cross entropy value between the training classification result and the true value of the homology between the gene sequence of the training antigen molecule in the training data and the gene sequence of the rabbit-derived antibody after training transformation as the classification loss function value.
In one example, in the antibody humanization system 500 based on the sequence codec described above, the classification pattern digestion inhibition loss function value calculation unit 515 includes: calculating the classification mode digestion inhibition loss function values of the rabbit source antibody gene characteristic vector after the training transformation and the training antigen molecule gene characteristic vector according to the following formula;
wherein the formula is:
Figure BDA0003850065750000231
wherein V1 and V2 Respectively representing the training modified rabbit source antibody gene characteristic vector and the training antigen molecule gene characteristic vector, and M 1 and M2 Respectively representing the weight matrixes of the classifier for the training modified rabbit-derived antibody gene feature vector and the training antigen molecule gene feature vector,
Figure BDA0003850065750000232
represents the square of the two-norm of the vector, | - | F An F norm representing a matrix, exp (·) represents a matrix and an exponential operation of a vector, the exponential operation of the matrix representing a calculation of a natural exponent function value raised to an eigenvalue of each position in the matrix, the exponential operation of the vector representing a calculation of a natural exponent function value raised to an eigenvalue of each position in the vector.
In one example, in the antibody humanization system 500 based on sequence coding, the multi-scale neighborhood feature extraction unit 523 is further configured to: inputting the gene sequence of the antigen molecule into a first convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a first neighborhood scale antigen molecule characteristic vector, wherein the first convolution layer has a first one-dimensional convolution kernel with a first length; inputting the gene sequence of the antigen molecule into a second convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a second neighborhood scale antigen molecule characteristic vector, wherein the second convolution layer has a second one-dimensional convolution kernel with a second length, and the first length is different from the second length; and cascading the first neighborhood scale antigen molecule feature vector and the second neighborhood scale antigen molecule feature vector to obtain the multi-scale neighborhood antigen molecule feature vector.
In one example, in the antibody humanization system 500 based on the sequence codec, the classification feature matrix generating unit 527 includes: calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector by using the following formula as the classification characteristic matrix;
wherein the formula is:
Figure BDA0003850065750000233
wherein V1 Expressing the characteristic vector of the rabbit derived antibody gene after modification, V 2 Representing the antigenic molecule gene feature vector, M representing the classification feature matrix,
Figure BDA0003850065750000234
representing a matrix multiplication.
In summary, an antibody humanization system based on sequence coding and decoding is elucidated, and a gene sequence is regarded as a text sequence by adopting an artificial intelligence model based on natural semantic understanding, and a global implicit feature of the gene sequence and a multi-scale neighborhood correlation feature under different gene spans are fused to respectively represent a gene sequence of an antigen molecule in a human body and feature distribution information of the gene sequence of the modified rabbit-derived antibody. And evaluating the homology of the modified rabbit-derived antibody and the antigen molecules in the human body by utilizing the transfer matrix of the gene characteristics of the modified rabbit-derived antibody relative to the gene characteristics of the antigen molecules, and further checking the homology of the gene sequence of the modified rabbit-derived antibody and the gene sequence of the antibody molecules in the human body.

Claims (8)

1. A method for humanizing an antibody based on a sequence encoding/decoding, comprising:
obtaining a gene sequence of an antigen molecule in a human body;
obtaining a plurality of gene expression characteristic vectors by passing the gene sequence of the antigen molecule through a trained context encoder based on a converter, and cascading the gene expression characteristic vectors to obtain an antigen molecule global gene characteristic vector;
the gene sequence of the antigen molecule passes through a trained multi-scale neighborhood feature extraction module to obtain an antigen molecule multi-neighborhood scale feature vector;
cascading the antigen molecule global gene feature vector and the antigen molecule multi-neighborhood scale feature vector to obtain an antigen molecule gene feature vector;
obtaining a gene sequence of the modified rabbit source antibody;
processing the gene sequence of the modified rabbit-derived antibody through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a modified rabbit-derived antibody gene feature vector;
calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector as a classification characteristic matrix; and
and obtaining a class probability value by the trained classifier of the classification characteristic matrix, wherein the class probability value represents the degree of homology of the rabbit-derived antibody and antigen molecules in the human body after modification.
2. The antibody humanization method based on sequence coding and decoding according to claim 1, wherein the passing the gene sequence of the antigen molecule through a trained multi-scale neighborhood feature extraction module to obtain an antigen molecule multi-neighborhood feature vector comprises:
inputting the gene sequence of the antigen molecule into a first convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a first neighborhood scale antigen molecule characteristic vector, wherein the first convolution layer has a first one-dimensional convolution kernel with a first length;
inputting the gene sequence of the antigen molecule into a second convolution layer of the multi-scale neighborhood characteristic extraction module to obtain a second neighborhood scale antigen molecule characteristic vector, wherein the second convolution layer has a second one-dimensional convolution kernel with a second length, and the first length is different from the second length; and
cascading the first neighborhood scale antigen molecule feature vector and the second neighborhood scale antigen molecule feature vector to obtain the multi-scale neighborhood antigen molecule feature vector.
3. The method of claim 2, wherein inputting the gene sequence of the antigen molecule into the first convolution layer of the multi-scale neighborhood feature extraction module to obtain a first neighborhood scale antigen molecule feature vector comprises:
performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using the first convolution layer of the multi-scale neighborhood characteristic extraction module according to the following formula to obtain a first neighborhood scale antigen molecule characteristic vector;
wherein the formula is:
Figure FDA0003850065740000021
wherein a is the width of the first convolution kernel in the X direction, F (a) is a parameter vector of the first convolution kernel, G (X-a) is a local vector matrix operated with a convolution kernel function, w is the size of the first convolution kernel, and X represents the gene sequence of the antigen molecule.
4. The method of claim 3, wherein inputting the gene sequence of the antigen molecule into a second convolution layer of the multi-scale neighborhood feature extraction module to obtain a second neighborhood scale antigen molecule feature vector comprises:
performing one-dimensional convolution coding on the gene sequence of the antibody molecule by using a second convolution layer of the multi-scale neighborhood characteristic extraction module according to the following formula to obtain a second neighborhood scale antigen molecule characteristic vector;
wherein the formula is:
Figure FDA0003850065740000022
wherein b is the width of the second convolution kernel in the X direction, F (b) is a second convolution kernel parameter vector, G (X-b) is a local vector matrix operated with the convolution kernel function, m is the size of the second convolution kernel, and X represents the gene sequence of the antigen molecule.
5. The method of claim 4, wherein the step of calculating a transfer matrix of the modified rabbit-derived antibody gene feature vector relative to the antigen molecule gene feature vector as a classification feature matrix comprises:
calculating a transfer matrix of the modified rabbit source antibody gene characteristic vector relative to the antigen molecule gene characteristic vector by using the following formula as the classification characteristic matrix;
wherein the formula is:
Figure FDA0003850065740000024
wherein V1 Expressing the modified rabbit antibody gene feature vector, V 2 Representing the antigen molecule gene feature vector, M representing the classification feature matrix,
Figure FDA0003850065740000023
representing a matrix multiplication.
6. The sequence codec-based antibody humanization method according to claim 1, further comprising training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier;
the training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier includes:
acquiring training data, wherein the training data comprise a gene sequence of a training antigen molecule, a gene sequence of a training modified rabbit-derived antibody and a true value of the homology between the gene sequence of the training antigen molecule and the gene sequence of the training modified rabbit-derived antibody;
respectively enabling the gene sequence of the training antigen molecule and the gene sequence of the rabbit source antibody after training modification to pass through the context encoder based on the converter and the multi-scale neighborhood feature extraction module to obtain a training antigen molecule gene feature vector and a rabbit source antibody gene feature vector after training modification;
calculating a transfer matrix of the rabbit-derived antibody gene characteristic vector after training modification relative to the training antigen molecule gene characteristic vector as a training classification characteristic matrix;
passing the training classification feature matrix through the classifier to obtain a classification loss function value;
calculating a classification mode digestion inhibition loss function value of the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector, wherein the classification mode digestion inhibition loss function value is related to the square of a two-norm of a difference feature vector between the rabbit-derived antibody gene feature vector after the training transformation and the training antigen molecule feature vector; and
training the converter-based context encoder, the multi-scale neighborhood feature extraction module, and the classifier with a weighted sum of the classification loss function values and the classification mode digestion mitigation loss function values as loss function values.
7. The method of claim 6, wherein the calculating the values of the class-pattern digestion inhibition loss function of the training adapted rabbit-derived antibody gene feature vector and the training antigen molecule gene feature vector comprises:
calculating the classification mode digestion inhibition loss function values of the rabbit source antibody gene characteristic vector after the training transformation and the training antigen molecule gene characteristic vector according to the following formula;
wherein the formula is:
Figure FDA0003850065740000031
wherein V1 and V2 Respectively representing the training modified rabbit source antibody gene characteristic vector and the training antigen molecule gene characteristic vector, and M 1 and M2 Respectively representing the classifier for the training modified rabbit-derived antibody gene feature vector and the training antigen molecule geneA weight matrix of the feature vector is calculated,
Figure FDA0003850065740000032
represents the square of the two-norm of the vector, | - | F An F norm representing a matrix, exp (·) represents a matrix and an exponential operation of a vector, the exponential operation of the matrix representing a calculation of a natural exponent function value raised to an eigenvalue of each position in the matrix, the exponential operation of the vector representing a calculation of a natural exponent function value raised to an eigenvalue of each position in the vector.
8. The method of claim 7, wherein passing the training classification feature matrix through the classifier to obtain a classification loss function value comprises:
processing the training classification feature matrix using the classifier with a formula to generate a training classification result, wherein the formula is:
softmax{(M c ,B c ) L Project (F), where Project (F) represents the projection of the training classification feature matrix as a vector, M c Weight matrix being a fully connected layer, B c A bias matrix representing a fully connected layer; and
and calculating a cross entropy value between the training classification result and a true value of the homology between the gene sequence of the training antigen molecule in the training data and the gene sequence of the rabbit source antibody after training modification to serve as the classification loss function value.
CN202211128757.XA 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding Active CN115458048B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211128757.XA CN115458048B (en) 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211128757.XA CN115458048B (en) 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding

Publications (2)

Publication Number Publication Date
CN115458048A true CN115458048A (en) 2022-12-09
CN115458048B CN115458048B (en) 2023-05-26

Family

ID=84304404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211128757.XA Active CN115458048B (en) 2022-09-16 2022-09-16 Antibody humanization method based on sequence coding and decoding

Country Status (1)

Country Link
CN (1) CN115458048B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005057486A2 (en) * 2003-12-08 2005-06-23 Xencor, Inc. Protein engineering with analogous contact environments
US20120141486A1 (en) * 2010-12-06 2012-06-07 Dainippon Sumitomo Pharma Co., Ltd. Human monoclonal antibody
CN103145834A (en) * 2013-01-17 2013-06-12 广州泰诺迪生物科技有限公司 Antibody humanization transformation method
US20190065677A1 (en) * 2017-01-13 2019-02-28 Massachusetts Institute Of Technology Machine learning based antibody design
US20200087395A1 (en) * 2018-09-14 2020-03-19 Eli Lilly And Company Cd200r agonist antibodies and uses thereof
US20200342955A1 (en) * 2017-10-27 2020-10-29 Apostle, Inc. Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods
CN114664376A (en) * 2022-03-31 2022-06-24 重庆邮电大学 miRNA-mRNA target prediction method based on sequence statistical characterization learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005057486A2 (en) * 2003-12-08 2005-06-23 Xencor, Inc. Protein engineering with analogous contact environments
US20120141486A1 (en) * 2010-12-06 2012-06-07 Dainippon Sumitomo Pharma Co., Ltd. Human monoclonal antibody
CN103145834A (en) * 2013-01-17 2013-06-12 广州泰诺迪生物科技有限公司 Antibody humanization transformation method
US20190065677A1 (en) * 2017-01-13 2019-02-28 Massachusetts Institute Of Technology Machine learning based antibody design
US20200342955A1 (en) * 2017-10-27 2020-10-29 Apostle, Inc. Predicting cancer-related pathogenic impact of somatic mutations using deep learning-based methods
US20200087395A1 (en) * 2018-09-14 2020-03-19 Eli Lilly And Company Cd200r agonist antibodies and uses thereof
CN114664376A (en) * 2022-03-31 2022-06-24 重庆邮电大学 miRNA-mRNA target prediction method based on sequence statistical characterization learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YAGHOUB SAFDARI等: "Antibody humanization methods–a review and update" *
YI-FAN ZHANG等: "Humanization of rabbit monoclonal antibodies via grafting combined Kabat/IMGT/Paratome complementarity-determining regions: Rationale and examples" *
马威: "基于机器学习的虚拟筛选模型构建和PAR4蛋白的同源模建及结构验证" *

Also Published As

Publication number Publication date
CN115458048B (en) 2023-05-26

Similar Documents

Publication Publication Date Title
Prihoda et al. BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
CN110970099A (en) Medicine molecule generation method based on regularization variational automatic encoder
Bachas et al. Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness
CN113838523A (en) Antibody protein CDR region amino acid sequence prediction method and system
WO2023208204A1 (en) Attention mechanism-based antibody non-sequential prediction method and apparatus
CN114585918A (en) Mesoscale engineered peptides and methods of selection
CN114008713A (en) Information processing system, information processing method, program, and method for producing antigen-binding molecule or protein
Huang et al. A review of protein inter-residue distance prediction
CN115458048A (en) Antibody humanization method based on sequence encoding and decoding
JP6484612B2 (en) Obtaining improved therapeutic ligands
CN112365919A (en) Antibody calculation optimization method based on genetic algorithm
Fei et al. LTPConstraint: a transfer learning based end-to-end method for RNA secondary structure prediction
CN116189776A (en) Antibody structure generation method based on deep learning
JP2022538378A (en) Computer-implemented method for optimizing physical/chemical properties of biological sequences
Castro et al. Guided generative protein design using regularized transformers
CN116312752A (en) Rigid body protein butt joint method based on isomorphism map neural network
CN114360636A (en) Antibody sequence structure collaborative design method based on flow model
KR20230121880A (en) Prediction of complete protein expression from masked protein expression
Zou et al. Antibody Humanization via Protein Language Model and Neighbor Retrieval
WO2023170844A1 (en) Method for producing library by machine learning
Honda et al. Cross attentive antibody-antigen interaction prediction with multi-task learning
Minot Data efficient machine learning-guided protein engineering
Li et al. Machine Learning Optimization of Candidate Antibodies Yields Highly Diverse Sub-nanomolar Affinity Antibody Libraries
WO2024051806A1 (en) Method for designing humanized antibody sequence
WO2024122449A1 (en) Antibody design method through machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant