WO2024088381A1 - 人源化抗体序列评估模型的构建方法及其应用 - Google Patents

人源化抗体序列评估模型的构建方法及其应用 Download PDF

Info

Publication number
WO2024088381A1
WO2024088381A1 PCT/CN2023/127070 CN2023127070W WO2024088381A1 WO 2024088381 A1 WO2024088381 A1 WO 2024088381A1 CN 2023127070 W CN2023127070 W CN 2023127070W WO 2024088381 A1 WO2024088381 A1 WO 2024088381A1
Authority
WO
WIPO (PCT)
Prior art keywords
humanized antibody
sequence
antibody
sequences
heavy chain
Prior art date
Application number
PCT/CN2023/127070
Other languages
English (en)
French (fr)
Inventor
郝小虎
樊隆
Original Assignee
南京金斯瑞生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京金斯瑞生物科技有限公司 filed Critical 南京金斯瑞生物科技有限公司
Publication of WO2024088381A1 publication Critical patent/WO2024088381A1/zh

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search

Definitions

  • the present application relates to the field of biomedicine technology, and specifically to a method for constructing a humanized antibody sequence evaluation model and its application.
  • Monoclonal antibody drugs have the advantages of strong specificity, few side effects and significant efficacy in the treatment of tumors, infectious diseases, autoimmune diseases and other diseases.
  • cell fusion and hybridoma technology are the most reliable methods for preparing monoclonal antibodies.
  • the animals commonly used for immunization are mice, rats, sheep, rabbits and other animals.
  • the monoclonal antibodies obtained by this method are of animal origin.
  • humanization must be carried out to reduce the human anti-animal antibody reaction caused by these heterologous antibodies. At the same time, it can also more effectively activate the human immune system, reduce the clearance rate of antibody drugs, and prolong the half-life.
  • CDR transplanted modified antibodies and phage display humanized antibodies are considered to be the most humanized antibodies, but the humanized antibodies modified by these two methods often have a serious decrease in affinity or even disappear completely.
  • Surface remodeled humanized antibodies are obtained based on the three-dimensional structure analysis of antibody molecules. Compared with CDR transplanted antibodies, they have certain advantages in maintaining affinity, but there are still a large number of mouse-derived amino acids in the antibodies, and their affinity is maintained at the expense of a certain degree of humanization.
  • a method for constructing a humanized antibody sequence evaluation model comprising: obtaining multiple human antibody template amino acid sequences and numbering them; calculating the entropy value at each numbered position; and constructing a humanized antibody sequence evaluation model based on the entropy value.
  • the entropy value is determined by a position-specific scoring matrix, wherein the method further comprises: determining a multiple sequence alignment result of the multiple human antibody template amino acid sequences according to the longest sequence after numbering, wherein vacant positions are filled by inserting symbols; and constructing the position-specific scoring matrix of the multiple human antibody template amino acid sequences based on the multiple sequence alignment result.
  • the entropy value is calculated by the following formula: Wherein, n is the total number of all amino acid types and inserted symbols that appear at a certain numbered position, with a maximum of 21, i is the index to n, and pi is the probability of occurrence of the i-th amino acid obtained by the position-specific scoring matrix at that position.
  • constructing a humanized antibody sequence evaluation model includes: determining a weight at each numbered position, wherein the weight is negatively correlated with the entropy value.
  • the weight is calculated by the following formula: Among them, w pos is the weight of a certain number position, e pos is the entropy value of the position, and N is the longest sequence number annotation length.
  • the humanized antibody sequence evaluation model is represented by the following formula: Among them, Score target represents the output evaluation value, w pos is the weight of a certain numbered position in the humanized antibody sequence, and p aa represents the probability of occurrence of the amino acid at that position.
  • the humanized antibody sequence evaluation model is used to evaluate humanized sequences of antibodies from all species.
  • a method for obtaining a candidate humanized antibody sequence by evaluation comprising: numbering the humanized antibody sequence to be evaluated; determining the weight and the occurrence probability of the amino acid at each numbered position in the humanized antibody sequence to be evaluated; based on the weight and the occurrence probability of the amino acid, using the humanized antibody sequence evaluation model constructed by the above method, evaluating the humanized antibody sequence to be evaluated to obtain an evaluation value; if the evaluation value meets a preset condition, determining the humanized antibody sequence to be evaluated corresponding to the evaluation value as the candidate humanized antibody sequence.
  • the preset condition is that the evaluation value is greater than a preset threshold or the evaluation values are sorted and ranked greater than a certain value.
  • the method further includes determining the target humanized antibody sequence by: predicting the antibody structure of the candidate humanized antibody sequence; simulating the binding of the candidate antibody structure with the corresponding antigen to obtain the candidate antibody structure; selecting the candidate antibody structure for biological experimental verification; and determining the target humanized antibody sequence based on the biological experimental verification results.
  • a method for humanizing a monoclonal antibody comprising: respectively determining a humanized antibody light chain template sequence and a humanized antibody heavy chain template sequence of the monoclonal antibody; respectively replacing the CDR regions of the light chain and heavy chain of the monoclonal antibody with the CDR regions corresponding to the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence to obtain a humanized antibody light chain template sequence and a humanized antibody heavy chain template sequence after CDR region replacement; performing an E-F loop treatment on the humanized antibody light chain template sequence after CDR region replacement to obtain a plurality of candidate humanized antibody light chain sequences; performing a D-E loop treatment on the humanized antibody heavy chain template sequence after CDR region replacement to obtain a plurality of candidate humanized antibody heavy chain sequences.
  • performing E-F loop processing on the humanized antibody light chain template sequence after the CDR region is replaced to obtain multiple candidate humanized antibody light chain sequences includes: searching the humanized antibody light chain template sequence after the CDR region is replaced in a database to obtain a multiple sequence alignment result; constructing a position-specific scoring matrix for the E-F loop according to the multiple sequence alignment result; generating multiple E-F loop sequences according to the position-specific scoring matrix for the E-F loop; and replacing the multiple E-F loop sequences into the humanized antibody light chain template sequence after the CDR region is replaced to obtain the multiple candidate humanized antibody light chain sequences.
  • performing D-E loop processing on the humanized antibody heavy chain template sequence after the CDR region is replaced to obtain multiple candidate humanized antibody heavy chain sequences includes: searching the humanized antibody heavy chain template sequence after the CDR region is replaced in a database to obtain a multiple sequence alignment result; constructing a position-specific scoring matrix for the D-E loop according to the multiple sequence alignment result; generating multiple D-E loop sequences according to the position-specific scoring matrix for the D-E loop; and replacing the multiple D-E loop sequences into the humanized antibody heavy chain template sequence after the CDR region is replaced to obtain the multiple candidate humanized antibody heavy chain sequences.
  • the method further comprises: performing back mutations on highly conserved sites in the plurality of candidate humanized antibody light chain sequences and the plurality of candidate humanized antibody heavy chain sequences, respectively.
  • back-mutating the highly conserved sites in the multiple candidate humanized antibody light chain sequences and the multiple candidate humanized antibody heavy chain sequences comprises: respectively determining the highly conserved sites in the light chain and heavy chain sequences of the monoclonal antibody; judging whether the amino acids of the multiple candidate humanized antibody light chain sequences and the multiple candidate humanized antibody heavy chain sequences are respectively consistent with the amino acids of the light chain and heavy chain of the monoclonal antibody at the highly conserved sites; if not, replacing the amino acids of the multiple candidate humanized antibody light chain sequences and the multiple candidate humanized antibody heavy chain sequences at the highly conserved sites back to the amino acids at the corresponding positions on the light chain and heavy chain of the monoclonal antibody.
  • the monoclonal antibody is a rabbit monoclonal antibody
  • the method further comprises: predicting the three-dimensional structure of the light chain and the heavy chain of the rabbit monoclonal antibody; according to the predicted three-dimensional structure, if there is a cysteine pair in the rabbit monoclonal antibody and the distance between the cysteine pairs is between 4 angstroms and 7 angstroms, there is no amino acid at the corresponding position on the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence after the CDR region is replaced, and serine is inserted at the position, and the position does not include the CDR region; and if there is only a single cysteine in the rabbit monoclonal antibody, there is no amino acid at the corresponding position on the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence after the CDR region is replaced, and serine is inserted at the position.
  • the method further comprises: using the humanized antibody sequence evaluation model constructed by the above method to respectively evaluate the light chain sequence and heavy chain sequence of the candidate humanized antibody to obtain the light chain sequence and heavy chain sequence of the humanized antibody.
  • the method further comprises: performing biological experiment verification on the humanized antibody to determine the target humanized antibody sequence.
  • FIG1 is a flow chart of a method for humanizing a monoclonal antibody according to some embodiments of the present application.
  • FIG2 is a flowchart of a method for constructing a humanized antibody sequence evaluation model according to some embodiments of the present application.
  • FIG3 is a flowchart of a method for obtaining candidate humanized antibody sequences by evaluation according to some embodiments of the present application.
  • FIG4( a ) is the classification result of the humanized antibody sequence evaluation model for distinguishing between human and mouse antibody sequences and the ROC curve (heavy chain).
  • FIG4( b ) is the classification result of the humanized antibody sequence evaluation model for distinguishing between human and mouse antibody sequences and the ROC curve (light chain).
  • FIG5( a ) is the classification result of the humanized antibody sequence evaluation model for distinguishing between antibody sequences of human origin and those of other species, as well as the ROC curve (heavy chain).
  • FIG5( b ) is the classification result of the humanized antibody sequence evaluation model for distinguishing between antibody sequences of human origin and those of other species, as well as the ROC curve (light chain).
  • FIG. 6 shows the results of ELISA binding experiments of the humanized antibody cloned from 81E11.
  • the present application provides a method for humanizing a monoclonal antibody, wherein the light and heavy chains of a monoclonal antibody from any animal source are analyzed, designed and processed based on important sites and positions of the sequence to obtain a humanized sequence.
  • Fig. 1 is a flowchart of a method for humanizing a monoclonal antibody according to some embodiments of the present application. As shown in Fig. 1, the method comprises the following steps. The method is performed by a first computing device.
  • the computing device may include a processing device (or processor), a memory, an input/output interface, and a communication port.
  • the processing device may execute a computing instruction (program code) and perform the method steps described in the present application.
  • the computing instruction may include a program, an object, a component, a data structure, a process, a module, and a function (the function refers to a specific function described in the present application).
  • the processing device may include a microcontroller, a microprocessor, a reduced instruction set computer (RISC), an application specific integrated circuit (ASIC), an application specific instruction set processor (ASIP), a central processing unit (CPU), a graphics processing unit (GPU), a physical processing unit (PPU), a microcontroller unit, a digital signal processor (DSP), a field programmable gate array (FPGA), an advanced RISC machine (ARM), a programmable logic device, and any circuit and processor capable of performing one or more functions, or any combination thereof.
  • RISC reduced instruction set computer
  • ASIC application specific integrated circuit
  • ASIP application specific instruction set processor
  • CPU central processing unit
  • GPU graphics processing unit
  • PPU physical processing unit
  • microcontroller unit a digital signal processor
  • FPGA field programmable gate array
  • ARM advanced RISC machine
  • Step 101 respectively determining the human antibody light chain template sequence and the human antibody heavy chain template sequence of the monoclonal antibody.
  • the monoclonal antibody may be a monoclonal antibody derived from any animal.
  • the monoclonal antibody may be a mouse monoclonal antibody, a rabbit monoclonal antibody, etc.
  • a database e.g., Observed Antibody Space database (OAS)
  • OAS Observed Antibody Space database
  • sequence similarity may be calculated to obtain the human antibody light chain and heavy chain template sequences.
  • the variable region amino acid sequences of the light chain and heavy chain of the monoclonal antibody may be used as input, and the OAS database may be searched using the BLAST tool to obtain the amino acid sequences of the human antibody light chain and heavy chain templates, respectively.
  • the human antibody light chain template sequence or heavy chain template sequence may include one or more.
  • the human antibody light chain template sequence or heavy chain template sequence may be a sequence with the highest similarity to the amino acid sequence of the monoclonal antibody light chain or heavy chain.
  • the human antibody light chain template sequence or heavy chain template sequence may be a sequence whose amino acid sequence similarity to the monoclonal antibody light chain or heavy chain meets certain conditions (e.g., greater than 98%, 95%, 90%, etc.).
  • Step 102 respectively replace the CDR regions of the light chain and heavy chain of the monoclonal antibody with the CDR regions corresponding to the human antibody light chain template sequence and the human antibody heavy chain template sequence to obtain a humanized antibody light chain template sequence and a humanized antibody heavy chain template sequence after CDR region replacement.
  • the numbering system can be used to number the monoclonal antibody light chain sequence and the heavy chain sequence, as well as the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence, to determine the CDR, framework region, residues affecting antibody-antigen binding affinity and antibody specificity, etc. of each sequence, so that the monoclonal antibody sequence and the humanized antibody template sequence can be replaced with sequences or sites in the CDR region and the non-CDR region.
  • the sequence or site of the monoclonal antibody sequence and the humanized antibody template sequence can be replaced with the sequence or site of the CDR region and the non-CDR region.
  • the ANARCI unified numbering system uniformly numbers these sequences.
  • the Chothia numbering scheme can be used for numbering.
  • Kabat, Martin, Gelfand, IMGT or Honneger's numbering schemes can also be used.
  • the variable regions of each antibody sequence may be numbered. These sequences are aligned to the same coordinate system, and vacant positions can be filled with symbols, for example, "-", "*", etc.
  • the light chain sequence and heavy chain sequence of the monoclonal antibody and the CDR region of the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence can be marked, and then the CDR region of the light chain sequence and the heavy chain sequence of the monoclonal antibody are replaced with the CDR region of the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence, respectively, to obtain the humanized antibody light chain template sequence and the humanized antibody heavy chain template sequence after the CDR region is replaced.
  • Step 103 performing E-F loop processing on the humanized antibody light chain template sequence after the CDR region is replaced to obtain multiple candidate humanized antibody light chain sequences.
  • the E-F loop is a ring structure located in the non-binding region of the antibody light chain, which can support the entire antibody.
  • the E-F loop treatment of the humanized antibody light chain template sequence after the CDR region is replaced refers to replacing the E-F loop in the humanized antibody light chain template sequence after the CDR region is replaced with an additionally generated E-F loop sequence. It is specifically achieved in the following manner.
  • the human antibody light chain template sequence after the CDR region is replaced is searched in a database (e.g., UniRef90 database, UniRef 100 database, etc.) to obtain a multiple sequence alignment (MSA) result.
  • a database e.g., UniRef90 database, UniRef 100 database, etc.
  • the human antibody light chain template sequence can also be searched in a database to obtain a multiple sequence alignment result.
  • a position-specific scoring matrix (Position Specific Scoring Matrix, PSSM) of the E-F loop is constructed.
  • the PSSM of the E-F loop represents the possibility of each amino acid appearing at each position of the E-F loop.
  • the number of rows of the PSSM is 20, corresponding to 20 different amino acids, and the number of columns is the sequence length of the E-F loop.
  • multiple E-F loop sequences can be generated according to the probability of occurrence of the amino acid corresponding to each position.
  • multiple E-F loop sequences can be randomly generated according to the position-specific scoring matrix.
  • the multiple E-F loop sequences are replaced into the humanized antibody light chain template sequence after the CDR region is replaced, so as to obtain the multiple candidate humanized antibody light chain sequences.
  • the probability of the amino acids appearing at each position can also be determined by other means, for example, a hidden Markov model.
  • Step 104 performing D-E loop processing on the humanized antibody heavy chain template sequence after the CDR region is replaced to obtain multiple candidate humanized antibody heavy chain sequences.
  • the D-E loop is a ring structure located in the non-binding region of the heavy chain of the antibody, which can support the entire antibody.
  • Performing D-E loop processing on the humanized antibody heavy chain template sequence after the CDR region is replaced refers to replacing the D-E loop in the humanized antibody heavy chain template sequence after the CDR region is replaced using an additionally generated D-E loop sequence.
  • Its implementation is similar to step 103.
  • the humanized antibody heavy chain template sequence after the CDR region is replaced is searched in a database (e.g., UniRef90 database) to obtain a multiple sequence alignment result; it should be noted that the humanized antibody heavy chain template sequence can also be searched in a database to obtain a multiple sequence alignment result.
  • a database e.g., UniRef90 database
  • a position-specific scoring matrix of the D-E loop is constructed; according to the position-specific scoring matrix of the D-E loop, a plurality of D-E loop sequences are generated; the plurality of D-E loop sequences are replaced in the humanized antibody heavy chain template sequence after the CDR region is replaced to obtain the plurality of candidate humanized antibody heavy chain sequences.
  • steps 103 and 104 multiple E-F loop and D-E loop sequences are obtained.
  • candidate humanized antibody light chain and heavy chain sequences with a higher degree of humanization can be obtained, which can better maintain the antibody structure and provide more candidates for obtaining the target humanized antibody light chain and heavy chain sequences.
  • Step 105 back-mutating the highly conserved sites in the plurality of candidate humanized antibody light chain sequences and the plurality of candidate humanized antibody heavy chain sequences respectively.
  • Highly conserved sites refer to sites that are relatively important for maintaining the structure of monoclonal antibodies. If these sites are inconsistent with the amino acids on the light and heavy chain sequences of the candidate humanized antibodies on the monoclonal antibodies, it may cause changes in the antibody structure of these candidate humanized antibodies, resulting in a decrease or even disappearance of the antibody affinity. Therefore, it is necessary to keep them consistent with the corresponding sites of the monoclonal antibody.
  • the highly conserved sites are determined in a manner similar to the above-mentioned E-F loop sequence, and can be determined by the PSSM of the light and heavy chains of the monoclonal antibody.
  • the PSSM can be obtained by inputting the light and heavy chains of the monoclonal antibody into a protein database and searching for them.
  • the method may further include performing cysteine treatment on the rabbit monoclonal antibody. Since the variable region of the rabbit monoclonal antibody and the constant region of the human antibody are chimeric, the cysteine at position 80 of the rabbit antibody light chain will form a free disulfide bond, resulting in a polymerization reaction between the antibody light chains, rendering the antibody ineffective. Therefore, it is necessary to treat the cysteine at the corresponding position on the template sequence to avoid antibody failure.
  • the three-dimensional structure of the light chain and heavy chain of the rabbit monoclonal antibody is predicted; according to the predicted three-dimensional structure, if there is a cysteine pair in the rabbit monoclonal antibody, and the distance between the cysteine pairs is between 4 angstroms and 10 angstroms, then the cysteine pair is 20 angstroms higher than the original cysteine pair.
  • steps 103 and 104 can be combined into one step or replaced in sequence.
  • step 103 can be located after step 104.
  • the present application also provides a method for constructing a humanized antibody sequence evaluation model.
  • the above method can be used to humanize monoclonal antibodies of animal origin to obtain multiple candidate humanized antibodies.
  • a humanized antibody sequence evaluation model can then be constructed based on the method described below, through which multiple candidate humanized antibodies can be evaluated and screened to screen out effective target humanized antibodies.
  • Fig. 2 is a flow chart of a method for constructing a humanized antibody sequence evaluation model according to some embodiments of the present application. As shown in Fig. 2, the method comprises the following steps. The method is performed by a second device.
  • Step 201 obtain multiple human antibody template amino acid sequences and number them.
  • the human antibody template amino acid sequence can be obtained from a database of human antibodies.
  • the database may be an OAS database.
  • the human antibody template amino acid sequence may be all or part of a sequence in a database.
  • a numbering system may be used to number the human antibody template amino acid sequence.
  • the numbering method may be the same as the numbering method in step 102.
  • the human antibody template amino acid sequence may be a variable region sequence or a complete antibody sequence obtained from a database. The variable region sequence is used to illustrate the method for constructing the model.
  • Step 202 calculating the entropy value at each numbered position.
  • the entropy value is determined by a position-specific scoring matrix.
  • the method may further include constructing a position-specific scoring matrix for the multiple human antibody template amino acid sequences.
  • constructing the position-specific scoring matrix for the multiple human antibody template amino acid sequences includes: determining the multiple sequence alignment results of the multiple human antibody template amino acid sequences according to the longest sequence after numbering, wherein the vacant positions are filled with symbols; based on the multiple sequence alignment results, constructing the position-specific scoring matrix for the multiple human antibody template amino acid sequences. Constructing the position-specific scoring matrix for the human antibody template amino acid sequences is similar to constructing the position-specific scoring matrix for the E-F loop in step 103 of Figure 1, and will not be repeated here.
  • the position-specific scoring matrix represents the probability score of each amino acid at each numbered position. Based on the position-specific scoring matrix, the probability of each amino acid appearing can be determined, thereby determining the entropy value at each numbered position.
  • the entropy value is used to represent the stability of the amino acid at each numbered position. The higher the entropy value, the less stable the amino acid at that position, the greater the change in the amino acid, and the less conservative it is.
  • the entropy value is calculated by the following formula (1):
  • n is the total number of all amino acid types and inserted symbols that appear at a certain numbered position, with a maximum of 21, i is the index to n, and pi is the probability of the i-th amino acid appearing at that position, which can be obtained from the position-specific scoring matrix.
  • the entropy value is determined by a hidden Markov model.
  • the probability of occurrence of an amino acid at each position can be determined by a hidden Markov model, and the entropy value is determined based on the probability of occurrence.
  • Step 203 construct a humanized antibody sequence evaluation model based on the entropy value.
  • constructing a humanized antibody sequence evaluation model includes: determining a weight at each numbered position.
  • the weight is negatively correlated with the entropy value. The larger the entropy value, the more unstable the amino acid at the position, and the lower the importance. Therefore, the weight is reversely assigned to it, and the corresponding weight is also lower.
  • the weight is calculated by the following formula (2):
  • w pos is the weight of a certain number position
  • e pos is the entropy value of the position, determined by formula (1)
  • N is the longest sequence number annotation length.
  • the humanized antibody sequence evaluation model is represented by the following formula (3):
  • Score target represents the evaluation value of the output
  • w pos is the weight of a certain numbered position of the humanized antibody sequence
  • p aa represents the probability of occurrence of the amino acid at the position, which is determined according to the position-specific scoring matrix.
  • the humanized antibody sequence evaluation model is used to evaluate the sequences of humanized antibodies of all species. The specific evaluation method can refer to Figure 3 below and its description.
  • the humanized antibody sequence evaluation model can effectively distinguish the light chains of human antibody sequences and mouse antibody sequences, and the corresponding AUC value is 1.00; the humanized antibody sequence evaluation model can distinguish the light chains of human antibody sequences and mouse antibody sequences.
  • the humanized antibody sequence evaluation model can distinguish the light chains of human antibody sequences and antibody sequences from other species, with a corresponding AUC value of 0.89; the humanized antibody sequence evaluation model can distinguish the heavy chains of human antibody sequences and antibody sequences from other species, with a corresponding AUC value of 0.94.
  • the present application also provides a method for obtaining candidate humanized antibody sequences by evaluation.
  • the method is based on the humanized antibody sequence evaluation model constructed above, evaluates the humanized antibody sequence to be evaluated, and determines the target humanized antibody sequence.
  • Fig. 3 is a flow chart of a method for obtaining candidate humanized antibody sequences by evaluation according to some embodiments of the present application. As shown in Fig. 3, the method mainly includes the following steps. The method is mainly performed by a third device.
  • Step 301 numbering the humanized antibody sequences to be evaluated.
  • the humanized antibody sequence to be evaluated can be numbered according to the numbering method mentioned above. In some embodiments, the humanized antibody sequence to be evaluated can be determined based on the method shown in FIG. 1 .
  • Step 302 determining the weight and occurrence probability of the amino acid at each numbered position in the humanized antibody sequence to be evaluated.
  • the entropy value is determined according to a position-specific scoring matrix of multiple human antibody template sequences.
  • the entropy value and weight are determined by the above formulas (1) and (2), respectively.
  • Step 303 based on the weight and the probability of occurrence of the amino acid, the humanized antibody sequence evaluation model constructed by the method described above is used to evaluate the humanized antibody sequence to be evaluated and obtain an evaluation value. Specifically, according to the position-specific scoring matrix of multiple human antibody template sequences, the probability of occurrence of the amino acid at each numbered position in each humanized antibody sequence to be evaluated is determined. The weight and the probability of occurrence of the amino acid are input into the humanized antibody sequence evaluation model, that is, formula (3), and the model can output the evaluation value of each humanized antibody sequence to be evaluated.
  • the humanized antibody sequence evaluation model that is, formula (3)
  • Step 304 if the evaluation value meets the preset condition, the humanized antibody sequence to be evaluated corresponding to the evaluation value is determined as a candidate humanized antibody sequence.
  • the preset condition may be that the evaluation value is greater than a preset threshold value (e.g., 0.5, 0.6, 0.7, etc.). For example, a humanized antibody sequence to be evaluated with an evaluation value greater than 0.6 is determined as a candidate humanized antibody sequence.
  • the evaluation value can be sorted, for example, in positive order, and the preset condition may be that the ranking is greater than a certain value (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, etc.), for example, the humanized antibody sequence to be evaluated corresponding to the top 15 evaluation values is determined as a candidate humanized antibody sequence.
  • a certain value e.g. 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, etc.
  • the evaluation method may also include determining the target humanized antibody sequence by the following methods: predicting the antibody structure of the candidate humanized antibody sequence; simulating the binding of the candidate antibody structure to the corresponding antigen to obtain the candidate antibody structure; selecting the candidate antibody structure for biological experimental verification; and determining the target humanized antibody sequence based on the biological experimental verification results.
  • the light and heavy chains of the candidate humanized antibody sequence are paired in pairs, and the paired candidate antibody structure is predicted using antibody structure prediction software.
  • the antigen structure is predicted using antigen structure prediction software.
  • the candidate antibody structure and antigen structure are simulated using computational simulation, and the candidate antibody structure with better simulation effect is selected for subsequent biological experimental verification, and the affinity is detected to determine the target humanized antibody sequence.
  • the candidate humanized antibody sequence is gene synthesized and constructed into a eukaryotic expression vector containing the conserved regions of the light chain and heavy chain, so that it can express the complete antibody molecule. Then the light chain and heavy chain vectors are co-transfected into 293 or CHO cells for transient expression, and the cell culture supernatant after transfection is collected. ELISA is used to verify the binding of the candidate antibody to the antigen, and the positive antibodies are further purified using Protein A; then the antibody affinity is tested using ELISA, FACS or SPR and compared with the parent antibody to obtain the affinity result.
  • the first computing device, the second computing device and the third computing device can be independent computing devices, or they can be combined into one computing device.
  • the method for humanizing a monoclonal antibody of the present application can effectively obtain a humanized monoclonal antibody; according to the constructed entropy-based humanized antibody sequence evaluation model, the humanized monoclonal antibody can be evaluated, thereby screening out effective humanized antibodies.
  • the human antibody sequences in the OAS database were used for model construction and test evaluation.
  • the OAS database contains more than 2.5 billion antibody sequence data from multiple species including humans, mice, monkeys, etc. 5,000,000 human antibody sequences were randomly selected from the OAS database according to The humanized antibody sequence evaluation model was constructed as follows.
  • MSA multiple sequence alignment
  • PSSM position-specific scoring matrix
  • the entropy value is calculated for each position in the numbered annotation.
  • the entropy value calculation formula is: Where n is the total number of all amino acid types and “-” insertions at a certain numbered position, with a maximum of 21, i is the index to n, and pi is the probability of occurrence of the i-th amino acid obtained by PSSM at that position.
  • the calculated entropy value of each position is normalized, and each position is weighted in reverse according to the normalized entropy value, that is, the position with the highest entropy value is given the smallest weight, and the position with the lowest entropy value is given the highest weight.
  • the weight calculation formula is: Among them, w pos is the weight of a certain number position, e pos is the entropy value of the position, and N is the longest sequence number annotation length.
  • a humanized antibody sequence evaluation model was constructed, and the model was expressed by the following formula: Among them, Score target represents the output evaluation value, w pos is the weight of a certain numbered position in the humanized antibody sequence, and p aa represents the occurrence probability of the amino acid at the position, which is obtained through PSSM.
  • the humanized antibody sequence encoding method comprises the following steps:
  • the humanized antibody sequence is annotated with a number, and the length of the number annotation is N.
  • the occurrence probability of the amino acid at each position in the humanized antibody sequence is queried.
  • the humanized antibody sequence is scored according to the formula: w pos p aa (ie: weight ⁇ probability).
  • the humanization evaluation model constructed in this application can distinguish the light chain and heavy chain sequences of human/mouse antibodies, and the AUC values corresponding to the light and heavy chains are 1.00 and 0.92 respectively; it can distinguish the light chain and heavy chain sequences of antibodies from human/other species, and the AUC values corresponding to the light and heavy chains are 0.89 and 0.94 respectively.
  • the test results show that the humanized antibody sequence evaluation model can effectively distinguish between human antibodies and mouse antibodies, human antibodies and antibodies from other species, and is effective in the application of evaluating humanized antibody sequences.
  • the 4-1BB agonist antibody (TNFRSF9 protein) was used as a specific example to evaluate the rabbit monoclonal antibody humanized antibody candidate sequences.
  • the specific content is shown in the following examples.
  • TNFRSF9 protein As the antigen, prepare rabbit monoclonal antibody by ordering MonoRab TM rabbit monoclonal antibody customization service (https://www.genscript.com.cn/custom-rabbit-monoclonal-antibody-generation.html) from Nanjing GenScript Biotechnology Co., Ltd.
  • MonoRab TM rabbit monoclonal antibody customization service https://www.genscript.com.cn/custom-rabbit-monoclonal-antibody-generation.html
  • step A the complete variable region sequence of the rabbit monoclonal antibody light chain obtained in step A (SEQ ID NO: 2) as input, the OAS (Observed Antibody Space) database was searched using the BLAST tool to obtain the human antibody sequence with the highest sequence similarity as the template sequence of the light chain; the template sequence of the human antibody light chain is:
  • step A Using the complete variable region sequence of the rabbit monoclonal antibody heavy chain obtained in step A (SEQ ID NO: 3) as input, the BLAST tool was used to search the (Observed Antibody Space, OAS) database to obtain the human antibody sequence with the highest sequence similarity as the template sequence of the heavy chain; the template sequence of the human antibody heavy chain is:
  • ANARCI unified numbering system specifically the Chothia numbering scheme to number the rabbit monoclonal antibody light chain complete variable region sequence (SEQ ID NO: 2), the rabbit monoclonal antibody heavy chain complete variable region sequence (SEQ ID NO: 3), the human antibody light chain template obtained in step 1 (SEQ ID NO: 4), and the human antibody heavy chain template (SEQ ID NO: 5).
  • the exemplary numbered sequences are shown below.
  • step 2 mark the CDR regions of the rabbit monoclonal antibody light chain, heavy chain, human antibody light chain template and heavy chain template obtained in step 1 respectively.
  • the sequences of these CDR regions are:
  • Rabbit monoclonal antibody heavy chain CDR-H2 NYDGT (SEQ ID NO: 7);
  • Rabbit monoclonal antibody light chain CDR-L1 SQSVDNNNY (SEQ ID NO: 9);
  • Rabbit monoclonal antibody light chain CDR-L2 SAS (SEQ ID NO: 10);
  • Human antibody heavy chain template CDR-H1 GGSIDTY (SEQ ID NO: 12);
  • Human antibody heavy chain template CDR-H2 Y (SEQ ID NO: 13);
  • Human antibody heavy chain template CDR-H3 PNRAAAGAFD (SEQ ID NO: 14);
  • Human antibody light chain template CDR-L1 SQSISTY (SEQ ID NO: 15);
  • Human antibody light chain template CDR-L3 YNSFEL (SEQ ID NO: 17).
  • the bold part indicates the replaced CDR region.
  • Cysteine CYS treatment comprising the following steps:
  • a single CYS is replaced. If a single CYS exists in the rabbit monoclonal antibody sequence, and if there is no amino acid at the corresponding position of the human antibody template after replacement, Ser is inserted; the light and heavy chains after CYS replacement are as follows. In this embodiment, the light and heavy chain sequences are not replaced:
  • Light chain E-F loop processing including the following steps:
  • PSSM Position Specific Scoring Matrix
  • EF loop candidate sequences are randomly generated according to the probability of occurrence of the amino acid corresponding to each position; the EF loop candidate sequences are (taking 10 sequences as an example):
  • Heavy chain D-E loop processing including the following steps:
  • step 6.1 Based on the MSA obtained in step 6.1, construct the PSSM of the heavy chain D-E loop.
  • DE loop candidate sequences are randomly generated according to the probability of occurrence of the amino acid corresponding to each position; the DE loop candidate sequences are (taking 10 as an example):
  • Reverse mutation comprising the following steps:
  • step 7.1 From the PSSM obtained in step 7.1, highly conserved sites in the light and heavy chain variable regions were obtained; these sites are:
  • Heavy chain 13P, 17L, 19L, 21C.
  • Light chain 1A, 5T, 6Q, 16G, 23C.
  • step 7.3 Determine whether the amino acids at the corresponding positions in the candidate sequences obtained in step 5.4 and step 6.4 are consistent with the parent rabbit monoclonal antibody sequence at the sites obtained in step 7.2. If not, replace the amino acids at the corresponding positions in the parent rabbit monoclonal antibody sequence.
  • the light and heavy chain variable region sequences are (taking one candidate antibody as an example):
  • the underlined part indicates the amino acid replaced by the back mutation.
  • the first site of the light chain underwent a back mutation, and the heavy chain was not modified.
  • Example 4 Using the humanized antibody sequence evaluation model to evaluate candidate sequences
  • the evaluation of candidate sequences using the humanized antibody sequence evaluation model includes the following steps:
  • Example 3 All candidate humanized antibody sequences to be evaluated finally obtained in Example 3 were sequence encoded according to the humanized antibody sequence encoding method in Example 2.
  • All scored candidate humanized antibody sequences are sorted from high to low according to the score target .
  • step 1.5 Perform biological validation on the three antibodies in step 1.5, named as: 81E11H3L1, 81E11H1L1, 81E11H4L1, specifically including the following steps:
  • Signal peptides are added to the N-termini of the heavy chain variable region and the light chain variable region of the candidate antibody.
  • the signal peptide sequences are:
  • the constant region sequence is added to the C-terminus of the heavy chain variable region to form a complete heavy chain.
  • the constant region sequence is:
  • the constant region sequence is added to the C-terminus of the light chain variable region to form a complete light chain.
  • the constant region sequence is:
  • the experimental methods in the following examples are conventional methods unless otherwise specified.
  • the experimental materials used in the following examples are purchased from conventional biochemical reagent companies unless otherwise specified.
  • the quantitative tests in the following examples are repeated three times, and the results are averaged.
  • This application uses 4-1BB agonist antibodies and obtains 31 positive clones by immunizing rabbits. Taking the 81E11 clone as an example, a total of 25 humanized antibodies are obtained through the above implementation steps. The obtained humanized antibodies are subjected to computational simulation of antibody-antigen binding, and the top 3 antibodies with the best ranking are selected for expression, and ELISA binding experiments are performed to verify their positive binding to the antigen.
  • Antibody affinity ELISA test The full sequence of the antibody is ordered to the antibody department of Nanjing GenScript Biotechnology Co., Ltd. for full synthesis of the antibody and antigen-antibody affinity testing.
  • the antibody affinity detection method is an enzyme-linked immunosorbent assay (ie, ELISA) based on the indirect method.
  • Indirect ELISA is used to evaluate the binding ability of purified antibodies to the antigen TNFRSF9 protein.
  • the specific steps include: coating the ELISA plate with 0.5 ⁇ g/ml recombinant TNFRSF9 protein in 100 ⁇ l/well PBS at 4°C overnight.
  • the plate was washed four times with PBST, and then TMB colorimetric solution (GenScript) was added and incubated in the dark at 25°C for 15 minutes. The reaction was terminated by adding 50 ⁇ l of 1M HCl stop solution (Sinopharm, 10011018). The plate was read at 450nm using an enzyme reader. The results of the ELISA binding experiment are shown in Figure 6. After experimental verification, a humanized antibody with positive binding was obtained, indicating that the antibody contained in this application is humanized. The effectiveness of the method.
  • the method for constructing a humanized antibody sequence evaluation model disclosed in the present application, a method for obtaining a candidate humanized antibody sequence by evaluation, and a method for humanizing a monoclonal antibody bring about beneficial effects including but not limited to: (1)
  • the humanized antibody sequence evaluation model constructed in the present application can evaluate multiple candidate humanized sequences involved, and finally give an effective humanized antibody.
  • the model is applicable to the humanization evaluation of antibodies from all species, and has low requirements for computing resources, low cost, and less time.
  • the method for obtaining a candidate humanized antibody sequence by evaluation provided in the present application, using a humanized antibody sequence evaluation model constructed based on entropy value can score the sequence to be evaluated and screen out an effective humanized antibody sequence.
  • the method for humanizing a monoclonal antibody obtaineds a humanized antibody that is positive for antigen binding, and the affinity loss after humanization is within an acceptable range, which is effective and can be used for diagnosis and detection, antibody imaging, and treatment of diseases sensitive to monoclonal antibody-based therapies. It should be noted that different embodiments may produce different beneficial effects. In different embodiments, the beneficial effects that may be produced may be any one or a combination of the above, or may be any other possible beneficial effects.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Peptides Or Proteins (AREA)

Abstract

本申请提供了一种人源化抗体序列评估模型的构建方法及其应用。人源化抗体序列评估模型的构建方法包括:获取多个人源抗体模板氨基酸序列并进行编号;计算每个编号位置处的熵值;基于熵值,构建人源化抗体序列评估模型。通过评估获得候选人源化抗体序列的方法包括:对待评估的人源化抗体序列进行编号;确定待评估的人源化抗体序列中每个编号位置上的权重和氨基酸的出现概率;基于权重和氨基酸的出现概率,使用上述方法构建的人源化抗体序列评估模型,对待评估的人源化抗体序列进行评估,获取评估值;若所述评估值满足预设条件,将所述评估值对应的待评估的人源化抗体序列确定为候选人源化抗体序列。本申请还提供一种对单克隆抗体进行人源化的方法。

Description

人源化抗体序列评估模型的构建方法及其应用
交叉引用
本发明要求2022年10月28日提交的申请号为202211335547.8的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本申请涉及生物医药技术领域,具体涉及一种人源化抗体序列评估模型的构建方法及其应用。
背景技术
单克隆抗体类药物对肿瘤、传染病、自身免疫病等疾病的治疗,具有特异性强、副作用小、疗效显著的优点。目前,细胞融合与杂交瘤技术是最为可靠的制备单克隆抗体的方法,常用来免疫的动物有小鼠、大鼠、羊、兔子及其他动物。利用该方法得到的单克隆抗体是动物源的,开发动物源单克隆抗体类药物应用于人体,必须要进行人源化改造,以减少这些异源抗体所引起的人抗动物源抗体的反应,同时也可以更加有效的激活人体免疫系统,降低抗体类药物的清除速度,延长半衰期。
传统的抗体人源化改造有很多方法,大多数方法是基于一级结构序列分析或静态的三维结构分析进行的,如:嵌合抗体、CDR(Complementary Determining Regions)移植改型抗体、表面重塑人源化抗体、噬菌体展示人源化抗体等。但这些无法应用于所有的动物源抗体。就兔抗体而言,直接的嵌合无法获得有效的抗体,如,兔抗体的可变区和人源抗体的恒定区嵌合,兔抗体轻链80号位置的半胱氨酸会形成游离的二硫键,导致在抗体轻链间发生聚合反应,使抗体失效。CDR移植改型抗体与噬菌体展示人源化抗体被认为是人源化程度最高的抗体,但这两种方法改造的人源化抗体往往亲和力下降严重,甚至是完全消失。表面重塑人源化抗体是基于抗体分子三维结构分析获得的,相对于CDR移植抗体来讲在亲和力保持上有一定优势,但其抗体内部依然存在大量的鼠源氨基酸,其亲和力的保持是以牺牲一定人源化程度为代价的。
传统的抗体人源化改造方法在对抗体进行操作的过程中,无法预知氨基酸的替换对抗体结构的影响,也无法预知氨基酸改变对抗体亲和力的影响。尤其是基于生物实验的方法,生产周期长,成本高昂,内外部环境影响因素需严格控制。而基于计算的准确的抗体结构预测和分子对接可以快速的得到氨基酸替换后抗体和抗原间的相互作用,在所设计的众多候选人源化序列中,筛选出有效的人源化抗体,抗体人源化过程耗时短,成本低廉。
另外,如何从多条候选人源化抗体序列中方便快捷的筛选出有效的人源化抗体,在抗体人源化改造中十分重要。目前,可以基于计算机模拟的方法来进行人源化改造后的抗体序列的评估,但该方法主要应用于鼠单抗,无法适用于其他物种来源的单克隆抗体。且该方法基于机器学习或者深度学习,用于训练模型的数据库十分庞大,训练过程复杂,对于计算资源的要求极高。
因此,有必要建立一种能够应用于所有动物源抗体的基于计算模拟的单克隆抗体人源化方法、一种构建人源化抗体序列评估模型的方法以及基于该模型,对人源化后的抗体进行评估的方法。这些方法能够应用于所有物种来源的抗体,且数据量小,耗时短,成本低廉。
发明内容
根据本申请的一个方面,提供了一种人源化抗体序列评估模型的构建方法,包括:获取多个人源抗体模板氨基酸序列并进行编号;计算每个编号位置处的熵值;基于所述熵值,构建人源化抗体序列评估模型。
在一些实施例中,所述熵值通过位置特异性打分矩阵确定,其中,所述方法进一步包括:根据编号后的最长序列,确定所述多个人源抗体模板氨基酸序列的多序列比对结果,其中,空缺位置插入符号来补齐;基于所述多序列比对结果,构建所述多个人源抗体模板氨基酸序列的所述位置特异性打分矩阵。
在一些实施例中,所述熵值通过如下公式计算:其中,n为某个编号位置上所有出现的氨基酸的种类和插入的符号的总数,最大为21,i为对n的索引,pi为在该位置上由所述位置特异性打分矩阵得到的第i种氨基酸出现的概率。
在一些实施例中,基于所述熵值,构建人源化抗体序列评估模型包括:确定每个编号位置处的权重,所述权重与所述熵值为负相关。
在一些实施例中,所述权重通过如下公式计算:其中,wpos为某个编号位置的权重,epos为该位置的熵值,N为最长序列编号注释长度。
在一些实施例中,所述人源化抗体序列评估模型通过如下公式表示: 其中,Scoretarget表示输出的评估值,wpos为人源化抗体序列的某个编号位置的权重,paa表示在该位置上的氨基酸的出现概率。
在一些实施例中,所述人源化抗体序列评估模型用于对所有物种来源的抗体人源化后的序列进行评估。
根据本申请的另一个方面,提供了一种通过评估获得候选人源化抗体序列的方法,包括:对待评估的人源化抗体序列进行编号;确定所述待评估的人源化抗体序列中每个编号位置上的权重和氨基酸的出现概率;基于所述权重和所述氨基酸的出现概率,使用上述方法构建的人源化抗体序列评估模型,对所述待评估的人源化抗体序列进行评估,获取评估值;若所述评估值满足预设条件,将所述评估值对应的待评估的人源化抗体序列确定为候选人源化抗体序列。
在一些实施例中,所述预设条件为评估值大于预设阈值或对评估值进行排序,排名大于某一值。
在一些实施例中,所述方法还包括通过以下方法确定目标人源化抗体序列:预测所述候选人源化抗体序列的抗体结构;模拟所述候选抗体结构与对应抗原的结合情况,获得候选抗体结构;选取候选抗体结构进行生物实验验证;根据所述生物实验验证结果,确定所述目标人源化抗体序列。
根据本申请的再一个方面,提供了一种对单克隆抗体进行人源化的方法,包括:分别确定所述单克隆抗体的人源抗体轻链模板序列和人源抗体重链模板序列;分别将所述单克隆抗体的轻链和重链的CDR区替换到所述人源抗体轻链模板序列和所述人源抗体重链模板序列对应的CDR区,得到CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列;对所述CDR区替换后的人源化抗体轻链模板序列进行E-F环处理,得到多个候选人源化抗体轻链序列;对所述CDR区替换后的人源化抗体重链模板序列进行D-E环处理,得到多个候选人源化抗体重链序列。
在一些实施例中,对所述CDR区替换后的人源化抗体轻链模板序列进行E-F环处理,得到多个候选人源化抗体轻链序列包括:在数据库中对所述CDR区替换后的人源抗体轻链模板序列进行检索,得到多序列比对结果;根据所述多序列比对结果,构建E-F环的位置特异性打分矩阵;根据所述E-F环的位置特异性打分矩阵,生成多个E-F环序列;将所述多个E-F环序列替换到所述CDR区替换后的人源化抗体轻链模板序列中,得到所述多个候选人源化抗体轻链序列。
在一些实施例中,对所述CDR区替换后的人源化抗体重链模板序列进行D-E环处理,得到多个候选人源化抗体重链序列包括:在数据库中对所述CDR区替换后的人源抗体重链模板序列进行检索,得到多序列比对结果;根据所述多序列比对结果,构建D-E环的位置特异性打分矩阵;根据所述D-E环的位置特异性打分矩阵,生成多个D-E环序列;将所述多个D-E环序列替换到所述CDR区替换后的人源化抗体重链模板序列中,得到所述多个候选人源化抗体重链序列。
在一些实施例中,所述方法还包括:对所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列中的高度保守的位点分别进行回复突变。
在一些实施例中,对所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列中的高度保守的位点分别进行回复突变包括:分别确定所述单克隆抗体的轻链和重链序列中的高度保守的位点;判断在所述高度保守的位点处,所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列是否分别与所述单克隆抗体的轻链和重链的氨基酸一致;若不一致,将所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列在所述高度保守的位点处的氨基酸替换回所述单克隆抗体的轻链和重链上对应位置的氨基酸。
在一些实施例中,所述单克隆抗体是兔单克隆抗体,所述方法还包括:预测所述兔单克隆抗体的轻链和重链的三维结构;根据预测的三维结构,若所述兔单克隆抗体中存在半胱氨酸对,且半胱氨酸对之间的距离在4埃到7埃之间,所述CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列上对应的位置处没有氨基酸,在所述位置处插入丝氨酸,所述位置不包括CDR区域;以及若所述兔单克隆抗体中仅存在单个半胱氨酸,所述CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列上对应的位置处没有氨基酸,在所述位置处插入丝氨酸。
在一些实施例中,所述方法还包括:使用上述方法构建的人源化抗体序列评估模型分别对所述候选人源化抗体的轻链序列和重链序列进行评估,获取人源化抗体的轻链序列和重链序列。
在一些实施例中,所述方法还包括:对所述人源化抗体进行生物实验验证,确定所述目标人源化抗体序列。
附图说明
图1是根据本申请一些实施例所示的对单克隆抗体进行人源化的方法的流程图。
图2是根据本申请一些实施例所示的构建人源化抗体序列评估模型的方法的流程图。
图3是根据本申请一些实施例所示的通过评估获得候选人源化抗体序列的方法的流程图。
图4(a)为人源化抗体序列评估模型区分人源、鼠源抗体序列的分类结果以及ROC曲线(重链)。
图4(b)为人源化抗体序列评估模型区分人源、鼠源抗体序列的分类结果以及ROC曲线(轻链)。
图5(a)为人源化抗体序列评估模型区分人源、其他物种来源抗体序列的分类结果以及ROC曲线(重链)。
图5(b)为人源化抗体序列评估模型区分人源、其他物种来源抗体序列的分类结果以及ROC曲线(轻链)。
图6为81E11克隆的人源化抗体的ELISA结合实验结果。
具体实施方式
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单的介绍。显而易见地,下面描述中的附图仅仅是本申请的一些示例或实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图将本申请应用于其它类似情景。除非从语言环境中显而易见或另做说明,图中相同标号代表相同结构或操作。
如本申请和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法也可能包含其它的步骤或元素。
本申请中使用了流程图用来说明根据本申请的实施例的系统所执行的操作。应当理解的是,前面或后面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各个步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
本申请提供了一种对单克隆抗体进行人源化的方法。对来自任何动物源的单抗的轻、重链进行基于序列的重要位点和位置的分析设计和处理,获得人源化序列。
图1是根据本申请一些实施例所示的对单克隆抗体进行人源化的方法的流程图。如图1所示,该方法包括如下步骤。该方法由第一计算设备执行。在一些实施例中,计算设备可以包括处理设备(或处理器)、存储器、输入/输出接口和通信端口。处理设备可以执行计算指令(程序代码)并执行本申请描述的方法步骤。计算指令可以包括程序、对象、组件、数据结构、过程、模块和功能(功能指本申请中描述的特定功能)。在一些实施例中,处理设备可以包括微控制器、微处理器、精简指令集计算机(RISC)、专用集成电路(ASIC)、应用特定指令集处理器(ASIP)、中央处理器(CPU)、图形处理单元(GPU)、物理处理单元(PPU)、微控制器单元、数字信号处理器(DSP)、现场可编程门阵列(FPGA)、高级RISC机(ARM)、可编程逻辑器件以及能够执行一个或多个功能的任何电路和处理器等,或其任意组合。
步骤101,分别确定所述单克隆抗体的人源抗体轻链模板序列和人源抗体重链模板序列。
在一些实施例中,该单克隆抗体可以是来源于任何动物的单克隆抗体。在一些实施例中,该单克隆抗体可以是鼠源单克隆抗体、兔源单克隆抗体等。可以根据单克隆抗体的序列,对收集有人源抗体序列的数据库(例如,观察抗体空间数据库(Observed Antibody Space database,OAS))进行检索并计算序列相似度,从而得到人源抗体轻链和重链模板序列。例如,可以将单克隆抗体的轻链和重链的可变区氨基酸序列作为输入,利用BLAST工具对OAS数据库进行检索,分别得到人源抗体轻链和重链模板的氨基酸序列。在一些实施例中,人源抗体轻链模板序列或重链模板序列可以包括一个或多个。例如,人源抗体轻链模板序列或重链模板序列可以是与单克隆抗体轻链或重链的氨基酸序列相似度最高的序列。又例如,人源抗体轻链模板序列或重链模板序列可以是与单克隆抗体轻链或重链的氨基酸序列相似度满足一定条件(例如,大于98%、95%、90%等)的序列。
步骤102,分别将所述单克隆抗体的轻链和重链的CDR区替换到所述人源抗体轻链模板序列和所述人源抗体重链模板序列对应的CDR区,得到CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列。
获取到人源化抗体轻链模板序列和人源化抗体重链模板序列后,可以使用编号系统对单克隆抗体的轻链序列和重链序列以及人源化抗体轻链模板序列和人源化抗体重链模板序列进行编号,以确定各序列的CDR、框架区、影响抗体-抗原结合亲和力和抗体特异性的残基等,从而能够对单克隆抗体序列和人源化抗体模板序列执行CDR区和非CDR区的序列或位点的替换处理。在一些实施例中,可以使用 ANARCI统一编号系统对这些序列进行统一编号。在一些实施例中,可以使用Chothia编号方案进行编号。在一些实施例中,还可以使用Kabat、Martin、Gelfand、IMGT或Honneger’s编号方案。在一些实施例中,进行编号的可以是各抗体序列的可变区。将这些序列对齐到同一坐标系统,空缺位置可以插入符号来补齐,例如,“-”、“*”等。基于编号,可以标记出单克隆抗体的轻链序列和重链序列以及人源化抗体轻链模板序列和人源化抗体重链模板序列的CDR区,进而将单克隆抗体的轻链序列和重链序列的CDR区分别替换到人源抗体轻链模板序列和人源抗体重链模板序列的CDR区,得到CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列。
步骤103,对所述CDR区替换后的人源化抗体轻链模板序列进行E-F环处理,得到多个候选人源化抗体轻链序列。
E-F环是位于抗体的轻链非结合区中的一段环形结构,对整个抗体能够起到支撑作用。对所述CDR区替换后的人源化抗体轻链模板序列进行E-F环处理指的是使用额外生成的E-F环序列对CDR区替换后的人源化抗体轻链模板序列中的E-F环进行替换。具体通过如下方式实现。
在一些实施例中,在数据库(例如,UniRef90数据库、UniRef 100数据库等)中对所述CDR区替换后的人源抗体轻链模板序列进行检索,得到多序列比对(Multiple Sequence Alignment,MSA)结果。应当注意的是,也可以在数据库中对所述人源抗体轻链模板序列进行检索,得到多序列比对结果。根据所述多序列比对结果,构建E-F环的位置特异性打分矩阵(Position Specific Scoring Matrix,PSSM)。E-F环的PSSM表示在E-F环的每个位置上,出现各个氨基酸的可能性情况。作为示例,PSSM的行数为20,对应着20种不同的氨基酸,列数为E-F环的序列长度。根据所述E-F环的位置特异性打分矩阵,按照每个位置对应的氨基酸出现的概率,可以生成多个E-F环序列。在一些实施例中,可以根据位置特异性打分矩阵,随机生成多个E-F环序列。将所述多个E-F环序列替换到所述CDR区替换后的人源化抗体轻链模板序列中,从而可以得到所述多个候选人源化抗体轻链序列。应当注意的是,各个位置出现的氨基酸的概率情况还可以通过其他方式确定,例如,隐马尔可夫模型。
步骤104,对所述CDR区替换后的人源化抗体重链模板序列进行D-E环处理,得到多个候选人源化抗体重链序列。
D-E环是位于抗体的重链非结合区中的一段环形结构,对整个抗体能够起到支撑作用。对所述CDR区替换后的人源化抗体重链模板序列进行D-E环处理指的是使用额外生成的D-E环序列对CDR区替换后的人源化抗体重链模板序列中的D-E环进行替换。其实现方式与步骤103类似。具体地,在数据库(例如,UniRef90数据库)中对所述CDR区替换后的人源抗体重链模板序列进行检索,得到多序列比对结果;应当注意的是,也可以在数据库中对所述人源抗体重链模板序列进行检索,得到多序列比对结果。根据所述多序列比对结果,构建D-E环的位置特异性打分矩阵;根据所述D-E环的位置特异性打分矩阵,生成多个D-E环序列;将所述多个D-E环序列替换到所述CDR区替换后的人源化抗体重链模板序列中,得到所述多个候选人源化抗体重链序列。
通过步骤103和104,得到多个E-F环和D-E环序列,对CDR区替换后的人源化抗体轻链和重链模板序列中的E-F环和D-E环进行替换,能够得到人源化程度更高的候选人源化抗体轻链和重链序列,能更好地维持抗体结构,为目标人源化抗体轻链和重链序列的获得提供了更多的候选项。
步骤105,对所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列中的高度保守的位点分别进行回复突变。
高度保守的位点指的是维持单克隆抗体结构比较重要的位点。若这些位点在候选人源化抗体轻重链序列上与在单克隆抗体上的氨基酸不一致,可能会造成这些候选人源化抗体的抗体结构改变,造成抗体亲和力的下降甚至消失。因此,需要使其与单克隆抗体对应位点保持一致。具体地,先确定所述单克隆抗体的轻链和重链序列中的高度保守的位点;判断在所述高度保守的位点处,所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列是否分别与所述单克隆抗体的轻链和重链的氨基酸一致;若不一致,将所述多个候选人源化抗体轻链序列和多个候选人源化抗体重链序列在所述高度保守的位点处的氨基酸替换回所述单克隆抗体的轻链和重链上对应位置的氨基酸。
在一些实施例中,高度保守的位点的确定方式与上述E-F环序列的确定方式类似,可以通过单克隆抗体的轻重链的PSSM确定。该PSSM可以通过将单克隆抗体的轻重链输入到蛋白质数据库中对其进行检索得到。
附加地或可选地,在单克隆抗体是兔单克隆抗体的情况下,所述方法还可以进一步包括对兔单克隆抗体进行半胱氨酸处理。由于兔单克隆抗体的可变区和人源抗体的恒定区嵌合,兔抗体轻链80号位置的半胱氨酸会形成游离的二硫键,导致在抗体轻链间发生聚合反应,使抗体失效。因此,需要将模板序列上对应位置的半胱氨酸进行处理,避免抗体失效。具体地,预测兔单克隆抗体的轻链和重链的三维结构;根据预测的三维结构,若所述兔单克隆抗体中存在半胱氨酸对,且半胱氨酸对之间的距离在4埃到 7埃之间,所述CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列上对应的位置处没有氨基酸,在所述位置处插入丝氨酸,所述位置不包括CDR区域;若所述兔单克隆抗体中仅存在单个半胱氨酸,所述CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列上对应的位置处没有氨基酸,在所述位置处插入丝氨酸。
应当注意的是,上述有关图1的方法步骤的描述以及步骤顺序仅仅是为了示例和说明,而不限定本申请的适用范围。对于本领域技术人员来说,在本申请的指导下可以对步骤进行各种修正和改变。然而,这些修正和改变仍在本申请的范围之内。例如,步骤103和104可以合并成一个步骤或进行顺序替换。例如,步骤103可以位于步骤104之后。
本申请还提供了一种构建人源化抗体序列评估模型的方法。通过上述方法能够对动物源的单克隆抗体进行人源化,获得多个候选人源化抗体。之后可以基于下文描述的方法来构建人源化抗体序列评估模型,通过该模型,能够对多个候选人源化抗体进行评估筛选,筛选出有效的目标人源化抗体。
图2是根据本申请一些实施例所示的构建人源化抗体序列评估模型的方法的流程图。如图2所示,该方法包括如下步骤。该方法由第二设备执行。
步骤201,获取多个人源抗体模板氨基酸序列并进行编号。
在一些实施例中,人源抗体模板氨基酸序列可以从人源抗体的数据库中获取。所述数据库可以是OAS数据库。在一些实施例中,所述人源抗体模板氨基酸序列可以是数据库的全部或部分序列。可以使用编号系统对人源抗体模板氨基酸序列进行编号。在一些实施例中,编号方式可以与步骤102中的编号方式相同。在一些实施例中,所述人源抗体模板氨基酸序列可以是从数据库中获得的可变区序列或全部抗体序列。在该模型构建的方法中以可变区序列进行说明。
步骤202,计算每个编号位置处的熵值。
在一些实施例中,所述熵值通过位置特异性打分矩阵确定。所述方法还可以包括构建所述多个人源抗体模板氨基酸序列的位置特异性打分矩阵。在一些实施例中,构建所述多个人源抗体模板氨基酸序列的位置特异性打分矩阵包括:根据编号后的最长序列,确定所述多个人源抗体模板氨基酸序列的多序列比对结果,其中,空缺位置插入符号来补齐;基于所述多序列比对结果,构建所述多个人源抗体模板氨基酸序列的位置特异性打分矩阵。构建人源抗体模板氨基酸序列的位置特异性打分矩阵与图1中步骤103构建E-F环的位置特异性打分矩阵类似,在此不再赘述。
位置特异性打分矩阵表示在各个编号位置上,各个氨基酸出现的可能性得分情况。基于该位置特异性打分矩阵,可以确定各氨基酸出现的概率,从而确定每个编号位置处的熵值。熵值用于表示每个编号位置处氨基酸的稳定性。熵值越高,该位置的氨基酸越不稳定,氨基酸的变化较大,越不保守。
在一些实施例中,所述熵值通过如下公式(1)计算:
其中,n为某个编号位置上所有出现的氨基酸的种类和插入的符号的总数,最大为21,i为对n的索引,pi为在该位置上的第i种氨基酸出现的概率,其可以由所述位置特异性打分矩阵得到。
在一些实施例中,所述熵值通过隐马尔可夫模型确定。例如,可以通过隐马尔可夫模型确定各个位置上氨基酸出现的概率,基于出现的概率,确定熵值。
步骤203,基于所述熵值,构建人源化抗体序列评估模型。
在一些实施例中,基于所述熵值,构建人源化抗体序列评估模型包括:确定每个编号位置处的权重。所述权重与所述熵值为负相关。熵值越大,表明该位置处氨基酸越不稳定,进而重要性越低。因此,对其反向赋予权重,对应的权重也越低。
在一些实施例中,所述权重通过如下公式(2)计算:
其中,wpos为某个编号位置的权重,epos为该位置的熵值,由公式(1)确定,N为最长序列编号注释长度。
在一些实施例中,所述人源化抗体序列评估模型通过如下公式(3)表示:
其中,Scoretarget表示输出的评估值,wpos为人源化抗体序列的某个编号位置的权重,paa表示该位置上的氨基酸的出现概率,其根据位置特异性打分矩阵确定。在一些实施例中,所述人源化抗体序列评估模型用于对所有物种来源的抗体人源化后的序列进行评估。具体评估方式可以参考下文图3及其描述部分。
在一些实施例中,通过对该模型的验证,所述人源化抗体序列评估模型能够有效区分人源抗体序列和鼠源抗体序列的轻链,对应的AUC值为1.00;所述人源化抗体序列评估模型能够区分人源抗体序 列和鼠源抗体序列的重链,对应的AUC值为0.92。所述人源化抗体序列评估模型能够区分人源抗体序列和其他物种来源的抗体序列的轻链,对应的AUC值为0.89;所述人源化抗体序列评估模型能够区分人源抗体序列和其他物种来源的抗体序列的重链,对应的AUC值为0.94。这些结果表明本申请的人源化抗体序列评估模型能够有效地将人源化抗体序列与其他物种来源的抗体序列进行区分,能够应用于后续对候选抗体的评估和筛选。
本申请还提供了一种通过评估获得候选人源化抗体序列的方法。该方法基于上文构建的人源化抗体序列评估模型,对待评估的人源化抗体序列进行评估,确定出目标人源化抗体序列。
图3是根据本申请一些实施例所示的通过评估获得候选人源化抗体序列的方法的流程图。如图3所示,该方法主要包括如下步骤。该方法主要由第三设备执行。
步骤301,对待评估的人源化抗体序列进行编号。
在一些实施例中,所述待评估的人源化抗体序列可以根据上文提到的编号方法进行编号。在一些实施例中,所述待评估的人源化抗体序列可以是基于图1所示的方法确定的。
步骤302,确定待评估的人源化抗体序列中每个编号位置上的权重和氨基酸的出现概率。
在一些实施例中,为确定待评估的人源化抗体序列中每个编号位置上的权重,需要确定待评估的人源化抗体序列中每个编号位置上的熵值。所述熵值根据多个人源抗体模板序列的位置特异性打分矩阵确定。所述熵值和权重分别通过上述公式(1)和(2)确定。
步骤303,基于所述权重和所述氨基酸的出现概率,使用上文所述方法构建的人源化抗体序列评估模型,对待评估的人源化抗体序列进行评估,获取评估值。具体地,根据多个人源抗体模板序列的位置特异性打分矩阵,确定每个待评估的人源化抗体序列中每个编号位置上氨基酸的出现概率。将所述权重和所述氨基酸的出现概率输入到所述人源化抗体序列评估模型中,即公式(3),模型可以输出每个待评估的人源化抗体序列的评估值。
步骤304,若所述评估值满足预设条件,将所述评估值对应的待评估的人源化抗体序列确定为候选人源化抗体序列。在一些实施例中,该预设条件可以是评估值大于预设阈值(例如,0.5、0.6、0.7等)。例如,将评估值大于0.6的待评估的人源化抗体序列确定为候选人源化抗体序列。在一些实施例中,可以对评估值进行排序,例如,按照正序排序,该预设条件可以是排名大于某一值(例如,3、4、5、6、7、8、9、10、11、12、13、14、15等),例如,将评估值为前15对应的待评估的人源化抗体序列确定为候选人源化抗体序列。
附加地或可选地,该评估方法还可以包括通过以下方法确定目标人源化抗体序列:预测所述候选人源化抗体序列的抗体结构;模拟所述候选抗体结构与对应抗原的结合情况,获得候选抗体结构;选取候选抗体结构进行生物实验验证;根据所述生物实验验证结果,确定所述目标人源化抗体序列。具体地,将候选人源化抗体序列的轻重链进行两两组合配对,并使用抗体结构预测软件预测配对后的候选抗体结构。使用抗原结构预测软件对抗原结构进行预测。使用计算模拟的方式对候选抗体结构和抗原结构进行模拟,选择模拟效果较好的候选抗体结构进行后续生物学实验验证,检测亲和力的情况,从而确定目标人源化抗体序列。
仅作为示例,将候选人源化抗体序列进行基因合成,构建到含有轻链与重链保守区的真核表达载体上,使之可以表达完整的抗体分子。然后将轻链与重链载体共转染293或者CHO细胞进行瞬时表达,收集转染后细胞培养上清,利用ELISA验证候选抗体与抗原的结合,结合阳性的抗体进一步使用Protein A进行抗体纯化;然后利用ELISA,FACS或者SPR等技术对抗体亲和力进行检测,并与亲本抗体进行比较,得到亲和力结果。
应当注意的是,上述有关图1、图2和图3的方法步骤的描述以及步骤顺序仅仅是为了示例和说明,而不限定本申请的适用范围。对于本领域技术人员来说,在本申请的指导下可以对步骤进行各种修正和改变。然而,这些修正和改变仍在本申请的范围之内。在一些实施例中,第一计算设备、第二计算设备和第三计算设备可以是独立的计算设备,也可以合并成一个计算设备。
通过本申请的对单克隆抗体进行人源化的方法可以有效获得人源化后的单克隆抗体;根据构建的基于熵值的人源化抗体序列评估模型,能够对人源化后的单克隆抗体进行评估,从而筛选出有效的人源化抗体。
实施例
实施例1人源化抗体序列评估模型的构建
采用OAS数据库中人源抗体序列进行模型构建和测试评估。OAS数据库中存有包含人、小鼠、猴等多个物种的超过25亿条抗体序列数据。从OAS数据库中随机抽取了5,000,000条人源抗体序列按照 如下方式构建人源化抗体序列评估模型。
对从OSA数据库中获得的所有5,000,000条人源抗体序列采用ANARCI统一编号系统进行编号注释。
按照编号注释中最长序列的编号注释,构建所有人源抗体序列的多序列比对结果(Multiple Sequence Alignment,MSA),其中,较短序列的空缺位置插入“-”补齐。
根据得到的MSA,构建所有人源抗体的位置特异性打分矩阵(Position Specific Scoring Matrix,PSSM)。
根据得到的PSSM,对编号注释中的每个位置计算熵值,熵值计算公式为:其中n为某个编号位置上所有出现的氨基酸的种类和插入“-”的总数,最大为21,i为对n的索引,pi为在该位置上由PSSM得到的第i种氨基酸出现的概率。
对计算得到的每个位置的熵值进行归一化,根据归一化的熵值,对每个位置反向赋予权重,即:熵值最高的位置,赋予最小的权重,熵值最低的位置,赋予最高的权重。权重计算公式为: 其中,wpos为某个编号位置的权重,epos为该位置的熵值,N为最长序列编号注释长度。
构建人源化抗体序列评估模型,模型通过如下公式表示:其中,Scoretarget表示输出的评估值,wpos为人源化抗体序列的某个编号位置的权重,paa表示该位置上的氨基酸的出现概率,该概率通过PSSM得到。
实施例2人源化抗体序列评估模型的效果
从OAS数据库中随机抽取了额外的4,999,335条人源抗体序列(重链),4,936,514条鼠源抗体序列(重链),727,184条人源抗体序列(轻链),727,184条鼠源抗体序列(轻链)做测试,以评估上述模型区分人源抗体和鼠源抗体的性能。
从OAS数据库中随机抽取了额外的10,016,375条人源抗体序列(重链),10,016,375条其他物种来源的抗体序列(重链),2,026,539条人源抗体序列(轻链),2,026,539条其他物种来源的抗体序列(轻链)做测试,以评估上述模型区分人源抗体和其他物种来源的抗体的性能。
将上述序列输入根据实施例1构建好的人源化抗体序列评估模型,得到每条序列对应的人源化评估值,以评估该模型区分人源抗体和其他物种来源抗体的性能。
人源化抗体序列编码方法包括以下步骤:
对人源化抗体序列进行编号注释,编号注释长度为N。
根据实施例1中所构建PSSM,查询人源化抗体序列中每个位置上氨基酸的出现概率。
根据实施例1中确定的每个位置的权重,及每个位置上氨基酸的出现概率,对人源化抗体序列按照公式:wpospaa(即:权重×概率)进行打分。
按照分类问题进行处理(0-表示非人,1-表示人),对上述结果绘制ROC曲线(Receiver Operating Characteristic Curve),测试结果如图4(a)和图4(b)、图5(a)和图5(b)所示。
如图4(a)和图4(b)、图5(a)和图5(b)所示,本申请构建的人源化评估模型能够区分人源/鼠源抗体的轻链和重链序列,轻重链对应的AUC值分别为1.00和0.92;能够区分人源/其他物种来源的抗体的轻链和重链序列,轻重链对应的AUC值分别为0.89和0.94。测试结果表明人源化抗体序列评估模型能够有效区分人源抗体和鼠源抗体、人源抗体和其他物种来源的抗体,在对于评估人源化抗体序列应用中具有有效性。
进一步,在对上述构建好的人源化抗体序列评估模型进行有效评估后,采用4-1BB激动剂抗体(TNFRSF9蛋白)作为具体的示例,对兔单抗人源化抗体候选序列进行评估,根据评估结果,给出排名最前的5(N=5)条兔单抗人源化抗体轻链序列、5(N=5)条兔单抗人源化抗体重链序列,并进行生物实验验证,具体内容参考如下实施例。
实施例3兔单抗的人源化候选序列的确定
具体的,包括以下步骤:
A.原始兔单克隆抗体获取:
1.以TNFRSF9蛋白作为抗原,通过订购南京金斯瑞生物科技有限公司的MonoRabTM兔单克隆抗体定制服务(https://www.genscript.com.cn/custom-rabbit-monoclonal-antibody-generation.html),制备兔单抗。抗原序列为:
2.对获得的兔单抗进行测序,获取兔单抗的轻、重链可变区氨基酸序列,以克隆81E11为例,其轻重链完整可变区序列分别为:
兔单抗的轻链完整可变区:
兔单抗的重链完整可变区:
B.兔单抗人源化:
1.寻找最优人源模板:
1.1以步骤A中获得的兔单抗轻链的完整可变区序列(SEQ ID NO:2)作为输入,利用BLAST工具对OAS(Observed Antibody Space)数据库做搜索,得到序列相似度最高的人源抗体序列,作为轻链的模板序列;人源抗体轻链模板序列为:
1.2以步骤A中获得的兔单抗重链的完整可变区序列(SEQ ID NO:3)作为输入,利用BLAST工具对(Observed Antibody Space,OAS)数据库做搜索,得到序列相似度最高的人源抗体序列,作为重链的模板序列;人源抗体重链模板序列为:
2.使用ANARCI统一编号系统,具体采用Chothia编号方案对兔单抗轻链完整可变区序列(SEQ ID NO:2)、兔单抗重链完整可变区序列(SEQ ID NO:3)、步骤1中得到的人源抗体轻链模板(SEQ ID NO:4)、人源抗体重链模板(SEQ ID NO:5)进行编号。示例性的编号后的序列如下所示。
编号后的兔单抗轻链完整可变区序列:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,30A,30B,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,95A,95B,95C,95D,96,97,98,99,100,101,102,103,104,105,106,107。
A,A,V,L,T,Q,T,P,S,P,V,S,V,T,V,G,G,T,V,T,I,N,C,Q,A,S,Q,S,V,D,N,N,N,Y,L,A,W,F,Q,Q,K,P,G,Q,P,P,K,Q,L,I,Y,S,A,S,T,L,A,S,G,V,S,S,R,F,K,G,S,G,S,G,T,Q,F,T,L,T,I,S,G,V,Q,C,D,D,A,A,T,Y,Y,C,L,G,E,F,S,A,S,S,G,D,W,N,A,F,G,G,G,T,E,V,V,V,K。
编号后的兔单抗重链完整可变区序列:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113。
Q,-,S,V,K,E,S,E,G,G,L,F,K,P,T,D,T,L,T,L,A,C,T,V,S,g,f,s,l,s,Y,N,A,I,T,W,V,R,Q,A,P,G,N,G,L,E,W,I,G,V,I,N,Y,D,G,T,T,V,Y,A,S,W,A,K,S,R,S,T,I,T,R,N,T,N,L,N,T,V,T,L,K,M,T,S,L,T,A,A,D,T,A,T,Y,F,C,A,R,N,F,-,-,-,-,N,I,W,G,P,G,T,L,V,T,V,S,S。
编号后的人源抗体轻链模板序列:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107。
D,I,Q,M,T,Q,S,P,S,S,L,S,A,S,V,G,D,T,V,T,I,T,C,R,A,S,Q,S,I,S,T,Y,L,S,W,F,Q,Q,K,P,G,K,A,P,K,L,L,I,Y,V,A,S,S,L,Q,S,G,V,P,S,R,F,S,G,S,G,S,G,T,E,F,T,L,T,I,A,G,L,Q,L,D,D,L,A,T,Y,Y,C,Q,Q,Y,N,S,F,E,L,S,F,G,G,G,T,K,V,D,I,K。
编号后的人源抗体重链模板序列:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,101,102,103,104,105,106,107,108,109,110,111,112,113。
Q,V,Q,L,Q,E,S,G,P,G,L,V,K,P,S,E,T,L,S,L,T,C,T,V,S,G,G,S,I,D,T,Y,Y,W,S,W,I,R,Q,P,P,G,K,G,L,E, W,I,G,-,-,-,-,-,-,Y,L,-,Y,N,P,S,L,K,S,R,A,T,I,S,L,D,T,S,K,N,Q,I,S,L,K,M,R,S,M,T,A,A,D,T,A,M,Y,F,C,A,R,D,P,N,R,A,A,A,G,A,F,D,I,W,G,P,G,T,M,V,T,V,S,S。
上述编号后的序列中,例如,以编号后的兔单抗轻链完整可变区序列为例,1对应A;以编号后的兔单抗重链完整可变区序列为例,“-”表示空缺位置。
3.对人源抗体轻链、重链模板序列进行CDR区替换,包括以下步骤:
3.1将兔单抗轻链、重链、步骤1中得到的人源抗体轻链模板、重链模板序列对齐到同一坐标系统,空缺位置使用“-”补齐。
3.2根据步骤2的编号,分别标记兔单抗轻链、重链、步骤1中得到的人源抗体轻链模板、重链模板的CDR区。这些CDR区序列为:
兔单抗重链CDR-H1:GFSLSYN(SEQ ID NO:6);
兔单抗重链CDR-H2:NYDGT(SEQ ID NO:7);
兔单抗重链CDR-H3:FN(SEQ ID NO:8);
兔单抗轻链CDR-L1:SQSVDNNNY(SEQ ID NO:9);
兔单抗轻链CDR-L2:SAS(SEQ ID NO:10);
兔单抗轻链CDR-L3:EFSASSGDWN(SEQ ID NO:11);
人源抗体重链模板CDR-H1:GGSIDTY(SEQ ID NO:12);
人源抗体重链模板CDR-H2:Y(SEQ ID NO:13);
人源抗体重链模板CDR-H3:PNRAAAGAFD(SEQ ID NO:14);
人源抗体轻链模板CDR-L1:SQSISTY(SEQ ID NO:15);
人源抗体轻链模板CDR-L2:VAS(SEQ ID NO:16);
人源抗体轻链模板CDR-L3:YNSFEL(SEQ ID NO:17)。
3.3将兔单抗的轻、重链CDR区分别替换到步骤3.2的人源抗体轻、重链模板的对应CDR区;替换后的人源化抗体轻、重链模板序列分别为:
替换后的人源化抗体轻链模板序列:
替换后的人源化抗体重链模板序列:
其中,加粗部分表示替换后的CDR区。
4.半胱氨酸CYS处理,包括以下步骤:
4.1采用AlphaFold II分别预测兔单抗轻、重链的三维结构。
4.2根据预测的兔单抗轻、重链的三维结构,计算兔单抗序列中所有CYS对之间的距离,对距离为4埃到7埃的CYS对,对应的替换后的人源抗体模板中如果没有氨基酸,则在该位置插入丝氨酸SER,CDR区域的CYS对不做替换。
4.3在步骤4.2的基础上,替换单个CYS,若兔单抗序列中存在单个CYS,如果替换后的人源抗体模板对应位置上没有氨基酸,则插入Ser;进行CYS替换后的轻重链如下所示,本实施例中轻重链序列没有发生替换:
CYS替换后的轻链序列:
CYS替换后的重链序列:
5.轻链E-F环处理,包括以下步骤:
5.1采用BLAST对人源抗体轻链模板序列搜索UniRef90数据库,得到轻链多序列比对(Multiple Sequence Alignment,MSA)结果。
5.2根据步骤5.1得到的MSA,构建轻链E-F环的位置特异性打分矩阵(Position Specific Scoring Matrix,PSSM)。
5.3根据步骤5.2得到的PSSM,按照每个位置对应的氨基酸出现的概率,随机生成若干条E-F环候选序列;E-F环候选序列为(以10条举例):

5.4将步骤5.3生成的E-F环候选序列依次替换到步骤4.3得到的轻链序列(SEQ ID NO:20)中,产生若干个候选序列。其中,以将SSLQPED(SEQ ID NO:22)替换为例,示意性的列出1个候选序列:
6.重链D-E环处理,包括以下步骤:
6.1采用BLAST对人源抗体重链模板序列搜索UniRef90数据库,得到重链MSA结果。
6.2根据步骤6.1得到的MSA,构建重链D-E环的PSSM。
6.3根据步骤6.2得到的PSSM,按照每个位置对应的氨基酸出现的概率,随机生成若干条D-E环候选序列;D-E环候选序列为(以10条举例):

6.4将步骤6.3生成的D-E环候选序列依次替换到步骤4.3得到的重链序列(SEQ ID NO:21)中,产生若干个候选序列。其中,以将VDTSKN(SEQ ID NO:32)替换为例,示意性的列出1个候选序列:
7.回复突变,包括以下步骤:
7.1采用BLAST分别对亲本兔单抗的轻、重链可变区序列(SEQ ID NO:2和SEQ ID NO:3)搜索UniRef90数据库,得到轻、重链MSA结果,根据MSA结果,得到对应的PSSM。
7.2由步骤7.1得到的PSSM,得到轻、重链可变区高度保守的位点;这些位点分别为:
重链:13P,17L,19L,21C。
轻链:1A,5T,6Q,16G,23C。
7.3判断步骤7.2中得到的位点上,由步骤5.4和步骤6.4得到的若干候选序列中对应位置上的氨基酸是否与亲本兔单抗序列一致,如不一致,替换回亲本兔单抗序列对应位置上的氨基酸。此时:轻重链可变区序列为(以1个候选抗体为例):
回复突变后的轻链可变区序列:
回复突变后的重链可变区序列:
其中,下划线部分表示回复突变替换回的氨基酸,轻链的第一个位点进行了回复突变,重链未有修改。
实施例4使用人源化抗体序列评估模型对候选序列进行评估
使用人源化抗体序列评估模型对候选序列进行评估包括以下步骤:
对于实施例3中最终得到的所有待评估的候选人源化抗体序列按照实施例2中的人源化抗体序列编码方法进行序列编码。
根据实施例1所构建的人源化抗体序列评估模型,对所有待评估的候选人源化抗体序列进行打分各符号含义与实施例1中描述完全一致。
对所有打分后的候选人源化抗体序列,按照评分Scoretarget由高到低进行排序。
分别选取得分最高的前5条轻链,5条重链;得分情况如表1所示:
表1

实施例5抗体-抗原结合的计算模拟与生物验证
1.抗体-抗原结合的计算模拟,包括以下步骤:
1.1对实施例4中得到的得分最高的前5条轻链和5条重链,进行两两组合配对。
1.2采用AbodyBuilder预测步骤1.1中的配对抗体结构。
1.3使用AlphaFold II对抗原结构进行预测。
1.4使用ZDOCK对步骤1.2中预测的所有抗体结构,分别和步骤1.3中预测的抗原结构进行结合模拟。
1.5对步骤1.4中模拟结果进行排序,选择排名最优的前3个组合。
2.对步骤1.5中的3个抗体进行生物验证,分别命名为:81E11H3L1,81E11H1L1,81E11H4L1,具体包括以下步骤:
2.1候选抗体重链可变区和轻链可变区的N端都添加信号肽,信号肽序列为:
2.2重链可变区的C端添加恒定区序列形成完整重链,恒定区序列为:
2.3轻链可变区的C端添加恒定区序列形成完整轻链,恒定区序列为:
实施例6 4-1BB激动剂抗体的人源化抗体验证
下述实施例中的实验方法,如无特殊说明,均为常规方法。下述实施例中所用的试验材料,如无特殊说明,均为自常规生化试剂公司购买得到的。以下实施例中的定量试验,均设置三次重复实验,结果取平均值。
本申请采用4-1BB激动剂抗体,通过免疫兔子获得了31株阳性克隆,以81E11克隆为例,经上述实施步骤,共得到25个人源化抗体,对获得的人源化抗体进行抗体-抗原结合的计算模拟,选择其中排名最优的前3个抗体进行表达、并做ELISA结合实验验证其与抗原的结合阳性。
抗体亲和力ELISA检测:抗体全序列下单到南京金斯瑞生物科技有限公司抗体部进行抗体的全合成和抗原抗体亲和力的检测。抗体亲和力检测方法是基于间接法的酶联免疫吸附实验(即ELISA)。间接ELISA用于评估纯化抗体对于抗原TNFRSF9蛋白的结合能力。具体步骤包括:将ELISA板用100μl/孔的PBS中0.5μg/ml的重组TNFRSF9蛋白在4℃下包被过夜。用PBS-T(0.05%吐温)洗涤板,并将其用250μl/孔的含1%BSA的PBST在37℃封闭2小时。随后弃去封闭液,向首孔加入1μg/ml的纯化抗体100μl,并按照3倍梯度稀释,共计11个测试浓度梯度外加一个空白孔。然后在37℃下孵育1小时。将板用PBST洗涤三次,并用100μl/孔的缀合辣根过氧化物酶的山羊抗小鼠IgG(Fc-特异性)二抗(Jackson,115-035-071)37℃孵育0.5小时。将板用PBST洗涤四次,然后加入TMB显色液(GenScript)并在25℃下在黑暗中孵育15分钟。通过加入50μl的1M HCl终止液(国药,10011018)终止反应。使用酶标仪在450nm下读板。ELISA结合实验结果如图6所示。经实验验证,得到了结合阳性的人源化抗体,表明本申请所载抗体人源 化方法的有效性。
EC50值于如表2所示。
表2
表2的结果表明,相对于对照81E11WTH,81E11H1L1、81E11H3L1和81E11H4L1这3个人源化抗体结合活性较好,且亲和力损失在可接受范围。
本申请所披露的一种人源化抗体序列评估模型的构建方法,一种通过评估获得候选人源化抗体序列的方法及其一种对单克隆抗体进行人源化的方法,带来的有益效果包括但不限于:(1)本申请构建的人源化抗体序列评估模型可以对所涉及的多条候选人源化序列进行评估,最终给出有效的人源化抗体,该模型适用于所有物种来源抗体的人源化评估,且对计算资源的要求较低,成本低廉、耗时少。(2)本申请提供的通过评估获得候选人源化抗体序列的方法,使用基于熵值构建的人源化抗体序列评估模型,能够对待评估序列进行打分,筛选出有效的人源化抗体序列。(3)本申请提供的一种对单克隆抗体进行人源化的方法,获得了与抗原结合阳性的人源化抗体,人源化后亲和力损失在可接受范围,具有有效性,可以用于诊断和检测、抗体成像以及治疗对基于单克隆抗体的疗法敏感的疾病。需要说明的是,不同实施例可能产生的有益效果不同,在不同的实施例里,可能产生的有益效果可以是以上任意一种或几种的组合,也可以是其他任何可能获得的有益效果。
本领域的技术人员应当理解,以上实施例仅为说明本申请,而不对本申请构成限制。凡在本申请的精神和原则内所作的任何修改、等同替换和变动等,均应包含在本申请的保护范围之内。

Claims (18)

  1. 一种人源化抗体序列评估模型的构建方法,其特征在于,包括:
    获取多个人源抗体模板氨基酸序列并进行编号;
    计算每个编号位置处的熵值;
    基于所述熵值,构建人源化抗体序列评估模型。
  2. 根据权利要求1所述的方法,其特征在于,所述熵值通过位置特异性打分矩阵确定,其中,所述方法进一步包括:
    根据编号后的最长序列,确定所述多个人源抗体模板氨基酸序列的多序列比对结果,其中,空缺位置插入符号来补齐;
    基于所述多序列比对结果,构建所述多个人源抗体模板氨基酸序列的所述位置特异性打分矩阵。
  3. 根据权利要求1或2所述的方法,其特征在于,所述熵值通过如下公式计算:
    其中,n为某个编号位置上所有出现的氨基酸的种类和插入的符号的总数,最大为21,i为对n的索引,pi为在该位置上由所述位置特异性打分矩阵得到的第i种氨基酸出现的概率。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,基于所述熵值,构建人源化抗体序列评估模型包括:
    确定每个编号位置处的权重,所述权重与所述熵值为负相关。
  5. 根据权利要求4所述的方法,其特征在于,所述权重通过如下公式计算:
    其中,wpos为某个编号位置的权重,epos为该位置的熵值,N为最长序列编号注释长度。
  6. 根据权利要求5所述的方法,其特征在于,所述人源化抗体序列评估模型通过如下公式表示:
    其中,Scoretarget表示输出的评估值,wpos为人源化抗体序列的某个编号位置的权重,paa表示该位置上的氨基酸的出现概率。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述人源化抗体序列评估模型用于对所有物种来源的抗体人源化后的序列进行评估。
  8. 一种通过评估获得候选人源化抗体序列的方法,其特征在于,包括:
    对待评估的人源化抗体序列进行编号;
    确定所述待评估的人源化抗体序列中每个编号位置上的权重和氨基酸的出现概率;
    基于所述权重和所述氨基酸的出现概率,使用如权利要求6所述方法构建的人源化抗体序列评估模型,对所述待评估的人源化抗体序列进行评估,获取评估值;
    若所述评估值满足预设条件,将所述评估值对应的所述待评估的人源化抗体序列确定为候选人源化抗体序列。
  9. 根据权利要求8所述的方法,其特征在于,所述预设条件为评估值大于预设阈值或对评估值进行排序,排名大于某一值。
  10. 根据权利要求8或9所述的方法,还包括通过以下方法确定目标人源化抗体序列:
    预测所述候选人源化抗体序列的抗体结构;
    模拟所述候选抗体结构与对应抗原的结合情况,获得候选抗体结构;
    选取候选抗体结构进行生物实验验证;
    根据所述生物实验验证结果,确定所述目标人源化抗体序列。
  11. 一种对单克隆抗体进行人源化的方法,其特征在于,包括:
    分别确定所述单克隆抗体的人源抗体轻链模板序列和人源抗体重链模板序列;
    分别将所述单克隆抗体的轻链和重链的CDR区替换到所述人源抗体轻链模板序列和所述人源抗体重链模板序列对应的CDR区,得到CDR区替换后的人源化抗体轻链模板序列和人源化抗体重链模板序列;
    对所述CDR区替换后的人源化抗体轻链模板序列进行E-F环处理,得到多个候选人源化抗体轻链序列;
    对所述CDR区替换后的人源化抗体重链模板序列进行D-E环处理,得到多个候选人源化抗体重链序列。
  12. 根据权利要求11所述的方法,其特征在于,所述对所述CDR区替换后的人源化抗体轻链模板序列进行E-F环处理,得到多个候选人源化抗体轻链序列包括:
    在数据库中对所述CDR区替换后的人源化抗体轻链模板序列进行检索,得到多序列比对结果;
    根据所述多序列比对结果,构建E-F环的位置特异性打分矩阵;
    根据所述E-F环的位置特异性打分矩阵,生成多个E-F环序列;
    将所述多个E-F环序列替换到所述CDR区替换后的人源化抗体轻链模板序列中,得到所述多个候选人源化抗体轻链序列。
  13. 根据权利要求11所述的方法,其特征在于,所述对所述CDR区替换后的人源化抗体重链模板序列进行D-E环处理,得到多个候选人源化抗体重链序列包括:
    在数据库中对所述CDR区替换后的人源化抗体重链模板序列进行检索,得到多序列比对结果;
    根据所述多序列比对结果,构建D-E环的位置特异性打分矩阵;
    根据所述D-E环的位置特异性打分矩阵,生成多个D-E环序列;
    将所述多个D-E环序列替换到所述CDR区替换后的人源化抗体重链模板序列中,得到所述多个候选人源化抗体重链序列。
  14. 根据权利要求11-13中任一项所述的方法,其特征在于,所述方法还包括:
    对所述多个候选人源化抗体轻链序列和所述多个候选人源化抗体重链序列中的高度保守的位点分别进行回复突变。
  15. 根据权利要求14所述的方法,其特征在于,所述对所述多个候选人源化抗体轻链序列和所述多个候选人源化抗体重链序列中的高度保守的位点分别进行回复突变包括:
    分别确定所述单克隆抗体的轻链和重链序列中的高度保守的位点;
    判断在所述高度保守的位点处,所述多个候选人源化抗体轻链序列和所述多个候选人源化抗体重链序列是否分别与所述单克隆抗体的轻链和重链的氨基酸一致;
    若不一致,将所述多个候选人源化抗体轻链序列和所述多个候选人源化抗体重链序列在所述高度保守的位点处的氨基酸替换回所述单克隆抗体的轻链和重链上对应位置的氨基酸。
  16. 根据权利要求11-15中任一项所述的方法,其中,所述单克隆抗体是兔单克隆抗体,所述方法还包括:
    预测所述兔单克隆抗体的轻链和重链的三维结构;
    根据预测的三维结构,若所述兔单克隆抗体中存在半胱氨酸对,且半胱氨酸对之间的距离在4埃到7埃之间,所述CDR区替换后的人源化抗体轻链模板序列和所述人源化抗体重链模板序列上对应的位置处没有氨基酸,在所述位置处插入丝氨酸,所述位置不包括CDR区域;以及
    若所述兔单克隆抗体中仅存在单个半胱氨酸,所述CDR区替换后的人源化抗体轻链模板序列和所述人源化抗体重链模板序列上对应的位置处没有氨基酸,在所述位置处插入丝氨酸。
  17. 根据权利要求11-16中任一项所述的方法,其特征在于,所述方法还包括:
    使用如权利要求6所述方法构建的人源化抗体序列评估模型分别对所述候选人源化抗体的轻链序列和重链序列进行评估,获取人源化抗体的轻链序列和重链序列。
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    对所述人源化抗体进行生物实验验证,确定所述目标人源化抗体序列。
PCT/CN2023/127070 2022-10-28 2023-10-27 人源化抗体序列评估模型的构建方法及其应用 WO2024088381A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211335547.8 2022-10-28
CN202211335547 2022-10-28

Publications (1)

Publication Number Publication Date
WO2024088381A1 true WO2024088381A1 (zh) 2024-05-02

Family

ID=90830120

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/127070 WO2024088381A1 (zh) 2022-10-28 2023-10-27 人源化抗体序列评估模型的构建方法及其应用

Country Status (1)

Country Link
WO (1) WO2024088381A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1839144A (zh) * 2003-08-07 2006-09-27 宜康公司 兔单克隆抗体的人源化方法
CN103145834A (zh) * 2013-01-17 2013-06-12 广州泰诺迪生物科技有限公司 一种抗体人源化改造方法
CN103265631A (zh) * 2013-05-07 2013-08-28 中国人民解放军第四军医大学 一种抗人crt 单克隆抗体的重链和轻链可变区

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1839144A (zh) * 2003-08-07 2006-09-27 宜康公司 兔单克隆抗体的人源化方法
CN103145834A (zh) * 2013-01-17 2013-06-12 广州泰诺迪生物科技有限公司 一种抗体人源化改造方法
CN103265631A (zh) * 2013-05-07 2013-08-28 中国人民解放军第四军医大学 一种抗人crt 单克隆抗体的重链和轻链可变区

Similar Documents

Publication Publication Date Title
Adolf-Bryfogle et al. RosettaAntibodyDesign (RAbD): A general framework for computational antibody design
Prihoda et al. BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
US20190065677A1 (en) Machine learning based antibody design
Weitzner et al. Accurate structure prediction of CDR H3 loops enabled by a novel structure-based C-terminal constraint
Bachas et al. Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness
CN103145834B (zh) 一种抗体人源化改造方法
Li et al. AbRSA: a robust tool for antibody numbering
KR20170070070A (ko) Vh-vl-도메인간 각도 기반 항체 인간화
Del Vecchio et al. Neural message passing for joint paratope-epitope prediction
Jeliazkov et al. Robustification of rosettaantibody and rosetta snugdock
Yin et al. Evaluation of AlphaFold antibody–antigen modeling with implications for improving predictive accuracy
CN115280417A (zh) 使用机器学习技术基于模板蛋白质序列来生成蛋白质序列
Ibsen et al. Prediction of antibody structural epitopes via random peptide library screening and next generation sequencing
WO2023208204A1 (zh) 基于注意力机制的抗体非定序预测方法和装置
US20140100834A1 (en) Computational methods for analysis and molecular design of antibodies, antibody humanization, and epitope mapping coupled to a user-interactive web browser with embedded three- dimensional rendering
CN105518461B (zh) 改进抗体稳定性的方法
Chungyoun et al. AI models for protein design are driving antibody engineering
WO2024088381A1 (zh) 人源化抗体序列评估模型的构建方法及其应用
Giulini et al. Towards the accurate modelling of antibody-antigen complexes from sequence using machine learning and information-driven docking
Zou et al. Antibody humanization via protein language model and neighbor retrieval
Ghanbarpour et al. Structure-free antibody paratope similarity prediction for in silico epitope binning via protein language models
Manieri et al. In silico techniques for prospecting and characterizing monoclonal antibodies
WO2023078420A1 (en) Methods for antibody optimization
Leem et al. High-throughput antibody structure modeling and design using abodybuilder
WO2024051806A1 (zh) 一种设计人源化抗体序列的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23881956

Country of ref document: EP

Kind code of ref document: A1