WO2024046205A1 - 一种基于多肽芯片的抗体表位的高通量鉴定方法 - Google Patents

一种基于多肽芯片的抗体表位的高通量鉴定方法 Download PDF

Info

Publication number
WO2024046205A1
WO2024046205A1 PCT/CN2023/114686 CN2023114686W WO2024046205A1 WO 2024046205 A1 WO2024046205 A1 WO 2024046205A1 CN 2023114686 W CN2023114686 W CN 2023114686W WO 2024046205 A1 WO2024046205 A1 WO 2024046205A1
Authority
WO
WIPO (PCT)
Prior art keywords
amino acid
subsequence
signal
short
spatial
Prior art date
Application number
PCT/CN2023/114686
Other languages
English (en)
French (fr)
Inventor
王俊
李英睿
刘兵行
解春兰
纪兴文
李丹妮
熊邦柱
Original Assignee
珠海碳云智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海碳云智能科技有限公司 filed Critical 珠海碳云智能科技有限公司
Publication of WO2024046205A1 publication Critical patent/WO2024046205A1/zh

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/543Immunoassay; Biospecific binding assay; Materials therefor with an insoluble carrier for immobilising immunochemicals
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs

Definitions

  • the invention belongs to the field of biotechnology, and specifically relates to a high-throughput identification method of antibody epitopes based on polypeptide chips.
  • Antigenic epitopes are special chemical groups present in antigen molecules that determine antigen specificity. Among protein antigens, due to the relative complexity of their structures, they often contain a variety of different antigen epitopes. Antigen epitopes composed of short peptides composed of consecutive linearly arranged amino acid residues are linear epitopes; some amino acids Although they are arranged discontinuously in sequence, they form a specific conformation in space, which is called a spatial epitope. Antigen epitopes are the basis of protein antigenicity, so we should conduct in-depth research on the diagnosis and prognosis of disease caused by protein antigen epitopes, targeted modification of protein molecules to reduce the immunogenicity of protein drugs, design of artificial vaccines without toxic side effects, and immune intervention treatments, etc. of great significance.
  • the antigen protein is synthesized.
  • the protein DNA sequence is usually inserted into a plasmid vector, and then transfected into active cells such as E. coli for expression. After expression, centrifugation, column chromatography, and column separation are performed. The same idea is then used to synthesize the heavy and light chains of the antibody.
  • the two are mixed and incubated in a 1:1 molar ratio, and then the antigen-antibody-bound complex is separated using space exclusion chromatography for crystallization. Then X-ray irradiation was used to obtain the diffraction spatial structure (contributing to 90% of the structures in the PDB database).
  • cryo-EM single-particle cryo-electron microscopy
  • the display vector can be a bacterial vector such as staphylococcal display vector, pSCEM1 or a phage such as pHORF, etc. If transfected into a bacterial vector, add antibodies to the cell culture medium, then use flow cytometry to separate cells bound to the antibodies and further sequence them. Linearly compare the sequencing data with the antigen sequence to locate the epitope region. .
  • the antigen protein is displayed on the phage surface as a fusion protein.
  • the library is then combined with the antibodies coated in ELISA, and the phages that can bind the antibodies are isolated, and then infected with host bacteria (such as E.coli ER2738) for amplification, and the next round of antibody binding-elution-amplification is performed. After 3-5 rounds of this process, a high degree of enrichment of phages that can bind to the antibody can be obtained. The enriched phages are then sequenced to obtain epitope sequences. Another idea is to integrate a large number of random sequences into the expression system.
  • the Ph.D-12 phage display system integrates a DNA sequence encoding a 12-amino acid random peptide into the capsid protein of the M13 phage.
  • the library theoretically has about 109 transformation sequences.
  • VirScan integrates a DNA sequence encoding 56 amino acids into the T7 bacteriophage display system. Both ideas are used to try to identify linear and spatial epitopes.
  • the library theoretically has 108 sequences.
  • These display systems have three main development directions in the field of antibodies: 1) challenging spatial epitopes, 2) improving throughput, and 3) studying the population virome. The third direction usually assumes that a linear epitope is identified, so it directly searches for which peptides appear in the protein sequences of which viruses.
  • the first direction challenges whether this method can identify spatial epitopes.
  • Johan Rockberg first demonstrated in Nature Method in 2008 that the bacterial display system can identify linear epitopes of monoclonal antibodies and polyclonal antibodies.
  • 2012 he published an article in Scientific Reports demonstrating that linear and spatial epitopes can be identified.
  • Another team published an article in 2017 proving that the signal peptide obtained by the expression system is formed into a motif, and then the motif is spatially aligned to the antigen protein, and the structural epitope can be identified.
  • the second direction challenges whether random sequence library construction can identify epitopes with high throughput.
  • Short peptide chip has a clear peptide space, and the experimental operation process is simple and stable. And many studies have shown that short peptide chips can be used to identify linear epitopes. There are currently two ideas for identifying epitopes based on chip-based short peptides. One is to linearly align short peptides to antigen sequences, and then enrich and score the aligned regions. The regions that are significantly enriched and have the highest scores are epitopes. area. The second is to look for commonalities in short peptides to form a motif, and then align the motif to a linear protein sequence to obtain the epitope region. However, it is not clear whether this method can identify spatial epitopes, how to identify accurate spatial epitopes, and to what extent spatial epitopes can be identified.
  • the solution proposed by the present invention aims to solve the problem of whether various epitopes can be identified using polypeptide chips. position, especially the problem of identifying spatial epitopes.
  • the present disclosure provides a method based on polypeptide chips to generate binding amino acid identification sequences to identify antigen spatial epitopes and main binding sites. It is intended to be used to identify the spatial epitopes of antibody-binding antigens and screen out antigens. The primary binding site on the epitope.
  • the present disclosure provides a method of identifying an antigenic epitope, comprising:
  • the key binding sites of the antigenic epitopes contained in the spatial alignment region are identified.
  • the present disclosure provides a method of generating an amino acid identification sequence of a specific length, comprising:
  • the second subsequence is an amino acid identification subsequence containing a preset number of amino acid identifications and the amino acids in the second subsequence
  • the logo appears non-contiguously in the amino acid logo sequence
  • a null operation is performed and a second subsequence set is generated, including:
  • each amino acid identification sequence in the amino acid identification sequence set are not set to empty, and it is determined whether the number of remaining amino acid identifications in the second subsequence generated after adding spaces is greater than or equal to the preset number of adding spaces, where the remaining amino acid identifications refer to whether Includes amino acid identifiers at first and last positions;
  • the null operation includes:
  • n is the preset number of empty spaces
  • the amino acid labeling operation includes:
  • L is the preset length of the second subsequence
  • n is the preset number of spaces.
  • Figure 1 shows the binding spectra of four antibodies Campath-1H, 9E10, 6A7 and Nivolumab.
  • the abscissa is the Zscore value of the antibody binding to the peptide chip
  • the ordinate is the second signal intensity value of the antibody binding to the peptide chip.
  • the color of each point in the map represents the linear alignment result of the key amino acid identification sequence and the target antigen protein. Black is the key amino acid identification sequence that can be linearly aligned with the antigen protein sequence
  • gray is the key that cannot be linearly aligned with the antigen protein sequence.
  • the amino acid identification sequence after combining these four antibodies with the peptide chip, the Zscore value and the second signal intensity value were both relatively large and could be linearly aligned with the 4aa key amino acid identification sequence of the antigen protein.
  • Figure 2 shows the antibody binding spectra of six antibodies a.4.6.1, F11.2.32, e111, Golimumab, Canakinumab and Bevacizumab.
  • the abscissa is the Zscore value of the antibody binding to the peptide chip, and the ordinate is the antibody to the peptide.
  • the color of each point in the map represents the linear alignment result of the key amino acid identification sequence and the target antigen protein. Black is the key amino acid identification sequence that can be linearly aligned with the antigen protein sequence, and gray is the key amino acid identification sequence that cannot be linearly aligned.
  • Figure 3 shows the binding spectra of seven poorly binding antibodies S309, hu5c8-Ruplizumab, rhPM-1-Tocilizumab, hu1124-Efalizumab, Certolizumab, D2E7-Adalimumab and CR3022, where the abscissa is the binding of the antibody to the peptide chip Zscore value, the ordinate is the second signal intensity value of the combination of the antibody and the peptide chip.
  • the color of each point in the map represents the linear alignment result of the key amino acid identification sequence and the target antigen protein. Black represents the linear alignment of the antigen protein sequence.
  • the key amino acid identification sequence, gray is the key amino acid identification sequence that cannot be linearly aligned with the antigen protein sequence. None of these seven antibodies has a 4aa key amino acid identification sequence with a large Zscore value and second signal intensity value, and the 4aa key amino acid identification sequence The number of sequences is extremely small.
  • Figure 4 shows the coverage of known sites by sites identified by this method and sites supported by key amino acid identification sequences.
  • Figure 5 shows the alphafold2 predicted spatial structure of Bax protein.
  • the amino acids at positions 13-18 have a large exposed area and are connected into a ring.
  • I19 black
  • I19 is located deep in the depression, and the area exposed to the surface of the intact protein is extremely small.
  • Figure 6 shows the predicted spatial structure (part) of TNF ⁇ protein alphafold2.
  • the gray + black area is the area formed by known binding sites, and the black area is the area formed by identified binding sites.
  • Figure 7 shows the relationship between the binding strength information of the Golimumab antibody binding site and the site score in the spatial epitope identification results.
  • the position shown here is the position of the intact protein minus 76, and the coordinates are consistent with the coordinates of spatially bound free TNF ⁇ .
  • BSA buried surface area
  • DeltaG solvation energy effect
  • Figure 8 shows the spatially resolved structure of segment 299-365 of E protein, from Spatial Structure PDB: 4FFZ.
  • Gray + black are known spatial binding sites, which basically completely wrap the structural part. Black are the three sites where hydrogen bonds and salt bridges will form.
  • Figure 9 shows the relationship between the binding strength information of the e111 antibody binding site and the site score in the spatial epitope identification results. Based on the spatially resolved structure, the buried surface area (BSA) and solvation energy effect (DeltaG) of each site can be obtained.
  • BSA buried surface area
  • DeltaG solvation energy effect
  • the term “comprises” or “includes” means the inclusion of the stated element, integer or step, but not the exclusion of any other element, integer or step.
  • the term “comprises” or “includes” is used herein, it also encompasses a combination of the stated elements, integers, or steps unless otherwise specified.
  • antibody is used in the broadest sense and specifically encompasses monoclonal antibodies (including full-length monoclonal antibodies), polyclonal antibodies, multispecific antibodies (e.g., bispecific antibodies), antibodies carrying one or more CDRs or antibody fragments or synthetic polypeptides derived from CDR sequences, as long as these polypeptides exhibit the desired biological activity.
  • Antibodies (Abs) and immunoglobulins (Igs) are glycoproteins with the same structural characteristics.
  • Antibody may also refer to immunoglobulins and immunoglobulin fragments, whether natural or partially or fully synthetic (e.g., recombinant) produced, including retained full-length immunoglobulins which comprise at least a portion of the variable region of the immunoglobulin molecule. Any fragment with binding specificity.
  • an antibody includes any protein having a binding domain that is homologous or substantially homologous to an immunoglobulin antigen-binding domain (antibody binding site).
  • Antibodies include antibody fragments, such as anti-tumor stem cell antibody fragments.
  • antibody therefore includes synthetic antibodies, recombinantly produced antibodies, multispecific antibodies (e.g., bispecific antibodies), human antibodies, non-human antibodies, humanized antibodies, chimeric antibodies, intrabodies, and antibody fragments , such as but not limited to Fab fragment, Fab' fragment, F(ab') 2 fragment, Fv fragment, disulfide-linked Fv (dsFv), Fd fragment, Fd' fragment, single-chain Fv (scFv), single-chain Fab (scFab), diabodies, anti-idiotypic (anti-Id) antibodies, or antigen-binding fragments of any of the above antibodies.
  • multispecific antibodies e.g., bispecific antibodies
  • human antibodies e.g., non-human antibodies, humanized antibodies, chimeric antibodies, intrabodies, and antibody fragments , such as but not limited to Fab fragment, Fab' fragment, F(ab') 2 fragment, Fv fragment, disulfide-linked Fv (dsFv), Fd fragment, F
  • Antibodies provided herein include any immunoglobulin type (e.g., IgG, IgM, IgD, IgE, IgA, and IgY), any class (e.g., IgGl, IgG2, IgG3, IgG4, IgAl and IgA2) or subclass (e.g., IgG2a and IgG2b) ("type” and "species” and “subtype” and “subclass” are used interchangeably herein).
  • immunoglobulin type e.g., IgG, IgM, IgD, IgE, IgA, and IgY
  • any class e.g., IgGl, IgG2, IgG3, IgG4, IgAl and IgA2
  • subclass e.g., IgG2a and IgG2b
  • Native or wild-type (i.e., derived from members of a population that have not been artificially manipulated) antibodies and immunoglobulins are typically heterotetrameric glycoproteins of approximately 150,000 Daltons consisting of two identical light chains (L) and two identical Composed of heavy chain (H). Each heavy chain has a variable domain (VH) at one end, followed by multiple constant domains. Each light chain has a variable domain (VL) at one end and a constant domain at the other end.
  • VH variable domain
  • VL variable domain
  • not artificially manipulated is meant that it has not been processed to contain or express foreign antigen-binding molecules.
  • Wild type may refer to the most prevalent allele or species found in a population or to antibodies derived from unmanipulated animals, as compared to alleles or polymorphisms, or derived from manipulation in some form such as mutagenesis , using recombinant methods, etc. to modify the amino acid variants or derivatives of the antigen-binding molecule.
  • monoclonal antibody refers to a population of identical antibodies, meaning that each individual antibody molecule in the population of monoclonal antibodies is identical to every other antibody molecule. This property is in contrast to that of a polyclonal population of antibodies, which contains antibodies with a variety of different sequences.
  • Monoclonal antibodies can be prepared by a number of well-known methods. For example, monoclonal antibodies can be prepared by immortalizing B cells, for example by fusion with myeloma cells to generate hybridoma cell lines or by infecting B cells with a virus such as EBV. Recombinant techniques can also be used to prepare antibodies in vitro from clonal populations of host cells by transforming the host cells with a plasmid carrying an artificial sequence of nucleotides encoding the antibody.
  • epitope includes any determinant capable of specific binding to an antigen-binding protein.
  • Antigenic epitope determinants usually consist of chemically active surface groups of the molecule including amino acids or sugar side chains, such as amino acids, sugar side chains, phosphoryl or sulfonyl groups, and often have specific three-dimensional structural features. and unique charge characteristics.
  • Antigenic epitopes can be "linear” or “spatial” structures. In a linear epitope, all interaction sites between a protein and interacting molecules (such as antibodies) are linearly arranged on the primary amino acid sequence of the protein. In spatial epitopes, interaction sites occur on amino acid residues that separate proteins from each other.
  • amino acid identification sequence refers to a string sequence derived from the primary amino acid sequence of a protein or peptide chain and representing the primary structure of the protein or peptide chain through specific identification characters, which may include identification characters indicating both amino acids and amino acid positions. .
  • identity herein refers to characters that can represent amino acids or amino acid positions in short peptides, including letters, numbers, arithmetic symbols, punctuation marks and other symbols, as well as some functional symbols.
  • amino acid identifier in this article refers to characters that can represent amino acids, such as common single-letter abbreviations of amino acids. Other computer-recognizable characters can also be used to represent amino acids, such as Greek letters, numbers, etc.
  • the amino acid identification sequence "RHS” contains three amino acid identifications, indicating an amino acid sequence composed of three amino acids: arginine Arg, histidine His and serine Ser, while the amino acid identification sequence "R.S” contains two
  • the amino acid identifier and a placeholder represent an amino acid sequence consisting of three amino acids: arginine Arg, an arbitrary amino acid and serine Ser.
  • amino acid identifier subsequence of the amino acid identifier sequence "RHS” includes a subsequence "RHS” composed of three amino acid identifiers and three subsequences "RH", "HS", and "R.S” composed of two amino acid identifiers. , wherein the subsequence "R.S” composed of 2 amino acid identifiers further contains a designated placeholder.
  • seed refers to the starting point from which iteration begins in the region growing process.
  • the seed used in this article is a single amino acid position that can be aligned with the amino acid position on the target protein. For example, if the spatial structure of a short peptide is compared with that of a target protein, and the output result of the comparison software shows that an amino acid at a certain position in the short peptide matches an amino acid at a corresponding position in the spatial structure of the target protein, then this match
  • the single amino acid on is the above-mentioned site, that is, the seed.
  • peptide chips to detect the signal intensity of monoclonal antibody samples.
  • the smallest loading unit of the chip is a slide, and each slide is enough to repeatedly display the peptide. All short peptides in the chip.
  • a set of diluted lysates of the monoclonal antibody to be tested is added to each slide.
  • a blank group is reserved to extract the baseline signal value. After obtaining the signal intensity value of each group, calculate the first signal intensity value of each short peptide in the chip. The larger the first signal intensity value, the stronger the signal, indicating the stronger the binding between the short peptide and the antibody.
  • the short peptides in the peptide chip are sorted from large to small according to the first signal intensity value, and the short peptides with the first signal intensity value greater than the preset threshold and the first signal intensity value within the preset sorting range are defined as short signal peptides.
  • the first signal intensity value can be the difference between the logarithm of the signal value of the monoclonal antibody group and the logarithm of the signal value of the blank group (baseline signal strength value), or it can be the signal strength value of the short peptide divided by the base signal strength value. ratio.
  • the reference signal intensity value can be the signal intensity value of the reserved blank group. If no blank group is reserved, the generally stable background signal of the short peptide on the chip can be used as the reference signal value; or according to the usage scenario, the reference signal value can be set to Customized signal intensity value or signal intensity value of reference product or control sample.
  • the first 500 to 2000 short peptides can be taken as signal short peptides (for example, the first 500 short peptides can be taken short peptides, the first 1000 short peptides or the first 2000 short peptides).
  • the first signal intensity value is the difference between the logarithmic value of the monoclonal antibody group signal value and the logarithmic value of the blank group signal value (reference signal intensity value)
  • the first signal intensity value is the signal intensity value of the short peptide and the benchmark
  • the first 2000 short peptides with the first signal intensity value > 2 are taken as short signal peptides, indicating that these short peptides on the peptide chip can combine with the monoclonal antibody to generate a signal
  • the first When the base of the logarithmic value obtained by the signal strength value is 2, the two situations represented by (1) and (2) are the same.
  • each short peptide binds to the antibody, which amino acids and amino acid combinations are used to achieve this, all amino acid identifier subsequences with a length ⁇ 7 are generated from the signal short peptide (including all amino acid sequences without leaving blanks).
  • the first subsequence and the second subsequence with the designated placeholder ".” left blank) each subsequence may contain 1-6 amino acid (aa) identifiers.
  • all subsequences of an amino acid identification sequence ABCD include 4 1aa subsequences (A, B, C, D), 6 2aa subsequences (AB, A.C, A..D, BC, B.D, CD), 4 3aa subsequences (ABC, AB.D, A.CD, BCD) and 1 4aa subsequence (ABCD), among which, such as "AB”, "ABC”, and "BCD" belong to the first subsequence that does not leave blank, For example, "A.C", "A..D", and "AB.D" belong to the second subsequence left blank.
  • the amino acid positions at the beginning and end of the subsequence are positions where spaces cannot be added.
  • the number of spaces that can be added to an amino acid identifier subsequence of length L is 1 to L-2.
  • the number of blanks is n, determine the size of the number of remaining amino acids L-2-n after adding the blanks and the number of blanks n.
  • the second subsequence is generated by adding blanks; if the number of remaining amino acids after adding blanks is less than the number of blanks, then the second subsequence Sequences were generated using the amino acid addition method.
  • the method of adding spaces means that given a full amino acid identification string composed of amino acids, replace the amino acids with specified placeholders, such as ".”, and generate various forms of second subunits one by one from adding 1 space to adding n spaces. sequence.
  • the method of adding amino acids means that given an all-empty string composed of specified placeholders, such as ".”, replace ".” with amino acids, and generate various forms from adding 1 to adding L-2-n amino acids one by one. Second subsequence.
  • the sample to be tested is subjected to polypeptide chip detection to obtain a signal peptide. Based on the obtained signal peptide, an amino acid identifier subsequence set is generated, and the number of short peptides containing a certain amino acid identifier subsequence is counted, recorded as X;
  • the number of short peptides randomly selected from all the short peptides covered by the chip should be the same as the number of signal short peptides obtained from the sample to be tested;
  • the first 2000 short peptides with the first signal intensity value > 1 are taken as short signal peptides, a collection of amino acid identifier subsequences is generated based on the short signal peptides, and short sequences containing a certain amino acid identifier subsequence are counted.
  • the number of peptides Obtain the distribution of the number of occurrences of each subsequence among 10,000 times, calculate the mean mean and standard deviation sd, and calculate the Zscore according to the following formula:
  • the Zscore value reflects whether the subsequence appears randomly in the signal peptide.
  • the Zscore is larger, the subsequence appears more frequently in the overall short signal peptides, which theoretically indicates that it will independently bring about the binding of antigen and antibody.
  • the Zscore is likely to be too large, so the credibility is low.
  • Zscore>10 and the second signal intensity value>3 are selected to define the binding amino acid identification sequence.
  • each short peptide usually has multiple alignment results.
  • select the alignment record whose amino acid identification sequence is supported by the key amino acid identification sequence For example, if the key amino acid identification sequence is RHS, and if xRHSxx exists in the alignment region, the alignment region is considered to be supported by the key amino acid identification sequence.
  • the alignment record supported by the key amino acid identification sequence it is necessary to integrate the alignment records of different short peptides to form an epitope region.
  • short peptides and spatial comparison records are sorted in reverse order (from large to small) by multiple keywords according to the first signal intensity value of the short peptide and the spatial comparison score.
  • each strong signal short peptide is sorted according to the first signal intensity value. Sort them in reverse order, and then sort them in reverse order according to the spatial alignment scores of each spatial alignment record corresponding to each strong signal short peptide. Select the first short peptide in sorting order, and use the aligned sites in this short peptide as seeds. These seeds are defined as sites in existing regions; then traverse the spatial alignment of other unused short peptides with strong signals.
  • each region has short peptides and alignment records used to form the region.
  • any alignment record is subject to The site supported by the key amino acid identification sequence is the key binding site.
  • the accumulation of the first signal intensity values of each strong signal short peptide for each binding site is the cumulative score of the site.
  • the present disclosure provides a method of identifying an antigenic epitope, comprising:
  • the key binding sites of the antigenic epitopes contained in the spatial alignment region are identified.
  • a collection of amino acid identification sequences in the polypeptide chip that generate a signal when combined with the antibody to be tested is obtained, including:
  • the first signal intensity value of the short peptide is determined based on the signal intensity value of the signal generated after the short peptide binds to the antibody to be tested, wherein the first signal intensity value is within a preset range;
  • a set of amino acid identification sequences is generated based on the amino acid sequence of each short signal peptide.
  • a set of signal short peptides is determined among the short peptides in the polypeptide chip, including:
  • the short peptides whose first signal intensity value satisfies at least one of the following conditions among each short peptide are determined as a signal short peptide set:
  • the first condition the first signal intensity value is the difference between the logarithmic value of the short peptide's signal intensity value and the logarithmic value of the reference signal intensity value, and the first signal intensity value is >1;
  • the first signal intensity value is the ratio of the signal intensity value of the short peptide to the reference signal intensity value, and the first signal intensity value is greater than the preset ratio threshold;
  • the third condition Sort the first signal intensity values of each short peptide from large to small, and the first signal intensity value sort is within the preset sorting range;
  • the signal short peptide is a short peptide that satisfies both the first condition and the third condition.
  • the signal short peptide is a short peptide that satisfies both the second condition and the third condition.
  • the signal short peptide is a short peptide whose first signal intensity value satisfies one or more of the following conditions:
  • the first signal intensity value is the difference between the logarithmic value of the short peptide signal and the logarithmic value of the reference signal, and the first signal intensity value is >1;
  • the first signal intensity value is the ratio of the signal value of the short peptide to the reference signal value, and the signal value of the short peptide is more than twice the reference signal value;
  • the reference signal is selected from the blank group signal or the background signal, wherein the blank group signal is the signal of the reserved short peptide that is not bound to the antibody to be tested, and the background signal is the common short peptide in use. Signal.
  • the short signal peptide is a short peptide with a first signal intensity value >1 that is sorted from large to small, and the first signal intensity value is in the top 2000 short peptides or the first signal intensity value is Among the first 2000 short peptides sorted from large to small, the short peptide with the first signal intensity value >1.
  • the short signal peptides are short peptides whose signal value is more than 2 times the reference signal value and are sorted from large to small, where the first signal value intensity is located in the top 2000 short peptides or Among the first 2000 short peptides sorted from largest to smallest first signal intensity value, the signal value of the short peptide is more than twice the reference signal value.
  • a set of key amino acid identity sequences is generated based on a set of amino acid identity sequences, including:
  • a set of amino acid identifier subsequences is generated based on a set of amino acid identifier sequences, including:
  • the second subsequence is an amino acid identification subsequence containing a preset number of amino acid identifications and the amino acids in the second subsequence
  • the logo appears non-contiguously in the amino acid logo sequence
  • a null operation is performed and a second subsequence set is generated, including:
  • each amino acid identification sequence in the amino acid identification sequence set are not set to empty. Calculate whether the number of remaining amino acid identifications in the second subsequence generated after adding spaces is greater than or equal to the preset number of adding spaces, where the number of remaining amino acid identifications is It is the number of remaining amino acid identifiers excluding the first and last amino acid identifiers;
  • the null operation includes:
  • n is the preset number of empty spaces
  • the amino acid labeling operation includes:
  • L is the preset length of the second subsequence
  • n is the preset number of spaces.
  • the set of amino acid identifier sequences includes amino acid identifier sequences that are less than or equal to 13 amino acids in length.
  • the set of amino acid identifier sequences includes amino acid identifier sequences that are less than or equal to 6 amino acids in length.
  • screening a collection of amino acid identifier subsequences and obtaining a combined collection of amino acid identifier sequences includes:
  • the second signal intensity value is to sort the signal intensity values of the short peptides corresponding to the amino acid identifier subsequences from large to small.
  • the signal intensity values are sorted at The average of the signal strength within the preset sorting range;
  • the enrichment analysis value is used to characterize the randomness of the occurrence of the amino acid identifier subsequence in the short signal peptide;
  • amino acid identifier subsequence is combined with the amino acid identifier sequence
  • the amino acid identifier subsequence is not a combined amino acid identifier sequence.
  • the enrichment analysis value is Zscore, where obtaining Zscore includes:
  • the sample to be tested is subjected to polypeptide chip detection to obtain a short signal peptide. Based on the short signal peptide, a collection of amino acid identifier subsequences is generated, and the number of short peptides containing a certain amino acid identifier subsequence is counted, recorded as X;
  • the number of short peptides extracted from the preset number should be equal to the number of short signal peptides obtained when the sample to be tested is subjected to peptide chip detection;
  • the preset sorting range is the first three short peptides sorted.
  • the preset number of times is 10,000, and the preset number of short peptides sampled each time is 2,000.
  • the amino acid identifier subsequence is a binding amino acid subsequence.
  • screening the set of binding amino acid identity sequences and obtaining a set of key amino acid identity sequences includes:
  • the binding amino acid identifier sequence is the first binding amino acid identifier. sequence, and performs the following recursive filtering operations, including:
  • the amino acid identification sequence corresponding to the second binding amino acid identification sequence can generate the first binding amino acid identification sequence and the number of amino acid identifications in the second binding amino acid identification sequence is m+1;
  • the second binding amino acid identification sequence is a key amino acid identification sequence
  • the second binding amino acid identification sequence is not a critical amino acid identification sequence.
  • m is 1.
  • based on the position of the short peptide in the peptide chip that generates a strong signal when combined with the antibody to be tested in the spatial structure of the target protein including:
  • the first signal intensity value of the short peptide is determined based on the signal intensity value of the signal generated after the short peptide binds to the antibody to be tested, wherein the first signal intensity value is within a preset range;
  • the strong signal short peptide corresponding to the spatial alignment record is not added to the set of epitope regions to be integrated.
  • the strong signal short peptide is a short peptide whose first signal intensity value satisfies one or more of the following conditions:
  • the first signal intensity value is the difference between the logarithmic value of the short peptide signal and the logarithmic value of the reference signal, and the first signal intensity value is >3;
  • the first signal intensity value is the difference between the signal value of the short peptide and the reference signal value, and the signal value of the short peptide is more than 8 times the reference signal value;
  • the short peptide with a strong signal is a short peptide with a first signal intensity value >3 that is sorted from large to small
  • the short peptides with the first signal intensity value in the top 100 are the short peptides with the first signal intensity value >3.
  • the short signal peptides are short peptides whose signal value is more than 8 times the reference signal value, sorted from large to small, and the first signal value intensity is in the top 100 short peptides. Or among the first 100 short peptides after sorting the first signal intensity value from large to small, the signal value of the short peptide is more than 8 times the reference signal value.
  • the comparison uses comparison software.
  • the comparison software is selected from PepSurf, Pep-3D-Search, and PepMapper.
  • the comparison software is PepSurf.
  • the alignment is looped no more than 100,000 times.
  • spatial alignments are reported as significant (P value ⁇ 0.05).
  • spatial alignment regions are generated including:
  • Each strong signal short peptide in the epitope region set to be integrated corresponds to a first signal intensity value
  • the spatial comparison records of the short peptide are Sort;
  • the amino acid site of the targeted antigen in the spatial alignment record with the first sorted order is used as the seed.
  • the next spatial alignment record or the next strong signal short peptide corresponding to the strong signal short peptide Perform region growing operations
  • each sorted spatial alignment record corresponding to the strong signal short peptide determine whether there is at least one spatial alignment record including a site in the first existing region among all spatial alignment records corresponding to the strong signal short peptide;
  • the key binding sites of the antigenic epitopes contained in the spatial alignment region are identified, including:
  • this amino acid position is a critical binding site for the epitope in the targeted protein.
  • the present disclosure provides a method of generating an amino acid identification sequence of a specific length, comprising:
  • Generate an amino acid identification sequence corresponding to the amino acid sequence perform the following segmentation operation and generate a first subsequence set: divide the amino acid identification sequence into a first subsequence containing a preset number of amino acid identifications, and the amino acid identifications in the first subsequence are The amino acid identification sequence appears continuously;
  • the second subsequence is an amino acid identification subsequence containing a preset number of amino acid identifications and the amino acids in the second subsequence
  • the logo appears non-contiguously in the amino acid logo sequence
  • a null operation is performed and a second subsequence set is generated, including:
  • each amino acid identification sequence in the amino acid identification sequence set are not set to empty, and it is determined whether the number of remaining amino acid identifications in the second subsequence generated after adding spaces is greater than or equal to the preset number of spaces, and the number of remaining amino acid identifications is It is the number of remaining amino acid identifiers excluding the first and last amino acid identifiers;
  • the null operation includes:
  • n is the preset number of empty spaces
  • the amino acid labeling operation includes:
  • L is the preset length of the second subsequence
  • n is the preset number of spaces.
  • the set of amino acid identifier sequences includes amino acid identifier sequences that are less than or equal to 13 amino acids in length.
  • the set of amino acid identifier sequences includes amino acid identifier sequences that are less than or equal to 6 amino acids in length.
  • the above method proposed by the present disclosure can further identify the key binding sites in the epitope that play a key role in the binding of antigens and antibodies on the basis of identifying the spatial epitope binding sites.
  • Example 1 Method for identifying key binding sites of spatial epitopes
  • the peptide chip is used to detect the signal intensity of monoclonal antibody samples.
  • the smallest loading unit of the chip is a slide, and each slide is enough to repeatedly display all the short peptides in the peptide chip.
  • a set of diluted lysates of the monoclonal antibody to be tested is added to each slide.
  • a blank group is reserved to extract the baseline signal value.
  • the signal intensity value of each group After obtaining the signal intensity value of each group, calculate the difference between the logarithmic value of the monoclonal antibody group signal and the logarithmic value of the blank group signal (reference signal value), and obtain the first signal intensity value of each short peptide on the chip as the signal intensity of the short peptide. . If no blank group is reserved, the generally stable background signal of the short peptide on the chip will be used as the reference signal value.
  • the first 2000 short peptides with the first signal intensity value > 1 are taken as short signal peptides, indicating that these short peptides on the peptide chip can combine with the monoclonal antibody to generate a signal; according to the total amount of short peptides on the peptide chip, the first short peptide can also be taken
  • the first 500 to 1000 short peptides with signal intensity values >1 are used as signal peptides.
  • the larger the first signal intensity value the stronger the signal, indicating that the binding of the short peptide to the antibody is stronger.
  • Short peptides with a first signal intensity value >3 are further defined as strong signal short peptides.
  • each short peptide binds to the antibody, which amino acids and amino acid combinations are used to achieve this, all amino acid identifier subsequences with a length ⁇ 7 are generated from the signal short peptide (including all amino acid sequences without leaving blanks).
  • the first subsequence and the second subsequence with the designated placeholder ".” left blank) each subsequence may contain 1-6 amino acid (aa) identifiers.
  • the amino acid positions at the beginning and end of the subsequence are positions where spaces cannot be added.
  • the number of spaces that can be added to an amino acid identifier subsequence of length L is 1 to L-2.
  • the number of blanks is n, determine the size of the number of remaining amino acids L-2-n after adding the blanks and the number of blanks n.
  • the second subsequence is generated by adding blanks; if the number of remaining amino acids after adding blanks is less than the number of blanks, then the second subsequence sequence Produced by adding amino acids.
  • the method of adding spaces means that given a full amino acid identification string composed of amino acids, replace the amino acids with specified placeholders, such as ".”, and generate various forms of second subunits one by one from adding 1 space to adding n spaces. sequence.
  • the method of adding amino acids means that given an all-empty string composed of specified placeholders, such as ".”, replace ".” with amino acids, and generate various forms from adding 1 to adding L-2-n amino acids one by one. Second subsequence.
  • the Zscore value reflects whether the subsequence appears randomly in the signal peptide.
  • Zscore>10 and the second signal intensity value>3 are selected to define the binding amino acid identification sequence.
  • each short peptide usually has multiple alignment results.
  • select alignment records whose amino acid sequences in the alignment are supported by key amino acid identification sequences. For example, if the key amino acid identification sequence is RHS, and if xRHSxx exists in the alignment region, the alignment region is considered to be supported by the key amino acid identification sequence. After obtaining the alignment record supported by the key amino acid identification sequence, it is necessary to integrate the alignment records of different short peptides to form an epitope region.
  • short peptides and spatial comparison records are sorted in reverse order (from large to small) by multiple keywords according to the first signal intensity value of the short peptide and the spatial comparison score.
  • each strong signal short peptide is sorted according to the first signal intensity value. Sort them in reverse order, and then sort them in reverse order according to the spatial alignment scores of each spatial alignment record corresponding to each strong signal short peptide. Select the first short peptide in sorting order, and use the aligned sites in this short peptide as seeds. These seeds are defined as sites in existing regions; then traverse the alignment records of other unused strong signal short peptides.
  • These strong signal short peptides are replaced with the unused strong signal short peptides and their corresponding ones according to the above sorting order. seed, to form new existing areas, and continue to iterate until no new existing areas are generated. All the existing regions that have completed the expansion together constitute a spatial alignment region. Each region has short peptides and alignment records used to form the region. Among the sites included in the selected region, any alignment record is subject to The site supported by the key amino acid identification sequence is the key binding site. The accumulation of the first signal intensity values of each strong signal short peptide for each binding site is the cumulative score of the site.
  • This example uses 17 purchased monoclonal antibodies with known epitopes (see Table 1 for product information), uses Health Tell's V16 peptide chip (Product No.: V16_10296), and conducts antibody-chip binding experiments according to standard experimental procedures.
  • the peptide chip used in the embodiment of the present invention is the V16 version chip of HealthTell Company, which contains 3,218,577 short peptides.
  • the chip detects 18 amino acids All 4aa combinations of acids (excluding cysteine Cys, C and methionine Met, M) are covered, and 99.97% of the 4aa identifier subsequences reach coverage of 10 short peptides or more.
  • Sample preparation The antibody sample is diluted twice with 1% D-mannitol solution to obtain a 100ng/mL sample plate to be tested for later use;
  • Blocking treatment of the chip Pipette 300 ⁇ L of the prepared blocking solution into the chip wells (the chip is placed in the Cassette), seal the membrane, mix at 600 rpm for 20 seconds, and then incubate in an incubator at 37°C for 1 hour;
  • Adding samples to the blocked Cassette (chip):
  • the minimum adding unit of the chip is a slide, and there is one sample in each slide.
  • a set of monoclonal antibody diluted solution is added to each slide, and the diluted According to the well position information of the plate arrangement table, transfer 300 ⁇ L of the antibody to the corresponding well position of the Cassette;
  • Second incubation Seal the Cassette, place it on a thermostatic mixer and incubate for 60 minutes;
  • Second washing Peel off the Cassette film, place it in an automatic plate washer, and use the automatic plate washer to wash the plate ( Microplate magnetic plate washer);
  • Imaging scanning disassemble, clean, and dry the chip in the Cassette, assemble it into the Imaging Cassette, and put it into Molecular Device's ImageXpress micro4 imager for scanning and imaging. Finally, each detection sample obtains a TIFF image file, which is the original data;
  • the GPR5 file contains all the information of a sample and the fluorescence intensity information of all characteristics.
  • the data generated in this step (i.e., the signal value of the antibody-binding peptide chip) was calculated by taking the logarithmic difference between the monoclonal antibody group signal and the blank group signal to obtain the first signal intensity value of each short peptide on the chip, and then using the first If the signal intensity value is >1, the short signal peptide will be screened, and then the subsequent epitope identification step will be entered.
  • the spatial epitope of an antibody consists of a series of amino acid sites that can participate in antigen-antibody binding.
  • the coverage rate of all identified binding sites by this method can reach 80% of the binding sites in each antibody spatial epitope, and there are sites supported by key amino acid identification sequences (i.e., key binding sites) Coverage of known binding sites can reach 60%.
  • the linear epitope site of antibody 6A7 and the spatial epitope site of antibody Golimumab were better identified; while the e111 antibody targeting dengue virus was less effective in identifying binding sites.
  • the fluctuations in the identification results of different antibodies are mainly affected by the following factors: 1) In the antibody spatial epitope, the binding strengths of different binding sites are different. Different sites will form different forces such as van der Waals forces and hydrogen bonds with specific amino acids in the antibody, thereby forming binding. The exposed area of each site on the surface of the antigen spatial structure is different. The area covered in the binding area after the antigen-antibody is combined is also different. The type of force that the binding relies on is also different, so the intensity of the force formed is different; 2) The degree of aggregation of strong binding sites. Since the binding strengths of antibody spatial epitope sites are different, the spatial epitope identification method of the present invention mainly considers the substrings formed by various combinations of 4 amino acids within a length of 6 amino acids.
  • the spatial structures used in the present invention are all spatial structures predicted by alphafold2, and this structure may deviate from the real structure.
  • Conformational changes brought about by antigen-antibody binding Among the antigenic epitopes, there are some sites that may have played a binding role before the structural change, but after the spatial conformation changes caused by the antigen-antibody binding, the role is relatively weakened. This change will cause the binding site we identified to be the binding site before the conformation change (assuming that the binding of the short peptide to the antibody will not cause a conformational change of the antibody), rather than the binding site after the conformation change determined by structural analysis. point.
  • the known spatial epitope of the 6A7_Bax antibody is composed of 7 binding sites, which is obtained by analyzing the spatial structure of the short peptide PTSSEQI and the 6A7 antibody binding complex composed of 7 sites.
  • the seventh Ile amino acid in PTSSEQI has a very small surface area on the protein space (as shown in Figure 5). As a result, this position is not considered to be aligned during spatial alignment, so this amino acid cannot be identified using the spatial epitope method. .
  • This spatial epitope method only identified 6 binding sites (of which Q18 had the lowest relative score).
  • Golimumab's antibody spatial epitope consists of 33 amino acids on TNF. These 33 amino acids form two binding regions (as shown in Figure 6), of which 88 and 89 form one region and the other binding sites form the other.
  • the spatial epitope identification method identified both regions and most sites. Among these sites, E104 and E107 are believed to be able to form hydrogen bonds and salt bridges in the analyzed spatial structure, which are important sites that distinguish this antibody from other TNF monoclonal antibodies (Ono, M., et al., Structural basis for tumor necrosis factor blockade with the therapeutic antibody golimumab. Protein Science, 2018.27(6):p.1038-1046.). Both sites were accurately identified as critical binding sites in this method.
  • the spatial epitope of the e111 antibody consists of 73 scattered sites on the E protein of dengue virus. Alphafold2 has not yet published the spatial structure of this protein, and no independent spatial structure of this protein has been resolved. Only the spatial structure of the E protein 299-395 region and the antibody-binding complex has been published. Therefore, the spatial structure of protein E used in this disclosure is the structure of a partial region of protein E isolated from the complex. This is one of the main reasons for the low coverage of e111 binding site analysis. Another reason is that currently known epitopes basically cover the entire spatial structure, which poses a great challenge to current epitope identification algorithms.
  • K343 is considered to be able to form hydrogen bonds and salt bridges with the amino acids corresponding to the antibody, and has a relatively largest BSA.
  • K363 has a relatively smallest ⁇ G, can also form hydrogen bonds and salt bridges, and forms the key amino acid identification sequence PK with P364, all of which have been accurately identified. Therefore, it is believed that the spatial epitope identification of e111 is relatively reliable.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Hematology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Urology & Nephrology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Genetics & Genomics (AREA)
  • General Physics & Mathematics (AREA)
  • Food Science & Technology (AREA)
  • Cell Biology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Peptides Or Proteins (AREA)

Abstract

本公开涉及一种鉴定抗原表位的方法,该方法包括获取多肽芯片中与待测抗体结合产生信号的氨基酸标识序列集合,并基于氨基酸标识序列集合生成关键氨基酸标识序列集合;基于多肽芯片中与待测抗体结合产生强信号的短肽在靶向蛋白的空间结构中的位置,生成空间比对区域;基于空间比对区域,鉴定空间比对区域中包含的抗原表位的关键结合位点。该方法在鉴定得到抗原空间表位的结合位点的基础上,能够进一步筛选出在抗原空间表位中发挥关键结合作用的关键结合位点。

Description

一种基于多肽芯片的抗体表位的高通量鉴定方法 技术领域
本发明属于生物技术领域,具体涉及一种基于多肽芯片的抗体表位的高通量鉴定方法。
背景技术
抗原表位是存在于抗原分子中决定抗原特异性的特殊化学基团。在蛋白质抗原中,由于其结构的相对复杂性,常含有多种不同的抗原表位,由连续性线性排列的氨基酸残基组成的短肽所构成的抗原表位为线性抗原表位;有些氨基酸虽然在序列上不连续性排列,但在空间上形成特定的构象,称为空间表位。抗原表位是蛋白质抗原性的基础,所以深入研究蛋白质抗原表位对疾病的诊断及预后判定,定点改造蛋白质分子以降低蛋白质药物的免疫原性、设计无毒副作用的人工疫苗以及免疫干预治疗等具有重要意义。
当前鉴定抗体表位的技术方案大体有三类:
(1)抗原抗体结晶与结构解析。首先合成抗原蛋白,这一步通常将蛋白DNA序列插入质粒载体,然后转染到E.coli等活性细胞中进行表达。表达后,进行离心分离、层析柱和色谱柱分离。然后采用同样的思路合成抗体的重链和轻链。接下来,将两者按照1:1摩尔比例混合孵育,再用空间排阻色谱法分离出抗原抗体结合的复合物用于结晶。然后用X射线照射获得衍射空间结构(贡献了PDB数据库中90%的结构)。而后续发展的单颗粒冷冻电镜(Cryo-EM)方法,则先将抗原抗体复合物冷冻固定在玻璃态冰中,再用电子显微镜成像,因为不需要结晶,已成为当下结构生物学主要工具。获得空间结构后,用COCOMAP、PYMOL、PISA等软件可以基于氨基酸距离和理化性质推测氨基酸间的成键和作用力,进而确定抗体表位区域主要结合的氨基酸。但单颗粒冷冻电镜等技术复杂和繁复,对实验操作人员要求高,进而通量低稳定性不好。
(2)展示系统。噬菌体展示系统和细菌展示系统都被用来尝试鉴定抗体的表位。一种方法思路是将抗原蛋白DNA序列合成PCR扩增后打断成随机片段,再将片段链接到展示载体。展示载体可以是细菌载体如staphylococcal display vector,pSCEM1或噬菌体如pHORF等。若转染到细菌载体中,则在细胞培养液中加入抗体,再用流式细胞仪分离与抗体结合的细胞并进一步对其测序,将测序数据与抗原序列进行线性比对,定位表位区域。若转染到噬菌体表达系统,则随着抗原蛋白与衣壳蛋白表达,抗原蛋白以融合蛋白的方式展示到噬菌体表面。再将文库与包被在ELISA的抗体结合,能结合抗体的噬菌体分离出来,再将其感染宿主细菌(如E.coli ER2738)进行扩增,进行下一轮抗体结合-洗脱-扩增。经过3-5轮此过程,就可以获得能与抗体结合的噬菌体的高度富集。然后再对富集的噬菌体测序获得表位序列。另一种思路是将大量的随机序列整合到表达系统中。如Ph.D-12噬菌体展示系统,里面将编码12个氨基酸随机肽的DNA序列整合到M13噬菌体的衣壳蛋白上,该文库理论上有约109个转化序列。VirScan则将编码56个氨基酸的DNA序列整合到T7细菌噬菌体展示系统,这两种思路均被用于尝试鉴定线性和空间表位。该文库理论上有108个序列。这些展示系统在抗体领域主要有三个方向的发展:1)挑战空间表位,2)提高通量,3)人群病毒组的研究。其中第三个方向通常假定鉴定到的是线性表位,故直接检索哪些肽出现在哪些病毒的蛋白序列中。方向一则挑战该方案是否可以鉴定出空间表位。Johan Rockberg在2008年首次在Nature Method证明细菌展示系统可以鉴定单克隆抗体和多克隆抗体的线性表位,2012年又在Scientific report发表文章证明可以鉴定线性和空间表位。另有团队在2017年发表文章证明将表达系统获得的信号肽形成motif,再将motif空间比对到抗原蛋白上,能够鉴定到结构表位。方向二则挑战随机序列建库能否高通量的鉴定表位。考虑到研究中的抗体都是由短肽所诱发,最后研究174个抗体中有112个有信号肽,其中49/112个抗体的表位能够定位到诱发用的线性肽上。该研究信号肽的cutoff是一个非常敏感的参数,当cutoff为3x时,66个抗体有1-20个结合肽,考虑到肽的数量较少,故主要通过看是否形成显著的motif来研究其共性。该技术方案存在的主要问题时:1)只有与抗体强结合的肽信号才会富集,这就导致最终的信号肽数量是少的,结合偏弱的信号就无法检测到;2)理论肽空间与实际肽空间的差距不清楚,这会影响到检测的稳定性和检测空间的不可预估。
(3)短肽芯片。短肽芯片有明确的肽空间,实验操作流程简单而稳定。且有较多研究表明短肽芯片可以用来鉴定线性表位。现有两种基于芯片短肽鉴定表位的思路,一种是短肽线性比对到抗原序列,然后对比对到的区域进行富集和打分评估,显著富集且得分最高的区域为表位区域。第二种是寻找短肽的共性,形成motif,然后将motif比对到线性蛋白序列中,获得表位区域。但该种方法能否鉴定空间表位,如何鉴定准确的空间表位以及什么程度的空间表位是可以鉴定的并不清楚。
然而现有的成熟技术往往都是昂贵且低通量的,潜在的高通量方案也没有获得应有的证明,因而并不成熟。目前的技术中很难鉴定到空间表位,且已知的表位中也存在一些冗余或非必要的结合位点,本发明所提出的方案旨在解决利用多肽芯片能否鉴定各类表位,尤其是鉴定空间表位的问题,同时在鉴定得到空间表位结合位点的基础上,进一步鉴定出表位中对于抗原抗体结合发挥关键作用的关键结合位点。
发明内容
为了克服现有技术的缺陷,本公开提供了基于多肽芯片生成结合氨基酸标识序列从而鉴定抗原空间表位和主要结合位点的方法,旨在用于鉴定抗体结合抗原的空间表位并筛选出抗原表位上的主要结合位点。
一方面,本公开提供了一种鉴定抗原表位的方法,包括:
获取多肽芯片中与待测抗体结合产生信号的氨基酸标识序列集合,并基于氨基酸标识序列集合生成关键氨基酸标识序列集合;
基于多肽芯片中与待测抗体结合产生强信号的短肽在靶向蛋白的空间结构中的位置,生成空间比对区域;
基于空间比对区域,鉴定空间比对区域中包含的抗原表位的关键结合位点。
另一方面,本公开提供了一种生成特定长度氨基酸标识序列的方法,包括:
对于氨基酸标识序列集合中的各氨基酸标识序列,执行以下分割操作并生成第一子序列集合:将该氨基酸标识序列分割为包含预设氨基酸标识个数的第一子序列,第一子序列中的氨基酸标识在该氨基酸标识序列中是连续出现的;
对于氨基酸标识序列集合中的各氨基酸标识序列,执行加空操作并生成第二子序列集合,第二子序列为包含预设氨基酸标识个数的氨基酸标识子序列且该第二子序列中的氨基酸标识在该氨基酸标识序列中是非连续出现的;
将第一子序列集合和第二子序列集合添加到氨基酸标识子序列集合中,生成氨基酸标识子序列集合;
其中,执行加空操作并生成第二子序列集合,包括:
氨基酸标识序列集合中的各氨基酸标识序列的首尾位置设定不为空,确定加空后生成的第二子序列中剩余氨基酸标识的数量是否大于等于预设加空数量,其中剩余氨基酸标识指不包括首尾位置的氨基酸标识;
若是,对由氨基酸标识构成的全氨基酸标识子序列进行加空操作,加空操作包括:
利用指定占位符从1个到n个逐个替换全氨基酸标识子序列中的氨基酸标识,生成加n个空的第二子序列;
其中,n为预设加空数量;
若否,对由指定占位符构成的全空子序列进行加氨基酸标识操作,加氨基酸标识操作包括:
利用氨基酸标识从1个到L-2-n个逐个替换指定占位符,生成包含L-2-n个剩余氨基酸标识的第二子序列;
其中,L为第二子序列的预设长度,n为预设加空数量。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本说明书的实施例,并与说明书一起用于解释本说明书的原理。
图1示出了四个抗体Campath-1H、9E10、6A7和Nivolumab的结合谱图,其中横坐标为抗体与多肽芯片结合的Zscore值,纵坐标为抗体与多肽芯片结合的第二信号强度值,图谱中的各点颜色表示关键氨基酸标识序列与靶向抗原蛋白的线性比对结果,黑色是能够线性比对上抗原蛋白序列的关键氨基酸标识序列,灰色是不能线性比对上抗原蛋白序列的关键氨基酸标识序列,这四个抗体与多肽芯片结合后均得到了Zscore值和第二信号强度值都偏大且能线性比对上抗原蛋白的4aa关键氨基酸标识序列。
图2示出了六个抗体a.4.6.1、F11.2.32、e111、Golimumab、Canakinumab和Bevacizumab的抗体结合谱图,其中横坐标为抗体与多肽芯片结合的Zscore值,纵坐标为抗体与多肽芯片结合的第二信号强度值,图谱中的各点颜色表示关键氨基酸标识序列与靶向抗原蛋白的线性比对结果,黑色是能够线性比对上抗原蛋白序列的关键氨基酸标识序列,灰色是不能线性比对上抗原蛋白序列的关键氨基酸标识序列,这六个抗体与多肽芯片结合后仅得到了Zscore值和第二信号强度值偏大的4aa关键氨基酸标识序列,但未得到能线性比对上抗原蛋白的4aa关键氨基酸标识序列。
图3示出了七个结合较差的抗体S309、hu5c8-Ruplizumab、rhPM-1-Tocilizumab、hu1124-Efalizumab、Certolizumab、D2E7-Adalimumab和CR3022的结合谱图,其中横坐标为抗体与多肽芯片结合的Zscore值,纵坐标为抗体与多肽芯片结合的第二信号强度值,图谱中的各点颜色表示关键氨基酸标识序列与靶向抗原蛋白的线性比对结果,黑色是能够线性比对上抗原蛋白序列的关键氨基酸标识序列,灰色是不能线性比对上抗原蛋白序列的关键氨基酸标识序列,这七个抗体均没有Zscore值和第二信号强度值都大的4aa关键氨基酸标识序列,并且4aa关键氨基酸标识序列的数量极少。
图4示出了本方法鉴定得到的位点和受关键氨基酸标识序列支持的位点对已知位点的覆盖度。
图5示出了Bax蛋白的alphafold2预测空间结构。13-18(灰色)位置处的氨基酸暴露面积大,连成一环。而I19(黑色)处于凹陷深处,暴露在完整蛋白表面的面积极小。
图6示出了TNFα蛋白alphafold2预测空间结构(部分)。灰色+黑色区域是已知结合位点形成的区域,黑色区域是鉴定到的结合位点形成的区域。
图7示出了Golimumab抗体结合位点的结合强度信息与空间表位鉴定结果中位点得分的关系。此处显示位置为完整蛋白位置减去76,坐标与空间结合的游离态TNFα的坐标一致。基于空间解析结构,能够获得每个位点的包埋表面面积(BSA)和溶剂化能效应(DeltaG)。其中BSA越大,其对抗原抗体结合作用力的贡献可能会较大;而DeltaG越小则则结合作用力可能越强。
图8示出了E蛋白299-365段的空间解析结构,来自空间结构PDB:4FFZ。灰色+黑色为已知空间结合位点,将结构部分基本上完全包裹。黑色为三个会形成氢键和盐桥的位点。
图9示出了e111抗体结合位点的结合强度信息与空间表位鉴定结果中位点得分的关系。基于空间解析结构,能够获得每个位点的包埋表面面积(BSA)和溶剂化能效应(DeltaG)。
具体实施方式
I.定义
在本公开中,除非另有说明,否则本文中使用的科学和技术名词具有本领域技术人员所通常理解的含义。并且,本文中所用的蛋白质和核酸化学、分子生物学、细胞和组织培养、微生物学、免疫学相关术语和实验室操作步骤均为相应领域内广泛使用的术语和常规步骤。同时,为了更好地理解本公开,下面提供相关术语的定义和解释。
术语“和/或”当用于连接两个或多个可选项时,应理解为意指可选项中的任一项或可选项中的任意两项或更多项。
如本文中所用,术语“包含”或“包括”意指包括所述的要素、整数或步骤,但是不排除任意其他要素、整数或步骤。在本文中,当使用术语“包含”或“包括”时,除非另有指明,否则也涵盖由所述及的要素、整数或步骤组成的情形。
如本文所用,术语“抗体”被用于最宽泛的含义,具体涵盖单克隆抗体(包括全长单克隆抗体)、多克隆抗体、多特异性抗体(例如双特异性抗体)、携带一个或多个CDR或源自CDR序列的抗体片段或合成多肽,只要这些多肽表现出所需的生物活性。抗体(Abs)和免疫球蛋白(Igs)是具有相同结构特征的糖蛋白。“抗体”也可指免疫球蛋白和免疫球蛋白片段,无论天然的或者部分或全部合成(例如重组)产生的,包括其至少包含免疫球蛋白分子的部分可变区的保留全长免疫球蛋白的结合特异性能力的任何片段。因此,抗体包括具有与免疫球蛋白抗原结合结构域(抗体结合位点)同源或基本上同源的结合结构域的任何蛋白。抗体包括抗体片段,例如抗肿瘤干细胞抗体片段。如本文所用,因此术语抗体包括合成抗体、重组产生的抗体、多特异性抗体(例如双特异性抗体)、人抗体、非人抗体、人源化抗体、嵌合抗体、胞内抗体以及抗体片段,例如但不限于Fab片段、Fab’片段、F(ab’)2片段、Fv片段、二硫键连接的Fv(dsFv)、Fd片段、Fd’片段、单链Fv(scFv)、单链Fab(scFab)、双抗体、抗独特型(抗Id)抗体、或者上述任何抗体的抗原结合片段。本文所提供的抗体包括任何免疫球蛋白类型(例如,IgG、IgM、IgD、IgE、IgA和IgY)、任何类别(例如IgG1、IgG2、IgG3、IgG4、IgA1和IgA2)或亚类(例如,IgG2a和IgG2b)的成员(“类型”和“种类”、以及“亚型”和“亚类”在本文中可互换使用)。天然或野生型(即得自未人工操纵的群体成员)抗体和免疫球蛋白通常为约150,000道尔顿的异四聚体糖蛋白,其由两个相同的轻链(L)和两个相同的重链(H)组成。每条重链的一端具有可变结构域(VH),随后是多个恒定结构域。每条轻链的一端具有可变结构域(VL),另一端具有恒定结构域。所谓“未人工操纵”意指未经旨在使其含有或表达外来抗原结合分子的处理。野生型可指一个群体中发现的最普遍的等位基因或种类或指得自未操纵动物的抗体,相比较于等位基因或多态型,或得自以某种形式的操纵例如诱变、使用重组方法等改变该抗原结合分子的氨基酸的变体或衍生物。
术语“单克隆抗体”指相同抗体的群体,表示单克隆抗体群体中的每个单独的抗体分子与其他抗体分子相同。这种特性与抗体的多克隆群体的特性相反,所述抗体的多克隆群体包含具有多种不同序列的抗体。单克隆抗体可以通过许多公知的方法来制备,例如,单克隆抗体可以通过永生化B细胞来制备,例如通过与骨髓瘤细胞融合以产生杂交瘤细胞系或者通过用诸如EBV的病毒感染B细胞。重组技术还可以用来在体外通过用携带编码抗体的核苷酸的人工序列的质粒转化宿主细胞来从宿主细胞的克隆群体制备抗体。
术语“表位”或“抗原表位”包括能够与抗原结合蛋白质特异性结合的任何决定簇。抗原表位的决定因素通常由分子的化学活性表面基团包括氨基酸或糖类或糖基侧链组成,例如氨基酸、糖侧链、磷酰基或磺酰基基团,并且通常具有特异的三维结构特征以及特异的电荷特征。抗原表位可以是“线性”或“空间”结构。在线性抗原表位中,蛋白和相互作用分子(例如抗体)之间的所有相互作用位点线性排列于蛋白的初级氨基酸序列上。在空间抗原表位中,相互作用位点出现在彼此隔离蛋白的氨基酸残基上。
术语“氨基酸标识序列”在本文中指衍生自蛋白或肽链的初级氨基酸序列并通过特定标识字符来表示蛋白或肽链的初级结构的字符串序列,可能同时包括表示氨基酸和表示氨基酸位置的标识字符。其中,术语“标识(identity)”在本文中指可以代表氨基酸或短肽中的氨基酸位置的字符,包括字母、数字、运算符号、标点符号和其他符号,以及一些功能性符号。术语“氨基酸标识”在本文中指可以代表氨基酸的字符,例如常见的氨基酸的单字母缩写,也可以使用其他计算机可识别的字符来表示氨基酸,如希腊字母、数字等。例如,在本文中,氨基酸标识序列“RHS”包含三个氨基酸标识,表示由精氨酸Arg,组氨酸His和丝氨酸Ser三个氨基酸组成的氨基酸序列,而氨基酸标识序列“R.S”包含两个氨基酸标识和一个占位符(表示氨基酸位置的标识字符),表示由精氨酸Arg,一个任意氨基酸和丝氨酸Ser三个氨基酸组成的氨基酸序列。
术语“子序列”或“子串”在本文中指衍生自原始字符串的字符串,这些子序列或子串由原始字符串中的元素和/或原始字符串中不包含的指定占位符构成。例如,氨基酸标识序列“RHS”的氨基酸标识子序列包括一个由3个氨基酸标识构成的子序列“RHS”和三个由2个氨基酸标识构成的子序列“RH”、“HS”、“R.S”,其中,由2个氨基酸标识构成的子序列“R.S”中还进一步包含一个指定占位符。
术语“种子(seed)”指区域生长过程中迭代开始的起点,在本文中所使用的种子为单个的氨基酸位点,该氨基酸位点可以与靶向蛋白上的氨基酸位点比对上。例如,一个短肽与靶向蛋白的空间结构比对,比对软件输出的结果显示该短肽中的某位置的氨基酸与靶向蛋白的空间结构相应位置的氨基酸是匹配上的,则该匹配上的单个氨基酸为上述位点,也即为种子。
II.具体实施方案详述
1.信号短肽的确定
利用多肽芯片检测单抗样本的信号强度,芯片的最小加样单位是载片(slide),每个载片中足以重复展现多肽 芯片中的全部短肽。在单抗实验中,每个载片添加一组待测单抗稀释后的溶解液。为了评估单抗与芯片中短肽结合所产生的信号,预留空白组从而提取基准信号值。获得各组的信号强度值之后,计算芯片中每条短肽的第一信号强度值,第一信号强度值越大,信号越强,表明短肽与抗体的结合越强。将多肽芯片中的短肽按照第一信号强度值从大到小进行排序,定义第一信号强度值大于预设阈值且第一信号强度值位于预设排序范围内的短肽为信号短肽。
第一信号强度值可以为将单抗组信号值的对数值与空白组信号值(基准信号强度值)的对数值的差值,也可以为短肽的信号强度值除以基准信号强度值的比值。
基准信号强度值可以是预留空白组的信号强度值,若未预留空白组,可以将芯片上短肽大体稳定的背景信号作为基准信号值;或者根据使用场景,将基准信号值设定为自定义的信号强度值或参比品、对照样本的信号强度值。
将多肽芯片中的短肽按照第一信号强度值从大到小进行排序后,根据多肽芯片上短肽的总量,可以取前500~2000个短肽作为信号短肽(例如,可取前500个短肽、前1000个短肽或前2000个短肽)。
在确定信号短肽时,例如,(1)当第一信号强度值为将单抗组信号值的对数值与空白组信号值(基准信号强度值)的对数值的差值时,取第一信号强度值>1的前2000个短肽作为信号短肽,表示多肽芯片上的这些短肽可以与单抗结合产生信号;(2)当第一信号强度值为短肽的信号强度值与基准信号强度值的比值时,取第一信号强度值>2的前2000个短肽作为信号短肽,表示多肽芯片上的这些短肽可以与单抗结合产生信号;在计算(1)中第一信号强度值所取得对数值的底数为2时,(1)和(2)所表示的两种情况相同。
进一步定义当第一信号强度值为将单抗组信号值的对数值与空白组信号值(基准信号强度值)的对数值的差值的绝对值时,第一信号强度值>3的短肽为强信号短肽;当第一信号强度值为短肽的信号强度值除以基准信号强度值的比值时,第一信号强度值>8的短肽为强信号短肽。
2.切分氨基酸标识子序列
为了寻找每个短肽是怎么同抗体结合的,是通过哪些氨基酸以及氨基酸组合来实现的,从信号短肽中生成长度<7的所有氨基酸标识子序列(包括全部由氨基酸构成的不留空的第一子序列和以指定占位符“.”留空的第二子序列),每个子序列可以包含1-6个氨基酸(aa)标识。如一个氨基酸标识序列ABCD的所有子序列包括4个1aa子序列(A、B、C、D),6个2aa子序列(AB、A.C、A..D、BC、B.D、CD),4个3aa子序列(ABC、AB.D、A.CD、BCD)和1个4aa子序列(ABCD),其中,如“AB”、“ABC”、“BCD”属于不留空的第一子序列,如“A.C”、“A..D”、“AB.D”属于留空的第二子序列。
切分子序列的算法分成两个环节:
从给定序列中,首先切分出所有长度的第一子序列,然后遍历加空,生成所有第二子序列,其中,子序列首尾的氨基酸位置是不可以加空的位置。如长度为L(即包括L个氨基酸标识)的氨基酸标识子序列可以加空的数量是1至L-2。当留空数量是n时,判定加空后的剩余氨基酸数量L-2-n与留空数量n的大小。若L-2-n≥n,则加空后剩余氨基酸数量多于留空数量,则第二子序列用加空法生成;若加空后剩余氨基酸数量少于留空数量,则第二子序列用加氨基酸法生成。
加空法是指给定由氨基酸构成的全氨基酸标识字符串,用指定占位符,例如“.”,来替换氨基酸,逐个生成加1个空到加n个空的各种形态第二子序列。
加氨基酸法是指给定由指定占位符,例如“.”,构成的全空字符串,用氨基酸来替换“.”,逐个生成加1到加L-2-n个氨基酸的各种形态第二子序列。
例如生成“RHSVV”的留2个空的子序列,即氨基酸标识子序列的长度L=5,留空数量n=2,留空数量大于剩余氨基酸数(L-2-n=1),则用加氨基酸法。最终生成保留1个氨基酸的各种形态的子序列。对于全空字符串“…”,每1个氨基酸,有三种位置的存放可能,故最终生成RH..V、R.S.V、R..VV三种子序列。
3.定义结合氨基酸标识序列
(1)计算氨基酸标识子序列的第二信号强度值:结合氨基酸标识序列对应的是短肽与抗体的结合单位。首先,计算上一步生成的每个氨基酸标识子序列第二信号强度值,即为包含该子序列的具有最大第一信号强度值的数个短肽的第一信号强度值的平均值,例如第一信号强度值排序前三的短肽的第一信号强度的平均值。第二信号强度值越大,表示包含这个子序列的短肽能够获得的第一信号强度值越大,也意味着,这个子序列能够让包含它的短肽获得更强的信号,该子序列更有可能是结合单位。但仅用这个指标显然无法定义结合氨基酸标识序列,因为第一信号强度值高的短肽中的子序列的第二信号强度值都会相对较大。
(2)计算氨基酸标识子序列的Zscore:考虑到第二信号强度值的优势和局限,进一步计算每个氨基酸标识子序列的Zscore。从芯片覆盖的所有短肽中随机抽样预设次数,每次抽样预设条数的短肽,生成这些短肽的所有氨基酸标识子序列,获得每个氨基酸标识子序列的出现次数在预设次数中的分布,计算该氨基酸标识子序列出现次数的平均值mean和标准差sd;
待测样本进行多肽芯片检测,获得信号短肽,基于获得的信号短肽生成氨基酸标识子序列集合,统计包含某一氨基酸标识子序列的短肽的数量,记为X;
上述过程中,从芯片覆盖的所有短肽中随机抽取短肽的数量应与待测样本获得的信号短肽数量相同;
例如,待测样本进行多肽芯片检测,取第一信号强度值>1的前2000个短肽作为信号短肽,基于信号短肽生成氨基酸标识子序列集合,统计包含某一氨基酸标识子序列的短肽的数量X;从芯片覆盖的所有短肽中随机抽样10000次,则每次抽样2000条短肽,生成这些短肽的所有氨基酸标识子序列并统计包含该子序列的短肽的数量,最后获得每个子序列的出现次数在10000次中的分布,计算获得平均值mean和标准差sd,根据以下公式计算Zscore:
Zscore值反映该子序列在信号短肽中是否是随机出现的。Zscore的绝对值越大,表明非随机性越强,即短肽在这里存在或缺失都不是偶然发生的。当Zscore较大时,该子序列在总体信号短肽中出现的次数偏多,理论上表示会独立带来抗原与抗体的结合。但氨基酸数较多的子序列,尤其是5aa和6aa,因为芯片短肽覆盖数有限,其Zscore容易偏大,故可信性较低。
Zscore较大的子序列会带来与抗体的结合,而第二信号强度值较大则意味着子序列会带来与抗体的较强结合。因此,在实际操作过程中,选用Zscore>10且第二信号强度值>3来定义结合氨基酸标识序列。
(3)筛选关键氨基酸标识序列:从1aa子序列为起始,递归过滤氨基酸标识数量加1的子序列。例如对于子序列RHS,若RHS的Zscore或者第二信号强度值高于RH、HS和R.S的Zscore或者第二信号强度值,则认为RHS是关键氨基酸标识序列;若Zscore和第二信号强度均小于某个氨基酸长度短于RHS的子序列如RH,则认为RHS是在RH基础上增加了一个对于抗体和表位结合无益的氨基酸S,故RHS不是关键氨基酸标识序列。
4.强信号短肽比对到蛋白空间结构
考虑到计算性能的局限,此处只选取强信号短肽进行空间比对。此处选取公开的软件surface获得蛋白空间结构的表面属性信息,参数信息为“Van del Waals radii sets:Chothia,Probe radius in angstrom:3.9,Calculation mode:Accessible and molecular surface areas”。再用pepSurf将强信号短肽比对到靶向蛋白的空间结构上,参数为要求最多循环遍历次数不超过100,000次,最终获得显著(p<0.05)的所有可比对上的记录。通常每条短肽会有3-5个显著比对记录,因此需要对比对记录进行筛选。除pepurf外,还可以使用Pep-3D-Search、PepMapper等比对软件进行空间结构的比对。
5.空间比对结果筛选
比对完成后,每条短肽通常会有多个比对结果。这里选择比对上的氨基酸标识序列受关键氨基酸标识序列支持的比对记录。例如,存在关键氨基酸标识序列是RHS,若比对区域里面存在xRHSxx的情况,则认为该比对区域受关键氨基酸标识序列支持。获得了受关键氨基酸标识序列支持的比对记录后,需要整合不同短肽的比对记录,形成表位区域。
6.空间比对区域生成和鉴别主要结合位点
首先将短肽和空间比对记录按照短肽的第一信号强度值和空间比对得分进行多关键词排倒序(从大到小排序),先根据第一信号强度值对各强信号短肽排倒序,然后再按照各强信号短肽对应的各空间比对记录的空间比对得分排倒序。按排序顺序选择第一条短肽,以该短肽中比对上的位点为种子seed,这些seed定义为既有区域的位点;然后遍历其他未使用的强信号短肽的空间比对记录;在既有区域发生扩张后,由于既有区域包含的位点的增多,会再次遍历所有短肽的所有比对记录,包括第一条短肽对应的其他空间比对记录。若存在短肽的空间比对记录中比对上的位点中,既有区域包含的位点数量多于既有区域外的位点数量,则该比对记录中的位点都加入既有区域位点,从而实现既有区域的扩张。依次迭代,直至既有区域中的位点不再增加。其中,会存在部分强信号短肽,其对应的所有空间比对记录均不包含既有区域中的位点,将这些强信号短肽按照上述排序顺序,更换未被使用的强信号短肽及其seed,形成新的既有区域,并不断迭代,直到不再有新的既有区域产生。完成了扩张的所有既有区域共同构成空间比对区域,每个区域都有用于形成该区域的短肽及比对记录,选出区域所包含的位点中,在任意一条比对记录中受到关键氨基酸标识序列支持的位点为关键结合位点。每个结合位点在各强信号短肽的第一信号强度值的累加为该位点的累计得分。
在一方面,本公开提供了一种鉴定抗原表位的方法,包括:
获取多肽芯片中与待测抗体结合产生信号的氨基酸标识序列集合,并基于氨基酸标识序列集合生成关键氨基酸标识序列集合;
基于多肽芯片中与待测抗体结合产生强信号的短肽在靶向蛋白的空间结构中的位置,生成空间比对区域;
基于空间比对区域,鉴定空间比对区域中包含的抗原表位的关键结合位点。
在一些实施方案中,获取多肽芯片中与待测抗体结合产生信号的氨基酸标识序列集合,包括:
对于多肽芯片中的短肽,根据该短肽与待测抗体结合后产生的信号的信号强度值确定该短肽的第一信号强度值,其中,第一信号强度值在预设范围内;
基于多肽芯片中的短肽的第一信号强度值,在多肽芯片中的短肽中确定信号短肽集合;以及
基于各信号短肽的氨基酸序列生成氨基酸标识序列集合。
在一些优选的实施方案中,基于多肽芯片中的短肽的第一信号强度值,在多肽芯片中的短肽中确定信号短肽集合,包括:
将各短肽中第一信号强度值满足以下至少一个条件的短肽确定为信号短肽集合:
第一条件:第一信号强度值为短肽的信号强度值的对数值与基准信号强度值的对数值的差值,第一信号强度值>1;
第二条件:第一信号强度值为短肽的信号强度值与基准信号强度值的比值,第一信号强度值大于预设比例阈值;
第三条件:将各短肽的第一信号强度值从大到小排序,第一信号强度值排序位于预设排序范围;
在一些更优选的实施方案中,信号短肽为同时满足第一条件和第三条件的短肽。
在一些更优选的实施方案中,信号短肽为同时满足第二条件和第三条件的短肽。
在一些优选的实施方案中,信号短肽为第一信号强度值满足以下一个或多个条件的短肽:
(1)第一信号强度值为短肽的信号的对数值与基准信号的对数值的差值,第一信号强度值>1;
(2)第一信号强度值为短肽的信号值与基准信号值的比值,短肽的信号值为基准信号值的2倍以上;
(3)将短肽的第一信号强度值从大到小排序,第一信号强度值位于前500~2000位。
在一些优选的实施方案中,基准信号选自空白组信号或背景信号,其中,空白组信号为预留的未与待测抗体结合的短肽的信号,背景信号为短肽在使用中常见的信号。
在一些优选的实施方案中,信号短肽为将第一信号强度值>1的短肽从大到小排序后,其中第一信号强度值位于前2000位的短肽或将第一信号强度值从大到小排序后的前2000个短肽中,第一信号强度值>1的短肽。
在一些优选的实施方案中,信号短肽为将短肽的信号值为基准信号值的2倍以上的短肽从大到小排序后,其中第一信号值强度位于前2000位的短肽或将第一信号强度值从大到小排序后的前2000个短肽中,短肽的信号值为基准信号值的2倍以上的短肽。
在一些实施方案中,基于氨基酸标识序列集合生成关键氨基酸标识序列集合,包括:
根据氨基酸标识序列集合生成氨基酸标识子序列集合;
筛选氨基酸标识子序列集合并获得结合氨基酸标识序列集合;以及
筛选结合氨基酸标识序列集合并获得关键氨基酸标识序列集合。
在一些实施方案中,根据氨基酸标识序列集合生成氨基酸标识子序列集合,包括:
对于氨基酸标识序列集合中的各氨基酸标识序列,执行以下分割操作并生成第一子序列集合:将该氨基酸标识序列分割为包含预设氨基酸标识个数的第一子序列,第一子序列中的氨基酸标识在该氨基酸标识序列中是连续出现的;
对于氨基酸标识序列集合中的各氨基酸标识序列,执行加空操作并生成第二子序列集合,第二子序列为包含预设氨基酸标识个数的氨基酸标识子序列且该第二子序列中的氨基酸标识在该氨基酸标识序列中是非连续出现的;
将第一子序列集合和第二子序列集合添加到氨基酸标识子序列集合中,生成氨基酸标识子序列集合;
其中,执行加空操作并生成第二子序列集合,包括:
氨基酸标识序列集合中的各氨基酸标识序列的首尾位置设定不为空,计算加空后生成的第二子序列中剩余氨基酸标识的数量是否大于等于预设加空数量,其中剩余氨基酸标识的数量为不包括首尾氨基酸标识的其余氨基酸标识的数量;
若是,对由氨基酸标识构成的全氨基酸标识子序列进行加空操作,加空操作包括:
利用指定占位符从1个到n个逐个替换全氨基酸标识子序列中的氨基酸标识,生成加n个空的第二子序列;
其中,n为预设加空数量;
若否,对由指定占位符构成的全空子序列进行加氨基酸标识操作,加氨基酸标识操作包括:
利用氨基酸标识从1个到L-2-n个逐个替换指定占位符,生成包含L-2-n个剩余氨基酸标识的第二子序列;
其中,L为第二子序列的预设长度,n为预设加空数量。
在一些优选的实施方案中,氨基酸标识子序列集合包括长度小于等于13个氨基酸的氨基酸标识子序列。
在一些更优选的实施方案中,氨基酸标识子序列集合包括长度小于等于6个氨基酸的氨基酸标识子序列。
在一些实施方案中,筛选氨基酸标识子序列集合并获得合氨基酸标识序列集合,包括:
获取氨基酸标识子序列集合中的各氨基酸标识子序列的第二信号强度值,第二信号强度值为将氨基酸标识子序列对应的短肽的信号强度值从大到小排序,信号强度值排序位于预设排序范围内的信号强度的平均值;
获取各氨基酸标识子序列的富集分析值,富集分析值用于表征氨基酸标识子序列在信号短肽中出现的随机性的值;
确定各氨基酸标识子序列的第二信号强度值和富集分析值是否大于预设阈值;
若是,氨基酸标识子序列是结合氨基酸标识序列;
若否,氨基酸标识子序列不是结合氨基酸标识序列。
在一些优选的实施方案中,富集分析值为Zscore,其中,获取Zscore,包括:
从多肽芯片中的全部短肽中随机抽样,每次抽取预设条数短肽;
重复随机抽样预设次数;
每次抽样之后生成取得的短肽的氨基酸标识子序列集合;
统计在每一次抽样中包含该氨基酸标识子序列的短肽数量,从而获得包含该氨基酸标识子序列的短肽数量在预设次数中的分布,获取平均值mean和标准差sd;
待检测样本进行多肽芯片检测,获得信号短肽,基于信号短肽生成氨基酸标识子序列集合,统计包含某一氨基酸标识子序列的短肽的数量,记为X;
其中,抽取预设条数短肽的数量应与待测样本进行多肽芯片检测时获得的信号短肽的数量相等;
根据公式
计算得到Zscore。
在一些更优选的实施方案中,预设排序范围为排序在先的三个短肽。
在一些更优选的实施方案中,预设次数为10000次,每次抽样预设条数的短肽为2000条。
在一些优选的实施方案中,当氨基酸子串的第二信号强度值>3且富集分析值>10时,该氨基酸标识子序列为结合氨基酸子序列。
在一些实施方案中,筛选所述结合氨基酸标识序列集合并获得关键氨基酸标识序列集合,包括:
对于所述结合氨基酸标识序列集合中的各个结合氨基酸标识序列,响应于该结合氨基酸标识序列中氨基酸标识的个数等于m,所述m为正整数,该结合氨基酸标识序列为第一结合氨基酸标识序列,并执行以下递归过滤操作,包括:
确定第二结合氨基酸标识序列的第二信号强度值或富集分析值是否大于该第一结合氨基酸标识序列,所述第二结合氨基酸标识序列为所述结合氨基酸标识序列集合中的结合氨基酸标识序列,所述第二结合氨基酸标识序列对应的氨基酸标识序列可以生成该第一结合氨基酸标识序列且所述第二结合氨基酸标识序列中氨基酸标识的个数为m+1;
若是,该第二结合氨基酸标识序列是关键氨基酸标识序列;
若否,该第二结合氨基酸标识序列不是关键氨基酸标识序列。
在一些优选的实施方案中,m为1。
在一些实施方案中,基于多肽芯片中与待测抗体结合产生强信号的短肽在靶向蛋白的空间结构中的位置,包括:
对于多肽芯片中的短肽,根据该短肽与待测抗体结合后产生的信号的信号强度值确定该短肽的第一信号强度值,其中,第一信号强度值在预设范围内;
基于多肽芯片中的短肽的第一信号强度值,在多肽芯片中的短肽中确定强信号短肽集合;
将强信号短肽集合中的各强信号短肽比对到靶向蛋白的空间结构,生成各强信号短肽对应的空间比对记录集合,其中,各强信号短肽对应至少一个空间比对记录;
筛选空间比对记录集合中的各空间比对记录,确定各空间比对记录对应的强信号短肽的氨基酸标识序列中是否包含关键氨基酸标识序列;
若是,将空间比对记录对应的强信号短肽添加入待整合表位区域集合;
若否,该空间比对记录对应的强信号短肽不添加入待整合表位区域集合。
在一些优选的实施方案中,强信号短肽为第一信号强度值满足以下一个或多个条件的短肽:
(1)第一信号强度值为短肽的信号的对数值与基准信号的对数值的差值,第一信号强度值>3;
(2)第一信号强度值为短肽的信号值与基准信号值的差值,短肽的信号值为基准信号值的8倍以上;
(3)将短肽的第一信号强度值从大到小排序,第一信号强度值位于前100位。
在一些更优选的实施方案中,强信号短肽为将第一信号强度值>3的短肽从大到小排序后,其中第一信号强度值位于前100位的短肽或将第一信号强度值从大到小排序后的前100个短肽中,第一信号强度值>3的短肽。
在一些更优选的实施方案中,信号短肽为将短肽的信号值为基准信号值的8倍以上的短肽从大到小排序后,其中第一信号值强度位于前100位的短肽或将第一信号强度值从大到小排序后的前100个短肽中,短肽的信号值为基准信号值的8倍以上的短肽。
在一些优选的实施方案中,比对采用比对软件。
在一些优选的实施方案中,比对软件选自PepSurf、Pep-3D-Search、PepMapper。
在一些更优选的实施方案中,比对软件为PepSurf。
在一些更优选的实施方案中,比对的循环遍历次数不超过100000次。
在一些优选的实施方案中,空间比对记录为显著的(P值<0.05)。
在一些实施方案中,生成空间比对区域,包括:
待整合表位区域集合中的各强信号短肽对应有第一信号强度值;
对待整合表位区域集合中的各强信号短肽按照对应的第一信号强度值从大到小的顺序,对各强信号短肽进行排序;
对于排序后的各强信号短肽,按照该强信号短肽的排序顺序,执行以下区域生长操作:
对于该强信号短肽对应的空间比对记录集合中的各空间比对记录,按照各空间比对记录中的空间比对得分从大到小的顺序,对该短肽的各空间比对记录进行排序;
对于该强信号短肽对应的排序后的各空间比对记录,按照各空间比对记录的排序顺序,以排序顺序第一的空间比对记录中比对上靶向抗原的氨基酸位点为种子,将种子添加入既有区域,既有区域为完成区域生长操作前的中间状态区域;
基于既有区域,按照各强信号短肽的排序顺序从排序后的各强信号短肽和对应的排序后的各空间比对记录中选取一条强信号短肽及其对应的各空间比对记录,执行以下既有区域生长操作:
确定该空间比对记录中的位点是否包含既有区域中的位点,且包含在既有区域中的位点数量不少于不包含在既有区域内的位点数量;
若是,将该空间比对记录中不包含在既有区域中的位点加入既有区域;
若否,按照该强信号短肽对应的各空间比对记录的排序顺序或各强信号短肽的排序顺序,对该强信号短肽对应的下一条空间比对记录或下一条强信号短肽执行区域生长操作;
若遍历一次强信号短肽后,既有区域中的位点有增加,则再次遍历所有强信号短肽,直至既有区域中的位点不再增加,完成区域生长操作;
对于该强信号短肽对应的排序后的各空间比对记录,确定该强信号短肽对应的全部空间比对记录中是否存在至少一条空间比对记录包含第一既有区域中的位点;
若否,将该强信号短肽及其对应的排序顺序添加入候选种子集合;
对于候选种子集合中的各强信号短肽,按照各强信号短肽对应的排序顺序,执行既有区域生成操作和既有区域生长操作;
完成既有区域生长操作后,更新候选种子集合,直至再无法选出新种子;
获得全部完成生长的既有区域,将各既有区域中的位点添加入空间比对区域集合,生成空间比对区域。
在一些实施方案中,基于空间比对区域,鉴定空间比对区域中包含的抗原表位的关键结合位点,包括:
基于空间比对区域中比对上靶向抗原的氨基酸位点,确定氨基酸位点是否在任一个对应的空间比对记录中被包含在关键氨基酸标识序列中;
若是,该氨基酸位点为为靶向蛋白中表位的关键结合位点。
另一方面,本公开提供了一种生成特定长度氨基酸标识序列的方法,包括:
生成氨基酸序列对应的氨基酸标识序列,执行以下分割操作并生成第一子序列集合:将该氨基酸标识序列分割为包含预设氨基酸标识个数的第一子序列,第一子序列中的氨基酸标识在该氨基酸标识序列中是连续出现的;
对于氨基酸标识序列集合中的各氨基酸标识序列,执行加空操作并生成第二子序列集合,第二子序列为包含预设氨基酸标识个数的氨基酸标识子序列且该第二子序列中的氨基酸标识在该氨基酸标识序列中是非连续出现的;
将第一子序列集合和第二子序列集合添加到氨基酸标识子序列集合中,生成氨基酸标识子序列集合;
其中,执行加空操作并生成第二子序列集合,包括:
氨基酸标识序列集合中的各氨基酸标识序列的首尾位置设定不为空,确定加空后生成的第二子序列中剩余氨基酸标识的数量是否大于等于预设加空数量,其中剩余氨基酸标识的数量为不包括首尾氨基酸标识的其余氨基酸标识的数量;
若是,对由氨基酸标识构成的全氨基酸标识子序列进行加空操作,加空操作包括:
利用指定占位符从1个到n个逐个替换全氨基酸标识子序列中的氨基酸标识,生成加n个空的第二子序列;
其中,n为预设加空数量;
若否,对由指定占位符构成的全空子序列进行加氨基酸标识操作,加氨基酸标识操作包括:
利用氨基酸标识从1个到L-2-n个逐个替换指定占位符,生成包含L-2-n个剩余氨基酸标识的第二子序列;
其中,L为第二子序列的预设长度,n为预设加空数量。
在一些优选的实施方案中,氨基酸标识子序列集合包括长度小于等于13个氨基酸的氨基酸标识子序列。
在一些更优选的实施方案中,氨基酸标识子序列集合包括长度小于等于6个氨基酸的氨基酸标识子序列。
本公开所提出的上述方法可以在鉴定得到空间表位结合位点的基础上,进一步鉴定出表位中对于抗原抗体结合发挥关键作用的关键结合位点。
为了达到清楚和简洁描述的目的,本文中作为相同的或分开的一些实施方案的一部分来描述特征,然而,将要理解的是,本公开的范围可包括具有所描述的所有或一些特征的组合的一些实施方案。
实施例
实施例1:鉴定空间表位关键结合位点的方法
1.信号短肽的确定
利用多肽芯片检测单抗样本的信号强度,芯片的最小加样单位是载片(slide),每个载片中足以重复展现多肽芯片中的全部短肽。在单抗实验中,每个载片添加一组待测单抗稀释后的溶解液。为了评估单抗与芯片中短肽结合所产生的信号,预留空白组从而提取基准信号值。获得各组的信号强度值之后,将单抗组信号的对数值与空白组信号(基准信号值)的对数值求差,获得芯片每条短肽的第一信号强度值作为短肽的信号强度。若未预留空白组,则以芯片上短肽大体稳定的背景信号作为基准信号值。
取第一信号强度值>1的前2000个短肽作为信号短肽,表示多肽芯片上的这些短肽可以与单抗结合产生信号;根据多肽芯片上短肽的总量,也可以取第一信号强度值>1的前500~1000个短肽作为信号短肽。第一信号强度值越大,信号越强,表明短肽与抗体的结合越强。进一步定义第一信号强度值>3的短肽为强信号短肽。
2.切分氨基酸标识子序列
为了寻找每个短肽是怎么同抗体结合的,是通过哪些氨基酸以及氨基酸组合来实现的,从信号短肽中生成长度<7的所有氨基酸标识子序列(包括全部由氨基酸构成的不留空的第一子序列和以指定占位符“.”留空的第二子序列),每个子序列可以包含1-6个氨基酸(aa)标识。
切分子序列的算法分成两个环节:
从给定序列中,首先切分出所有长度的第一子序列,然后遍历加空,生成所有第二子序列,其中,子序列首尾的氨基酸位置是不可以加空的位置。如长度为L(即包括L个氨基酸标识)的氨基酸标识子序列可以加空的数量是1至L-2。当留空数量是n时,判定加空后的剩余氨基酸数量L-2-n与留空数量n的大小。若L-2-n≥n,则加空后剩余氨基酸数量多于留空数量,则第二子序列用加空法生成;若加空后剩余氨基酸数量少于留空数量,则第二子序列 用加氨基酸法生成。
加空法是指给定由氨基酸构成的全氨基酸标识字符串,用指定占位符,例如“.”,来替换氨基酸,逐个生成加1个空到加n个空的各种形态第二子序列。
加氨基酸法是指给定由指定占位符,例如“.”,构成的全空字符串,用氨基酸来替换“.”,逐个生成加1到加L-2-n个氨基酸的各种形态第二子序列。
例如生成“RHSVV”的留2个空的子序列,即氨基酸标识子序列的长度L=5,留空数量n=2,留空数量大于剩余氨基酸数(L-2-n=1),则用加氨基酸法。最终生成保留1个氨基酸的各种形态的子序列。对于全空字符串“…”,每1个氨基酸,有三种位置的存放可能,故最终生成RH..V、R.S.V、R..VV三种子序列。
3.定义结合氨基酸标识序列
(1)计算氨基酸标识子序列的第二信号强度值:结合氨基酸标识序列对应的是短肽与抗体的结合单位。首先,计算上一步生成的每个氨基酸标识子序列第二信号强度值,即为包含该子序列的具有最大第一信号强度值的三个短肽的第一信号强度值的平均值。
(2)计算氨基酸标识子序列的Zscore:考虑到第二信号强度值的优势和局限,进一步计算每个氨基酸标识子序列的Zscore。例如待测样本进行多肽芯片检测,取第一信号强度值>1的前2000个短肽作为信号短肽,基于信号短肽生成氨基酸标识子序列集合,统计包含某一氨基酸标识子序列的短肽的数量X;从芯片覆盖的所有短肽中随机抽样10000次,则每次抽样2000条短肽,生成这些短肽的所有氨基酸子串并统计包含某一子序列的短肽的数量,最后获得包含该子序列的短肽数量在10000次中的分布,计算获得平均值mean和标准差sd,根据以下公式计算Zscore。
Zscore值反映该子序列在信号短肽中是否是随机出现的。
Zscore较大的子序列会带来短肽与抗体的结合,而第二信号强度值较大则意味着子序列会带来与抗体的较强结合。因此,在实际操作过程中,选用Zscore>10且第二信号强度值>3来定义结合氨基酸标识序列。
(3)筛选关键氨基酸标识序列:从1aa子序列为起始,递归过滤氨基酸标识数量加1的子序列。例如对于子序列RHS,若RHS的Zscore或者第二信号强度值高于RH、HS和R.S的Zscore或者第二信号强度值,则认为RHS是关键氨基酸标识序列;若Zscore和第二信号强度均小于某个氨基酸长度短于RHS的子序列如RH,则认为RHS是在RH基础上增加了一个对于抗体和表位结合无益的氨基酸S,故RHS不是关键氨基酸标识序列。
4.强信号肽比对到蛋白空间结构
考虑到计算性能的局限,此处只选取第一信号强度值>3的强信号短肽进行空间比对。此处选取公开的软件surface获得蛋白空间结构的表面属性信息,参数信息为“Van del Waals radii sets:Chothia,Probe radius in angstrom:3.9,Calculation mode:Accessible and molecular surface areas”。再用pepSurf将强信号短肽比对到靶向蛋白的空间结构上,参数为要求最多循环遍历次数不超过100,000次,最终获得显著(p<0.05)的所有可比对上的记录。通常每条短肽会有3-5个显著比对记录,因此需要对比对记录进行筛选。
5.空间比对结果筛选
比对完成后,每条短肽通常会有多个比对结果。这里选择比对上的氨基酸序列受关键氨基酸标识序列支持的比对记录。例如,存在关键氨基酸标识序列是RHS,若比对区域里面存在xRHSxx的情况,则认为该比对区域受关键氨基酸标识序列支持。获得了受关键氨基酸标识序列支持的比对记录后,需要整合不同短肽的比对记录,形成表位区域。
6.空间比对区域生成和鉴别主要结合位点
首先将短肽和空间比对记录按照短肽的第一信号强度值和空间比对得分进行多关键词排倒序(从大到小排序),先根据第一信号强度值对各强信号短肽排倒序,然后再按照各强信号短肽对应的各空间比对记录的空间比对得分排倒序。按排序顺序选择第一条短肽,以该短肽中比对上的位点为种子seed,这些seed定义为既有区域的位点;然后遍历其他未使用的强信号短肽的比对记录;在既有区域发生扩张后,由于既有区域包含的位点的增多,会再次遍历所有短肽的所有比对记录,包括第一条短肽对应的其他空间比对记录。若存在短肽的空间比对记录中比对上的位点中,既有区域包含的位点数量多于既有区域外的位点数量,则该比对记录中的位点都加入既有区域位点,从而实现既有区域的扩张,而这条短肽被标记为已使用。依次迭代,直至加入任何强信号短肽时,既有区域中的位点均不再增加。其中,会存在部分强信号短肽,其对应的所有空间比对记录均不包含既有区域中的位点,将这些强信号短肽按照上述排序顺序,更换未使用的强信号短肽及其seed,形成新的既有区域,并不断迭代,直到不再有新的既有区域产生。完成了扩张的所有既有区域共同构成空间比对区域,每个区域都有用于形成该区域的短肽及比对记录,选出区域所包含的位点中,在任意一条比对记录中受到关键氨基酸标识序列支持的位点为关键结合位点。每个结合位点在各强信号短肽的第一信号强度值的累加为该位点的累计得分。
实施例2:待测抗体与多肽芯片的结合
1.实施例所使用的试剂与芯片
本实施例采用17种购买的已知表位的单克隆抗体(商品信息详见表1),采用Health Tell公司的V16多肽芯片(货号:V16_10296),依据标准实验流程进行抗体与芯片结合实验。
表1所购抗体厂商编号和已知表位
2.抗体与芯片结合实验
本发明的实施例中采用多肽芯片为HealthTell公司的V16版本芯片,含3,218,577条短肽。该芯片对18种氨基 酸(不包含半胱氨酸Cys,C和甲硫氨酸Met,M)的所有4aa组合均覆盖,且99.97%的4aa标识子序列达到10个短肽及以上的覆盖。
(1)样本制备:抗体样本用1%D-甘露醇(D-mannitol)溶液,经两次稀释,得到100ng/mL的待测样本板备用;
(2)封闭液配制:取7.2mL样本稀释液至15mL离心管,加入0.8ml 1%的干酪素Casein,振荡混匀;
(3)排板加样
a.芯片的封闭处理:将配好的封闭液分别移取300μL至芯片孔位中(芯片置于Cassette中),封膜,600rpm混匀20s,然后在恒温箱中37℃孵育1小时;
b.封闭处理Cassette(芯片)的加样:芯片的最小加样单位是slide,每个slide中一个样本,单抗实验中,每个slide添加一组单抗稀释后的溶解液,将稀释好的抗体根据排板表的孔位信息,转移300μL至Cassette对应的孔位中;
(4)第一次孵育:将Cassette封膜,放置于自动化仪器的孵育模块上,孵育1h;
(5)第一次洗板:将Cassette撕膜,置于自动洗板机(微孔板磁力洗板机)中洗板;
(6)加二抗:在避光条件下(关灯),取2μL的goat-mouse二抗加入到6mL的二抗稀释液中(3000倍稀释),振荡混匀,取6μL的goat-human二抗加入到9mL的二抗稀释液中(1500倍稀释),分别取550μL加入到Cassette的对应孔中;
(7)第二次孵育:将Cassette封膜,放置于恒温混匀器上孵育60min;
(8)第二次洗板:将Cassette撕膜,置于自动洗板机中,使用自动洗板机洗板(微孔板磁力洗板机);
(9)成像扫描:将Cassette中的芯片进行拆卸、清洗、干燥后组装进Imaging Cassette,放入Molecular Device公司的ImageXpress micro4成像仪进行扫描成像。最终每个检测样本得到一张TIFF图片文件,即为原始数据;
(10)数据处理:
对样本数值进行对数转换和标准化处理,输出标准化矩阵,具体包括:
a.提取特征的荧光强度数值,输出1个GPR5数据文件和1个图像定位文件。其中,GPR5文件包含了一个样本的所有信息和所有特征的荧光强度信息。
b.从所有样本的GPR5数据文件中提取特征的荧光强度信息,生成原始荧光强度(FG,foreground)数据矩阵。然后对每个样本的数据分别进行对数转换得到LFG(log-transferred foreground)数据矩阵、进行Zscore的标准化处理获得NLFG(normalized and log-transferred foreground)数据矩阵。该步骤还会生成一个样本芯片信息文件,该文件包括了样本阵列位置、所用芯片编号等信息。
对该步骤产生的数据(即抗体结合多肽芯片的信号值)进行了将单抗组信号与空白组信号的对数值求差,获得芯片每条短肽的第一信号强度值,然后以第一信号强度值>1的取值筛选得到了信号短肽,从而进入后续的表位鉴定步骤。
实施例3:空间表位主要结合位点鉴定的准确度评估
所有预判结合较好的抗体,对其空间表位进行了鉴定。鉴定结果如表2和图4所示,其中“已知结合位点数”为从抗体-抗原或抗体-短肽复合物的空间解析结构中推算出来的结合位点的数量;“所有鉴定位点覆盖度”为使用本发明的空间表位方法鉴定出来的位点对已知结合位点的覆盖百分比;“子序列支持的鉴定位点覆盖度”为使用本发明的空间表位方法鉴定出来的位点中,受到关键氨基酸标识序列支持的位点,即被认为能关键结合的位点对已知结合位点的覆盖百分比。
抗体的空间表位,由一系列的能参与抗原-抗体结合的氨基酸位点构成。总体来说,本方法所有鉴定到的结合位点对各个抗体空间表位中的结合位点的覆盖率能够达到80%,而其中有关键氨基酸标识序列支持的位点(即关键结合位点)对已知结合位点的覆盖率能够达到60%。其中鉴定效果较好的如抗体6A7的线性表位位点,抗体Golimumab的空间表位位点;而结合位点鉴定效果较差的是靶向登革热病毒的e111抗体。
不同抗体鉴定效果的波动主要受到以下因素影响:1)在抗体空间表位中,不同结合位点的结合强度不同。不同的位点会与抗体中特定的氨基酸形成范德华力、氢键等不同作用力,从而形成结合。每个位点在抗原空间结构表面的暴露面积不同,抗原抗体结合后在结合区域被覆盖的面积也有所不同,结合所依赖的作用力类型也不同,故形成的作用力的强度有所区别;2)强结合位点的聚集程度。既然抗体空间表位位点的结合强度有别,而本发明的空间表位鉴定方法主要考虑6个氨基酸长度内的4个氨基酸的各类组合所形成的子串。那么在抗原表面,强结合氨基酸的间距若超出6个氨基酸范围,则可能无法观测到其结合子串信号。3)空间结构的局限。本发明中所使用的空间结构都是alphafold2预测的空间结构,该结构可能与真实结构有所偏差。4)抗原抗体结合所带来的构象改变。在抗原表位中,存在一些位点可能在结构变化前发挥了结合作用,而在抗原抗体结合导致空间构象改变后,作用相对减弱了。这种改变会导致我们鉴定出的结合位点为构象改变前的结合位点(假定短肽与抗体的结合并不会引起抗体的构象变化),而不是结构解析出来的构象改变后的结合位点。
下面重点分析三个主要案例:
6A7_Bax抗体的已知空间表位由7个结合位点构成,这是由短肽7个位点构成的短肽PTSSEQI和6A7抗体结合复合物空间结构解析获得。当把抗体结合信号强的短肽比对到Bax的alphafold2预测的空间结构上时,由于短肽 PTSSEQI中的第七个Ile氨基酸在蛋白空间表面的面积极小(如图5所示),导致空间比对时不认为能够比对到该位置,故而该氨基酸用空间表位方法没有能够鉴定出来。本空间表位方法只鉴定得到6个结合位点(其中Q18相对得分最低)。这也提示,虽然特定氨基酸在合成出来的短肽中能够与抗体结合,增强短肽的结合能力,但其在空间结构中,在完整抗原蛋白与抗体结合时,可能并不会直接参与结合,即并非关键结合位点。当然也存在一定可能性是:抗体与Bax完整抗原蛋白结合时发生构象变化,会进一步暴露Ile氨基酸到空间表面,进而使其能够参与同抗体的结合。在本方法中,鉴定得到了其中的6个氨基酸位点为空间表位的关键结合位点。
Golimumab的抗体空间表位由TNF上的33个氨基酸构成。这33个氨基酸形成两个结合区域(如图6所示),其中88、89形成一个区域,其他结合位点形成另一个。空间表位鉴定方法将两个区域和绝大多数位点都鉴定出来。这些位点中,E104和E107在解析的空间结构中认为能够形成氢键和盐桥,是该抗体区别于其他TNF单抗的重要位点(Ono,M.,et al.,Structural basis for tumor necrosis factor blockade with the therapeutic antibody golimumab.Protein Science,2018.27(6):p.1038-1046.)。这两个位点在本方法中均被准确鉴定为关键结合位点。另外,分析发现,以G24、Q25、F144和P139所形成的中段结合区域具有最高的结合得分,应该在抗原抗体结合过程中发挥了重要的作用。这些位点在结构解析研究中并未受到特别重视,这有可能是短肽-抗体结合与完整抗原-抗体结合的差异所造成,其中一个重要的差异在于抗原-抗体结合会带来构象的改变,而本发明的技术是依赖于短肽-抗体结合,发现的是构象改变前的结合位点。
e111抗体的空间表位由登革病毒的E蛋白上的73个散在的位点构成。Alphafold2尚无发表的该蛋白的空间结构,也未见有独立的该蛋白空间结构被解析出来。只有E蛋白299-395区域与抗体结合复合物的空间结构有发表。因此,本公开中使用的E蛋白的空间结构是从该复合物中分离的E蛋白部分区域的结构。这是e111结合位点解析覆盖度低的主要原因之一。另一个原因是目前已知的表位基本覆盖整个空间结构,这对当前表位鉴定算法的挑战极大。文章报道的能够与抗体轻链和重链结合的S338、G344和V345均被鉴定到且与K343形成一个表位结合区域(Austin,S.K.,et al.,Structural Basis of Differential Neutralization of DENV-1Genotypes by an Antibody that Recognizes a Cryptic Epitope.PLoS Pathogens,2012.8(10):p.e1002930.)。其中K343被认为能够与抗体对应的氨基酸形成氢键和盐桥,且具有相对最大的BSA。K363具有相对最小的ΔG,也能够形成氢键和盐桥,与P364形成关键氨基酸标识序列PK,这些均被准确鉴定出来。所以,认为e111的空间表位鉴定是相对可靠的。
表2本方法鉴定得到的位点对已知结合位点的覆盖度

Claims (10)

  1. 鉴定抗原表位的方法,包括:
    获取多肽芯片中与待测抗体结合产生信号的氨基酸标识序列集合,并基于所述氨基酸标识序列集合生成关键氨基酸标识序列集合;
    基于多肽芯片中与待测抗体结合产生强信号的短肽在靶向蛋白的空间结构中的位置,生成空间比对区域;
    基于所述空间比对区域,鉴定所述空间比对区域中包含的抗原表位的关键结合位点。
  2. 根据权利要求1所述的方法,其中,所述获取多肽芯片中与待测抗体结合产生信号的氨基酸标识序列集合,包括:
    对于所述多肽芯片中的短肽,根据该短肽与所述待测抗体结合后产生的信号的信号强度值确定该短肽的第一信号强度值,其中,第一信号强度值在预设范围内;
    基于所述多肽芯片中的短肽的第一信号强度值,在所述多肽芯片中的短肽中确定信号短肽集合;以及
    基于各所述信号短肽的氨基酸序列生成所述氨基酸标识序列集合;
    优选地,所述基于所述多肽芯片中的短肽的第一信号强度值,在所述多肽芯片中的短肽中确定信号短肽集合,包括:
    将各所述短肽中第一信号强度值满足以下至少一个条件的短肽确定为信号短肽集合:
    第一条件:所述第一信号强度值为所述短肽的信号强度值除以基准信号强度值的比值,所述第一信号强度值大于预设比例阈值;
    第二条件:将各所述短肽的第一信号强度值从大到小排序,所述第一信号强度值排序位于预设排序范围;
    更优选地,所述信号短肽为同时满足所述第一条件和所述第二条件的短肽。
  3. 根据权利要求1或2所述的方法,其中,所述基于所述氨基酸标识序列集合生成关键氨基酸标识序列集合,包括:
    根据所述氨基酸标识序列集合生成氨基酸标识子序列集合;
    筛选所述氨基酸标识子序列集合并获得结合氨基酸标识序列集合;以及
    筛选所述结合氨基酸标识序列集合并获得关键氨基酸标识序列集合。
  4. 根据权利要求3所述的方法,其中,所述根据所述氨基酸标识序列集合生成氨基酸标识子序列集合,包括:
    对于所述氨基酸标识序列集合中的各氨基酸标识序列,执行以下分割操作并生成第一子序列集合:将该氨基酸标识序列分割为包含预设氨基酸标识个数的第一子序列,所述第一子序列中的氨基酸标识在该氨基酸标识序列中是连续出现的;
    对于所述氨基酸标识序列集合中的各氨基酸标识序列,执行加空操作并生成第二子序列集合,所述第二子序列为包含预设氨基酸标识个数的氨基酸标识子序列且该第二子序列中的氨基酸标识在该氨基酸标识序列中是非连续出现的;
    将所述第一子序列集合和所述第二子序列集合添加到氨基酸标识子序列集合中,生成氨基酸标识子序列集合;
    其中,所述执行加空操作并生成第二子序列集合,包括:
    所述氨基酸标识序列集合中的各氨基酸标识序列的首尾位置设定不为空,确定加空后生成的第二子序列中剩余氨基酸标识的数量是否大于等于预设加空数量,所述剩余氨基酸标识的数量为不包括首尾氨基酸标识的其余氨基酸标识的数量;
    若是,对由氨基酸标识构成的全氨基酸标识子序列进行加空操作,所述加空操作包括:
    利用指定占位符从1个到n个逐个替换所述全氨基酸标识子序列中的氨基酸标识,生成加n个空的第二子序列;
    其中,n为预设加空数量;
    若否,对由指定占位符构成的全空子序列进行加氨基酸标识操作,所述加氨基酸标识操作包括:
    利用氨基酸标识从1个到L-2-n个逐个替换指定占位符,生成包含L-2-n个剩余氨基酸标识的第二子序列;
    其中,L为所述第二子序列的预设长度,n为预设加空数量。
  5. 根据权利要求4所述的方法,其中,所述筛选所述氨基酸标识子序列集合并获得合氨基酸标识序列集合,包括:
    获取所述氨基酸标识子序列集合中的各所述氨基酸标识子序列的第二信号强度值,所述第二信号强度值为将所述氨基酸标识子序列对应的短肽的信号强度值从大到小排序,所述信号强度值排序位于预设排序范围内的信号强度的平均值;
    获取各所述氨基酸标识子序列的富集分析值,所述富集分析值用于表征所述氨基酸标识子序列在信号短肽中出现的随机性的值;
    确定各所述氨基酸标识子序列的第二信号强度值和富集分析值是否大于预设阈值;
    若是,所述氨基酸标识子序列是结合氨基酸标识序列;
    优选地,所述富集分析值为Zscore,其中,获取所述Zscore,包括:
    从所述多肽芯片中的全部短肽中随机抽样,每次抽取预设条数短肽;
    重复所述随机抽样预设次数;
    每次抽样之后生成取得的短肽的氨基酸标识子序列集合;
    统计在每一次抽样中包含该氨基酸标识子序列的短肽数量,从而获得包含该氨基酸标识子序列的短肽数量在预设 次数中的分布,获取平均值mean和标准差sd;
    待检测样本进行多肽芯片检测,获得所述信号短肽,基于所述信号短肽生成氨基酸标识子序列集合,统计包含某一氨基酸标识子序列的短肽的数量,记为X;
    其中,所述抽取预设条数短肽的数量应与待测样本进行多肽芯片检测时获得的所述信号短肽的数量相等;
    根据以下公式
    计算得到Zscore。
  6. 根据权利要求5所述的方法,其中,所述筛选所述结合氨基酸标识序列集合并获得关键氨基酸标识序列集合,包括:
    对于所述结合氨基酸标识序列集合中的各个结合氨基酸标识序列,响应于该结合氨基酸标识序列中氨基酸标识的个数等于m,所述m为正整数,该结合氨基酸标识序列为第一结合氨基酸标识序列,并执行以下递归过滤操作,包括:
    确定第二结合氨基酸标识序列的第二信号强度值或富集分析值是否大于该第一结合氨基酸标识序列,所述第二结合氨基酸标识序列为所述结合氨基酸标识序列集合中的结合氨基酸标识序列,所述第二结合氨基酸标识序列对应的氨基酸标识序列可以生成该第一结合氨基酸标识序列且所述第二结合氨基酸标识序列中氨基酸标识的个数为m+1;
    若是,该第二结合氨基酸标识序列是关键氨基酸标识序列。
  7. 根据权利要求6所述的方法,其中,所述基于多肽芯片中与待测抗体结合产生强信号的短肽在靶向蛋白的空间结构中的位置,包括:
    对于所述多肽芯片中的短肽,根据该短肽与所述待测抗体结合后产生的信号的信号强度值确定该短肽的第一信号强度值,其中,第一信号强度值在预设范围内;
    基于所述多肽芯片中的短肽的第一信号强度值,在所述多肽芯片中的短肽中确定强信号短肽集合;
    将所述强信号短肽集合中的各强信号短肽比对到靶向蛋白的空间结构,生成各所述强信号短肽对应的空间比对记录集合,其中,各所述强信号短肽对应至少一个空间比对记录;
    筛选所述空间比对记录集合中的各空间比对记录,确定各所述空间比对记录对应的强信号短肽的氨基酸标识序列中是否包含关键氨基酸标识序列;
    若是,将所述空间比对记录对应的强信号短肽添加入待整合表位区域集合;
    优选地,所述比对采用比对软件;优选地,所述比对软件选自PepSurf、Pep-3D-Search、PepMapper;更优选地,所述比对软件为PepSurf。
  8. 根据权利要求7所述的方法,其中,所述生成空间比对区域,包括:
    所述待整合表位区域集合中的各强信号短肽对应有第一信号强度值;
    对所述待整合表位区域集合中的各强信号短肽按照对应的第一信号强度值从大到小的顺序,对各所述强信号短肽进行排序;
    对于排序后的各所述强信号短肽,按照该强信号短肽的排序顺序,执行以下空间比对记录排序操作:
    对于该强信号短肽对应的空间比对记录集合中的各空间比对记录,按照各空间比对记录中的空间比对得分从大到小的顺序,对该短肽的各空间比对记录进行排序;
    对于排序后的各所述强信号短肽和对应的排序后的各所述空间比对记录,执行以下既有区域生成操作:
    以排序顺序第一的强信号短肽对应的排序第一的空间比对记录中比对上靶向抗原的氨基酸位点为种子,将所述种子添加入既有区域,所述既有区域为完成所述区域生长操作前的中间状态区域;
    基于所述既有区域,按照各所述强信号短肽的排序顺序从排序后的各所述强信号短肽和对应的排序后的各所述空间比对记录中选取一条强信号短肽及其对应的各空间比对记录,执行以下既有区域生长操作:
    确定该空间比对记录中的位点是否包含所述既有区域中的位点,且包含在既有区域中的位点数量不少于不包含在所述既有区域内的位点数量;
    若是,将该空间比对记录中不包含在所述既有区域中的位点加入所述既有区域;
    若否,按照该强信号短肽对应的各所述空间比对记录的排序顺序或各所述强信号短肽的排序顺序,对该强信号短肽对应的下一条空间比对记录或下一条强信号短肽执行所述区域生长操作;
    若遍历一次所述强信号短肽后,所述既有区域中的位点有增加,则再次遍历所有所述强信号短肽,直至所述既有区域中的位点不再增加,完成所述区域生长操作;
    对于该强信号短肽对应的排序后的各所述空间比对记录,确定该强信号短肽对应的全部空间比对记录中是否存在至少一条空间比对记录包含第一既有区域中的位点;
    若否,将该强信号短肽及其对应的排序顺序添加入候选种子集合;
    对于所述候选种子集合中的各强信号短肽,按照各所述强信号短肽对应的排序顺序,执行所述既有区域生成操作和所述既有区域生长操作;
    完成所述既有区域生长操作后,更新候选种子集合,直至再无法选出新种子;
    获得全部完成生长的既有区域,将各所述既有区域中的位点添加入空间比对区域集合,生成空间比对区域。
  9. 根据权利要求8任一项所述的方法,其中,所述基于所述空间比对区域,鉴定所述空间比对区域中包含的抗原表位的关键结合位点,包括:
    基于所述空间比对区域中比对上靶向抗原的氨基酸位点,确定所述氨基酸位点是否在任一个对应的空间比对记录中被包含在关键氨基酸标识序列中;
    若是,该氨基酸位点为为靶向蛋白中表位的关键结合位点。
  10. 生成特定长度氨基酸标识序列的方法,包括:
    生成氨基酸序列对应的氨基酸标识序列,执行以下分割操作并生成第一子序列集合:将该氨基酸标识序列分割为包含预设氨基酸标识个数的第一子序列,所述第一子序列中的氨基酸标识在该氨基酸标识序列中是连续出现的;
    对于所述氨基酸标识序列集合中的各氨基酸标识序列,执行加空操作并生成第二子序列集合,所述第二子序列为包含预设氨基酸标识个数的氨基酸标识子序列且该第二子序列中的氨基酸标识在该氨基酸标识序列中是非连续出现的;
    将所述第一子序列集合和所述第二子序列集合添加到氨基酸标识子序列集合中,生成氨基酸标识子序列集合;
    其中,所述执行加空操作并生成第二子序列集合,包括:
    所述氨基酸标识序列集合中的各氨基酸标识序列的首尾位置设定不为空,确定加空后生成的第二子序列中剩余氨基酸标识的数量是否大于等于预设加空数量,所述剩余氨基酸标识的数量为不包括首尾氨基酸标识的其余氨基酸标识的数量;
    若是,对由氨基酸标识构成的全氨基酸标识子序列进行加空操作,所述加空操作包括:
    利用指定占位符从1个到n个逐个替换所述全氨基酸标识子序列中的氨基酸标识,生成加n个空的第二子序列;
    其中,n为预设加空数量;
    若否,对由指定占位符构成的全空子序列进行加氨基酸标识操作,所述加氨基酸标识操作包括:
    利用氨基酸标识从1个到L-2-n个逐个替换指定占位符,生成包含L-2-n个剩余氨基酸标识的第二子序列;
    其中,L为所述第二子序列的预设长度,n为预设加空数量。
PCT/CN2023/114686 2022-09-01 2023-08-24 一种基于多肽芯片的抗体表位的高通量鉴定方法 WO2024046205A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211067234.9A CN117672369A (zh) 2022-09-01 2022-09-01 一种基于多肽芯片的抗体表位的高通量鉴定方法
CN202211067234.9 2022-09-01

Publications (1)

Publication Number Publication Date
WO2024046205A1 true WO2024046205A1 (zh) 2024-03-07

Family

ID=90083336

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/114686 WO2024046205A1 (zh) 2022-09-01 2023-08-24 一种基于多肽芯片的抗体表位的高通量鉴定方法

Country Status (2)

Country Link
CN (1) CN117672369A (zh)
WO (1) WO2024046205A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109651506A (zh) * 2017-10-11 2019-04-19 上海交通大学 一种快速获得抗原特异性抗体的方法
CN111440228A (zh) * 2020-03-09 2020-07-24 扬州大学 多种亚型流感病毒ha2蛋白共同抗原表位、抗体、鉴定方法和应用
CN112557644A (zh) * 2020-12-21 2021-03-26 珠海碳云智能科技有限公司 用于对目标抗体进行检测的多肽的筛选方法及其筛选的多肽的应用
CN114694743A (zh) * 2020-12-11 2022-07-01 深圳吉诺因生物科技有限公司 基于表位保守性的免疫多肽组鉴定方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109651506A (zh) * 2017-10-11 2019-04-19 上海交通大学 一种快速获得抗原特异性抗体的方法
CN111440228A (zh) * 2020-03-09 2020-07-24 扬州大学 多种亚型流感病毒ha2蛋白共同抗原表位、抗体、鉴定方法和应用
CN114694743A (zh) * 2020-12-11 2022-07-01 深圳吉诺因生物科技有限公司 基于表位保守性的免疫多肽组鉴定方法
CN112557644A (zh) * 2020-12-21 2021-03-26 珠海碳云智能科技有限公司 用于对目标抗体进行检测的多肽的筛选方法及其筛选的多肽的应用

Also Published As

Publication number Publication date
CN117672369A (zh) 2024-03-08

Similar Documents

Publication Publication Date Title
US11866785B2 (en) Tumor specific antibodies and T-cell receptors and methods of identifying the same
CN112010981B (zh) 一种小鼠抗人IgG单克隆抗体
Jia et al. A novel method of multiplexed competitive antibody binning for the characterization of monoclonal antibodies
US20190359679A1 (en) Diagnostic, prognostic, therapeutic and screening protocols
WO2022265066A1 (ja) SARS-CoV-2の免疫測定方法及び免疫測定キット
Arvey et al. Age-associated changes in the circulating human antibody repertoire are upregulated in autoimmunity
KR20240096684A (ko) Ns1 단백질의 결합 단백질 및 응용
CN117720650B (zh) 抗人呼吸道合胞病毒抗体及其应用
CN106939034B (zh) 用于鉴定受试者所感染的hev基因型的方法和试剂盒
WO2024046205A1 (zh) 一种基于多肽芯片的抗体表位的高通量鉴定方法
JP7455947B2 (ja) アデノウイルスの免疫測定方法及び免疫測定器具
JP2023052742A (ja) アデノウイルスの免疫測定方法及び免疫測定器具
WO2022041055A1 (zh) 用于检测新型冠状病毒抗体的试剂盒以及检测方法
CN117672370A (zh) 一种鉴定线性表位的方法
US20100279881A1 (en) Epitope-mediated antigen prediction
US20240233873A9 (en) Sequencing polyclonal antibodies directly from single particle cryoem data
JP7436274B2 (ja) アデノウイルスの免疫測定方法及び免疫測定器具
WO2022244861A1 (ja) 抗ノロウイルス抗体
JP6531262B2 (ja) 抗蛍光色素モノクローナル抗体
WO2022244860A1 (ja) 抗ノロウイルス抗体
CN115362257A (zh) 腺病毒的免疫测定方法及免疫测定器具
CN117672351A (zh) 一种评估抗体与多肽芯片结合特征的方法
CN113817025A (zh) Sle抗原表位多肽在鉴别sle和其他自身免疫疾病中的作用
CN113831401A (zh) 一种sle抗原表位多肽及其在sle诊断中的作用
CN116008555A (zh) 一种检测单克隆抗体对rankl结合力的方法及其应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23859238

Country of ref document: EP

Kind code of ref document: A1