WO2024083187A1 - 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法 - Google Patents

通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法 Download PDF

Info

Publication number
WO2024083187A1
WO2024083187A1 PCT/CN2023/125412 CN2023125412W WO2024083187A1 WO 2024083187 A1 WO2024083187 A1 WO 2024083187A1 CN 2023125412 W CN2023125412 W CN 2023125412W WO 2024083187 A1 WO2024083187 A1 WO 2024083187A1
Authority
WO
WIPO (PCT)
Prior art keywords
glycan
isomer
chemical formula
isomers
mass
Prior art date
Application number
PCT/CN2023/125412
Other languages
English (en)
French (fr)
Inventor
朱奕颖
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2024083187A1 publication Critical patent/WO2024083187A1/zh

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • the invention relates to the field of biotechnology, and in particular to a method for distinguishing polysaccharide structural isomers by replacing similar mass isotopes through computer simulation.
  • Protein glycosylation is one of the common post-translational modifications. About 50-70% of human proteins are glycosylated, including surface receptors, organelle-resident proteins, secretory proteins, and transport proteins. Protein glycosylation is a very important modification involved in many biological processes, such as mediating cell attachment, monitoring protein folding status and promoting protein delivery, stimulating signal transduction pathways, affecting protein-protein interactions, and changing protein solubility.
  • Polysaccharides are composed of basic structural units, monosaccharides. The intramolecular hemiacetal group of one monosaccharide and the hydroxyl group of another monosaccharide can form a glycosidic bond.
  • Glucose Glu/Glc
  • galactose Gal
  • Man mannose
  • Deoxyhexoses dHex
  • N-acetylglucosamine GlcNAc
  • N-acetylgalactosamine GalNAc
  • HexNAc N-acetylglucosamine
  • Sialic acid is a general term for substituted nine-carbon neuraminic acids, with N-acetylneuraminic acid (NeuAc) and N-glycolylneuraminic acid (NeuGc) being common in mammals.
  • NeuAc is widely found in human proteins, while NeuGc is a non-human sialic acid but has been found in apes.
  • Glycosylation can be divided into several subtypes based on different glycosidic bonds: N-linked glycosylation, O-linked glycosylation, glycosylation, C-linked glycosylation, and phosphoglycosylation.
  • N-linked glycans N-glycan database
  • O-linked glycans O-linked glycosylation
  • N-linked oligosaccharides are attached to the nitrogen atom of asparagine (Asn).
  • O-linked glycosylation is the attachment of sugars to oxygen atoms in serine, threonine, or tyrosine.
  • N-linked glycans have a common pentasaccharide core structure and are generally divided into three different subtypes: high mannose, complex, and hybrid.
  • Mass spectrometry analysis of glycosylation is more difficult than other post-translational modifications of proteins because of the wide variety of glycans and their complex structures.
  • the analysis methods of protein glycosylation can generally be divided into two categories. One is to release glycans from proteins by enzymatic hydrolysis and then specifically analyze pure sugar molecules or peptides. The other is to directly analyze glycopeptides, which carry information about the glycosyl linking sites. Due to the different ways of connecting glycan branches, there are structural isomers, which show the same parent ion mass in the mass spectrometry, making the analysis very difficult.
  • MS1 peak area to iTRAQ, TMT, and then to the use of heavy isotope labeled peptides as standards for accurate quantification the corresponding analysis software is also developing.
  • Data-dependent-analysis (DIA) of mass spectrometry is the most widely used spectral scanning mode, that is, the highest abundance ions are fragmented and scanned MS/MS spectra.
  • the peak area or peak height of MS1 in different samples is the simplest and most widely used method as the basis for relative quantification.
  • glycopeptides there is also a pain point in using quantitative software such as Skyline, because they are basically developed for general peptide molecules with fixed mass protein modifications, and are not specifically customized for complex sugar modifications.
  • the information input into the software needs to include the identified peptide sequence and modification mass.
  • the software only uses the modification mass as an identifier, and different glycopeptide linked sugar isomers will be regarded as the same peptide.
  • the technical problem to be solved by the present invention is how to distinguish sugar structural isomers based on mass spectrometry data quantification software and/or how to quantify sugar structural isomers in mass spectrometry analysis and/or how to quantify sugar structural isomers based on mass spectrometry data quantification software.
  • the present invention first provides a method for quantitative analysis of glycan isomers based on mass spectrometry data.
  • the method may include the following steps: using similar mass isotopes to replace isotopes in the structural isomers of the glycan isomers to be quantified by computer simulation to obtain simulated glycan isomers with changed chemical formula and mass; quantifying the simulated glycan isomers based on mass spectrometry data to obtain quantitative results of the different structural isomers.
  • the difference between the mass of the simulated glycan isomer and the mass of the glycan isomer to be quantified may be less than or equal to 0.2 Da.
  • the polysaccharide isomers to be quantified may be isomers.
  • the similar mass isotopes may be an isotope combination with a mass difference of no more than 0.05 Da.
  • similar mass isotopes of 14 N may be 13 C and 1 H
  • similar mass isotopes of 16 O may be 15 N and 1 H
  • similar mass isotopes of 15 N may be 12 C and 1 H.
  • x is the serial number of each structural isomer in the glycan isomer to be quantified, sorted from small to large according to the glycan ID number, the serial number is a natural number, and the sorting starts from 1 and counts continuously to n;
  • the computer simulation is simulated according to the number of N and O in the chemical formula of the glycan isomer to be quantified, including any of the following steps:
  • the number m of N in the chemical formula is greater than or equal to the number of structural isomers of the glycan isomer minus 1 (i.e., n-1), and for the structural isomer with a ranking number of x, x-1 14 N can be removed from the chemical formula, and x-1 13 C and x-1 1 H can be added to obtain the simulated glycan isomer.
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da;
  • the number m of N in the chemical formula is less than the number of structural isomers of the polysaccharide isomer minus 1 (i.e., n-1), but the sum of the number m of N and the number k of O is greater than or equal to the number of structural isomers The number of isomers is reduced by 1 (i.e., n-1).
  • x-1 14 N For the structural isomer with a sorting number of x, x-1 14 N can be removed from the chemical formula, and x-1 13 C and x-1 1 H can be added, until m 14 N is removed; then for the structural isomer with a sorting number of x, m 14 N can be removed from the chemical formula, and m 13 C and m 1 H can be added, and xm-1 16 O can be removed from the chemical formula, and xm-1 15 N and xm-1 1 H can be added to obtain the simulated glycan isomer.
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da;
  • the sum m+k of the number m of N and the number k of O in the chemical formula is less than the number of structural isomers of the polysaccharide isomer minus 1 (i.e., n-1), and for the structural isomer with a ranking number of x, in the chemical formula, x-1 14 N can be removed, x-1 13 C and x-1 1 H can be added, until m 14 N are removed; then for the structural isomer with a ranking number of x, in the chemical formula, m 14 N can be removed, m 13 C and m 1 H can be added, and in the chemical formula, xm-1 16 O can be removed, xm-1 15 N and xm-1 1 H can be added, until k 16 O is removed; then for the structural isomer with a ranking number of x, in the chemical formula, m 14 N can be removed, m 13 C and m 1 H can be added, and in the chemical formula, xm-1 16 O can be removed, xm
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da or m ⁇ 0.008106Da+k ⁇ 0.013019Da+(xmk-1) ⁇ 0.0233Da.
  • the polysaccharide isomers to be quantified may include n structural isomers, where n is a natural number.
  • N in the chemical formula is m, where m is a natural number.
  • the number of O in the chemical formula is k, where k is a natural number.
  • the x can be the serial number of each structural isomer in the glycan isomer to be quantified, sorted from small to large according to the glycan ID number.
  • the serial number is a natural number, and the sorting starts from 1 and counts continuously to n.
  • the glycan ID number can be derived from the GlycomeDB database (relevant website: www.glycome-db.org).
  • the mass spectrometry data quantification software may be Skyline software.
  • the present invention also provides a method for quantitatively analyzing glycopeptides containing glycan isomers in mass spectrometry data.
  • the method may include the following steps: replacing the isotopes in the glycan isomers contained in the glycopeptide with isotopes of similar mass by computer simulation to obtain simulated glycan isomers with changed chemical formula and mass, and obtain glycopeptides containing simulated glycan isomers; quantifying the glycopeptides containing the simulated glycan isomers based on mass spectrometry data using mass spectrometry data quantification software to obtain the quantitative results of the glycopeptides containing glycan isomers with different structures.
  • the difference between the mass of the simulated glycan isomer and the mass of the glycan isomer may be less than or equal to 0.2 Da.
  • the glycan isomers may be isomers.
  • the similar mass isotopes may be isotope combinations with a mass difference of no more than 0.05Da.
  • the similar mass isotopes of 14 N may be 13 C and 1 H
  • the similar mass isotopes of 16 O may be 15 N and 1 H
  • the similar mass isotopes of 15 N may be 13 C and 1 H.
  • the isotopes may be 12 C and 1 H.
  • x can be the serial number of each structural isomer in the glycan isomer to be quantified, sorted from small to large according to the glycan ID number, the serial number is a natural number, and the sorting starts from 1 and counts continuously to n;
  • the computer simulation can be simulated according to the number of N and O in the chemical formula of the glycan isomer to be quantified, comprising the following steps:
  • the number m of N in the chemical formula is greater than or equal to the number of structural isomers of the glycan isomer minus 1 (i.e., n-1), and for the structural isomer with a sorting sequence number of x, x-1 14 N are removed from the chemical formula, and x-1 13 C and x-1 1 H are added to obtain the simulated glycan isomer.
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da;
  • the number m of N in the chemical formula is less than the number of structural isomers of the polysaccharide isomer minus 1 (i.e., n-1), but the sum of the number m of N and the number k of O, m+k, is greater than or equal to the number of structural isomers minus 1 (i.e., n-1), and for the structural isomer with a ranking number of x, in the chemical formula, x-1 14 N is removed, x-1 13 C and x-1 1 H are added, until m 14 N is removed; then for the structural isomer with a ranking number of x, in the chemical formula, m 14 N is removed, m 13 C and m 1 H are added, and in the chemical formula, xm-1 16 O is removed, xm-1 15 N and xm-1 1 are added.
  • H obtains the simulated glycan isomer, and compared with the glycan isomer to be quantified, the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da;
  • the sum m+k of the number m of N and the number k of O in the chemical formula is less than the number of structural isomers of the polysaccharide isomer minus 1 (i.e., n-1), and for the structural isomer with a ranking number of x, in the chemical formula, x-1 14 N is removed, x-1 13 C and x-1 1 H are added, until m 14 N is removed; then for the structural isomer with a ranking number of x, in the chemical formula, m 14 N is removed, m 13 C and m 1 H are added, and in the chemical formula, xm-1 16 O is removed, xm-1 15 N and xm-1 1 H are added, until k 16 O is removed; then for the structural isomer with a ranking number of x, in the chemical formula, m 14 N is removed, m 13 C and m 1 H are added, and in the chemical formula, xm-1 16 O is removed, xm-1 15 N and xm-1 1
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da or m ⁇ 0.008106Da+k ⁇ 0.013019Da+(xmk-1) ⁇ 0.0233Da.
  • the glycan isomer includes n structural isomers, where n is a natural number, m is a natural number, and k is a natural number.
  • the x is the serial number of each structural isomer in the glycan isomers sorted from small to large according to the glycan ID number, the serial number is a natural number, and the sorting starts from 1 and counts continuously to n.
  • the glycan ID number can be derived from the GlycomeDB database (related website www.glycome-db.org).
  • the mass spectrometry data quantification software may be Skyline software.
  • the present invention also provides a device for quantitatively analyzing glycopeptides containing glycan isomers in mass spectrometry data.
  • the device may include the following modules:
  • Mass spectrum data acquisition module used to acquire mass spectrum data of samples
  • glycopeptide identification module used to identify the glycopeptides contained in the sample based on the mass spectrometry data
  • Glycopeptide quantification module used to quantify the glycopeptide.
  • the glycopeptide quantification module includes the following modules:
  • B3-1) a glycan isomer simulation module: used to perform computer simulation on glycan isomers with different structures contained in the glycopeptide to obtain simulated glycan isomers, and obtain glycopeptides containing simulated glycan isomers;
  • Glycopeptide quantification module used to quantify the glycopeptide containing the simulated glycan isomer using mass spectrometry data quantification software to obtain the quantitative results of the glycopeptide containing the glycan isomer.
  • x is the serial number of each structural isomer in the glycan isomer to be quantified, sorted from small to large according to the glycan ID number, the serial number is a natural number, and the sorting starts from 1 and counts continuously to n.
  • the computer simulation is simulated according to the number of N and O in the chemical formula of the glycan isomer to be quantified, and is established by a method comprising the following steps:
  • the number m of N in the chemical formula is greater than or equal to the number of structural isomers of the glycan isomer minus 1 (i.e., n-1), and for the structural isomer with a sorting sequence number of x, x-1 14 N are removed from the chemical formula, and x-1 13 C and x-1 1 H are added to obtain the simulated glycan isomer.
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da;
  • the number m of N in the chemical formula is less than the number of structural isomers of the polysaccharide isomer minus 1 (i.e., n-1), but the sum of the number m of N and the number k of O m+k is greater than or equal to the number of structural isomers minus 1 (i.e., n-1), for the structural isomer with a ranking number of x, in the chemical formula, x-1 14 N is removed, x-1 13 C and x-1 1 H are added, until m 14 N is removed; then for the structural isomer with a ranking number of x, in the chemical formula, m 14 N is removed, m 13 C and m 1 H are added, and in the chemical formula, xm-1 16 O is removed, xm-1 15 N and xm-1 1 are added.
  • H obtains the simulated glycan isomer, and compared with the glycan isomer to be quantified, the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da;
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da or m ⁇ 0.008106Da+k ⁇ 0.013019Da+(xmk-1) ⁇ 0.0233Da.
  • the polysaccharide isomers may include n structural isomers, where n is a natural number, m is a natural number, and k is a natural number.
  • the sequence number is a natural number, and the sequence is counted continuously from 1 to n.
  • the glycan ID number comes from the GlycomeDB database (related website www.glycome-db.org).
  • the mass spectrometry data quantification software may be Skyline software.
  • the present invention further provides a computer-readable storage medium storing a computer program, wherein the computer program enables a computer to execute the steps of any of the methods described above.
  • the present invention uses mass spectrometry to analyze sialic acid-containing glycopeptides in the serum of liver cancer patients and normal human serum, and a total of 1218 glycopeptides are identified by searching with pGlyco software.
  • mass spectrometry uses mass spectrometry to analyze sialic acid-containing glycopeptides in the serum of liver cancer patients and normal human serum, and a total of 1218 glycopeptides are identified by searching with pGlyco software.
  • the polysaccharide isomers in 1218 glycopeptides are distinguished by mass fine-tuning, and then all the identified glycopeptides are quantified using Skyline software. The results show that there are no missing values for glycopeptides, and finally 315 glycopeptides are obtained with a change of more than 2.5 times in liver cancer and normal human serum.
  • the present invention has the following beneficial effects:
  • the present invention uses computer simulation (in silico) to replace isotopes of similar mass to distinguish carbohydrate structural isomers, so that the software can distinguish isomers and quantify them separately.
  • the present invention uses computer simulation (in silico) to replace isotopes of similar mass to distinguish carbohydrate structural isomers, so that mass spectrometry data quantification software can distinguish isomers and quantify isomer molecules separately.
  • the sources of reagents or consumables in the embodiments of the present invention are as follows: 4-Hydroxyethylpiperazineethanesulfonic acid: Sigma-Aldrich 54457; Pierce BCA kit: ThermoFisher 23227; Dithiothreitol: Invitrogen 15508013; Iodoacetamide: Sigma-Aldrich H4034; Pancreatin: Promega V5113; Formic acid: Fisher A117-50; Solid phase extraction C18 cartridge: CDS 4215SD; IMAC Fe-NTA: ThermoFisher A32992; C18 stagetip:CDS Empore 2215.
  • Example 1 Establishment of a method for distinguishing polysaccharide structural isomers by replacing similar mass isotopes through computer simulation
  • Serum samples from liver cancer patients and healthy subjects were dissolved in 4X volume lysis buffer (solution composition: 9M urea, 20mM 4-hydroxyethylpiperazineethanesulfonic acid), centrifuged at 16000xg for 5 minutes, and the supernatant was retained to obtain the dissolved serum protein solution.
  • the protein concentration in the serum protein solution of the two samples was determined using the Pierce BCA kit.
  • Fe-NTA IMAC beads were used to enrich sialic acid-containing glycopeptides. According to the experimental steps in the kit instructions, 0.5 mg of purified serum protease hydrolysate was mixed with IMAC beads for one hour, eluted, spin-dried and resuspended in 0.1% formic acid solution, desalted with C18 stage tip, spin-dried and redissolved in 50 ⁇ L 0.1% formic acid.
  • LC-MS/MS Thermo Fisher U3000 nanoUPLC coupled to a Thermo Fisher 3-in-1 tandem Orbitrap Eclipse mass spectrometer was used for detection. 50 cm (100 ⁇ m ID, 1.9 ⁇ m C18 filler) analytical column. Solution A in the liquid phase was 0.1% formic acid in water, and solution B was 80% acetonitrile, 0.1% formic acid in water. The injection volume was 4 ⁇ L, and each sample was repeated twice. The liquid phase gradient increased from 4% to 50% in 90 minutes. The composition of solvent B was 80% acetonitrile, 0.1% formic acid in water, with a flow rate of 0.3 ⁇ L/min.
  • Raw files were generated after mass spectrometry scanning. The raw files corresponding to liver cancer patient samples were named Cancer.raw, and the raw files corresponding to healthy human samples were named Normal.raw.
  • the pGlyco 2.0 software (download link: http://pfind.org/software/pGlyco/index.html) was used with default search parameters, and the uniprot human protein sequence database and the human N-linked glycan database (N-glycan database) used by pGlyco in 2020 were selected, which contained a total of 8093 glycan IDs, and the Total FDR was set to 1%.
  • the searched glycopeptide identification data were in txt files named Cancer.txt (corresponding to liver cancer patients) and Normal.txt (corresponding to healthy people).
  • the mass of the simulated glycan isomer increases by (x-1) ⁇ 0.008106Da or m ⁇ 0.008106Da+(xm-1) ⁇ 0.013019Da or m ⁇ 0.008106Da+k ⁇ 0.013019Da+(xmk-1) ⁇ 0.0233Da.
  • the mass of the glycan isomers has changed slightly after the computer simulation changes the chemical formula:
  • glycan with ID sorting number x as 3 (glycan ID is 1269): after computer simulation, the chemical formula changed from C90H146N6O65 to C90H148N4O65C'2, that is, (3-1) 14 Ns were removed from the chemical formula, and (3-1) 13 Cs and 2 1 Hs were added. Compared with the mass of the original glycan isomer, the mass increased by (3-1) ⁇ 0.008106Da;
  • the glycan isomers According to the masses of the glycan isomers obtained after simulation, the glycan isomers can be distinguished in the subsequent analysis software.
  • the resulting file is saved as the new database as shifted glycans.txt file.
  • glycopeptide result file (txt format) obtained from the search was converted into a pepXML file.
  • the parameters of each glycopeptide in the pepXML file are set as follows:
  • analysis_summary represents the analysis summary
  • msms_run_summary represents the secondary mass spectrometry analysis summary
  • base_name represents the base name
  • raw_data represents the raw data name
  • raw_data_type represents the raw data type
  • fragment_mass_type represents the fragmentation mass type
  • precursor_mass_type represents the precursor mass type
  • search_engine represents the search software.
  • aminoacid_modification is the amino acid modification
  • aminoacid is the modified amino acid
  • Massdiff is the modification The mass of the group, mass is massdiff + amino acid residue mass; variable is the variable modification, and description is the functional description.
  • mod_aminoacid_mass position sum of amino acids on the peptide segment
  • mass masses of modified amino acids
  • the glycan modifications are added according to the sugar modification masses in the regular glycans.txt file or the shifted glycans.txt file obtained in step 2.3.1.
  • the report includes the report Cancer.txt/Normal.txt of step 2.2 pGlyco glycopeptide identification and the shifted glycans.txt of step 2.3.1: Gene name gene name, Protein name protein name, Accession protein number in the database, kD protein mass, Site sugar modification site, GlyID sugar number, Glycan sugar composition, Glymass normal sugar mass, Calc.m/z theoretical glycopeptide mass-to-charge ratio, PlausibleStruct sugar possible structure, Peptide sugar modified peptide sequence, Charge charged Charge, GlycoPeptide modified peptide sequence (change the modified aspartic acid originally replaced by J in the pGlyco search result back to N, and add the normal modified mass [+XXXX.XX] after the modified amino acid.
  • This modified peptide format can be accepted by Skyline), Shift GlycoPeptide (same as GlycoPeptide, but according to the rule of shifted glycans, the modified glycan is replaced with the changed mass), PPM (glycopeptide mass change), Total area (Cancer, Normal)
  • the peak area of glycopeptide includes liver cancer patients and healthy controls (this item is reserved for the next step 2.3.5, leave it blank for now).
  • Peptide Settings Digestion Enzyme Trypsin; Transition Setting: Precursor Charge 2, 3, 4, 5, 6, 7, Ion Charges 1, 2, 3, 4, 5, 6, Ion Types y, b, p; Resolving Power (matching mass spectrometer MS1 settings): Resolving Power 120,000 at 200 m/z.
  • ppm specific values
  • glycopeptides with changed mass including glycan isomers and computer-simulated mass changes
  • re-create the Skyline file and name it shifted test repeat a-c (import the modification definition in step a, and use the "shifted glycans.txt" file obtained in step 2.3.1.2 and the pepXML file with changed mass obtained in step 2.3.2 when building the library in step c).
  • 2.3.4 report divide the glycopeptides into two columns: 0-10ppm and 10-50ppm, adjust the mass accuracy under the Transition Settings-Full Scan tab, and analyze them separately. Make sure all peptides match the library spectrum, import the original file and manually adjust the peaks as usual, and export two Skyline glycopeptide peak area reports.
  • sialic acid-containing glycopeptides in the serum of liver cancer patients and normal human serum were analyzed by mass spectrometry, and a total of 1218 glycopeptides were identified by searching with pGlyco software.
  • the method of distinguishing glycan structural isomers by replacing similar mass isotopes by computer simulation established by the present invention was used. After fine-tuning the mass of the glycan isomers in the glycopeptides for differentiation, all the identified glycopeptides were quantified using Skyline software. The results showed that there were no missing values for glycopeptides, and it was finally found that 315 glycopeptides changed by more than 2.5 times in the serum of liver cancer patients and normal human serum.
  • the present invention uses mass spectrometry to analyze sialic acid-containing glycopeptides in the serum of liver cancer patients and normal human serum, and a total of 1218 glycopeptides are identified by searching with pGlyco software.
  • mass spectrometry uses mass spectrometry to analyze sialic acid-containing glycopeptides in the serum of liver cancer patients and normal human serum, and a total of 1218 glycopeptides are identified by searching with pGlyco software.
  • the polysaccharide isomers in 1218 glycopeptides are distinguished by fine-tuning the mass, and then all the identified glycopeptides are quantified using Skyline software. The results show that there are no missing values for glycopeptides, and finally 315 glycopeptides are obtained with a change of more than 2.5 times in liver cancer and normal human serum.
  • the method of the present invention can simultaneously and accurately and without missing values quantify and differentially analyze all identified glycopeptides containing complex modifications of different glycosyl groups, and can be applied to the preparation of products for distinguishing different glycopeptide-linked sugar isomers or the preparation of products for glycosylation mass spectrometry analysis.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

本发明公开了通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法。所述方法通过计算机模拟使用相似质量同位素替换待定量的聚糖异构体的结构异构体中的同位素,得到化学式和质量发生微调改变(质量差小于0.2Da)的模拟聚糖异构体,同时基于质谱数据可对模拟后获得的结构异构体进行定量。实验证明,通过本发明所建立的方法对肝癌患者血清与正常人血清中鉴定得到的1218条糖肽中聚糖异构体的结构异构体进行区分和定量,最终糖肽无缺失值且获得315条糖肽在肝癌与正常人血清中的改变大于2.5倍。因此本发明所建立的方法可以有效地区分不同糖肽链接糖异构体,并且准确地无缺失值地对鉴定得到的糖肽均进行定量和差异分析。

Description

通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法 技术领域
本发明涉及生物技术领域,具体涉及通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法。
背景技术
蛋白质糖基化是常见的翻译后修饰之一,大约50-70%的人类蛋白质是糖基化的,包括表面受体、细胞器驻留蛋白、分泌蛋白和运输蛋白。蛋白质糖基化是种非常重要的修饰,涉及许多生物过程,例如介导细胞附着、监测蛋白质折叠状态和促进蛋白质递送、刺激信号转导途径、影响蛋白质-蛋白质相互作用和改变蛋白质的溶解度。聚糖由基本结构单元单糖组成。一种单糖的分子内半缩醛基团和另一种单糖的羟基可以形成糖苷键。葡萄糖(Glu/Glc)、半乳糖(Gal)和甘露糖(Man)是立体异构体,称为己糖(Hex)。脱氧己糖(dHex)是羟基被氢原子取代的己糖,如岩藻糖(Fuc)。N-乙酰氨基葡萄糖(GlcNAc)和N-乙酰半乳糖胺(GalNAc)均为N-乙酰己糖胺(HexNAc)。唾液酸是取代的九碳神经氨酸的总称,在哺乳动物中常见N-乙酰神经氨酸(NeuAc)和N-羟乙酰神经氨酸(NeuGc)。NeuAc广泛存在于人类蛋白质中,而NeuGc是非人类唾液酸,但已在猿类中发现。糖基化可以通过不同的糖苷键分为几个亚型:N-连接糖基化、O-连接糖基化、糖基化、C-连接糖基化和磷酸糖基化。常见的为糖苷键为N-连接糖基化的N-连接型聚糖(N-glycan database)与糖苷键为O-连接糖基化的O-连接型聚糖(O-glycan database)。N-连接寡糖附着在天冬酰胺(Asn)的氮原子上。O-连接糖基化是将糖连接到丝氨酸、苏氨酸或酪氨酸中的氧原子上。N-连接型聚糖具有共同的五糖核心结构,且通常可分为三种不同的亚型:高甘露糖、复合物和杂合体。
糖基化的质谱分析比其他的蛋白翻译后修饰更困难,原因在于聚糖种类繁多,结构复杂。蛋白质糖基化的分析方法一般可分两大类,一是将聚糖从蛋白上酶解释放,再具体分析纯糖分子或多肽,另一类是直接分析糖肽,糖肽带有糖基链接位点的信息。因聚糖支链连接方式不同,存在结构异构体,在质谱上体现相同的母离子质量,分析难度很大。随着质谱技术的发展,二级甚至多级质谱的出现可以使糖分子进一步解离,从而解析结构异构体,也同时出现了很多糖分子或者糖肽的大规模搜索软件比如pGlyco,ProteinProspector,O-Pair。现阶段糖蛋白质组学已从定性走向定量化,即不但要鉴定不同组别蛋白上的糖基化类别,还要对不同糖基做定量。对质谱鉴定到的分子做大规模定量方法也一直在更新优化中。以肽段定量为例,从spectral counting(二级图谱数),MS1 peak area到iTRAQ,TMT,再到用重同位素标记肽段做标准品进行精准定量,相应分析软件也在跟随着发展。质谱的data-dependent-analysis(DIA)是应用最广泛的扫谱模式,即取最高丰度的离子进行裂解并扫MS/MS谱图。其后用离子 在不同样品中MS1的峰面积或峰高作为相对定量的依据是最简便也是应用最广泛的方法。因为DIA的扫谱原理,会出现缺失值,即一些肽段在一些样本中会被选择做二级MS/MS质谱,在另一些样本没有被选择到,导致即使肽段存在于样本中,但最后结果中没有二级谱图及定量数据。因为这一痛点,专门定量的软件如Progenesis,Skyline被设计开发出来。比如,Match-between-runs算法,即根据离子其他特性比如保留时间做峰提取,可以大幅减少缺失值提高重复性,很多软件更有可视化窗口可进行手动调整。但是对于糖肽,使用像Skyline这样的定量软件亦有一痛点,因为它们的开发基本上是用于一般的肽段分子带着固定质量的蛋白质修饰,没有专门为糖基复杂修饰定制。输入软件的信息需要包括鉴定到的肽段序列及修饰质量,软件仅靠修饰质量为标识符,不同糖肽链接糖异构体会被视作同一肽段。
发明公开
本发明所要解决的技术问题是如何基于质谱数据定量软件区分糖结构异构体和/或如何在质谱分析中定量糖结构异构体和/或如何基于质谱数据定量软件定量糖结构异构体。
为了解决上述技术问题,本发明首先提供了基于质谱数据对聚糖异构体进行定量分析的方法。所述方法可包括如下步骤:通过计算机模拟使用相似质量同位素替换待定量的聚糖异构体的结构异构体中的同位素,得到化学式和质量发生改变的模拟聚糖异构体;基于质谱数据对所述模拟聚糖异构体进行定量,得到所述不同结构异构体的定量结果。
所述模拟聚糖异构体的质量与所述待定量的聚糖异构体的质量的差值可小于等于0.2Da。
所述待定量的聚糖异构体可为同分异构体。
所述相似质量同位素可为质量差不超过0.05Da的同位素组合。如14N的相似质量同位素可为13C和1H,16O的相似质量的同位素可为15N和1H,15N的相似质量同位素可为12C和1H。
上述方法中,以x为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n;所述计算机模拟根据所述待定量聚糖异构体的化学式中N和O的数量进行模拟,包括如下任一步骤:
A1)所述化学式中N的数量m大于或等于所述聚糖异构体的结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中可去掉x-1个14N,增加x-1个13C及x-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da;
A2)所述化学式中N的数量m小于所述聚糖异构体的结构异构体的数量减1(即n-1),但是N的数量m和O的数量k之和m+k大于或等于所述结构异构 体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中可去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中可去掉m个14N,增加m个13C及m个1H,并且在化学式中可去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
A3)所述化学式中N的数量m和O的数量k之和m+k小于所述聚糖异构体的结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中可去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中可去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中可去掉m个14N,增加m个13C及m个1H,并且在化学式中可去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,同时在化学式中可去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da。
所述待定量的聚糖异构体可包含n个结构异构体,n为自然数。
所述化学式中N的数量为m,m为自然数。
所述化学式中O的数量为k,k为自然数。
所述x可为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号。所述序号是一个自然数,所述排序从1开始连续计数至n。
所述glycan ID号可来源于GlycomeDB数据库(相关网址:www.glycome-db.org)。
所述质谱数据定量软件可为Skyline软件。
为了解决上述技术问题,本发明还提供了定量分析质谱数据中包含聚糖异构体的糖肽的方法。所述方法可包括如下步骤:通过计算机模拟使用相似质量的同位素替换所述糖肽含有的聚糖异构体中的同位素,得到化学式和质量发生改变的模拟聚糖异构体,并获得包含模拟聚糖异构体的糖肽;基于质谱数据对包含所述模拟聚糖异构体的所述糖肽使用质谱数据定量软件进行定量,得到所述包含不同结构聚糖异构体的糖肽的定量结果。
所述模拟聚糖异构体的质量与所述聚糖异构体的质量的差值可小于等于0.2Da。
所述聚糖异构体可为同分异构体。
所述相似质量同位素可为质量差不超过0.05Da的同位素组合。如14N的相似质量同位素可为13C和1H,16O的相似质量的同位素可为15N和1H,15N的相似质 量同位素可为12C和1H。
上述方法中,可以x为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n;所述计算机模拟可根据所述待定量聚糖异构体的化学式中N和O的数量进行模拟,包括如下步骤:
A1)所述化学式中N的数量m大于或等于所述聚糖异构体的结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da;
A2)所述化学式中N的数量m小于所述聚糖异构体的结构异构体的数量减1(即n-1),但是N的数量m和O的数量k之和m+k大于或等于所述结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
A3)所述化学式中N的数量m和O的数量k之和m+k小于所述聚糖异构体的结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,同时在化学式中去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da所述聚糖异构体包含n个结构异构体,n为自然数。m为自然数。k为自然数。
所述x为所述聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n。
所述glycan ID号可来源于GlycomeDB数据库(相关网址www.glycome-db.org)。
上述方法中,所述质谱数据定量软件可为Skyline软件。
为了解决上述技术问题,本发明还提供了定量分析质谱数据中包含聚糖异构体的糖肽的装置。所述装置可包括如下模块:
B1)质谱数据获取模块:用于获取样本的质谱数据;
B2)糖肽鉴定模块:用于基于所述质谱数据鉴定得到样本含有的糖肽;
B3)糖肽定量模块:用于对所述糖肽进行定量。
B3)所述糖肽定量模块包括如下模块:
B3-1)聚糖异构体模拟模块:用于对所述糖肽含有的不同结构的聚糖异构体进行计算机模拟获得模拟聚糖异构体,并获得包含模拟聚糖异构体的糖肽;
B3-2)糖肽定量模块:用于对所述包含模拟聚糖异构体的糖肽使用质谱数据定量软件进行定量,获得包含聚糖异构体的糖肽的定量结果。
上述装置中,以x为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n。所述电脑模拟根据所述待定量聚糖异构体的化学式中N和O的数量进行模拟,通过包括如下步骤的方法建立:
C1)所述化学式中N的数量m大于或等于所述聚糖异构体的结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da;
C2)所述化学式中N的数量m小于所述聚糖异构体的结构异构体的数量减1(即n-1),但是N的数量m和O的数量k之和m+k大于或等于所述结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
C3)所述化学式中N的数量m和O的数量k之和m+k小于所述聚糖异构体的结构异构体的数量减1(即n-1),对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,同时在化学式中去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da。
所述聚糖异构体可包含n个结构异构体,n为自然数。m为自然数。k为自然数。
所述序号是一个自然数,所述排序从1开始连续计数至n。
所述glycan ID号来源于GlycomeDB数据库(相关网址www.glycome-db.org)。
上述装置中,所述质谱数据定量软件可为Skyline软件。
为了解决上述技术问题,本发明还提供了一种存储有计算机程序的计算机可读存储介质,所述计算机程序使计算机执行如上文任一所述方法的步骤。
本发明通过质谱分析肝癌患者血清与正常人血清中的含唾液酸的糖肽,用pGlyco软件搜索共鉴定到1218条糖肽。使用本发明所建立的通过计算机模拟替换相近质量同位素区分聚糖结构异构体方法,经过对1218条糖肽中的聚糖异构体进行质量微调加以区分,再用Skyline软件将鉴定到的糖肽全部做了定量,结果显示,糖肽无缺失值,最终获得315条糖肽在肝癌与正常人血清中的改变大于2.5倍。实验证明,本发明所建立的通过计算机模拟替换相近质量同位素区分聚糖结构异构体方法可以有效地区分不同糖肽链接糖异构体,同时准确地并且无缺失值地对鉴定得到的糖肽均进行定量和差异分析。
与现有技术相比,本发明的有益效果在于:
本发明采用电脑模拟(in silico)替换相似质量同位素的方法区别糖类结构异构体,使软件能区分异构体并且进行分别定量。
本发明采用电脑模拟(in silico)替换相似质量同位素的方法区别糖类结构异构体,使质谱数据定量软件能区分异构体并且对异构体分子进行分别定量。
实施发明的最佳方式
下面结合具体实施方式对本发明进行进一步的详细描述,给出的实施例仅为了阐明本发明,而不是为了限制本发明的范围。以下提供的实施例可作为本技术领域普通技术人员进行进一步改进的指南,并不以任何方式构成对本发明的限制。
下述实施例中的实验方法,如无特殊说明,均为常规方法。下述实施例中所用的材料、试剂、仪器等,如无特殊说明,均可从商业途径得到。以下实施例中的定量试验,均设置两次重复实验,结果取平均值。
本发明实施例中的试剂或耗材来源如下:
4-羟乙基哌嗪乙磺酸:Sigma-Aldrich 54457;
Pierce BCA试剂盒:ThermoFisher 23227;
二硫苏糖醇:Invitrogen 15508013;
碘乙酰胺:Sigma-Aldrich H4034;
胰酶:Promega V5113;
甲酸:Fisher A117-50;
固相萃取C18小柱:CDS 4215SD;
IMAC Fe-NTA:ThermoFisher A32992;
C18 stagetip:CDS Empore 2215。
实施例1、通过电脑模拟替换相近质量同位素区分聚糖结构异构体方法的建立
1.样本采集
选取1名在山东省立医院被诊断为肝癌的患者和1名健康人作为样本采集对象。研究方案经山东省立医院伦理委员会批准,研究根据《赫尔辛基宣言》的原则进行的。在入组前,每位参与者或其法律代表都签署书面的知情同意书。
采集肝癌患者和健康人全血,通过离心方式获得肝癌患者和健康人的血清样本。
2.唾液酸N-聚糖多肽的质谱鉴定与定量
2.1样本准备与质谱检测
2.1.1样本准备
2.1.1.1血清蛋白酶解
分别将肝癌患者和健康人的血清样本溶解在4X体积裂解液(溶液组成为:9M尿素,20mM 4-羟乙基哌嗪乙磺酸)中,16000xg离心5分钟留取上清得到溶解的血清蛋白溶液,Pierce BCA试剂盒测定两个样本溶解的血清蛋白溶液中蛋白浓度。
之后取1mg溶解的血清蛋白溶液,加入二硫苏糖醇至终浓度4.5mM室温反应1小时;然后加入碘乙酰胺至终浓度10mM,避光室温反应半小时;按照质量比为酶:蛋白=1:20(w:w)的比例加入胰酶,室温过夜得到血清蛋白酶解液。血清蛋白酶解液中加入甲酸至终浓度0.1%,使用固相萃取C18小柱脱盐得到纯化的血清蛋白酶解液后备用。
2.1.1.2富集唾液酸糖肽
使用Fe-NTA IMAC珠富集含唾液酸的糖肽。按照试剂盒说明书实验步骤流程,取0.5mg纯化的血清蛋白酶解液,与IMAC珠子混合一小时,洗脱后旋干重悬在0.1%甲酸溶液中,再用C18stagetip除盐,旋干后重新溶解在50μL 0.1%甲酸中。
2.1.2质谱检测
LC-MS/MS:使用赛默飞U3000纳升流速超高效液相色谱(nanoUPLC)联用赛默飞三合一串联Orbitrap Eclipse质谱仪进行检测。50cm(100μm ID,1.9μm C18填料)的分析柱。液相中溶液A为0.1%甲酸水溶液,B为80%乙腈、0.1%甲酸水溶液。进样量为4μL,每个样本两次技术重复。液相梯度在90分钟内从4%升至50%。溶剂B的组成为80%乙腈、0.1%甲酸水溶液,流速0.3μL/分钟。
一级与二级质谱数据皆为高质量精度高灵敏度orbitrap质量分析器获得:一级扫描范围(m/z)=800–2000;分辨率=120,000;AGC=200,000;最大注入时间=100ms;包含电荷状态=2-6;n次后动态排除,n=1;动态排除持续时间=15秒;质谱裂解方式设置为stepped HCD(NCE=30%±10%);二级隔离窗口=2;分辨率=15,000;AGC目标=500,000;最大注入时间=250ms。质谱扫描后生成raw文件,肝癌患者样本对应的raw文件命名为Cancer.raw,健康人样本对应的raw文件命名为Normal.raw。
2.2糖肽鉴定:
使用pGlyco 2.0软件(下载网址http://pfind.org/software/pGlyco/index.html)默认搜索参数,选择uniprot人类蛋白序列数据库及pGlyco在2020年使用的人类N-连接型聚糖数据库(N-glycan database)共包含8093个glycan ID的聚糖,Total FDR设置为1%。搜索后的糖肽鉴定数据为txt文件,命名为Cancer.txt(对应肝癌患者)及Normal.txt(对应健康人)。
2.3糖肽定量
2.3.1聚糖Glycan数据库格式转换
将聚糖Glycan数据库(N-glycan database)转换成质谱数据肽段定量软件Skyline(下载网址https://skyline.ms/project/home/software/skyline/begin.view)可接受的格式。
2.3.1.1聚糖格式转换
在步骤2.2得到的glycan数据库(N-glycan database)鉴定结果中,以Glycan ID 127为例进行转换格式的描述:在原始Glycan数据库中Glycan ID 127的参数为“kind=43100”意义为Hex=4,HexNAc=3,NeuAc=1,NeuGc=0,Fuc=0,化学式为C59H96N4O43;在转换后的新格式中Glycan ID 127的参数为<static_modification,aminoacid="N",explicit_decl="true",formula="C59H96N4O43",name="127"/>。
将所有聚糖(包括非异构体聚糖和异构体聚糖)进行格式转换,将得到的文件保存为regular glycans.txt文件。
2.3.1.2异构体聚糖格式转换
将具有相同化学式和质量的聚糖同分异构体的质量进行微调改变,即通过计算机模拟,使用相似质量的同位素替换聚糖异构体中的同位素,得到化学式和质量发生改变的模拟聚糖异构体。改变聚糖异构体的化学式后,其质量(分子量)会有很小的变化,可以据此在后续分析软件中可以将聚糖异构体区分开,并在分析最后依此规则再找到原始的聚糖异构体,在最终结果输出中使用原始的聚糖异构体原始的化学式及结构。具体步骤如下:
找出所有具有相同质量的聚糖异构体(n个),按glycan ID号从小到大排列顺序,序号记为x(x为从1开始连续计数至n中的任一自然数),并根据以下规则使用计算机模拟(In silico)微调(质量变化小于0.2Da)改变聚糖同分异构体的化学式和质量:
Ⅰ.化学式中N的数量(m)大于或等于结构异构体数量减1(即n-1)的聚糖异构体:对于排序为x的异构体,在化学式中去掉x-1个14N,增加x-1个13C及1H,与原聚糖异构体质量相比,质量增加(x-1)×0.008106Da;
Ⅱ.化学式中N的数量(m)小于结构异构体的数量减1(即n-1),但是N的数量(m)O的数量(k)之和(m+k)大于或等于结构异构体数量减1(即n-1)的聚糖异构体:对于排序为x的异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体, 在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
Ⅲ.化学式中N的数量(m)和O的数量(k)之和(m+k)小于结构异构体数量减1(即n-1)的聚糖异构体:对于排序为x的异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,同时在化学式中去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da。
例如表1中所示,鉴定得到人血清糖肽包含具有相同的化学式C90H146N6O65和相同的质量2350.83035Da的6个聚糖异构体(n=6)修饰具有不同的结构(化学式中存在足够量N:N的数量(m=6)大于结构异构体数量减1(即m>n-1=5)),糖ID(glycan ID)分别为1266-1273(ID排序序号分别为1-6)。计算机模拟改变化学式后聚糖异构体的质量有了很小的变化:
对于ID排序序号x为1的聚糖(glycan ID为1266):化学式和质量未被进行模拟,未发生变化;
对于ID排序序号x为2的聚糖(glycan ID为1267):计算机模拟后,化学式由C90H146N6O65变为了C90H147N5O65C’1,即在化学式中去掉(2-1)个即1个14N,增加(2-1)个即1个13C及1个1H,与原聚糖异构体质量相比,质量增加了(2-1)×0.008106Da;
对于ID排序序号x为3的聚糖(glycan ID为1269):计算机模拟后,化学式由C90H146N6O65变为了C90H148N4O65C’2,即在化学式中去掉(3-1)个即2个14N,增加(3-1)个即2个13C及2个1H,与原聚糖异构体质量相比,质量增加了(3-1)×0.008106Da;
对于ID排序序号x为4的聚糖(glycan ID为1270):计算机模拟后,化学式由C90H146N6O65变为了C90H149N3O65C’3,即在化学式中去掉(4-1)个即3个14N,增加(4-1)个即3个13C及3个1H,与原聚糖异构体质量相比,质量增加了(4-1)×0.008106Da;
对于ID排序序号x为5的聚糖(glycan ID为1272):计算机模拟后,化学式由C90H146N6O65变为了C90H150N2O65C’4,即在化学式中去掉(5-1)个即4个14N,增加(5-1)个即4个13C及4个1H,与原聚糖异构体质量相比, 质量增加了(5-1)×0.008106Da;
对于ID排序序号x为6的聚糖(glycan ID为1273):计算机模拟后,化学式由C90H146N6O65变为了C90H151N1O65C’5,即在化学式中去掉(6-1)个即5个14N,增加(6-1)个即5个13C及5个1H,与原聚糖异构体质量相比,质量增加了(6-1)×0.008106Da。
根据模拟后得到的聚糖异构体的质量,在后续分析软件中可以将聚糖异构体区分开。
表1.原始聚糖异构体在进行电脑模拟微调化学式的前后对比

注:C'代表重标13C
使用电脑模拟转换所有glycan数据库中的聚糖异构体后,将得到的文件保存新数据库为shifted glycans.txt文件。
2.3.2糖肽格式转换
将在pGlyco数据库中搜索鉴定得到糖肽,将搜索得到的糖肽结果文件(txt格式)转换成pepXML文件。具体pepXML文件中每个糖肽的参数设置如下:
a.提取搜索得到的糖肽结果文件名称、搜索软件、……等文件信息放在pepXML文件开头,如:</analysis_summary>,<msms_run_summary,base_name="Cancer",raw_data=".raw",raw_data_type=".raw">,<fragment_mass_type="monoisotopic",precursor_mass_type="monoisotopic",search_engine="pGlyco">。其中,analysis_summary代表分析摘要,msms_run_summary代表二级质谱分析摘要,base_name代表基础名称,raw_data代表原始数据名称,raw_data_type代表原始数据类型,fragment_mass_type代表裂解质量类型,precursor_mass_type代表母离子质量类型,search_engine代表搜索软件。
b.在<analysis_summary>aminoacid_modification段落定义修饰,在pGlyco糖肽搜索结果中找到这一文件中的所有修饰,然后转化格式为<aminoacid_modification,aminoacid="X",description="XX",mass="XX.XXXXXXXX",massdiff="XX.XXXXXXXX",variable="Y/N"/>。其中,aminoacid_modification为氨基酸修饰,aminoacid为被修饰氨基酸,Massdiff为修饰 基团的质量,mass为massdiff+氨基酸残基质量;variable为可变修饰,description为功能描述。
所有常见修饰定义,比如:
<aminoacid_modification,aminoacid="C",massdiff="57.02146374",mass="160.030648219",variable="N",description="Carboaminomethyl"/>。
所有糖基的定义,比如:
<aminoacid_modification,aminoacid="N",description="GlycanID1270",mass="2464.873277",massdiff="2350.83035",variable="Y"/>。
c.将搜索得到的糖肽结果按质谱(MS/MS)图谱编号scan No.对应填入搜索总结(search summary)部分,包括assumed_charge(假设电荷),precursor_neutral_mass(前体中性质量),scan(扫描),probability(可能性),calc_neutral_pep_mass(计算得到的中性肽段质量),protein info(蛋白信息)等。
例如:
其中,修饰部分是从pGlyco搜索得到的糖肽结果中找到关于修饰的列,找到修饰的位置以及具体修饰,转换成pepXML文件形式,即mod_aminoacid_mass position(肽段上氨基酸的顺序位置)="X"、mass(修饰氨基酸的质量)="XXXX.XXXXXXXX"。比如糖肽搜索结果中“1,Carbamidomethyl[C]”,转换成pepXML文件后展示的形式为<mod_aminoacid_mass position="1",mass="160.030648219"/>。
而聚糖的修饰按照步骤2.3.1中得到的regular glycans.txt文件或者shifted glycans.txt文件中的糖基修饰质量进行添加。
d.将两个转换后的文件名(肝癌患者或健康人数据对应转换后的结果,Cancer或者Normal)改为与数据库搜索得到的糖肽结果文件(Cancer或者Normal)相同,文件扩展名改为pep.xml。将它们标记在不同的文件夹中。
2.3.3质谱扫描文件格式转换
通过MSconvert软件(下载网址https://proteowizard.sourceforge.io/)将步骤2.1.2得到的原始raw文件(文件名为Cancer.raw或Normal.raw)转换为mzXML格式。
2.3.4总结果Excel报告的建立
建立模版,把pGlyco糖肽鉴定结果转为最终集合糖肽定性定量结果。报告中包括步骤2.2pGlyco糖肽鉴定的报告Cancer.txt/Normal.txt及步骤2.3.1改变的聚糖质量shifted glycans.txt:Gene name基因名称,Protein name蛋白名称,Accession蛋白在数据库中编号,kD蛋白质量,Site糖修饰位点,GlyID糖编号,Glycan糖组成成分,Glymass正常糖质量,Calc.m/z理论糖肽荷质比,PlausibleStruct糖可能的结构,Peptide带糖修饰的肽段序列,Charge带电荷数,GlycoPeptide修饰肽段序列(将pGlyco搜索结果中原本以J代替的修饰天冬氨酸改回到N,并在修饰氨基酸后加正常修饰质量[+XXXX.XXX],这种修饰肽段格式可被Skyline接受),Shift GlycoPeptide(同GlycoPeptide,但根据shifted glycans的规律把修饰聚糖换成改变的质量),PPM(糖肽质量改变),Total area(Cancer,Normal)糖肽的峰面积包括肝癌患者与健康人对照(此项为下步骤2.3.5预留,先留空)。
2.3.5糖肽定量
通过Skyline(MacCoss Lab)软件进行糖肽定量,具体步骤为:
a.创建skyline项目,保存文件名为test,找到三个skyline文件,扩展名为sky,sky.view,skyd;右键单击test.sky文件,然后选择用记事本打开;打开步骤2.3.1.1得到的“regular glycans.txt”文件,复制其中的完整的糖链质量列表插入到static_modification参数部分。此步骤为导入modification定义。
b.Skyline参数设置如下:
肽段设置Peptide Settings:解离酶Digestion Enzyme胰酶Trypsin;离子转换设置Transition Setting:前体离子电荷Precursor Charge 2,3,4,5,6,7,离子电荷数Ion Charges 1,2,3,4,5,6,离子类型Ion Types y,b,p;分辨能力(匹配质谱MS1设置):Resolving Power 120,000at 200m/z。
c.保存Skyline文件test,双击名为test.sky文件,使用步骤2.3.2转换后得到的pep.xml文件,在test.sky文件Peptide Settings-Library选项卡下正常创建一个库(cutoff score为0)。
d.回步骤2.3.4报告中列举每条糖肽质量改变的具体数值(ppm),从小到大进行排序。对于正常聚糖质量肽(不包含聚糖异构体,ppm=0)的糖肽列表复制并粘贴到test.sky主界面左侧中,确保所有肽段与谱库光谱匹配,导入原始 文件并照常手动调整峰值,导出最终报告。
e.对于质量改变的糖肽(包含聚糖异构体并进行了电脑模拟质量改变),重新建立Skyline文件并且命名为shifted test,重复a-c(在a步骤导入modificaton定义,同时在c步骤建库时改用步骤2.3.1.2得到的“shifted glycans.txt”文件以及步骤2.3.2中得到的改变质量的pepXML文件)。在2.3.4报告中将糖肽分为0-10ppm和10-50ppm两列,在Transition Settings-Full Scan标签下调整mass accuracy,分别分析。确保所有肽段与谱库光谱匹配,导入原始文件并照常手动调整峰值,导出两份Skyline糖肽峰面积报告。
f.合并正常质量(ppm=0),改变质量0<ppm<10、10<ppm<50的糖肽峰面积的定量报告。可以根据不同项目的需要调整质量改变阈值。最后将峰面积通过对应糖肽序列查找粘贴到步骤2.3.4的总报告Total area列中,并最终删除shifted GlycoPeptide及PPM列。
综上,通过质谱分析肝癌患者血清与正常人血清中的含唾液酸的糖肽,用pGlyco软件搜索共鉴定到1218条糖肽,但由于糖肽中的糖基修饰存在同分异构体,无法准确地并且无缺失值地对鉴定得到的糖肽均进行定量和差异分析;使用本发明所建立的通过计算机模拟替换相近质量同位素区分聚糖结构异构体方法,经过对糖肽中的聚糖异构体进行质量微调加以区分,再用Skyline软件将鉴定到的糖肽全部做了定量;结果显示,糖肽无缺失值,最终发现315条糖肽在肝癌与正常人血清中的改变大于2.5倍。
以上对本发明进行了详述。对于本领域技术人员来说,在不脱离本发明的宗旨和范围,以及无需进行不必要的实验情况下,可在等同参数、浓度和条件下,在较宽范围内实施本发明。虽然本发明给出了特殊的实施例,应该理解为,可以对本发明作进一步的改进。总之,按本发明的原理,本申请欲包括任何变更、用途或对本发明的改进,包括脱离了本申请中已公开范围,而用本领域已知的常规技术进行的改变。按以下附带的权利要求的范围,可以进行一些基本特征的应用。
工业应用
本发明通过质谱分析肝癌患者血清与正常人血清中的含唾液酸的糖肽,用pGlyco软件搜索共鉴定到1218条糖肽。使用本发明所建立的通过计算机模拟替换相近质量同位素区分聚糖结构异构体方法,经过对1218条糖肽中的聚糖异构体进行质量微调加以区分,再用Skyline软件将鉴定到的糖肽全部做了定量,结果显示,糖肽无缺失值,最终获得315条糖肽在肝癌与正常人血清中的改变大于2.5倍。本发明的方法可以同时准确地并且无缺失值地对鉴定得到的含有不同糖基复杂修饰的糖肽全部进行定量和差异分析,可以应用于制备区分不同糖肽链接糖异构体的产品或制备糖基化质谱分析的产品。

Claims (13)

  1. 基于质谱数据对聚糖异构体进行定量分析的方法,其特征在于:所述方法包括如下步骤:通过计算机模拟使用相似质量同位素替换待定量的聚糖异构体的结构异构体中的同位素,得到化学式和质量发生改变的模拟聚糖异构体;基于质谱数据对所述模拟聚糖异构体进行定量,得到所述不同结构异构体的定量结果;
    所述模拟聚糖异构体的质量与所述待定量的聚糖异构体的质量的差值小于等于0.2Da。
  2. 根据权利要求1所述的方法,其特征在于:以x为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n;所述计算机模拟根据所述待定量聚糖异构体的化学式中N和O的数量进行模拟,包括如下任一步骤:
    A1)所述化学式中N的数量m大于或等于所述聚糖异构体的结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da;
    A2)所述化学式中N的数量m小于所述聚糖异构体的结构异构体的数量减1,但是N的数量m和O的数量k之和m+k大于或等于所述结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
    A3)所述化学式中N的数量m和O的数量k之和m+k小于所述聚糖异构体的结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,在化学式中去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da;
    所述待定量的聚糖异构体包含n个结构异构体,n为自然数;m为自然数;k为自然数。
  3. 定量分析质谱数据中包含聚糖异构体的糖肽的方法,其特征在于:所述方法包括如下步骤:通过计算机模拟使用相似质量的同位素替换所述糖肽含有的聚糖异构体中的同位素,得到化学式和质量发生改变的模拟聚糖异构体,并获得包含模拟聚糖异构体的糖肽;基于质谱数据对包含所述模拟聚糖异构体的所述糖肽使用质谱数据定量软件进行定量,得到所述包含不同结构聚糖异构体的糖肽的定量结果;
    所述模拟聚糖异构体的质量与所述聚糖异构体的质量的差值小于等于0.2Da。
  4. 根据权利要求3所述的方法,其特征在于:
    以x为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n;所述计算机模拟根据所述待定量聚糖异构体的化学式中N和O的数量进行模拟,包括如下任一步骤:
    A1)所述化学式中N的数量m大于或等于所述聚糖异构体的结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da;
    A2)所述化学式中N的数量m小于所述聚糖异构体的结构异构体的数量减1,但是N的数量m和O的数量k之和m+k大于或等于所述结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
    A3)所述化学式中N的数量m和O的数量k之和m+k小于所述聚糖异构体的结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,同时在化学式中去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da;
    所述待定量的聚糖异构体包含n个结构异构体,n为自然数;m为自然数; k为自然数。
  5. 根据权利要求3所述的方法,其特征在于:所述质谱数据定量软件为Skyline软件。
  6. 定量分析质谱数据中包含聚糖异构体的糖肽的装置,其特征在于:所述装置包括如下模块:
    B1)质谱数据获取模块:用于获取样本的质谱数据;
    B2)糖肽鉴定模块:用于基于所述质谱数据鉴定得到样本含有的糖肽;
    B3)糖肽定量模块:用于对所述糖肽进行定量;
    B3)所述糖肽定量模块包括如下模块:
    B3-1)聚糖异构体模拟模块:用于对所述糖肽含有的不同结构的聚糖异构体进行计算机模拟获得模拟聚糖异构体,并获得包含模拟聚糖异构体的糖肽;
    B3-2)糖肽定量模块:用于对所述包含模拟聚糖异构体的糖肽使用质谱数据定量软件进行定量,获得包含聚糖异构体的糖肽的定量结果。
  7. 根据权利要求6所述的装置,其特征在于:以x为所述待定量的聚糖异构体中的每个结构异构体按照glycan ID号从小到大排序后的序号,所述序号是一个自然数,所述排序从1开始连续计数至n;所述计算机模拟根据所述待定量聚糖异构体的化学式中N和O的数量进行模拟,包括如下任一步骤:
    C1)所述化学式中N的数量m大于或等于所述聚糖异构体的结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da;
    C2)所述化学式中N的数量m小于所述聚糖异构体的结构异构体的数量减1,但是N的数量m和O的数量k之和m+k大于或等于所述结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da;
    C3)所述化学式中N的数量m和O的数量k之和m+k小于所述聚糖异构体的结构异构体的数量减1,对于排序序号为x的结构异构体,在化学式中去掉x-1个14N,增加x-1个13C及x-1个1H,直至去掉m个14N;然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,直至去掉k个16O;再然后对于排序序号为x的结构异构体,在化学式中去掉m个14N,增加m个13C及m个1H,并且在化学式中去掉x-m-1个16O,增加x-m-1个15N和x-m-1个1H,同时在化学式中去掉x-m-k-1个12C及x-m-k-1个1H,增加x-m-k-1 个15N得到所述模拟聚糖异构体,与所述待定量聚糖异构体相比,所述模拟聚糖异构体的质量增加(x-1)×0.008106Da或m×0.008106Da+(x-m-1)×0.013019Da或m×0.008106Da+k×0.013019Da+(x-m-k-1)×0.0233Da;
    所述待定量的聚糖异构体包含n个结构异构体,n为自然数;m为自然数;k为自然数。
  8. 根据权利要求6所述的装置,其特征在于:所述质谱数据定量软件为Skyline软件。
  9. 一种存储有计算机程序的计算机可读存储介质,其特征在于:所述计算机程序使计算机执行如权利要求1所述方法的步骤。
  10. 一种存储有计算机程序的计算机可读存储介质,其特征在于:所述计算机程序使计算机执行如权利要求2所述方法的步骤。
  11. 一种存储有计算机程序的计算机可读存储介质,所述计算机程序使计算机执行如权利要求3所述方法的步骤。
  12. 一种存储有计算机程序的计算机可读存储介质,其特征在于:所述计算机程序使计算机执行如权利要求4所述方法的步骤。
  13. 一种存储有计算机程序的计算机可读存储介质,其特征在于:所述计算机程序使计算机执行如权利要求5所述方法的步骤。
PCT/CN2023/125412 2022-10-21 2023-10-19 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法 WO2024083187A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211293141.8A CN115662500B (zh) 2022-10-21 2022-10-21 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法
CN202211293141.8 2022-10-21

Publications (1)

Publication Number Publication Date
WO2024083187A1 true WO2024083187A1 (zh) 2024-04-25

Family

ID=84990114

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/125412 WO2024083187A1 (zh) 2022-10-21 2023-10-19 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法

Country Status (2)

Country Link
CN (1) CN115662500B (zh)
WO (1) WO2024083187A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115662500B (zh) * 2022-10-21 2023-06-20 清华大学 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130041592A1 (en) * 2010-04-01 2013-02-14 University Of Georgia Research Foundation, Inc. Method And System Using Computer Simulation For The Quantitative Analysis Of Glycan Biosynthesis
CN104034792A (zh) * 2014-06-26 2014-09-10 云南民族大学 基于质荷比误差识别能力的蛋白质二级质谱鉴定方法
CN107250797A (zh) * 2012-10-25 2017-10-13 威斯康星校友研究基金会 用于分析物定量的中子编码质量标签
CN110261500A (zh) * 2019-05-30 2019-09-20 同济大学 一种基于质谱的完整n-糖肽相对定量方法
CN111220749A (zh) * 2018-11-25 2020-06-02 中国科学院大连化学物理研究所 一种o-连接糖肽的分析方法
CN112017734A (zh) * 2019-05-31 2020-12-01 塞莫费雪科学(不来梅)有限公司 质谱数据的反卷积
CN115662500A (zh) * 2022-10-21 2023-01-31 清华大学 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7402438B2 (en) * 2003-10-30 2008-07-22 Palo Alto Research Center Incorporated Automated identification of carbohydrates in mass spectra
US9410966B2 (en) * 2013-01-17 2016-08-09 The Regents Of The University Of California Isotopic recoding for targeted tandem mass spectrometry
WO2019046814A1 (en) * 2017-09-01 2019-03-07 Venn Biosciences Corporation IDENTIFICATION AND USE OF GLYCOPEPTIDES AS BIOMARKERS FOR THE DIAGNOSIS AND MONITORING OF TREATMENT
CN109959699B (zh) * 2017-12-14 2021-08-03 中国科学院大连化学物理研究所 一种基于拟多级谱进行完整糖基化肽段的质谱检测方法
JP7040635B2 (ja) * 2018-10-16 2022-03-23 株式会社島津製作所 糖鎖構造解析装置、及び糖鎖構造解析用プログラム
WO2022032002A1 (en) * 2020-08-05 2022-02-10 University Of Florida Research Foundation, Incorporated Mass spectrometry based systems and methods for implementing multistage ms/ms analysis
CN114624317A (zh) * 2020-12-10 2022-06-14 中国科学院大连化学物理研究所 一种基于直接进样质谱的定性和定量分析方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130041592A1 (en) * 2010-04-01 2013-02-14 University Of Georgia Research Foundation, Inc. Method And System Using Computer Simulation For The Quantitative Analysis Of Glycan Biosynthesis
CN107250797A (zh) * 2012-10-25 2017-10-13 威斯康星校友研究基金会 用于分析物定量的中子编码质量标签
CN104034792A (zh) * 2014-06-26 2014-09-10 云南民族大学 基于质荷比误差识别能力的蛋白质二级质谱鉴定方法
CN111220749A (zh) * 2018-11-25 2020-06-02 中国科学院大连化学物理研究所 一种o-连接糖肽的分析方法
CN110261500A (zh) * 2019-05-30 2019-09-20 同济大学 一种基于质谱的完整n-糖肽相对定量方法
CN112017734A (zh) * 2019-05-31 2020-12-01 塞莫费雪科学(不来梅)有限公司 质谱数据的反卷积
CN115662500A (zh) * 2022-10-21 2023-01-31 清华大学 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法

Also Published As

Publication number Publication date
CN115662500B (zh) 2023-06-20
CN115662500A (zh) 2023-01-31

Similar Documents

Publication Publication Date Title
Veillon et al. Characterization of isomeric glycan structures by LC‐MS/MS
Nishimura Toward automated glycan analysis
Shu et al. Large-scale identification of N-linked intact glycopeptides in human serum using HILIC enrichment and spectral library search
Nilsson Liquid chromatography-tandem mass spectrometry-based fragmentation analysis of glycopeptides
Mayampurath et al. Improving confidence in detection and characterization of protein N‐glycosylation sites and microheterogeneity
Hoffmann et al. The fine art of destruction: a guide to in‐depth glycoproteomic analyses—exploiting the diagnostic potential of fragment ions
Fang et al. Glyco-Decipher enables glycan database-independent peptide matching and in-depth characterization of site-specific N-glycosylation
Wu et al. Mapping site‐specific protein N‐glycosylations through liquid chromatography/mass spectrometry and targeted tandem mass spectrometry
WO2024083187A1 (zh) 通过计算机模拟替换相近质量同位素区分聚糖结构异构体的方法
JP2022517161A (ja) マイクロバイオータ関連代謝物の検出および定量のための質量分析検定法
EP3517623A1 (en) Method and system for analyzing n-linked sugar chains of glycoprotein
Oran et al. Mass spectrometric immunoassay of intact insulin and related variants for population proteomics studies
Kozlik et al. Study of structure‐dependent chromatographic behavior of glycopeptides using reversed phase nanoLC
Goyallon et al. Evaluation of a combined glycomics and glycoproteomics approach for studying the major glycoproteins present in biofluids: Application to cerebrospinal fluid
De Haan et al. Mass spectrometry in clinical glycomics: the path from biomarker identification to clinical implementation
Zhou et al. Reliable LC‐MS quantitative glycomics using iGlycoMab stable isotope labeled glycans as internal standards
Patabandige et al. Quantitative clinical glycomics strategies: a guide for selecting the best analysis approach
Chalkley et al. The effectiveness of filtering glycopeptide peak list files for Y ions
Rehulka et al. Microgradient separation technique for purification and fractionation of permethylated N‐glycans before mass spectrometric analyses
WO2020215791A1 (zh) 同位素标记仿生糖或糖组、其制备方法及应用
Mukherjee et al. Oxonium Ion–Guided Optimization of Ion Mobility–Assisted Glycoproteomics on the timsTOF Pro
Pap et al. Novel O-linked sialoglycan structures in human urinary glycoproteins
Wang et al. Isomeric separation of permethylated glycans by extra-long reversed-phase liquid chromatography (RPLC)-MS/MS
Kuo et al. Strategic applications of negative-mode LC-MS/MS analyses to expedite confident mass spectrometry-based identification of multiple glycosylated peptides
Peltoniemi et al. Novel data analysis tool for semiquantitative LC-MS-MS 2 profiling of N-glycans