CN116242903A - Glycoform identification method, device, equipment and medium - Google Patents

Glycoform identification method, device, equipment and medium Download PDF

Info

Publication number
CN116242903A
CN116242903A CN202310123581.7A CN202310123581A CN116242903A CN 116242903 A CN116242903 A CN 116242903A CN 202310123581 A CN202310123581 A CN 202310123581A CN 116242903 A CN116242903 A CN 116242903A
Authority
CN
China
Prior art keywords
glycoform
mass
charge ratio
single ion
free
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310123581.7A
Other languages
Chinese (zh)
Inventor
刘显硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cantonbio Co ltd
Foshan Pu Jin Bioisystech Co ltd
Foshan Hanteng Biotechnology Co ltd
Original Assignee
Cantonbio Co ltd
Foshan Pu Jin Bioisystech Co ltd
Foshan Hanteng Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cantonbio Co ltd, Foshan Pu Jin Bioisystech Co ltd, Foshan Hanteng Biotechnology Co ltd filed Critical Cantonbio Co ltd
Priority to CN202310123581.7A priority Critical patent/CN116242903A/en
Publication of CN116242903A publication Critical patent/CN116242903A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Databases & Information Systems (AREA)
  • Electrochemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioethics (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The application discloses a method, a device, equipment and a medium for identifying sugar types, which are used for acquiring configuration parameters input by a user and relating the configuration parameters to a pre-established sugar type database; extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; searching and matching the glycoforms corresponding to the free glycans according to the single ionic mass-to-charge ratio array in a glycoform database, and determining the identification result of the glycoforms contained in the target sample; the method can improve the identification efficiency and accuracy and is beneficial to reducing the labor cost. The method and the device can be widely applied to the technical field of biological information.

Description

Glycoform identification method, device, equipment and medium
Technical Field
The application relates to the technical field of biological information, in particular to a method, a device, equipment and a medium for identifying sugar types.
Background
Glycosylation is the process of transferring a sugar to a protein under the action of a glycosyltransferase and forming a glycosidic bond with an amino acid residue on the protein, i.e., attaching a sugar to a protein or lipid. Proteins undergo glycosylation to form glycoproteins. Glycosylation is an important post-translational modification of proteins, and has the functional effect of regulating proteins and helping protein folding. In the medical field, glycosylation has important influence on the curative effect, stability and immunogenicity of protein drugs, and is used as a key quality attribute to penetrate through each process of drug development, so that fine characterization is required in the development process.
In the related art, there is a need for identifying and recognizing glycoforms, and currently, the main strategy is to determine glycoforms existing in a sample by mass spectrometry technology. However, in practice it has been found that in the identification of glycoforms by analysis of mass spectral data, K is introduced during labelling due to free glycans + 、NH 4 + And Na (Na) + And single or multiple adduct interferences, resulting in no adduct molecular weights that still need to be manually calculated after deconvolution, requiring extensive manual calculation to identify the excluded interference peaks. Along with the development of bioinformatics, in products such as recombinant proteins, fusion proteins, tandem scFv or IgM antibodies, multiple sites, multiple types, multiple antennae and glycoforms with complex structures can appear, the traditional mode manually performs spectrum resolution on mass spectrum data, then the flow of online library searching has the problem of inefficiency, operation errors are easy to occur, meanwhile, abundant experience is needed for distinguishing, the labor cost is high, and the identification efficiency and accuracy are low.
Disclosure of Invention
The present application aims to solve at least one of the technical problems existing in the related art to a certain extent.
It is therefore an object of embodiments of the present application to provide a method, apparatus, device and medium for identifying glycoforms.
In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the application comprises the following steps:
in one aspect, embodiments of the present application provide a method for identifying a glycoform, the method comprising:
acquiring configuration parameters input by a user, and associating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeling reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues;
extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to free glycan obtained by disconnecting glycan from protein in a target sample; the mass spectrum data is mass spectrum data obtained based on a mass spectrum analysis technology after the free polysaccharide is marked and separated by a marking reagent;
and searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array, and determining the identification result of the glycoforms contained in the target sample.
In addition, the method for identifying a glycoform according to the above embodiment of the present application may further have the following additional technical features:
further, in one embodiment of the present application, the free glycans produced by the unlinked glycans to proteins are obtained by enzymatic or chemical methods.
Further, in one embodiment of the present application, the separation of free glycans in the target sample is performed by Liquid Chromatography (LC), capillary Electrophoresis (CE).
Further, in one embodiment of the present application, the labeling agent is any one of 2-aminobenzamide (2-AB), 2-aminobenzoic acid (2-AA), rapiFluor-MS, instantPC, and procainamide.
Further, in one embodiment of the present application, the adduct is of the type K + 、NH 4 + 、Na + Any one of them.
Further, in an embodiment of the present application, the extracting the single ion mass-to-charge ratio array corresponding to the free glycans includes:
acquiring a mass spectrogram corresponding to the free glycan according to the mass spectrum data;
selecting a plurality of numerical values of single ion mass-to-charge ratios from the mass spectrogram according to the abundance, and obtaining the single ion mass-to-charge ratio array; the single ion mass-to-charge ratio array at least comprises the single ion mass-to-charge ratio with the highest abundance and the value of the mass-to-charge ratio of the isotope adjacent to the single ion mass-to-charge ratio with the highest abundance.
Further, in an embodiment of the present application, the extracting the single ion mass-to-charge ratio array corresponding to the free glycan further includes: and amplifying the outline of the mass spectrogram.
Further, in one embodiment of the present application, the method further comprises the steps of:
dividing each type of sugar in the sugar type database according to a preset dimension to obtain a plurality of sugar type data tables.
Further, in an embodiment of the present application, the searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array includes:
determining a target glycoform data table from the plurality of glycoform data tables according to the target sample type and the cell type expressing the target sample;
and searching and matching the glycoform corresponding to the free glycan according to the single ion mass-to-charge ratio array in the target glycoform data table.
Further, in one embodiment of the present application, the method further comprises the steps of:
if no matched identification result is found in the target glycoform data table, outputting prompt information;
the prompt message is used for reminding the user to reselect the target sugar type data table or reenter the configuration parameters.
In another aspect, embodiments of the present application provide a glycoform identification device comprising:
the configuration module is used for acquiring configuration parameters input by a user and relating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeling reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues;
the extraction module is used for extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to free glycan obtained by disconnecting glycan from protein in a target sample; the mass spectrum data is mass spectrum data obtained based on a mass spectrum analysis technology after the free polysaccharide is marked and separated by a marking reagent;
and the matching module is used for searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array, and determining the identification result of the glycoforms contained in the target sample.
Further, in one embodiment of the present application, the apparatus further comprises:
and the analysis module is used for acquiring mass spectrum data corresponding to the free polysaccharide in the target sample through a mass spectrum analysis technology.
In another aspect, embodiments of the present application provide a computer device, including:
at least one processor;
at least one memory for storing at least one program;
the at least one program, when executed by the at least one processor, causes the at least one processor to implement a glycoform identification method as described above.
In another aspect, embodiments of the present application further provide a computer readable storage medium having stored therein a processor executable program for implementing a glycoform identification method as described above when executed by a processor.
The advantages and benefits of the present application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the present application.
The embodiment of the application discloses a glycoform identification method, which comprises the following steps: acquiring configuration parameters input by a user, and associating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeled reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues; extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to the free glycan obtained by the disconnection of the glycan and the protein in the target sample; the mass spectrum data is mass spectrum data obtained based on mass spectrum analysis technology after the free glycan is marked and separated by a marking reagent; and searching and matching the glycoforms corresponding to the free glycans according to the single ionic mass-to-charge ratio array in a glycoform database, and determining the identification result of the glycoforms contained in the target sample. The method can be based on an automatic analysis tool, and can solve the problems of resolution of free glycan mass spectrum data, sugar pairing, sugar names and structure translation in one step. The method can automate the whole process of glycoform identification, can improve the identification efficiency and accuracy, and obviously reduce the experience threshold of free glycan analysis, thereby being beneficial to reducing the labor cost.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description is made with reference to the accompanying drawings of the embodiments of the present application or the related technical solutions in the prior art, it should be understood that, in the following description, the drawings are only for convenience and clarity to describe some embodiments in the technical solutions of the present application, and other drawings may be obtained according to these drawings without any inventive effort for those skilled in the art.
FIG. 1 is a schematic flow chart of a method for identifying glycoforms provided in the embodiments of the present application;
FIG. 2 is a schematic diagram of data of a glycoform database provided in an embodiment of the present application;
FIG. 3 is a chromatogram of one LC-MS glycoform identification provided in the examples of the present application;
FIG. 4 is a mass spectrum corresponding to a response peak with a retention time of 5.27min in FIG. 3 provided in the example of the present application;
FIG. 5 is a mass spectrum of the mass spectrum of FIG. 4 after being amplified, provided in an embodiment of the present application;
FIG. 6 is a diagram of a matching result of a glycoform database provided in an embodiment of the present application;
FIG. 7 is a chromatogram of one finished glycoform label provided in an embodiment of the present application;
Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The present application is further described below with reference to the drawings and specific examples. The described embodiments should not be construed as limitations on the present application, and all other embodiments, which may be made by those of ordinary skill in the art without the exercise of inventive faculty, are intended to be within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.
Glycosylation is the process of transferring a sugar to a protein under the action of a glycosyltransferase and forming a glycosidic bond with an amino acid residue on the protein, i.e., attaching a sugar to a protein or lipid. Proteins undergo glycosylation to form glycoproteins. Glycosylation is an important post-translational modification of proteins, and has the functional effect of regulating proteins and helping protein folding. In the medical field, glycosylation has important influence on the curative effect, stability and immunogenicity of protein drugs, and is used as a key quality attribute to penetrate through each process of drug development, so that the redevelopment process needs fine characterization.
In the related art, there is a need for identifying and recognizing the glycoforms, and currently, the mainstream strategy is to use a mass spectrometry technology, and analyze mass spectrometry data through mass spectrometry software to determine the glycoforms existing in the sample. For example, in mass spectrometry software, which is mainstream in the industry, the commonly used MaxEnt3 or Bayes algorithm can be used to deconvolute the free glycan peaks. However, since the free glycans are involved in the labelling process, K may be introduced + ,NH 4 + And Na (Na) + Single or multiple adducts, which still require manual calculation to obtain the molecular weight of the no-adduct after deconvolution, and a large number of manual calculation to identify the elimination of interference peaks; each mass spectrometry software is exclusive, and the software foundation package only generally contains glycosylation solutions of self-labeling reagents, and cannot be compatible with analysis applications of other labeling reagents, so that the universality is low; conventional glycoform search software such as Glycoworkbench, glycomod, glycam can match glycoform names and structures according to the molecular weight of the labeled free glycans, but often requires manual translation and redrawing of glycoform structures due to complex and obscure nomenclature and structural representation.
From the analysis, it can be seen that: when the traditional mode is used for identifying the glycoform, the spectrum data is manually decomposed, then the problem of inefficiency exists in the online database searching process, operation errors are easy to occur, meanwhile, abundant experience is needed for distinguishing the false, so that the labor cost is high, and the identification efficiency and accuracy are low.
In view of this, embodiments of the present application provide a glycoform identification method that can be based on automated analysis tools to solve the work of resolution, glycoform pairing, glyconame and structural translation of free glycan mass spectrometry data in one step. The method can automate the whole process of glycoform identification, can improve the identification efficiency and accuracy, and obviously reduce the experience threshold of free glycan analysis, thereby being beneficial to reducing the labor cost.
Referring to fig. 1, fig. 1 is a flow chart of a glycoform identification method according to an embodiment of the present application, and referring to fig. 1, the glycoform identification method includes, but is not limited to:
step 110, acquiring configuration parameters input by a user, and associating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeling reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues;
step 120, extracting a single ion mass-to-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to free glycan obtained by disconnecting glycan from protein in a target sample; the mass spectrum data is mass spectrum data obtained based on a mass spectrum analysis technology after the free polysaccharide is marked and separated by a marking reagent;
And 130, searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array, and determining the identification result of the glycoforms contained in the target sample.
In the embodiment of the application, a glycoform identification method is provided, the method can develop a corresponding glycoform database and a related matching algorithm based on Excel VBA platform design, and the glycoform of free glycan is determined through a mass spectrometry technology, so that the glycoform identification of sugar contained in a sample is realized.
Specifically, in the embodiment of the present application, a sample to be identified may be denoted as a target sample; the basic principle of mass spectrometry is that each free glycan in a sample is ionized in an ion source to generate positively charged ions with different mass-to-charge ratios, and the positively charged ions are formed into ion beams by the action of an accelerating electric field and enter a mass analyzer to determine the molecular weight of the ions. It will be appreciated that since the sugar itself does not have a chromophoric group, it is often labeled or derivatized in order to more effectively detect the sugar chain during isolation and purification and structural identification. Thus, in embodiments of the present application, a labeling reagent may be used to label free glycans in a target sample to be identified during mass spectrometry. In some embodiments, the types of labeling reagents may include, but are not limited to, 2-AB, rapid Fluor-MS, instantPC, etc., and in actual use, glycans are first unlinked and enriched with proteins, and then any labeling reagent is selected to label free glycans. In some embodiments, the glycans are unlinked using enzymatic cleavage, and the enzyme used to cleave the sample of interest includes any of N-glycoprotein deglycosylases (e.g., PNGase A and PNGase F), endoglycosidases (e.g., endo S, endo H, endo F). The separation of free glycans is required prior to mass spectrometry and includes, but is not limited to, liquid Chromatography (LC), capillary Electrophoresis (CE), and the like.
In the embodiment of the application, when the glycoform of the target sample is identified, a liquid chromatography-mass spectrometry (LC-MS) is adopted, each free glycan in the target sample is separated through liquid chromatography, and then the target sample is sent into a mass spectrometer for mass spectrometry identification. The mass spectrometer generally includes an ion source device, a mass analyzer, etc., and in the present application, ionization methods and methods for measuring m/z ratio of ions specifically adopted in the mass spectrometry technology are not limited, and can be implemented with reference to the prior art. Through mass spectrometry, mass spectrum data corresponding to the target sample can be obtained. The mass spectral data is generally in the form of a spectrum, with a typical mass spectrum generally having an abscissa representing the m/z ratio and an ordinate representing the relative intensities (in percent values) of the peaks, the strongest peak will have a relative intensity of 100. Based on mass spectrum data, ion mass-to-charge ratio information of various saccharide free glycans contained in the target sample can be determined, and corresponding glycoform information can be further determined according to the ion mass-to-charge ratio information.
In addition, elements in nature all contain isotopes, and when the glycoform is identified, deconvolution calculation of the glycoform can be carried out by utilizing the same ion isotope mass-to-charge ratio array Type molecular weight, thereby matching glycan type. The monoisotope represents the analyte 12 C, 1 H, 16 O, 14 N and other species with highest abundance form accurate molecular weight, because the free polysaccharide generally has molecular weight less than 5000Da, the mass spectrum resolution is enough to distinguish isotope peaks, and because 12 C and C 13 C natural abundance ratio about 99:1, if the molecular weight of the sample is large enough, C is large enough, there will be a different number in probability 13 C, forming a series of multiple isotopes with uniform spacing of 1 Da. A series of continuous m/z peaks are formed after isotope band points in the ionization process of a mass spectrogram, the charged valence state of ions can be calculated according to the interval rule of the isotope peaks m/z peaks, and the molecular weight m of the ions can be obtained through deconvolution calculation, so that the sugar type can be accurately judged.
Of course, it is also necessary to supplement the fact that the labeling reagent itself also causes a change in the molecular weight of the ions corresponding to the glycofree glycans during the labeling of the target sample to be identified, and in some cases, partial adducts may also be produced, e.g. the type of adduct may be K + 、NH 4 + 、Na + Etc. These all cause the corresponding ion of the saccharide free glycan to differ from its own molecular weight. Therefore, in the embodiment of the present application, in order to solve the present situation that the matching situation is complicated, a sugar type database may be established in advance, and the database may include molecular weight information of sugar residues of various known sugars and sugar type information corresponding to the sugar. Here, the sugar type information may include, but is not limited to, its name and specific structural information. Then, for labeling reagents and adducts that may be varied depending on the actual situation, it is possible to select by the user at the time of each authentication and input this information into the authentication system or software. That is, each time of authentication, configuration parameters input by the user may be acquired, and these configuration parameters may include type information of the labeled reagent, quality deviation data, and type information of the adduct, wherein the type information of the labeled reagent is recorded as first type information, and the type information of the adduct is recorded as second type information. Acquisition of After the configuration parameters input by the user are reached, the configuration parameters can be related to a pre-established sugar type database, then, the configuration parameters based on the input by the user can be perfectly adjusted in the sugar type database, and the exact molecular weight actually corresponding to each sugar when the mass spectrometry analysis is carried out, in particular, the value can be obtained by the free glycan residue +H corresponding to each sugar 2 The sum of the molecular weight of the O + labelling agent + adduct ion (if any).
In this embodiment of the present application, a tandem mass spectrometry (LC-MS) technology for liquid chromatography-mass spectrometry may be used to obtain a chromatogram and mass spectrum data of a target sample, where each response peak in the chromatogram may represent a free glycan contained in the target sample, so in this embodiment of the present application, the mass spectrum data of each response peak is the mass spectrum data of each free glycan in the target sample, and the mass spectrum data of each response peak may be analyzed to determine glycoform information corresponding to each response peak.
Specifically, in the embodiment of the present application, for each response peak, a corresponding single ion mass-to-charge ratio array at the response peak may be extracted. Here, a single ion mass to charge ratio array includes several adjacent mass spectral data peaks within a specified range interval. Here, in general, for a response peak at a place, peak data with highest abundance in a corresponding mass spectrum is unique and is also the most reliable glycoform. However, in the case of a glycoform having a large molecular weight, the strongest peak (more or less the intensity of a plurality of valence peaks on a mass spectrum) may not be found due to the complexity of mass spectrum data, and if matching is performed using single peak data, a matching error may occur. Also, there may be some interference with the actual corresponding molecular weight of the different molecules of the same saccharide due to the influence of the adduct ions and other factors, for example, if a 900Da component is identified as a chromatographic peak, a 900Da component is taken as an example due to the influence of the adduct ions in mass spectrometry, and in a mass spectrum, the saccharide component may exist in a standard 900Da form and also may exist in 900+39 (+ K) + )、900+68(+2K + )、900+17(+NH 4 + ) A plasma component to mapThere are multiple detected peaks in the upper close region. It will be appreciated that the standard 900Da component is a single ion, the same 900+39 (+K) + ) Also a single ion. In addition, saccharide components are also often detected in non-reducing terminal source internal cleavage or saccharide unit shedding forms, such as 900-203 (drop N-acetylglucosamine), 900-291 (drop sialic acid), and may also show 1-valent and 2-valent peaks, so that the actual 900Da glycoform may exist in a wide variety of related forms.
Therefore, in the embodiment of the present application, for each response peak, the corresponding mass spectrum data may be analyzed to obtain a single ion mass-to-charge ratio array. The single ion mass-to-charge ratio array is used for calculation and matching analysis, so that the possibility of misjudgment can be remarkably reduced, and the accuracy of identification is improved. Here, each single ion mass-to-charge ratio array may include at least 2 adjacent single ion isotope mass-to-charge ratio values, i.e., the highest abundance single ion mass-to-charge ratio and the mass-to-charge ratios of isotopes adjacent thereto. The present application is not limited to a specific number thereof.
After obtaining the single ion mass-to-charge ratio array, the single ion mass-to-charge ratio array can be searched and matched in the sugar type database, and it can be understood that various sugar types and variant molecular weights thereof possibly generated by various sugar types under the current identification environment can be adjusted and obtained as the sugar type database is completed through configuration parameters. And (3) by inputting the names and variants of all the sugar types obtained by calculating the single ion mass-charge ratio array, comprehensively comparing, determining the sugar type information corresponding to each response peak, and searching and matching all the response peaks to determine the identification results of all the sugar types contained in the target sample.
The implementation flow of the glycoform identification method provided in the present application will be described below with reference to a specific application example.
In the embodiment of the application, related software and algorithm tools can be developed and realized based on an Excel platform in practical application, and various known molecular weight information and glycoform information of glycoform corresponding glycoresidues are pre-loaded in a glycoform database, wherein the data can comprise, but are not limited to, 171 oligosaccharide, 200 oxford named sugar library, 39 human cell common N-sugar library, 182 CHO cell common N-sugar library, 16 common O-sugar library, etc., and molecular weight information of 2-AB, rapid Fluor-MS, instantPC, etc. and K can be recorded in the system + 、NH 4 + And Na (Na) + Molecular weight information of the adducts.
Particularly, in order to facilitate subsequent matching, the matched identification result is more scientific and reasonable, and in some embodiments, various types of sugar types in the sugar type database can be divided according to a preset dimension to obtain a plurality of sugar type data tables. For example, in the embodiment of the present application, sheet1 (Short name used IgG glycans), sheet2 (Oligosaccharide composition), sheet3 (N-glycon List), sheet4 (Human host N-glycon), sheet5 (CHO host N-glycon), sheet6 (O-glycon) may be edited in advance, for a total of 6 glycoforms. In some cases, if it is already known in advance which glycoform data table the saccharides in the target sample belong to, the glycoform data table can be used directly to complete the matching. On one hand, the matching efficiency can be improved, and the data processing amount can be reduced; on the other hand, the situation of misjudgment can be reduced, and the matched identification result can be more scientific and reasonable. Thus, the data preparation work in the early stage can be completed.
When the user performs sugar type identification, parameters can be configured on the system software: selecting a proper labeling reagent (such as 2-AB, rapid Fluor-MS and InstantPC main stream reagents) and default is InstantPC; inputting Mass deviation data (Mass tolerance, which is determined according to Mass spectrum accuracy and instrument state of data acquisition, wherein default is +/-0.1 Da); adduct information (optional K) + ,NH 4 + ,Na + And null), respectively, represent the form of the matched glycoform + adduct considered in the glycoform pairing, based on the user's operation, the configuration parameters can be obtained, then the configuration parameters can be associated with the aforementioned glycoform database, and the glycoform database can call the configuration parameter information in the subsequent matching. In the embodiment of the present application, short name used IgG glycans is taken as an example, after associating configuration parameters,referring to fig. 2, the table may be recorded with name information of a glycoform, a corresponding glycoform structure, and a molecular weight after the structure is labeled.
It should be noted that, in some embodiments, the user may also actively select to configure a suitable glycoform database, that is, determine the target glycoform data table from a plurality of glycoform data tables, specifically, may be based on the target sample type and the cell type expressing the target sample, for example, if the target sample type is immunoglobulin and is expressed by CHO cells, it is recommended to select the "Short name used IgG glycans" target glycoform data table, which may empirically accumulate the common glycoforms of each type of cell, so as to implement the selection. In addition, in some embodiments, the above-mentioned flow can be realized by the system software by oneself, does not need user operation to select, thus improve the automatic procedure, facilitate improving the authentication efficiency.
Then, the target sample may be treated with a labeling reagent, each of the free glycans in the target sample may be separated by liquid chromatography, and mass spectrometry data of each of the free glycans may be obtained by mass spectrometry techniques. Referring to fig. 3, fig. 3 shows a chromatogram of a glycoform identification by LC-MS, in which retention times, generally free glycans of the same retention time, are of the same glycoform, are represented by the horizontal axis. Taking the response peak corresponding to 5.27min as an example, referring to fig. 4, fig. 4 shows a mass spectrum corresponding to the response peak with the retention time of 5.27min in fig. 3, it can be seen that, on the mass spectrum, the response peak includes a plurality of mass spectrum data peaks with m/z ratio between 862-864, and in this embodiment of the present application, a single ion mass-to-charge ratio array corresponding to the response peak can be extracted.
Specifically, in some embodiments, extracting a corresponding single ion mass-to-charge ratio array at the response peak includes:
acquiring a mass spectrogram corresponding to the response peak according to the mass spectrum data;
selecting a plurality of numerical values of single ion mass-to-charge ratios from the mass spectrogram according to the abundance, and obtaining the single ion mass-to-charge ratio array; the single ion mass-to-charge ratio array at least comprises the single ion mass-to-charge ratio with the highest abundance and the value of the mass-to-charge ratio of the isotope adjacent to the single ion mass-to-charge ratio with the highest abundance.
In the embodiment of the present application, when a single ion mass-to-charge ratio array corresponding to a response peak is extracted, in order to determine each existing mass spectrum data peak conveniently, the profile of the mass spectrum may be amplified. For example, referring to fig. 5, by amplifying the mass spectrum of fig. 4, the exact mass-to-charge ratio values of the isotopes of a single ion may be further determined. Of course, here, some low peaks with very low relative abundance can be ignored. Then, a single ion mass to charge ratio array is determined. Specifically, in some embodiments, the number of data in the single ion mass-to-charge ratio array may be preset, for example, may be set to 2, and, for example, in fig. 5, two values 862.84309 and 863.34801 may be selected as the single ion mass-to-charge ratio array. Of course, the number of data here can be flexibly set as required, which is not limited in this application.
Of course, in practical application, for each response peak, a user can amplify the profile of the mass spectrogram, input the single ion isotope mass-charge ratio with highest abundance displayed in the mass spectrogram and the mass-charge ratio data close to the right, automatically match sugar types, and check the matching result; then the input check with the second highest abundance can be repeatedly found, and the input check with the third highest abundance can be sequentially input with the data. It will be appreciated that the more peaks of the input mass spectral data, the higher the accuracy of the matching result accordingly. Referring to fig. 6, taking input of two adjacent single ion mass-to-charge ratio data (input m/z1 and input m/z 2) for glycoform matching as an example, fig. 6 shows a schematic diagram of a matching result, and according to two sets of isotope data of input m/z1 and input m/z2, an identification result can be automatically generated with higher accuracy.
Here, as described above, when matching the glycoforms corresponding to the respective strong peaks, matching may be performed using the determined target glycoform data table. In some embodiments, if no matching identification result is found in the target glycoform data table, not found may be displayed, and corresponding prompt information may be output. The prompt information can be used for reminding a user of considering whether the target sugar type data table is not recorded or not and the target sugar type data table needs to be replaced; or the configuration parameters are set with errors, and the configuration parameters need to be input again and then matched.
In this embodiment, referring to fig. 7, after each response peak is identified, each glycoform structure diagram obtained by copy-paste matching may be marked at the spectral peak of the chromatogram, so as to complete the identification of the glycoform.
The embodiment of the application also provides a glycoform identification device, which comprises:
the analysis module is used for acquiring mass spectrum data corresponding to the free polysaccharide in the target sample through a mass spectrum analysis technology;
the configuration module is used for acquiring configuration parameters input by a user and relating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeling reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues;
The extraction module is used for extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to free glycan obtained by disconnecting glycan from protein in a target sample; the mass spectrum data is mass spectrum data obtained based on a mass spectrum analysis technology after the free polysaccharide is marked and separated by a marking reagent;
and the matching module is used for searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array, and determining the identification result of the glycoforms contained in the target sample.
It will be appreciated that the content of one embodiment of the glycoform identification method shown in fig. 1 is applicable to the embodiment of the glycoform identification device, and the functions of the embodiment of the glycoform identification device are the same as those of the embodiment of the glycoform identification method shown in fig. 1, and the advantages achieved are the same as those achieved by the embodiment of the glycoform identification method shown in fig. 1.
Referring to fig. 8, an embodiment of the present application further discloses a computer device, including:
At least one processor 201;
at least one memory 202 for storing at least one program;
the at least one program, when executed by the at least one processor 201, causes the at least one processor 201 to implement an embodiment of a glycoform identification method as shown in fig. 1.
It will be appreciated that the content of one embodiment of the method for identifying a glycoform as shown in fig. 1 is applicable to the embodiment of the computer device, and the functions of the embodiment of the computer device are the same as those of the embodiment of the method for identifying a glycoform as shown in fig. 1, and the advantages achieved are the same as those achieved by the embodiment of the method for identifying a glycoform as shown in fig. 1.
The present application also discloses a computer-readable storage medium in which a processor-executable program is stored, which when executed by a processor is for implementing an embodiment of a glycoform identification method as shown in fig. 1.
It will be appreciated that the content of one embodiment of the method for identifying a glycoform as shown in fig. 1 is applicable to the embodiment of the computer-readable storage medium, and the functions of the embodiment of the computer-readable storage medium are the same as those of the embodiment of the method for identifying a glycoform as shown in fig. 1, and the advantages achieved are the same as those achieved by the embodiment of the method for identifying a glycoform as shown in fig. 1.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of this application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
Furthermore, while the present application is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the functions and/or features may be integrated in a single physical system and/or software module or may be implemented in separate physical systems or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present application. Rather, the actual implementation of the various functional modules in the systems disclosed herein will be apparent to engineers in ordinary skill in view of their attributes, functions, and internal relationships. Thus, those of ordinary skill in the art will be able to implement the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the application, which is to be defined by the appended claims and their full scope of equivalents.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any system that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, or apparatus.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic system) with one or more wires, a portable computer diskette (magnetic system), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber system, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
In the foregoing description of the present specification, descriptions of the terms "one embodiment/example", "another embodiment/example", "certain embodiments/examples", and the like, are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments, and one skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are intended to be included in the scope of the present invention as defined by the appended claims
In the description of the present specification, reference to the terms "one embodiment," "another embodiment," or "certain embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A method for glycoform identification, said method comprising:
acquiring configuration parameters input by a user, and associating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeling reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues;
Extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to free glycan obtained by disconnecting glycan from protein in a target sample; the mass spectrum data is mass spectrum data obtained based on a mass spectrum analysis technology after the free polysaccharide is marked and separated by a marking reagent;
and searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array, and determining the identification result of the glycoforms contained in the target sample.
2. The method according to claim 1, wherein the labeling reagent is any one of 2-AB, 2-AA, rapiFluor-MS, instantPC, and procainamide.
3. The method according to claim 1, wherein the adduct is of the type K + 、NH 4 + 、Na + Any one of them.
4. The method according to claim 1, wherein the extracting a single ion mass-to-charge ratio array corresponding to the free glycans comprises:
Acquiring a mass spectrogram corresponding to the free glycan according to the mass spectrum data;
selecting a plurality of numerical values of single ion mass-to-charge ratios from the mass spectrogram according to the abundance, and obtaining the single ion mass-to-charge ratio array; the single ion mass-to-charge ratio array at least comprises the single ion mass-to-charge ratio with the highest abundance and the value of the mass-to-charge ratio of the isotope adjacent to the single ion mass-to-charge ratio with the highest abundance.
5. A method for identifying a glycoform according to any of claims 1-4, said method further comprising the steps of:
dividing each type of sugar in the sugar type database according to a preset dimension to obtain a plurality of sugar type data tables.
6. The method according to claim 5, wherein searching for and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass to charge ratio array comprises:
determining a target glycoform data table from the plurality of glycoform data tables according to the target sample type and the cell type expressing the target sample;
and searching and matching the glycoform corresponding to the free glycan according to the single ion mass-to-charge ratio array in the target glycoform data table.
7. A method of glycoform identification according to claim 6, said method further comprising the steps of:
if no matched identification result is found in the target glycoform data table, outputting prompt information;
the prompt message is used for reminding the user to reselect the target sugar type data table or reenter the configuration parameters.
8. A glycoform identification device, said device comprising:
the configuration module is used for acquiring configuration parameters input by a user and relating the configuration parameters to a pre-established sugar type database; the configuration parameters include a first type of information of the labeling reagent, quality deviation data, and a second type of information of the adduct, and the glycoform database includes molecular weight information and glycoform information of a plurality of glycoresidues;
the extraction module is used for extracting a single ion mass-charge ratio array corresponding to the free polysaccharide according to mass spectrum data corresponding to the free polysaccharide; the single ion mass-to-charge ratio array comprises a plurality of numerical values of single ion mass-to-charge ratios; the free glycan refers to free glycan obtained by disconnecting glycan from protein in a target sample; the mass spectrum data is mass spectrum data obtained based on a mass spectrum analysis technology after the free polysaccharide is marked and separated by a marking reagent;
And the matching module is used for searching and matching the glycoforms corresponding to the free glycans in the glycoform database according to the single ion mass-to-charge ratio array, and determining the identification result of the glycoforms contained in the target sample.
9. A computer device, comprising:
at least one processor;
at least one memory for storing at least one program;
when said at least one program is executed by said at least one processor, said at least one processor is caused to carry out a glycoform identification method according to any of claims 1-7.
10. A computer-readable storage medium having stored therein a program executable by a processor, characterized in that: the processor executable program when executed by a processor is for implementing a glycoform identification method according to any of claims 1-7.
CN202310123581.7A 2023-02-14 2023-02-14 Glycoform identification method, device, equipment and medium Pending CN116242903A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310123581.7A CN116242903A (en) 2023-02-14 2023-02-14 Glycoform identification method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310123581.7A CN116242903A (en) 2023-02-14 2023-02-14 Glycoform identification method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116242903A true CN116242903A (en) 2023-06-09

Family

ID=86632580

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310123581.7A Pending CN116242903A (en) 2023-02-14 2023-02-14 Glycoform identification method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116242903A (en)

Similar Documents

Publication Publication Date Title
DE112005001143B4 (en) System and method for grouping precursor and fragment ions using chromatograms of selected ions
Li et al. Strategy for comparative untargeted metabolomics reveals honey markers of different floral and geographic origins using ultrahigh-performance liquid chromatography-hybrid quadrupole-orbitrap mass spectrometry
US7595484B2 (en) Mass spectrometric method, mass spectrometric system, diagnosis system, inspection system, and mass spectrometric program
JP4704034B2 (en) Method of using data binning in analysis of chromatographic / spectrometric data
JP2003533672A (en) Methods for untargeted complex sample analysis
US20050274884A1 (en) Mass spectrometry and mass spectrometry system
US20070095757A1 (en) Methods and systems for the annotation of biomolecule patterns in chromatography/mass-spectrometry analysis
US10438785B2 (en) Method for quantitative analysis of high-molecular compound and data-processing device for the quantitative analysis
CN105334279A (en) High-resolution mass spectrum data processing method
CN104170052A (en) Method and apparatus for improved quantitation by mass spectrometry
US7544930B2 (en) Tandem type mass analysis system and method
CN103389335A (en) Analysis device and method for identifying biomacromolecules
Kislinger et al. Multidimensional protein identification technology: current status and future prospects
JP2009115724A (en) Mass spectrograph and mass spectrometry
US20140336951A1 (en) Identification of related peptides for mass spectrometry processing
CN109856310B (en) Method for removing false positive mass spectrum characteristics in metabolite ion peak table based on HPLC-MS
Olivier-Jimenez et al. From mass spectral features to molecules in molecular networks: a novel workflow for untargeted metabolomics
CN111157664A (en) Biological metabonomics data processing method, analysis method, device and application
Walsh et al. Clustering and curation of electropherograms: an efficient method for analyzing large cohorts of capillary electrophoresis glycomic profiles for bioprocessing operations
Basharat et al. TopFD-a proteoform feature detection tool for top-down proteomics
CN116242903A (en) Glycoform identification method, device, equipment and medium
Sun et al. BPDA2d—a 2D global optimization-based Bayesian peptide detection algorithm for liquid chromatograph–mass spectrometry
Miller et al. Glycopeptide characterization of Sf9‐derived SARS‐CoV‐2 spike protein recombinant vaccine candidates expedited by the use of glycopeptide libraries
Pfeifer et al. Leveraging R (LevR) for fast processing of mass spectrometry data and machine learning: Applications analyzing fingerprints and glycopeptides
GB2404193A (en) Automated chromatography/mass spectrometry analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination