CN112748173B - Method for constructing mass spectrum model for diagnosing new coronavirus infection - Google Patents

Method for constructing mass spectrum model for diagnosing new coronavirus infection Download PDF

Info

Publication number
CN112748173B
CN112748173B CN202110156544.7A CN202110156544A CN112748173B CN 112748173 B CN112748173 B CN 112748173B CN 202110156544 A CN202110156544 A CN 202110156544A CN 112748173 B CN112748173 B CN 112748173B
Authority
CN
China
Prior art keywords
polypeptide
mass
characteristic
leu
charge ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110156544.7A
Other languages
Chinese (zh)
Other versions
CN112748173A (en
Inventor
廖璞
孙巍
乔亮
吕倩
马庆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Clin Bochuang Biotechnology Co Ltd
Original Assignee
Beijing Clin Bochuang Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Clin Bochuang Biotechnology Co Ltd filed Critical Beijing Clin Bochuang Biotechnology Co Ltd
Publication of CN112748173A publication Critical patent/CN112748173A/en
Application granted granted Critical
Publication of CN112748173B publication Critical patent/CN112748173B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • G01N27/628Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas and a beam of energy, e.g. laser enhanced ionisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • G01N33/56983Viruses
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • G01N33/6851Methods of protein analysis involving laser desorption ionisation mass spectrometry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Optics & Photonics (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Cell Biology (AREA)
  • Biotechnology (AREA)
  • Virology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention provides a characteristic polypeptide composition for detecting new coronavirus infection, which comprises 25 characteristic polypeptides with specific mass-to-charge ratios, and can judge whether a sample is a patient with the new coronavirus infection or not by analyzing the expression condition of the characteristic polypeptides. The invention also provides applications of the mass spectrum model prepared according to the characteristic polypeptide composition, products for diagnosing new coronavirus infection and the like. The invention provides a plurality of characteristic protein combinations with differences according to the new coronavirus infection patient/normal person, phthisis patient and the contrast with new coronavirus infection type symptoms for the first time, breaks through the traditional research thought of searching characteristic polypeptides only in normal person and new coronavirus infection patient, effectively avoids the infection of false positive results similar to the new coronavirus infection symptoms, has simple operation, low detection cost and high accuracy, and is expected to be used for large-scale screening of the new coronavirus infection.

Description

Method for constructing mass spectrum model for diagnosing new coronavirus infection
Technical Field
The invention belongs to the field of detection, and relates to a technology for rapidly detecting novel coronavirus infection by using a time-of-flight mass spectrometry technology.
Background
Coronaviruses are a class of pathogens that primarily cause respiratory and intestinal diseases. The surface of such virus particles has a plurality of regularly arranged protrusions, and the whole virus particle resembles a imperial crown, thus the name "coronavirus". Coronaviruses can infect a variety of mammals, such as pigs, cattle, cats, dogs, minks, camels, bats, mice, hedgehog, and a variety of birds, in addition to humans.
Six types of human coronaviruses are known to date. Four of these coronaviruses are more common in the population and are less pathogenic, generally causing only mild respiratory symptoms like common cold. Two other coronaviruses, severe acute respiratory syndrome coronavirus and middle east respiratory syndrome coronavirus, namely SARS coronavirus and MERS coronavirus for short, can cause severe respiratory diseases.
The novel coronavirus COVID-19 is a novel coronavirus strain which is never found in human body before, and the propagation rule, the infection mechanism, the evolution and variation rule are still unclear, thus bringing difficulty to control.
In order to prevent the occurrence and prevalence of novel coronavirus (covd-19) infection, measures are rapidly taken to effectively control the spread of epidemic development, and rapid detection of novel coronavirus infection is particularly important. For a long time, the identification of coronaviruses adopts traditional microbiological detection methods, namely morphological, physiological and biochemical characteristics and serological identification. The method has high accuracy, but the required time is too long, and the method can be completed only in tens of hours at the fastest speed, so that the method is difficult to meet the requirement of rapid detection. The nucleic acid detection method based on multiplex PCR has important significance for early diagnosis of coronaviruses and discovery of infectious sources.
Matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS for short) technology is a mass spectrometry technology which is developed rapidly and is the last 80 years of the 20 th century. The mass analyzer is an ion drift tube (ion drift tube), ions generated by an ion source are firstly collected, all ion speeds in the collector become 0, and the ions enter the field-free drift tube after being accelerated by using a pulse electric field and fly to an ion receiver at a constant speed, and the larger the ion mass is, the longer the time for reaching the receiver is; the smaller the ion mass, the shorter the time it takes to reach the receiver. According to the principle, ions with different masses can be separated according to mass-to-charge ratio, and the molecular mass and purity of biological macromolecules such as polypeptide, protein, nucleic acid, polysaccharide and the like can be accurately detected, so that the method has the advantages of high accuracy, high flexibility, large flux, short detection period and high cost performance.
In recent years, mass spectrometry techniques have emerged to detect polypeptides or polypeptides characteristic of pathogenic microorganisms or viruses. For example, chinese patent application CN102337223a, "penicillium chrysogenum antifungal protein Pc-Arctin and its preparation method", discloses a MALDI-TOF identification method for detecting penicillium chrysogenum antifungal protein Pc-Arctin, wherein penicillium chrysogenum a096 spores are picked up from a flat plate and inoculated in SGY liquid medium for culture, crude protein solution obtained by pretreatment is separated and purified on chromatographic column, and separated and purified on carboxymethyl cation exchange chromatographic column, eluting components are collected, centrifugal ultrafiltration concentration of each component is carried out to a required volume, paecilomyces variotii is used as sensitive test indicator bacteria, antifungal active components are tracked, and the determined active components determine the purity of the obtained protein; a single band on the SDS-PAGE electrophoretogram was excised and MALDI-TOF identified. The method is only suitable for specific microorganisms, multiple protein purification processes are needed, and finally, the characteristic polypeptide Pc-Arctin is identified by MALDI-TOF, so that the method is complex in process and narrow in application range, and the purpose of detecting viruses by mass spectrometry cannot be achieved.
Chinese patent application 201110154723, "method for MALDI TOF MS assisted identification of listeria monocytogenes" and 201110154469, "method for MALDI TOF MS assisted identification of vibrio cholerae" discloses a method for assisted identification of bacteria using MALDI TOF MS technique, comprising: pretreating bacterial cultures, collecting MALDI TOF MS (matrix assisted laser Desorption ionization time of flight) maps of all strain samples, preparing a bacterial standard map according to software, detecting and collecting the maps of bacteria to be detected by using the same method, comparing the maps, and judging according to matching scores. Because the method uses conventional treatment (by absolute ethyl alcohol, formic acid and acetonitrile treatment, and assisted by centrifugation, and finally the supernatant is sucked for detection), although the method can characterize the characteristic spectrum of the bacteria to a certain extent, the obtained spectrum is essentially the spectrum collection of the various molecules because the detected substances contain proteins, lipids, lipopolysaccharide and lipo-oligosaccharide, DNA, polypeptide and other molecules capable of being ionized, so that the information of the spectrum which is required to be treated and compared is too large, and the characteristic of the spectrum is low because the detected molecules are too large, and the method is only suitable for a specific bacteria and cannot be popularized to other virus detection in a large quantity.
Chinese patent application 200880121570, entitled "methods and biomarkers for diagnosing and monitoring mental disorders" reports that nearly hundred kinds of neuropeptides associated with mental disorders, including influenza virus, can be detected by MALDI-TOF mass spectrometry techniques. However, this method only briefly summarises the various possible techniques, which neither report a specific protocol nor a specific target for coronaviruses, and thus it is difficult to teach researchers to detect influenza viruses by MALDI-TOF mass spectrometry techniques.
Thus, there is a need for a novel characteristic polypeptide mass spectrometry model for detecting coronavirus infection by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and use thereof.
Disclosure of Invention
A first object of the present invention is to provide a set of compositions based on a characteristic polypeptide of sero-peptide-group (peptidome) which characteristic polypeptide can detect a novel coronavirus (covd-19) by MALDI-TOF mass spectrometry, wherein the characteristic polypeptide composition comprises 25 characteristic polypeptides having the following mass-to-charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z.
In one embodiment, the characteristic polypeptide composition comprises 15 characteristic polypeptides having the following mass to charge ratios and polypeptide sequences:
a characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
a characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a polypeptide with a mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
a characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
a characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
A characteristic polypeptide having a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 11;
a characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
a polypeptide having a mass to charge ratio of 28091m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 15.
In one embodiment, a ten fold cross-validation accuracy of about 91% is indicated when the peak of the signature polypeptide 8986m/z, 28091m/z is up-regulated while the peak of the signature polypeptide 6939m/z, 13886m/z, 14049m/z, 14102m/z is down-regulated for expression, indicating that the serum sample is a positive sample, i.e., the patient is determined to be a new coronavirus infected patient. In a preferred embodiment, the composition of the characteristic polypeptides comprises only characteristic polypeptides having mass ratios of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.
In another embodiment, a serum sample is indicated as a positive sample when the peak of the characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated for expression, i.e., the patient is a new coronavirus infected patient, and the ten fold cross-validation accuracy is about 93.88%. In a preferred embodiment, the composition of the characteristic polypeptides comprises only characteristic polypeptides having mass ratios 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.
In other embodiments, a positive sample is indicated when the peak of the signature polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, i.e., the patient is a new coronavirus infected patient, with a ten fold cross-validation accuracy of about 97.96%.
It is a second object of the present invention to provide a mass spectrometry model for detecting a novel coronavirus infection, which is prepared from a characteristic polypeptide composition having a mass-to-charge ratio peak of any one of the above schemes.
In one embodiment, the mass spectrometry model is prepared from the characteristic polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when the peak of the characteristic polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the characteristic polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a patient with a novel coronavirus infection, and the ten fold cross-validation accuracy is about 97.96%.
In another embodiment, the mass spectrometry model is prepared from only a composition of characteristic polypeptides having mass ratios 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, wherein when the peak of characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated, and the peak of characteristic polypeptide 6939m/z, 13739 m/z, 13765m/z, 86m/z, 14049m/z, 14095m/z is down-regulated, indicating that the serum sample is positive for a new patient, i.e., a cross-patient is about a ten percent of positive, cross-validation rate of the patient, 88.93%.
In other embodiments, the mass spectrometry model is prepared from only a signature polypeptide composition having mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e., the patient is determined to be a new coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.
It is a third object of the present invention to provide a kit for detecting a novel coronavirus infection comprising the above-described characteristic polypeptide composition, or comprising the above-described mass spectrometry model.
In one embodiment, the polypeptide composition or mass spectrometry model is prepared from the characteristic polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when the peak of the characteristic polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the characteristic polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a patient with a novel coronavirus infection, and the ten fold cross-validation accuracy is about 97.96%.
In another embodiment, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass ratios 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, wherein when the peak of the characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated, and the peak of the characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13849 m/z, 14049m/z, 14095m/z is down-regulated, indicating that the serum is about a positive for a cross-infection of the patient as a new patient, i.e., a cross-over-validated sample of about ten percent of virus.
In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a new coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.
In one embodiment, the kit comprises a sample treatment fluid developed by Beijing-based New Boc Biotechnology Inc.
In another embodiment, the kit further comprises a standard mass spectrum sample tube for ensuring the accuracy of the molecular weight measured by the mass spectrometer, wherein the sample tube can be a plurality of sample tubes containing single characteristic polypeptides or one sample tube containing a plurality of characteristic polypeptides, and samples in the standard sample tube are used for performing parallel mass spectrum test when being subjected to mass spectrum with the sample to be measured so as to judge whether the molecular weight information of the sample to be measured is accurate and reliable.
In another embodiment, the kit can contain software or a chip of the standard database of the characteristic polypeptides, and can be used for providing standard data or curve comparison when a sample to be tested is subjected to mass spectrometry so as to judge the expression condition of the characteristic polypeptides in the sample to be tested.
It is a fourth object of the present invention to provide the use of said characteristic polypeptide composition, or said mass spectrometry model, for the preparation of a product for diagnosing a novel coronavirus infection.
In one embodiment, the polypeptide composition or mass spectrometry model is prepared from the characteristic polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when the peak of the characteristic polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated while the peak of the characteristic polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a patient with a novel coronavirus infection, and the ten fold cross-validation accuracy is about 97.96%.
In another embodiment, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass ratios 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, wherein when the peak of the characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z is up-regulated, and the peak of the characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13849 m/z, 14049m/z, 14095m/z is down-regulated, indicating that the serum is about a positive for a cross-infection of the patient as a new patient, i.e., a cross-over-validated sample of about ten percent of virus.
In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the characteristic polypeptides having mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicative of a positive sample, i.e., the patient is determined to be a new coronavirus infected patient, and the ten fold cross-validation accuracy is about 91%.
In any of the above embodiments, the product for diagnosing a new coronavirus infection refers to any conventional product for diagnosing a new coronavirus infection, including: detection reagent, detection chip, detection carrier, detection kit, etc.
A fifth object of the present invention is to provide a method for constructing a mass spectrometry model, comprising:
1) Serum samples of a plurality of clinically definite patients infected with the new coronavirus and non-new coronavirus infected control persons (including tuberculosis patients, patients with symptoms similar to fever and cough and healthy people) are collected and frozen at low temperature for standby;
2) Carrying out mass spectrum pretreatment on serum proteins;
3) Carrying out mass spectrometry detection and reading on the two groups of preprocessed serum proteins to obtain fingerprint patterns of the two groups of serum polypeptides;
4) Carrying out standardized treatment on finger print of serum polypeptide of all patients and normal people, and collecting data;
5) Performing quality control treatment on the obtained data, and screening out characteristic polypeptides with the following mass-to-charge ratio peaks: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, and establishing a mass spectral model for detecting a new coronavirus infection from these mass-to-charge ratio peaks.
In one embodiment, wherein the mass spectrometry model of step 5) is prepared from only the characteristic polypeptides having mass ratios 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, wherein positive serum is indicated to be a positive sample for a new patient when the characteristic polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are up-regulated, and the peak of the characteristic polypeptides 6939m/z, 13719m/z, 13765m/z, 13849 m/z, 14049m/z, 14095m/z are down-regulated, indicating that the serum is about a new patient is a cross-validated sample of a new patient, i.e., a new patient is about 93.88%.
In another embodiment, wherein the mass spectrometry model of step 5) is prepared from only the characteristic polypeptides having mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e., the patient is determined to be a new coronavirus infected patient, a ten fold cross-validation accuracy of about 91%.
In any of the above embodiments, the characteristic polypeptides are respectively:
a characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
a characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a polypeptide with a mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
a characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
A characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
a characteristic polypeptide having a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 11;
a characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
a polypeptide having a mass to charge ratio of 28091m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 15.
In any of the above embodiments, wherein the method of step 2) of pre-treating comprises diluting the serum protein or polypeptide in the stabilized sample with a sample treatment fluid.
In any of the above embodiments, in the step 3), the polypeptide mass spectrometry universal pretreatment kit is used to dilute and read two groups of serum proteins, so as to obtain fingerprints of the two groups of serum polypeptides.
In any of the above embodiments, the quality control process described in step 5) uses the same mass spectrum parameters to detect the crystallization point of the blank substrate, and if a distinct mass spectrum peak appears, the quality of the substrate solution is considered to be unacceptable.
In any one of the above embodiments, wherein the quality control processing in step 5) selects the following 8 characteristic peaks as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.
Furthermore, in any one of the above embodiments, the characteristic polypeptide composition, mass spectrometry model, detection product, use, construction method may involve only 15 characteristic polypeptides having the following mass-to-charge ratios and polypeptide sequences:
a characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
a characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a polypeptide with a mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
A characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
a characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
a characteristic polypeptide having a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 11;
a characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
a polypeptide having a mass to charge ratio of 28091m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 15.
In the process of detecting a biological sample by time-of-flight mass spectrometry, the mass of a mass spectrogram is influenced by a plurality of conditions such as individual difference, sample mass, environmental temperature and humidity change, crystallization states of a sample and a matrix and the like. To avoid the influence of the abnormal spectrum on the analysis result, the above 8 characteristic peaks common to human serum were introduced as quality control peaks, and the occurrence of the quality control peaks was independent of whether the patient had a novel coronavirus infection. Of the 843 spectra collected, 683 could detect all 8 quality control peaks (81.0% of the total number of spectra), and 156 could detect 7 quality control peaks (18.5% of the total number of spectra). Wherein, the following spectrogram quality control conditions are set: in the spectrogram of a single sample, the quality control peak number is 6-8, and when the deviation of the internal standard peak molecular weight is less than 0.002 (or the deviation range is not more than 2 per mill), the quality control is qualified. Unqualified spectra need to be re-detected.
The invention combines a bioinformatics method to screen out corresponding new coronavirus infection markers and establish a detection model for analysis and detection, wherein the bioinformatics method comprises the steps of carrying out standardization treatment on fingerprint spectrums, carrying out experimental quality control treatment on obtained data, screening expected serum characteristic polypeptides and establishing a mass spectrum model, and optionally establishing and verifying the mass spectrum model by using an LR algorithm. And the experimental quality control processing reserves mass spectrum data with the number of the internal standard peaks not less than 6, and performs secondary calibration of the spectrogram by using the internal standard peaks.
Terminology and definitions
Cross validation of ten folds, called 10-fold cross-validation, was used to test algorithm accuracy. Is a common test method. The data set was divided into ten parts, 9 parts of which were used as training data and 1 part as test data in turn, and the test was performed. Each test gives a corresponding correct rate (or error rate). As an estimation of the accuracy of the algorithm, an average value of the accuracy (or error rate) of the result of 10 times is generally required to perform 10-fold cross-validation (e.g., 10 times 10-fold cross-validation), and then the average value is obtained as an estimation of the accuracy of the algorithm. It should be noted that ten fold cross-validation accuracy is related to, but not equivalent to, the accuracy (or sensitivity) of the actual test. In the process of evaluating the effect of the test algorithm, the effect accords with the ten-fold cross-validation accuracy of the confidence interval, and if the correlation change is presented along with the quantity of the characteristic polypeptides and reaches the value which is feasible for clinical diagnosis, the mass spectrum model constructed by the polypeptides accords with the requirement for clinical diagnosis.
Technical effects
Compared with the prior art, the invention has the following advantages:
1. the invention adopts the combination of a plurality of characteristic proteins with differences between a new coronavirus infected patient and a normal person, a pulmonary tuberculosis patient and a control patient with new coronavirus infection type symptoms to detect serum samples, and adopts a method combining traditional statistics and a modern bioinformatics method to carry out data processing, thereby obtaining a polypeptide fingerprint detection model of the new coronavirus infected patient and the healthy person and other control patients, and a series of discovered protein charge ratio peaks provide basis and resources for searching new and more ideal markers.
2. Compared with the prior detection method, the method has higher sensitivity and specificity, simple operation, low detection cost and high flux, and is expected to be used for large-scale screening of new coronavirus infection.
3. The construction method of the model is reasonable and feasible in design, provides a new screening method for providing the clinical cure rate of new coronavirus infection, and provides a new thought for exploring the mechanism of occurrence and development of new coronavirus infection.
4. The invention provides a plurality of characteristic protein combinations with differences between 146 patients with definite diagnosis of new coronavirus infection, 46 patients with normal infection, 33 patients with tuberculosis and 73 patients with new coronavirus infection type symptoms for the first time, breaks through the traditional research thought of searching characteristic polypeptides only in normal patients and new coronavirus infection patients, and effectively avoids the infection of false positive results similar to the new coronavirus infection symptoms.
5. The mass spectrum model of the invention has the detection accuracy reaching 99%, the sensitivity being 98% and the specificity being 100%, and the result shows that the serum peptide group characteristic polypeptide model of the invention can be rapidly used for screening new coronavirus infected patients in the crowd.
Drawings
Fig. 1: and comparing serum polypeptide fingerprints of different groups (healthy people group, tuberculosis group, similar symptoms group and new crown patients group), wherein the negative healthy people, the negative tuberculosis, the negative similar symptoms and the positive new crown patients are respectively from top to bottom.
Fig. 2-1: the 20 peaks with the highest repetition frequency in LASSO. Fig. 2-2: the 20 peaks with the highest importance of VIP changes in PLS-DA.
Fig. 2-3: the 10 peaks with the highest accuracy were cross-validated in RFECV.
Fig. 3: the intensity of each characteristic peak of the training group, wherein the left column is a negative control group, and the right column is a positive control group.
Fig. 4-1: various machine learning methods, training set ROC curve comparison. Fig. 4-2: test set ROC curve comparison.
Fig. 5: the test set of the real groupings confuses the predicted results of the matrix.
Fig. 6: a procedure for establishing a mass spectrometry model for rapidly screening a patient for a novel coronavirus infection (covd-19). Fig. 7: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 5157.6, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 8: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 5366.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 9: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 5892.9, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 10: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 6357.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 11: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 6654.0, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 12: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 6939.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 13: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 7364.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 14: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 7614.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 15: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 8034.3, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 16: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 8042.7, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 17: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 8226.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 18: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 8424.9, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 19: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 8559.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 20: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 8986.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 21: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 9626.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 22: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 13719.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 23: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 13765.2, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 24: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 13886.1, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 25: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 14049.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 26: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 14094.7, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 27: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 14101.8, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 28: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 15123.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 29: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 15866.5, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 30: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 28091.4, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Fig. 31: and a mass spectrum peak spectrum of the characteristic polypeptide m/z 28231.5, wherein the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of a COVID-19.
Detailed Description
The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
Example 1 sample processing
Serum samples from 146 diagnosed patients were obtained from a hospital in Chongqing, month 2 of 2020, all patients were positive for nucleic acid detection and were strictly classified according to guidelines.
Classification is based on the following criteria:
(1) Light weight: clinical symptoms are mild, and the symptoms are not manifested by pneumonia in imaging;
(2) General type: has fever and respiratory symptoms, and can be used for treating visible pneumonia in imaging;
(3) Heavy duty: dyspnea, respiratory rate not less than 30 times/min, oxygen saturation not more than 93% under static state, arterial blood partial pressure (PaO 2)
The oxygen concentration (FiO 2) is less than or equal to 300mmHg;
(4) Critically, respiratory failure, the need for a ventilator, shock, and other organ failure should be sent to the ICU for rescue.
The 152 serum samples of non-new coronavirus infection used as control were from a Chongqing hospital at month 3 of 2020, including 46 normal persons, 33 tuberculosis patient controls, and 73 controls with symptoms of new coronavirus infection type.
All samples were drawn on an empty stomach before eating in the early morning, loaded into a vacuum serum collection tube without additives, centrifuged for 10min at 2,264g, incubated at 56℃for 30min, and serum samples were then sub-packaged and frozen at-80 ℃.
Mass spectrometry pretreatment of serum samples: before mass spectrometry experiments were performed, 1 tube each of the sub-packaged serum samples was extracted from the low temperature refrigerator and placed on wet ice. Thawing for 60-90 min. 5uL of serum sample is sucked, 45uL of sample treatment solution is added, and vortex is carried out at 1200rpm for 30s; 10uL of the sample solution after the suction treatment is added into 10uL of the prepared matrix solution, and vortex is carried out at 1200rpm for 30s; and (3) spotting 1uL of the mixed solution on a target plate, repeating three experiments on each sample, and naturally airing to perform mass spectrum detection.
Example 2 creation of a Mass Spectrometry model for MALDI-TOF-MS
Sample preparation
5ul of serum for each sample was diluted in 45ul of sample treatment fluid (Bioyong Technologies inc.). Then 10ul of diluted serum was removed and mixed with 10ul of matrix solution (Bioyong Technologies inc.).
2ul of the mixed droplets were removed and added to the stainless steel target plate. After drying at room temperature, the samples were injected into a MALDI-TOF MS mass spectrometer (Clin-TOF-II; bioyong Technologies Inc.). Each sample was tested in parallel 3 times.
The general pretreatment kit for the matrix-assisted laser desorption time-of-flight mass spectrum Clin-TOF and the experimental polypeptide mass spectrum is developed by Bioyong company in China. The MALDIquat program is used for preprocessing data, square root conversion is carried out on the processed data, smoothing processing is carried out by using a filter fitting method, and baseline correction is carried out. The mass spectrometer is calibrated with a mixture of polypeptide proteins of known molecular weight. The quality drift of the calibrator should be within 500 ppm. 500 spectra were acquired for each sample point. The molecular weight acquisition range is m/z 3000-30000.
The mass spectrograms of different groups of samples are shown in figure 1 (figure 1: the fingerprint comparison of different groups of serum polypeptides is shown in the specification), wherein the fingerprint comparison is respectively from top to bottom of a negative healthy human spectrum, a negative tuberculosis spectrum, a negative similar symptom and a positive new coronary patient. In the negative healthy person spectra, peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 28091m/z are lower, while peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are higher. In the negative tuberculosis spectra, peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are low, and peak intensities of 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are high. In similar negative symptom group spectra, peak intensities of 5158m/z, 5366m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 28091m/z are lower, while peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are higher. In the positive new crown patient spectra, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are higher, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are lower.
(II) Mass Spectrometry data acquisition
Clin-TOF mass spectrometry was used. And setting a certain point of a crystallization point of a proper laser energy acquisition sample. Each sample point selects 50 laser bombardment positions, each position is bombarded for 10 times, namely, each sample crystallization point is subjected to 500 laser bombardment, and a spectrogram is collected. Laser frequency: 30Hz. Data collection range: 3-30 kDa. External standard calibration was performed with standard before each sample crystallization point was collected, with an average molecular weight deviation of less than 500ppm.
Experiment quality control:
(1) And detecting blank matrix crystallization points by using the same mass spectrum parameters, and if obvious mass spectrum peaks appear, considering that the mass of the matrix solution is unqualified, and replacing a new matrix.
(2) When the standard is used for external standard calibration, the quality deviation of different calibration product points is not more than 500ppm, and 5 calibration product peaks must meet the requirements at the same time.
(3) And selecting polypeptide peaks in 8 serum as internal standard quality control peaks. If 6-8 internal standard peaks can be detected and the molecular weight deviation range of the internal standard peaks is not more than 2 per mill, the spectrogram is considered to be qualified. Otherwise, the spectrogram needs to be collected again. The internal standard peaks m/z are as follows: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.
(III) raw data preprocessing
The MALDI-TOF raw data is subjected to internal standard secondary calibration by internal standard calibration software and is stored as a txt format file. The internal standard peak m/z is: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z. The spectra were then processed using the maldiquat program. The spectral processing content includes smoothing, baseline correction, and molecular weight calibration. Peak detection is performed with a signal-to-noise ratio of 3. The peak is bin processed using the binpeak command with a fault tolerance of 0.002. Peaks with a peak frequency of not less than 25% in the retention group. Finally, the resulting matrix was used for the following analysis.
After log2 transformation, the peak intensity matrix is quantitated and normalized with the R packet limma. The missing values are filled with the minimum values in all samples. The patient and control sample data were randomized into training and test groups at a ratio of 2:1.
(IV) selection of characteristic proteins
After intensity normalization and missing value normalization, the peaks of the training set were analyzed by the following three machine learning methods: LASSO Algorithm (LASSO), partial least squares regression analysis (PLS-DA) and recursive feature elimination with cross validation (RFECV). LASSO full scale Least absolute shrinkage and selection operator is a compressed estimate. The method comprises the steps of obtaining a relatively refined model by constructing a penalty function, so that the model compresses regression coefficients, namely the sum of absolute values of forced coefficients is smaller than a certain fixed value; while some regression coefficients are set to zero. The advantage of subset contraction is thus retained, being a biased estimate of the processing of data with complex co-linearity.
FIG. 2-1 shows the 20 peaks with the highest repetition frequency in LASSO. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. Partial least squares discriminant analysis (PLS-DA) is a multivariate statistical analysis method for discriminant analysis. Discriminant analysis is a common statistical analysis method for determining how a subject is classified based on observed or measured variable values. The principle is that the characteristics of different processed samples (such as an observation sample and a control sample) are respectively trained to generate a training set, and the credibility of the training set is checked.
FIG. 2-2 shows the 20 peaks of highest importance for VIP changes in PLS-DA. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. RFECV refers to finding the optimal number of features through cross-validation. Where RFE (Recursive feature elimination) refers to recursive feature elimination for ranking the importance of features. CV (Cross Validation) refers to cross-validation, i.e. after feature rating, the best number of features is selected by cross-validation. Fig. 2-3 show the 10 peaks with the highest cross-validation accuracy in RFECV. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak.
Through experience inspection of the original spectrogram of the selected peak, 25 peaks with qualified quality control are screened out as characteristics. The intensity of each characteristic peak of the training set is shown in fig. 3. Each row in the graph represents a characteristic peak, each column represents a spectrogram data, and the shades in the graph represent intensities of the peaks. Wherein the left column is a negative control group and the right column is a positive group. It can be seen that the peaks of the characteristic polypeptides 6939m/z, 13765m/z, 13886m/z, 6357m/z, 6654m/z, 14049m/z, 28232m/z, 13719m/z, 14095m/z, 14102 m/z are generally expressed higher in the negative group than in the positive group, while the peaks of the characteristic polypeptides 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 7364m/z, 7614m/z, 28091m/z, 8034m/z, 8043m/z, 8226m/z, 15123m/z, 15867m/z, 5893m/z, 5158m/z, 5366m/z are generally expressed higher in the positive group than in the negative group. The intensities of these peaks differed significantly between the covd-19 and control groups.
(V) model algorithm
8 machine learning methods are used for building a model by using 25 characteristic peaks of training set data, and the model result is evaluated through cross-validation accuracy. The machine learning method in analysis 8 is as follows: logistic Regression (LR), support Vector Machine (SVM), random Forest (RF), naive bayes method (NB), gradient descent tree (GBDT), K-nearest neighbor algorithm (KNN), decision Tree (DT) and adaptive enhancement algorithm (Adaboost).
Fig. 4-1 and 4-2 show model results for training and test sets, respectively, in the form of ROC curves. ROC curves are curves plotted on the ordinate with true positive rate (sensitivity) and false positive rate (1-specificity) on the abscissa, according to a series of different classification schemes (demarcation values or decision thresholds). The areas under the ROC curves (AUCs) of the respective tests are calculated separately for comparison, and the AUC of which test is the largest and the diagnostic value of which test is the best. In this study, the area under the ROC curve AUC was greater than 0.99 for all models of the training set, with LR, SVM, RF, GBDT, DT and Adaboost AUC of 1 (fig. 4-1). In ROC curve analysis of the validation set data, it was found that the AUC of 8 models obtained by 8 machine learning methods exceeded 0.92 in the test set, with AUC of 1 for LR, SVM and NB models (fig. 4-2). After evaluating the accuracy, recall, precision, F1, sensitivity and specificity of the 8 models, the LR models were found to have the best classification performance (auc=1, sensitivity=98%, specificity=100%, accuracy=99%, precision=100%, recall=98%, f1=99%), and could be further applied to the detection of covd-19.
The confusion matrix of the LR model in the test set is shown in FIG. 5, wherein the vertical axis in the figure represents the real grouping situation of samples, the upper row represents the number of negative samples, and the lower row represents the number of positive samples; the horizontal axis represents the model prediction result, the left column represents the number of samples judged negative by the model, and the right column represents the number of samples judged positive by the model. Among the 51 negative samples, all the negative samples are judged to be negative, and the judgment accuracy (namely model specificity) of the negative samples is 100%; of the 49 positive samples, 1 was misjudged as negative, 48 were judged as positive, and the positive sample judgment accuracy (i.e., model sensitivity) was 98.0%.
TABLE 1 median of 25 characteristic polypeptides in training set in patient, healthy person
Figure SMS_1
Figure SMS_2
A specific procedure for establishing a mass spectrometry model for rapid screening of a novel coronavirus infection (COVID-19) patient is shown in FIG. 6. The process comprises the following steps: (1) Collecting new coronavirus infected patients and negative control people respectively and collecting serum samples; (2) subjecting the serum sample to mass spectrometry pretreatment with the kit; (3) MALDI-TOF MS mass spectrum detection to obtain spectrogram information; (4) spectrogram processing and obtaining a peak list; (5) bioinformatic analysis; (6) determining a mass spectrometry model.
EXAMPLE 3 construction of screening model for patients with New coronavirus infection
As training samples, 198 out of 298 serum samples (146 from diagnosed new coronavirus infected patients, another 46 normal persons, 33 tuberculosis patient controls, and 73 controls with similar symptoms of new coronavirus infection (fever cough) were selected for model establishment, 97 from new coronavirus infected patients, and 34 from normal persons, 19 from tuberculosis patient controls, and 48 from those with similar symptoms of new coronavirus infection). All serum samples were withdrawn on an early morning empty stomach, serum was isolated and virus inactivated and stored in a-80 ℃ low temperature freezer.
The remaining samples (49 patients with new coronavirus infection, 12 normal persons, 14 tuberculosis, 25 new coronavirus infection-like symptoms) were used as verification samples for blind test. The processing method is the same as the above.
A mass spectrum model of the polypeptide infected by the new coronavirus is established by screening out the polypeptide peaks of serum characteristics of the patient infected by the new coronavirus in the embodiment 1-2. The model was defined as using 25 characteristic peaks, each: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z.
The characteristic mass spectrum peak spectrogram of the characteristic polypeptide is shown in figures 6-30.
The training set and validation set AUC for the LR model were both 1. The accuracy of the test set is 99%, the sensitivity is 98% and the specificity is 100%. The model has good prediction capability.
TABLE 2 model training results
Sample of Number of examples Prediction of new coronavirus infection Prediction of non-new coronavirus infection Prediction accuracy%
Patient group 97 97 0 100.00
Normal group 34 0 34 100.00
Tuberculosis group 19 0 19 100.00
Symptom analogue group 48 0 48 100.00
Totals to 198 100.00
From the above table it can be seen that the results for the training set samples are: 34 cases in 34 normal groups are judged correctly, and the specificity is 100.00%;97 out of 97 patients were judged correctly, sensitivity was 100.00%;19 out of 19 tuberculosis patients were judged correctly with sensitivity of 100.00%; the 48 cases of the similar patients were judged to be correct for 48 cases, and the sensitivity was 100.00%.
Example 4 identification of novel coronavirus infection characteristic Polypeptides
After the peaks to be identified were determined in examples 2 and 3, 7 serum samples with different intensities of the peaks to be identified in the pre-treatment samples were searched. After the sample is reduced by DTT, the protein with molecular weight more than 50kDa is removed by ultrafiltration and centrifugation. The small molecule proteins/polypeptides filtered off were separated by tricine-SDS-PAGE. And carrying out secondary mass spectrum identification on each strip after the strips are subjected to intra-gel enzyme digestion.
Polypeptide sequence identification was performed using a nano-LC-MS/MS platform, including nanoflow HPLC (Thermo Fisher Scientific, USA) and Q-Exactive mass spectrometer (Thermo Fisher Scientific, USA). The ion mode is a positive ion mode, and the scanning range is 300-1400m/z. The primary mass spectrum resolution is 70000 and the secondary mass spectrum resolution is 17500.
Liquid phase analysis column: model: exsil Pure 120C18 (Dr. Maisch GmbH, USA); specification of: 360 μm by 12cm; inner diameter: 150 μm; and (3) granulating: 1.9um. Elution mode: the mobile phase eluted linearly from 7%B (80% acetonitrile, 0.1% formic acid) to 45% b. Flow rate: 600nl/min; the total time was 38 minutes.
The results of the identification are shown in tables 3 and 4.
TABLE 3 characterization of characteristic peak Polypeptides
Figure SMS_3
TABLE 4 polypeptide identification sequences
Figure SMS_4
/>
Figure SMS_5
EXAMPLE 5 Blind screening test of screening model for patients with New coronavirus infection
After model training, a model with 25 input variables for the characteristic polypeptide fragments was created, and another model with 15 input variables for the sequenced characteristic polypeptide fragments was created.
According to the method of example 3, 49 patients with new coronavirus infection, 12 normal persons, 14 tuberculosis, and 21 samples of the type symptoms were blindly predicted by using the above two models, and the type of the sample was judged, and the method was the same as that described in the above examples. The results are shown in Table 5-1 and Table 5-2, respectively.
TABLE 5-1 prediction of test samples by 25 variables
Figure SMS_6
Figure SMS_7
/>
From Table 5-1, it can be seen that the results for the test group samples are: 12 cases in the 12 normal cases are judged correctly, and the specificity is 100.00%; 48 out of 49 patients were judged correctly, sensitivity was 97.96%;14 out of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%;25 of the 25 symptoms were judged to be correct similarly to 25 of the patients, with a sensitivity of 100.00%.
TABLE 5-2 prediction of test samples by 15 variables
Sample of Number of examples Prediction of new coronavirus infection Prediction of non-new coronavirus infection Prediction accuracy%
Patient group 49 46 3 93.88
Normal group 12 1 11 91.67
Tuberculosis group 14 0 14 100.00
Symptom analogue group 25 1 24 96.00
Totals to 100 95.00
From Table 5-2, it can be seen that the results for the test group samples are: 46 of 49 new patients with coronary disease are judged correctly, and the sensitivity is 93.88%; 11 of the 12 normal groups are judged correctly, and the specificity is 91.67%;14 out of 14 tuberculosis patients were judged correctly with a specificity of 100.00%; 24 out of 25 symptoms similar to those of the patient were judged to be correct, with a sensitivity of 96.00%. This demonstrates that the model consisting of 15 characteristic polypeptides input variables has the same specificity for tuberculosis patients as the complete variable detection results, with very few erroneous decisions occurring in the other three groups. The model meets the clinical requirement of rapid screening and diagnosis of patients.
In addition, it can be seen from the above table that: the blind selection detection accuracy of the 25 characteristic polypeptides used in the invention for the new coronavirus infection group is basically the same as that of model training, but the prediction result for the non-new coronavirus infection group reaches 100%, which shows that on the result after model training, the experimenter can completely exclude the false positive result through fine optimization, which shows that the diagnosis result of the positive result is true and reliable, and missing diagnosis and/or misdiagnosis are avoided to the greatest extent, thus having positive significance.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that several improvements and modifications can be made without departing from the technical principle of the present invention, and these improvements and modifications should also be considered as the scope of the present invention.
Sequence listing
<110> Beijing and Yixinbo-created biotechnology Co., ltd
<120> method for constructing Mass Spectrometry model for diagnosing New coronavirus infection
<150> 202011107819X
<151> 2020-10-16
<160> 15
<170> SIPOSequenceListing 1.0
<210> 1
<211> 61
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 1
Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu Asn Val Lys
1 5 10 15
Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu
20 25 30
Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr
35 40 45
Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg
50 55 60
<210> 2
<211> 68
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 2
Glu Glu Asp Gly Asp Leu Gln Cys Leu Cys Val Lys Thr Thr Ser Gln
1 5 10 15
Val Arg Pro Arg His Ile Thr Ser Leu Glu Val Ile Lys Ala Gly Pro
20 25 30
His Cys Pro Thr Ala Gln Leu Ile Ala Thr Leu Lys Asn Gly Arg Lys
35 40 45
Ile Cys Leu Asp Leu Gln Ala Leu Leu Tyr Lys Lys Ile Ile Lys Glu
50 55 60
His Leu Glu Ser
65
<210> 3
<211> 74
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 3
Met Glu Ala Pro Ala Gln Leu Leu Phe Leu Leu Leu Leu Trp Leu Pro
1 5 10 15
Asp Thr Thr Gly Glu Ile Val Met Thr Gln Ser Pro Ala Thr Leu Ser
20 25 30
Val Ser Pro Gly Glu Arg Ala Thr Leu Ser Cys Arg Ala Ser Gln Ser
35 40 45
Val Ser Ser Asn Leu Ala Trp Tyr Gln Gln Lys Pro Gly Gln Ala Pro
50 55 60
Arg Leu Leu Ile Tyr Gly Ala Ser Thr Arg
65 70
<210> 4
<211> 69
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 4
Met Lys Leu Leu His Val Phe Leu Leu Phe Leu Cys Phe His Leu Arg
1 5 10 15
Phe Cys Lys Val Thr Tyr Thr Ser Gln Glu Asp Leu Val Glu Lys Lys
20 25 30
Cys Leu Ala Lys Lys Tyr Thr His Leu Ser Cys Asp Lys Val Phe Cys
35 40 45
Gln Pro Trp Gln Arg Cys Ile Glu Gly Thr Cys Val Cys Lys Leu Pro
50 55 60
Tyr Gln Cys Pro Lys
65
<210> 5
<211> 79
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 5
Met Thr Ser Arg Lys Lys Val Leu Leu Lys Val Ile Ile Leu Gly Asp
1 5 10 15
Ser Gly Val Gly Lys Thr Ser Leu Met Asn Gln Tyr Val Asn Lys Lys
20 25 30
Phe Ser Asn Gln Tyr Lys Ala Thr Ile Gly Ala Asp Phe Leu Thr Lys
35 40 45
Glu Val Met Val Asp Asp Arg Leu Val Thr Met Gln Ile Trp Asp Thr
50 55 60
Ala Gly Gln Glu Arg Phe Gln Ser Leu Gly Val Ala Phe Tyr Arg
65 70 75
<210> 6
<211> 91
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 6
Met Thr Leu Gly Arg Arg Leu Ala Cys Leu Phe Leu Ala Cys Val Leu
1 5 10 15
Pro Ala Leu Leu Leu Gly Gly Thr Ala Leu Ala Ser Glu Ile Val Gly
20 25 30
Gly Arg Arg Ala Arg Pro His Ala Trp Pro Phe Met Val Ser Leu Gln
35 40 45
Leu Arg Gly Gly His Phe Cys Gly Ala Thr Leu Ile Ala Pro Asn Phe
50 55 60
Val Met Ser Ala Ala His Cys Val Ala Asn Val Asn Val Arg Ala Val
65 70 75 80
Arg Val Val Leu Gly Ala His Asn Leu Ser Arg
85 90
<210> 7
<211> 119
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 7
Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser
1 5 10 15
Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Ile Gln Val Tyr Ser Arg
20 25 30
His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser
35 40 45
Gly Phe His Pro Ser Asp Ile Glu Val Asp Leu Leu Lys Asn Gly Glu
50 55 60
Arg Ile Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp
65 70 75 80
Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp
85 90 95
Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gln Pro Lys Ile
100 105 110
Val Lys Trp Asp Arg Asp Met
115
<210> 8
<211> 127
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 8
Gly Pro Thr Gly Thr Gly Glu Ser Lys Cys Pro Leu Met Val Lys Val
1 5 10 15
Leu Asp Ala Val Arg Gly Ser Pro Ala Ile Asn Val Ala Val His Val
20 25 30
Phe Arg Lys Ala Ala Asp Asp Thr Trp Glu Pro Phe Ala Ser Gly Lys
35 40 45
Thr Ser Glu Ser Gly Glu Leu His Gly Leu Thr Thr Glu Glu Glu Phe
50 55 60
Val Glu Gly Ile Tyr Lys Val Glu Ile Asp Thr Lys Ser Tyr Trp Lys
65 70 75 80
Ala Leu Gly Ile Ser Pro Phe His Glu His Ala Glu Val Val Phe Thr
85 90 95
Ala Asn Asp Ser Gly Pro Arg Arg Tyr Thr Ile Ala Ala Leu Leu Ser
100 105 110
Pro Tyr Ser Tyr Ser Thr Thr Ala Val Val Thr Asn Pro Lys Glu
115 120 125
<210> 9
<211> 128
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 9
Met Ser Leu Arg Leu Asp Thr Thr Pro Ser Cys Asn Ser Ala Arg Pro
1 5 10 15
Leu His Ala Leu Gln Val Leu Leu Leu Leu Ser Leu Leu Leu Thr Ala
20 25 30
Leu Ala Ser Ser Thr Lys Gly Gln Thr Lys Arg Asn Leu Ala Lys Gly
35 40 45
Lys Glu Glu Ser Leu Asp Ser Asp Leu Tyr Ala Glu Leu Arg Cys Met
50 55 60
Cys Ile Lys Thr Thr Ser Gly Ile His Pro Lys Asn Ile Gln Ser Leu
65 70 75 80
Glu Val Ile Gly Lys Gly Thr His Cys Asn Gln Val Glu Val Ile Ala
85 90 95
Thr Leu Lys Asp Gly Arg Lys Ile Cys Leu Asp Pro Asp Ala Pro Arg
100 105 110
Ile Lys Lys Ile Val Gln Lys Lys Leu Ala Gly Asp Glu Ser Ala Asp
115 120 125
<210> 10
<211> 123
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 10
Val Pro Leu Ala Asp Met Pro His Ala Pro Ile Gly Leu Tyr Phe Asp
1 5 10 15
Thr Val Ala Asp Lys Ile His Ser Val Ser Arg Lys His Gly Ala Thr
20 25 30
Leu Val His Cys Ala Ala Gly Val Ser Arg Ser Ala Thr Leu Cys Ile
35 40 45
Ala Tyr Leu Met Lys Phe His Asn Val Cys Leu Leu Glu Ala Tyr Asn
50 55 60
Trp Val Lys Ala Arg Arg Pro Val Ile Arg Pro Asn Val Gly Phe Trp
65 70 75 80
Arg Gln Leu Ile Asp Tyr Glu Arg Gln Leu Phe Gly Lys Ser Thr Val
85 90 95
Lys Met Val Gln Thr Pro Tyr Gly Ile Val Pro Asp Val Tyr Glu Lys
100 105 110
Glu Ser Arg His Leu Met Pro Tyr Trp Gly Ile
115 120
<210> 11
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 11
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 12
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 12
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 13
<211> 141
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 13
Val Leu Ser Pro Ala Asp Lys Thr Asn Val Lys Ala Ala Trp Gly Lys
1 5 10 15
Val Gly Ala His Ala Gly Glu Tyr Gly Ala Glu Ala Leu Glu Arg Met
20 25 30
Phe Leu Ser Phe Pro Thr Thr Lys Thr Tyr Phe Pro His Phe Asp Leu
35 40 45
Ser His Gly Ser Ala Gln Val Lys Gly His Gly Lys Lys Val Ala Asp
50 55 60
Ala Leu Thr Asn Ala Val Ala His Val Asp Asp Met Pro Asn Ala Leu
65 70 75 80
Ser Ala Leu Ser Asp Leu His Ala His Lys Leu Arg Val Asp Pro Val
85 90 95
Asn Phe Lys Leu Leu Ser His Cys Leu Leu Val Thr Leu Ala Ala His
100 105 110
Leu Pro Ala Glu Phe Thr Pro Ala Val His Ala Ser Leu Asp Lys Phe
115 120 125
Leu Ala Ser Val Ser Thr Val Leu Thr Ser Lys Tyr Arg
130 135 140
<210> 14
<211> 146
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 14
Val His Leu Thr Pro Glu Glu Lys Ser Ala Val Thr Ala Leu Trp Gly
1 5 10 15
Lys Val Asn Val Asp Glu Val Gly Gly Glu Ala Leu Gly Arg Leu Leu
20 25 30
Val Val Tyr Pro Trp Thr Gln Arg Phe Phe Glu Ser Phe Gly Asp Leu
35 40 45
Ser Thr Pro Asp Ala Val Met Gly Asn Pro Lys Val Lys Ala His Gly
50 55 60
Lys Lys Val Leu Gly Ala Phe Ser Asp Gly Leu Ala His Leu Asp Asn
65 70 75 80
Leu Lys Gly Thr Phe Ala Thr Leu Ser Glu Leu His Cys Asp Lys Leu
85 90 95
His Val Asp Pro Glu Asn Phe Arg Leu Leu Gly Asn Val Leu Val Cys
100 105 110
Val Leu Ala His His Phe Gly Lys Glu Phe Thr Pro Pro Val Gln Ala
115 120 125
Ala Tyr Gln Lys Val Val Ala Gly Val Ala Asn Ala Leu Ala His Lys
130 135 140
Tyr His
145
<210> 15
<211> 257
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 15
Ile Leu Leu Tyr Ser Leu Asp Gly Arg Leu Leu Ser Thr Tyr Ser Ala
1 5 10 15
Tyr Glu Trp Ser Leu Gly Ile Lys Ser Val Ala Trp Ser Pro Ser Ser
20 25 30
Gln Phe Leu Ala Val Gly Ser Tyr Asp Gly Lys Val Arg Ile Leu Asn
35 40 45
His Val Thr Trp Lys Met Ile Thr Glu Phe Gly His Pro Ala Ala Ile
50 55 60
Asn Asp Pro Lys Ile Val Val Tyr Lys Glu Ala Glu Lys Ser Pro Gln
65 70 75 80
Leu Gly Leu Gly Cys Leu Ser Phe Pro Pro Pro Arg Ala Gly Ala Gly
85 90 95
Pro Leu Pro Ser Ser Glu Ser Lys Tyr Glu Ile Ala Ser Val Pro Val
100 105 110
Ser Leu Gln Thr Leu Lys Pro Val Thr Asp Arg Ala Asn Pro Lys Ile
115 120 125
Gly Ile Gly Met Leu Ala Phe Ser Pro Asp Ser Tyr Phe Leu Ala Thr
130 135 140
Arg Asn Asp Asn Ile Pro Asn Ala Val Trp Val Trp Asp Ile Gln Lys
145 150 155 160
Leu Arg Leu Phe Ala Val Leu Glu Gln Leu Ser Pro Val Arg Ala Phe
165 170 175
Gln Trp Asp Pro Gln Gln Pro Arg Leu Ala Ile Cys Thr Gly Gly Ser
180 185 190
Arg Leu Tyr Leu Trp Ser Pro Ala Gly Cys Met Ser Val Gln Val Pro
195 200 205
Gly Glu Gly Asp Phe Ala Val Leu Ser Leu Cys Trp His Leu Ser Gly
210 215 220
Asp Ser Met Ala Leu Leu Ser Lys Asp His Phe Cys Leu Cys Phe Leu
225 230 235 240
Glu Thr Glu Ala Val Val Gly Thr Ala Cys Arg Gln Leu Gly Gly His
245 250 255
Thr

Claims (7)

1. A method of constructing a mass spectrometry model for diagnosing a new coronavirus infection, the steps comprising:
1) Collecting serum samples of a plurality of clinically definite patients with new coronavirus infection and serum samples of tuberculosis patients, fever and cough symptom similar patients and healthy people which are used as non-new coronavirus infection control patients, and performing low-temperature refrigeration for later use;
2) Pretreatment before mass spectrometry detection is carried out on serum proteins of a serum sample;
3) Carrying out mass spectrometry detection and reading on the four groups of preprocessed serum proteins to obtain fingerprint patterns of four groups of serum polypeptides;
4) Carrying out standardized treatment on the fingerprint of serum polypeptide of serum samples of all diagnosed new coronavirus infected persons, tuberculosis patients of non-new coronavirus infected control persons, fever and cough symptom similar patients and healthy people, and collecting data;
5) Performing quality control treatment on the obtained data, and screening 25 characteristic polypeptides with the following mass-to-charge ratio peaks: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, and establishing a mass spectral model for detecting a new coronavirus infection from these mass-to-charge ratio peaks.
2. A method of constructing a mass spectrometry model for diagnosing a new coronavirus infection, the steps comprising:
1) Collecting serum samples of a plurality of clinically definite patients with new coronavirus infection and serum samples of tuberculosis patients, fever and cough symptom similar patients and healthy people which are used as non-new coronavirus infection control patients, and performing low-temperature refrigeration for later use;
2) Carrying out pretreatment before mass spectrometry on serum proteins of a serum sample;
3) Carrying out mass spectrometry detection and reading on the four groups of preprocessed serum proteins to obtain fingerprint patterns of four groups of serum polypeptides;
4) Carrying out standardized treatment on the fingerprint of serum polypeptide of serum samples of all diagnosed new coronavirus infected persons, tuberculosis patients of non-new coronavirus infected control persons, fever and cough symptom similar patients and healthy people, and collecting data;
5) Performing quality control treatment on the obtained data, screening out 15 characteristic polypeptides, performing secondary mass spectrum identification on the characteristic polypeptides, and establishing a mass spectrum model for detecting new coronavirus infection according to the mass-to-charge ratio peaks;
wherein the 15 characteristic polypeptides are respectively:
a characteristic polypeptide with a mass to charge ratio of 6939m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 1;
a polypeptide with a mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 2;
a characteristic polypeptide with a mass to charge ratio of 8034m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 3;
a polypeptide with a mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from the sequences shown in SEQ ID No. 4;
a characteristic polypeptide with a mass-to-charge ratio of 8986m/z, the polypeptide sequence of which is selected from the sequences shown in SEQ ID No. 5;
A characteristic polypeptide having a mass to charge ratio of 9626m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 6;
a characteristic polypeptide having a mass to charge ratio of 13719m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 7;
a characteristic polypeptide having a mass to charge ratio of 13765m/z, the polypeptide sequence being selected from the group consisting of the sequences shown in SEQ ID No. 8;
a polypeptide having a mass to charge ratio of 13886m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 9;
a characteristic polypeptide having a mass to charge ratio of 14049m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 10;
a characteristic polypeptide having a mass to charge ratio of 14095m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 11;
a characteristic polypeptide having a mass to charge ratio of 14102m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 12;
a characteristic polypeptide with a mass to charge ratio of 15123m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide having a mass to charge ratio of 15867m/z, the polypeptide sequence of which is selected from the group consisting of the sequences shown in SEQ ID No. 14;
a polypeptide having a mass to charge ratio of 28091m/z, wherein the polypeptide sequence is selected from the group consisting of the sequences shown in SEQ ID No. 15.
3. The method of construction according to claim 2, wherein when the peak of the characteristic polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z in a serum sample for mass spectrometry is up-regulated while the peak of the characteristic polypeptide 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z is down-regulated for expression, the serum sample is indicated as a positive sample, i.e. the provider of the serum sample is a new coronavirus infected patient, a ten fold cross-validation accuracy of about 93.88%.
4. A method of constructing according to any one of claims 1 or 2 or 3, wherein the method of pretreatment of step 2) comprises diluting the serum protein or polypeptide in the stabilized sample with a sample treatment fluid.
5. The method according to claim 4, wherein the step 3) uses a general pretreatment kit for polypeptide mass spectrometry to dilute and read four groups of serum proteins, and obtain fingerprint spectra of four groups of serum polypeptides.
6. The method of claim 5, wherein the quality control process of step 5) uses the same mass spectrum parameters for detecting the crystallization point of the blank substrate, and considers the quality of the substrate solution to be disqualified if a distinct mass spectrum peak appears.
7. The construction method according to claim 6, wherein the quality control process of step 5) selects the following 8 characteristic peaks as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700m/z.
CN202110156544.7A 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection Active CN112748173B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011107819X 2020-10-16
CN202011107819 2020-10-16

Publications (2)

Publication Number Publication Date
CN112748173A CN112748173A (en) 2021-05-04
CN112748173B true CN112748173B (en) 2023-06-20

Family

ID=75653649

Family Applications (5)

Application Number Title Priority Date Filing Date
CN202110158952.6A Active CN112903802B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection
CN202110156544.7A Active CN112748173B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection
CN202110155258.9A Active CN112798679B (en) 2020-10-16 2021-02-04 Kit for diagnosing novel coronavirus infection
CN202110155492.1A Active CN112858454B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for diagnosing new coronary pneumonia
CN202110154054.3A Active CN112946053B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for preparing detection product for diagnosing new coronavirus infection

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110158952.6A Active CN112903802B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection

Family Applications After (3)

Application Number Title Priority Date Filing Date
CN202110155258.9A Active CN112798679B (en) 2020-10-16 2021-02-04 Kit for diagnosing novel coronavirus infection
CN202110155492.1A Active CN112858454B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for diagnosing new coronary pneumonia
CN202110154054.3A Active CN112946053B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for preparing detection product for diagnosing new coronavirus infection

Country Status (1)

Country Link
CN (5) CN112903802B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114858907A (en) * 2021-02-04 2022-08-05 北京毅新博创生物科技有限公司 Construction method of mass spectrum model for diagnosing new coronary pneumonia
CN114858906A (en) * 2021-02-04 2022-08-05 北京毅新博创生物科技有限公司 Kit for diagnosing neocoronary pneumonia
CN114858904A (en) * 2021-02-04 2022-08-05 北京毅新博创生物科技有限公司 Mass spectrometry model comprising characteristic polypeptides for diagnosing neocoronary pneumonia
CN113555118B (en) * 2021-07-26 2023-03-31 内蒙古自治区人民医院 Method and device for predicting disease degree, electronic equipment and storage medium
WO2023123175A1 (en) * 2021-12-30 2023-07-06 北京毅新博创生物科技有限公司 Method for evaluating whether individual completes vaccination or individual immune changes

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101424661A (en) * 2008-07-23 2009-05-06 中国人民解放军总医院第二附属医院 Serodiagnosis model establishing method for active tuberculosis disease

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004092208A2 (en) * 2003-04-15 2004-10-28 Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Health Sars-related proteins
US8057993B2 (en) * 2003-04-26 2011-11-15 Ibis Biosciences, Inc. Methods for identification of coronaviruses
US20060257861A1 (en) * 2005-05-12 2006-11-16 Wright State University Screening assay for inhibitors of severe acute respiratory syndrome (SARS) using SELDI-TOF Mass Spectrometry
US7714276B2 (en) * 2005-09-30 2010-05-11 New York University Methods for direct biomolecule identification by matrix-assisted laser desorption ionization (MALDI) mass spectrometry
CN101093215A (en) * 2007-04-03 2007-12-26 许洋 Mass spectrum kit and method for evaluating prognosis from screening lung cancer
PL2195645T3 (en) * 2007-09-11 2014-12-31 Cancer Prevention & Cure Ltd Method for aiding in the diagnosis and therapy of asthma and lung cancer
CN102323246B (en) * 2011-07-29 2016-08-03 北京毅新博创生物科技有限公司 One group for detecting the characteristic protein of pulmonary carcinoma
CN102661884B (en) * 2012-05-03 2015-04-15 浙江大学 Sample containing tuberculosis serum characterized protein and preparation method thereof
GB201807380D0 (en) * 2018-05-04 2018-06-20 Karlsson Roger Biomarkers for detecting microbial infection
CN111366734B (en) * 2020-03-20 2021-07-13 广州市康润生物科技有限公司 Method for screening new coronavirus through double indexes and predicting severe pneumonia
CN111323511B (en) * 2020-03-26 2022-04-29 浙江大学医学院附属第四医院(浙江省义乌医院、浙江大学医学院附属第四医院医共体) Rapid detection kit and method for inactivating new coronavirus
CN111455062B (en) * 2020-04-01 2022-02-11 中国人民解放军总医院 Kit and platform for detecting susceptibility genes of novel coronavirus
CN111504886B (en) * 2020-05-06 2021-09-03 西安交通大学 Application of a group of molecules in preparation of auxiliary diagnosis reagent or kit for new coronary pneumonia
CN111337673B (en) * 2020-05-18 2020-08-11 博奥赛斯(天津)生物科技有限公司 Synthetic polypeptide composition for novel coronavirus immunodetection and application

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101424661A (en) * 2008-07-23 2009-05-06 中国人民解放军总医院第二附属医院 Serodiagnosis model establishing method for active tuberculosis disease

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Development of a Clinical MALDI-ToF Mass Spectrometry Assay for SARS-CoV-2: Rational Design and Multi-Disciplinary Team Work;Ray K. Iles等;《Diagnostics》;20200924;第10卷(第10期);第746-1~15页 *
社区获得性肺炎蛋白指纹图谱诊断技术研究;翁丽珍等;《中国临床医生杂志》;20151105;第43卷(第11期);27-30 *
血清蛋白指纹图谱检测对儿童支原体肺炎早期诊断的研究;严红梅等;《中外医疗》;20170501;第36卷(第13期);4-8 *

Also Published As

Publication number Publication date
CN112946053A (en) 2021-06-11
CN112858454B (en) 2022-09-30
CN112903802A (en) 2021-06-04
CN112798679A (en) 2021-05-14
CN112858454A (en) 2021-05-28
CN112798679B (en) 2023-06-20
CN112748173A (en) 2021-05-04
CN112903802B (en) 2023-06-27
CN112946053B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN112748173B (en) Method for constructing mass spectrum model for diagnosing new coronavirus infection
CN109884302B (en) Lung cancer early diagnosis marker based on metabonomics and artificial intelligence technology and application thereof
Vaidyanathan et al. Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-throughput bacterial identification
Chalupová et al. Identification of fungal microorganisms by MALDI-TOF mass spectrometry
Panda et al. MALDI-TOF mass spectrometry for rapid identification of clinical fungal isolates based on ribosomal protein biomarkers
CN111289736A (en) Slow obstructive pulmonary early diagnosis marker based on metabonomics and application thereof
WO2022166486A1 (en) Characteristic polypeptide composition for diagnosing covid-19
WO2022166485A1 (en) Kit for diagnosing covid-19
CN103308696A (en) Brucella rapid detection kit based on mass-spectrometric technique
CN101403740B (en) Mass spectrum model used for detecting liver cancer characteristic protein and preparation method thereof
US11561228B2 (en) Method for discriminating a microorganism
CN107024530A (en) Method of detection microorganism and products thereof is composed by internal standard material
CN111239235A (en) Database establishment method and identification method of Bartonella strain MALDI-TOF MS
CN111307926A (en) Rapid detection method for brucella vaccine strain infection based on serum
WO2022166494A1 (en) Construction method for mass spectrum model for diagnosing covid-19
WO2022166493A1 (en) Mass spectrometry model comprising marker polypeptides for diagnosing covid-19 pneumonia
WO2022166487A1 (en) Use of characteristic polypeptide composition and mass spectrometry model for preparing covid-19 detection product
Pan et al. Identification of lethal Aspergillus at early growth stages based on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
CN116626147A (en) Detection method of Kodak-ing-disease bacteria and construction of protein fingerprint thereof
Velichko et al. Classification and identification tasks in microbiology: Mass spectrometric methods coming to the aid
WO2011015631A1 (en) Method of identifying micro-organisms and their species in blood culture
CN111650287B (en) Small fecal peptide for detecting active tuberculosis and detection system thereof
Lee et al. Utilizing Negative Markers for Identifying Mycobacteria Species based on Mass Spectrometry with Machine Learning Methods
CN114354946A (en) Method for establishing regional human pathogenic bacteria polypeptide quality reference spectrum library
CN118112250A (en) Method for detecting fluconazole-resistant candida otophylla and construction of protein fingerprint thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant