CN114858903A - Characteristic polypeptide composition for diagnosing neocoronary pneumonia - Google Patents

Characteristic polypeptide composition for diagnosing neocoronary pneumonia Download PDF

Info

Publication number
CN114858903A
CN114858903A CN202110154026.1A CN202110154026A CN114858903A CN 114858903 A CN114858903 A CN 114858903A CN 202110154026 A CN202110154026 A CN 202110154026A CN 114858903 A CN114858903 A CN 114858903A
Authority
CN
China
Prior art keywords
polypeptide
mass
ala
characteristic
leu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110154026.1A
Other languages
Chinese (zh)
Inventor
廖璞
孙巍
乔亮
吕倩
马庆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Clin Bochuang Biotechnology Co Ltd
Original Assignee
Beijing Clin Bochuang Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Clin Bochuang Biotechnology Co Ltd filed Critical Beijing Clin Bochuang Biotechnology Co Ltd
Priority to CN202110154026.1A priority Critical patent/CN114858903A/en
Priority to PCT/CN2021/142821 priority patent/WO2022166486A1/en
Publication of CN114858903A publication Critical patent/CN114858903A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • G01N27/628Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas and a beam of energy, e.g. laser enhanced ionisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/64Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using wave or particle radiation to ionise a gas, e.g. in an ionisation chamber
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/62Detectors specially adapted therefor
    • G01N30/72Mass spectrometers
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • G01N33/6851Methods of protein analysis involving laser desorption ionisation mass spectrometry
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/72Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving blood pigments, e.g. haemoglobin, bilirubin or other porphyrins; involving occult blood
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • G01N2030/8809Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
    • G01N2030/8813Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials
    • G01N2030/8822Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials involving blood
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N30/00Investigating or analysing materials by separation into components using adsorption, absorption or similar phenomena or using ion-exchange, e.g. chromatography or field flow fractionation
    • G01N30/02Column chromatography
    • G01N30/88Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86
    • G01N2030/8809Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample
    • G01N2030/8813Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials
    • G01N2030/8831Integrated analysis systems specially adapted therefor, not covered by a single one of the groups G01N30/04 - G01N30/86 analysis specially adapted for the sample biological materials involving peptides or proteins
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2333/00Assays involving biological materials from specific organisms or of a specific nature
    • G01N2333/005Assays involving biological materials from specific organisms or of a specific nature from viruses
    • G01N2333/08RNA viruses
    • G01N2333/165Coronaviridae, e.g. avian infectious bronchitis virus

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Immunology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Pathology (AREA)
  • Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Food Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Electrochemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biotechnology (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Optics & Photonics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Toxicology (AREA)
  • Peptides Or Proteins (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention provides a characteristic polypeptide composition for detecting new coronary pneumonia, which comprises 29 characteristic polypeptides with specific mass-to-charge ratios, and whether a sample is a new coronary pneumonia patient or not can be judged by analyzing the expression condition of the characteristic polypeptides. The invention also provides applications of a mass spectrum model prepared from the characteristic polypeptide composition, a product for diagnosing the new coronary pneumonia and the like. The invention firstly proposes to search a plurality of characteristic protein combinations with differences according to new coronary pneumonia patients/normal persons, tuberculosis patients and new coronary pneumonia type symptom contrast, breaks through the traditional research thought of only searching characteristic polypeptides in normal persons and new coronary pneumonia patients, effectively avoids infection of false positive results similar to new coronary pneumonia symptoms, has simple operation, low detection cost and high accuracy, and is expected to be used for large-scale screening of new coronary pneumonia.

Description

Characteristic polypeptide composition for diagnosing new coronary pneumonia
Technical Field
The invention belongs to the field of detection, and relates to a technology for rapidly detecting novel coronavirus pneumonia by using a time-of-flight mass spectrometry technology.
Background
Coronaviruses are a group of pathogens that cause mainly respiratory and intestinal diseases. The surface of such virus particles has many regularly arranged protrusions, and the whole virus particle is like the crown of emperor, hence the name "coronavirus". Besides humans, coronaviruses can infect various mammals such as pigs, cows, cats, dogs, minks, camels, bats, mice, hedgehogs, and various birds. The novel coronavirus COVID-19 is a novel coronavirus strain which is never discovered in human bodies before, and the propagation rule, the infection mechanism, the evolution rule and the mutation rule of the novel coronavirus strain are still unclear, so that the difficulty is brought to prevention and treatment.
In order to prevent the occurrence and the prevalence of the novel coronavirus (COVID-19) pneumonia, measures are rapidly taken, the development and the spread of epidemic situations are effectively controlled, and the rapid detection of the novel coronavirus pneumonia is particularly important. For a long time, the identification of coronavirus adopts the traditional microbiological detection method, namely morphological, physiological and biochemical characteristics and serological identification. Although the method has high accuracy, the required time is too long and can be finished within ten and several hours at the fastest speed, and the requirement of quick detection is difficult to adapt. The nucleic acid detection method based on the multiplex PCR has important significance for early diagnosis of coronavirus and discovery of infection source. And multiple PCR detection aims at multiple genes, the false negative rate is lower than that of single PCR, however, the PCR detection method has the defects of complicated detection process, higher cost and limited detection high flux.
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) technology is a mass spectrometry technology which is published and developed rapidly in the end of the 20 th century and the 80 th century. The mass analyzer is an ion drift tube (ion drift tube), ions generated by an ion source are firstly collected, the speed of all ions in a collector is changed into 0, the ions enter the field-free drift tube after being accelerated by a pulse electric field and fly to an ion receiver at a constant speed, and the larger the mass of the ions is, the longer the time for the ions to reach the receiver is; the smaller the mass of the ions, the shorter the time it takes to reach the receiver. According to the principle, ions with different masses can be separated according to the mass-to-charge ratio, the molecular mass and the purity of biomacromolecules such as polypeptide, protein, nucleic acid, polysaccharide and the like can be accurately detected, and the method has the advantages of high accuracy, strong flexibility, large flux, short detection period and high cost performance.
In recent years, mass spectrometry techniques have emerged to detect polypeptides or polypeptides characteristic of pathogenic microorganisms or viruses. For example, chinese patent application CN102337223A, "penicillium chrysogenum antifungal protein Pc-Arctin and a preparation method thereof", discloses a MALDI-TOF identification method for detecting penicillium chrysogenum antifungal protein Pc-Arctin, wherein penicillium chrysogenum a096 spores are picked from a plate and inoculated into SGY liquid culture medium for culture, crude protein solution obtained by pretreatment is separated and purified on a chromatographic column, and separated and purified on a carboxymethyl cation exchange chromatographic column, each eluted component is collected, each component is concentrated to a required volume by centrifugal ultrafiltration, paecilomyces variotii is used as sensitive test indicator bacteria, antifungal active components are tracked, and the determined active components are used for judging the purity of the obtained protein; a single band on the SDS-PAGE electrophoresis image is cut, and MALDI-TOF identification is carried out. The method is only suitable for specific microorganisms, needs a multiple protein purification process, and finally identifies the characteristic polypeptide Pc-Arctin by MALDI-TOF, has complicated process and narrow application range, and cannot realize the purpose of detecting viruses by mass spectrometry.
Chinese patent application 201110154723, "MALDI TOF MS assisted identification Listeria monocytogenes" and 201110154469, "MALDI TOF MS assisted identification Vibrio cholerae" disclose a method for assisted identification of bacteria by MALDI TOF MS technology, comprising: pretreating the bacterial culture, collecting MALDI TOF MS spectra of all bacterial strain samples, preparing bacterial standard spectra according to software, detecting and collecting the spectra of the bacteria to be detected by using the same method, comparing the two spectra, and judging according to matching scores. Because the method uses conventional treatment (through absolute ethyl alcohol, formic acid and acetonitrile treatment, and is assisted with centrifugation, and finally supernatant is sucked for detection), although the characteristic map of the bacteria can be characterized to a certain extent, because the to-be-detected object contains protein, lipid, lipopolysaccharide, lipooligosaccharide, DNA, polypeptide and other molecules which can be ionized, the obtained map is essentially the map set of the various molecules, the map information amount required to be treated and compared is overlarge, and the map characteristic is low because the to-be-detected molecule is overlarge, so that the method is only suitable for a specific bacterium and cannot be popularized to other large-scale virus detection.
Chinese patent application 200880121570, title of the invention "method and biomarker for diagnosing and monitoring psychiatric disorders" reports that nearly a hundred species of biological peptides related to psychiatric disorders, including influenza virus, can be detected by MALDI-TOF mass spectrometry. However, this method simply summarizes the various possible techniques, neither reporting specific protocols nor specific targets for coronaviruses, and thus it is difficult to teach researchers to detect influenza viruses by MALDI-TOF mass spectrometry.
Therefore, a characteristic polypeptide mass spectrum model for detecting the novel coronary pneumonia by matrix assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF-MS) and application thereof are needed at present.
Disclosure of Invention
The first object of the present invention provides a set of compositions based on seropeptidome (peptome) signature polypeptides that can detect neocoronaviruses (COVID-19) by MALDI-TOF mass spectrometry, wherein the signature polypeptide composition comprises 25 signature polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, or 29 characteristic polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.
In any of the above embodiments, when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-validation accuracy of ten fold is about 91%. In a preferred embodiment, the composition of characterizing polypeptides comprises only characterizing polypeptides in a mass ratio of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.
In another any embodiment, when the peaks of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z are up-regulated while the peaks of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z are down-regulated, the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, with a cross-over validation accuracy of about 93.31%. In a preferred embodiment, the composition of signature polypeptides comprises only signature polypeptides in a mass ratio of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.
In other embodiments, when the peak of signature polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z is upregulated, while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is downregulated in expression, it is indicative that the serum sample is a positive pneumonia, i.e. the patient is a new coronary patient, the ten-fold cross validation accuracy is about 98.69%.
The second invention aim of the invention is to provide a mass spectrum model for detecting the neocoronary pneumonia, which is prepared by the characteristic polypeptide composition with the mass-to-charge ratio peak value of any scheme.
In one embodiment, the mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, peaks of the signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were up-regulated, indicating that the serum sample was a positive sample, i.e. that the patient was a new coronary pneumonia patient, with a cross-fold accuracy of about 97.96%.
Alternatively, in another embodiment of the foregoing, the mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5323 m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new patient with a cross-fold test accuracy of about 98.69%.
In another embodiment, the mass spectral model is prepared from only the following signature polypeptide compositions at mass to mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when signature polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z, were expressed in a down-regulated manner, indicating that the serum sample was a positive sample, i.e., the patient was a new coronary pneumonia patient, and the ten-fold cross-validation accuracy was about 93.31%.
In other embodiments, the mass spectral model is prepared from only the following signature polypeptide compositions at mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z and 14102m/z, respectively, wherein when the peaks of the signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of the signature polypeptides 6939m/z, 13886m/z, 14049m/z and 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-over validation accuracy is about 91%.
The third invention of the present invention is to provide a kit for detecting neocoronary pneumonia, which comprises the characteristic polypeptide composition, or comprises the mass spectrum model.
In one embodiment, the polypeptide composition or mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 5893m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the characteristic polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, and the cross-over validation accuracy is about 97.96%.
Alternatively, in another embodiment, the polypeptide composition or mass spectral model is prepared from a signature polypeptide 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, when the signature polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new coronary pneumonia patient, and the cross-fold validation accuracy is about 98.69%.
In another embodiment, the polypeptide composition or mass spectral model is prepared from only the featured polypeptides having mass to charge ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the featured polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z, were expressed when the peaks were down-regulated, indicating that the serum sample was a positive sample, i.e. the patient was a new coronary pneumonia patient, and the ten-fold cross-validation accuracy was about 93.31%.
In other embodiments, the polypeptide composition or mass spectral model is prepared from only characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when peaks of characteristic polypeptides 8986m/z, 28091m/z are up-regulated while peaks of characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e., the patient is determined to be a new crown pneumonia patient, and the cross-fold cross-validation accuracy is about 91%.
In one embodiment, the kit includes a sample processing solution developed by New Bsakawa Biotech limited, Beijing resol.
In another embodiment, the kit further comprises a standard mass spectrum sample tube for ensuring that the molecular weight measured by the mass spectrometer is accurate, the sample tube can be a plurality of sample tubes containing single characteristic polypeptide, or a sample tube containing a plurality of characteristic polypeptides, and a sample in the standard sample tube is used for performing parallel mass spectrum test with a sample to be tested when performing mass spectrum so as to judge whether the molecular weight information of the sample to be tested is accurate and reliable.
In another embodiment, the kit may contain software or a chip of the standard database of the characteristic polypeptide, and may be used to provide a comparison of standard data or curves when a sample to be tested is subjected to mass spectrometry so as to determine the expression status of the characteristic polypeptide in the sample to be tested.
The fourth invention of the invention is to provide the characteristic polypeptide composition or the mass spectrum model for use in preparing products for diagnosing new coronary pneumonia.
In one embodiment, the polypeptide composition or mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 5893m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the characteristic polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, and the cross-over validation accuracy is about 97.96%.
Alternatively, in another embodiment, the polypeptide composition or mass spectral model is prepared from a signature polypeptide 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, when the signature polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new coronary pneumonia patient, and the cross-fold validation accuracy is about 98.69%.
In another embodiment, the polypeptide composition or mass spectral model is prepared from only the featured polypeptides having mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the featured polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z, were expressed when the peaks were down-regulated, indicating that the serum sample was a positive sample, i.e. the patient was a new coronary pneumonia patient, and the ten-fold cross-validation accuracy was about 93.31%.
In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the following signature polypeptides at a mass ratio of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the signature polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e., the patient is determined to be a new crown pneumonia patient with a cross-fold accuracy of about 91%.
In any of the above embodiments, the product for diagnosing neocoronary pneumonia refers to any conventional product for diagnosing neocoronary pneumonia, including: detection reagent, detection chip, detection carrier, detection kit and the like.
The fifth invention of the present invention is to provide a method for constructing a mass spectrometry model, comprising:
1) collecting serum samples of multiple clinically confirmed new coronary pneumonia persons and non-new coronary pneumonia contrast persons (including tuberculosis patients, similar patients with fever and cough and healthy people), and freezing at low temperature for later use;
2) performing pretreatment before mass spectrum on the serum protein;
3) performing mass spectrum detection reading on the two groups of pretreated serum proteins to obtain fingerprint spectrums of the two groups of serum polypeptides;
4) performing standardized processing on the fingerprint spectrums of serum polypeptides of all patients and normal people, and collecting data;
5) and performing quality control treatment on the obtained data, and screening out the characteristic polypeptides with the following mass-to-charge ratio peaks: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, performing secondary mass spectral identification on the characteristic polypeptides, and establishing a mass spectral model for detecting new crown pneumonia according to the mass-to-charge ratio peaks.
In one embodiment, wherein step 5) performs quality control processing on the obtained data, a characteristic polypeptide having the following mass-to-charge ratio peaks is selected: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, secondary mass spectrometric identification of the signature polypeptides and establishment of a mass spectrometric model for the detection of new coronary pneumonia based on these mass to charge ratio peaks.
In a preferred embodiment, wherein the mass spectral model of step 5) is prepared only from signature polypeptides having mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, wherein when signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, m/z, When the peaks of 15867m/z and 28091m/z are up-regulated and the peaks of characteristic polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z are down-regulated, the serum sample is a positive sample, namely the patient is a new coronary pneumonia patient, and the ten-fold cross validation accuracy is about 93.31%.
In another embodiment, wherein the mass spectrometric model of step 5) is prepared only from characteristic polypeptides at mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when peaks of characteristic polypeptides of 8986m/z, 28091m/z are upregulated while peaks of characteristic polypeptides of 6939m/z, 13886m/z, 14049m/z, 14102m/z are downregulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-fold cross-validation accuracy is about 91%.
Furthermore, in any one of the embodiments of any one of the above objects, the signature polypeptide composition, the mass spectrometric model, the detection product, the use, the method of construction may involve a polypeptide comprising only 19 signature polypeptides having the following mass to charge ratios and polypeptide sequences:
a characteristic polypeptide with a mass-to-charge ratio of 6939m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 1;
a characteristic polypeptide with the mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequence shown in SEQ ID No. 2;
a characteristic polypeptide with the mass-to-charge ratio of 8034m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 3;
a characteristic polypeptide with the mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 4;
a characteristic polypeptide with the mass-to-charge ratio of 8986m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 5;
a characteristic polypeptide with the mass-to-charge ratio of 9626m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 6;
a characteristic polypeptide with the mass-to-charge ratio of 13719m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 7;
a characteristic polypeptide with the mass-to-charge ratio of 13765m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 8;
a characteristic polypeptide with the mass-to-charge ratio of 13886m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 9;
a characteristic polypeptide with the mass-to-charge ratio of 14049m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 10;
a characteristic polypeptide with the mass-to-charge ratio of 14095m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 11;
a characteristic polypeptide with the mass-to-charge ratio of 14102m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 12;
a characteristic polypeptide with the mass-to-charge ratio of 15123m/z, and the polypeptide sequence is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide with the mass-to-charge ratio of 15867m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 14;
a characteristic polypeptide with a mass-to-charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15;
a characteristic polypeptide with the mass-to-charge ratio of 11435m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 16;
a characteristic polypeptide with the mass-to-charge ratio of 11495m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 17;
a characteristic polypeptide with the mass-to-charge ratio of 11523m/z, and the polypeptide sequence of the characteristic polypeptide is selected from the sequence shown as SEQ ID No. 18;
a characteristic polypeptide with the mass-to-charge ratio of 11680m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 19.
In any of the embodiments above, wherein the step 2) pre-treatment method comprises diluting the serum protein or polypeptide in the stable sample with the sample processing solution.
In any of the above embodiments, in the step 3), the polypeptide mass spectrum universal pretreatment kit is used to dilute and read two groups of serum proteins, so as to obtain fingerprint spectra of two groups of serum polypeptides.
In any of the above embodiments, in the quality control treatment in step 5), for a blank substrate, the crystallization point of the blank substrate is detected by using the same mass spectrometry parameters, and if a significant mass spectrometry peak occurs, the quality of the substrate solution is considered to be unqualified.
In any one of the above embodiments, in the quality control processing in step 5), the following 8 characteristic peaks are selected as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z.
In the process of detecting a biological sample by the time-of-flight mass spectrometry, the quality of a mass spectrometry spectrogram is influenced by a plurality of conditions such as individual difference, sample quality, environmental temperature and humidity change, and crystallization states of the sample and a matrix. In order to avoid the influence of an abnormal spectrogram on an analysis result, the total 8 characteristic peaks in human serum are introduced as quality control peaks, and the appearance of the quality control peaks is irrelevant to whether a patient has the novel coronavirus pneumonia. In the 843 collected spectrograms, 683 spectrograms can detect all 8 quality control peaks (accounting for 81.0 percent of the total spectrograms), and 156 spectrograms can detect 7 quality control peaks (accounting for 18.5 percent of the total spectrograms). Wherein, the following spectrogram quality control conditions are set: in the spectrogram of a single sample, the quality control is qualified when the quantity of the quality control peaks is 6-8 and the deviation of the molecular weight shift of the internal standard peak is less than 0.002 (or the shift range is not more than 2 per thousand). The failing spectrum needs to be re-detected.
The invention screens out corresponding new coronary pneumonia markers and establishes a detection model for analysis and detection by combining with a bioinformatics method, wherein the bioinformatics method comprises the steps of carrying out standardization processing on a fingerprint, carrying out experimental quality control processing on obtained data, screening expected serum characteristic polypeptides and establishing a mass spectrum model, and optionally establishing and verifying the mass spectrum model by using an LR algorithm and the like. And performing experimental quality control treatment, namely retaining mass spectrum spectrogram data with the internal standard peak outgoing amount not less than 6, and performing secondary calibration on the spectrogram by using the internal standard peak.
Terms and definitions
Cross validation by ten folds, called 10-fold cross-validation by English name, is used for testing the accuracy of the algorithm. Is a commonly used test method. The data set was divided into ten parts, and 9 parts of the data set were used as training data and 1 part of the data set was used as test data in turn for the experiments. Each trial will yield a corresponding correct rate (or error rate). The average of the accuracy (or error rate) of the 10 results is used as an estimate of the accuracy of the algorithm, and generally 10-fold cross validation is performed multiple times (for example, 10 times of 10-fold cross validation), and then the average is obtained as an estimate of the accuracy of the algorithm. It should be noted that the ten-fold cross-validation accuracy correlates with but is not equivalent to the actual detection accuracy (or sensitivity). In the process of evaluating the effect of the test algorithm, the effect meets the ten-fold cross validation accuracy of the confidence interval, and if the effect presents correlation change along with the quantity of the characteristic polypeptides and reaches the feasible value of clinical diagnosis, the mass spectrum model constructed by the polypeptides is shown to meet the requirement of clinical diagnosis.
SAA protein (Serum amyloid A protein) is a Serum amyloid A family protein, an acute phase reaction protein, and belongs to a heterogeneous class of proteins in an apolipoprotein family. There are 4 serum amyloid A genes in humans, SAA1-SAA4, respectively, where two proteins, SAA1 and SAA2, which are in the acute phase (acute phase), are referred to as A-SAA.
Technical effects
Compared with the prior art, the invention has the following advantages:
1. the invention adopts a plurality of characteristic protein combinations which are different between a new coronary pneumonia patient and a normal person, a tuberculosis patient and a contrast patient with new coronary pneumonia type symptoms to detect a serum sample, and adopts a method of combining traditional statistics and a modern bioinformatics method to carry out data processing, thereby obtaining a polypeptide fingerprint spectrum detection model of the pneumonia patient, a healthy person and the contrast patient, and a series of discovered protein mass-to-charge ratio peaks provide basis and resources for searching for new more ideal markers.
2. Compared with the prior detection method, the method has higher sensitivity and specificity, simple operation, low detection cost and high flux, and is expected to be used for large-scale screening of the neocoronary pneumonia.
3. The construction method of the model is reasonable and feasible in design, provides a new screening method for providing the clinical cure rate of the new coronary pneumonia, and also provides a new idea for exploring the mechanism of the occurrence and development of the new coronary pneumonia.
4. The invention firstly provides a method for searching a plurality of characteristic protein combinations with differences in the contrast of 146 cases of patients with confirmed diagnosis of new coronary pneumonia, 46 cases of normal persons, 33 cases of tuberculosis patients and 73 cases of contrast with new coronary pneumonia type symptoms, breaks through the traditional research thought of only searching characteristic polypeptides in normal persons and new coronary pneumonia patients, and effectively avoids infection with false positive results similar to the new coronary pneumonia symptoms.
5. The result shows that the serum peptide characteristic polypeptide model can be rapidly used for screening patients with new coronary pneumonia in crowds.
6. Compared with a composition and a mass spectrum model constructed by 25 characteristic polypeptides, the newly introduced 4 characteristic polypeptides (SEQ ID NO:16-19) belong to the SAA protein marker family, and can be used as biomarkers to diagnose the bacterial and viral infections clinically by ELASA, immunoturbidimetry, colloidal gold, immunofluorescence chromatography and other methods. However, on the basis of the completed mass spectrum models of 25 specific polypeptides, the invention firstly proposes the use of the SAA protein marker for detecting viruses by laser flight mass spectrometry, and firstly and accurately identifies the specific SAA protein sequence (namely SEQ ID NO:16-19), so that the condition of clinical misdiagnosis of normal samples can be effectively avoided. The results show that compared with the ten-fold cross validation accuracy rate of 25 characteristic polypeptide mass spectrum models, the ten-fold cross validation accuracy rate is about 97.96%, and the ten-fold cross validation accuracy rate of 29 characteristic polypeptide mass spectrum models with the introduction of 4 SAA polypeptide markers is about 98.69%.
Drawings
FIG. 1: comparing serum polypeptide fingerprints of different groups (healthy people group, tuberculosis group, similar symptom group and new crown patient group), wherein negative healthy people fingerprint, negative tuberculosis fingerprint, negative similar symptom and positive new crown patient are respectively arranged from top to bottom
FIG. 2-1: the 20 peaks with the highest repetition frequency in LASSO. FIG. 2-2: the 20 peaks with the highest significance for VIP changes in PLS-DA.
FIGS. 2 to 3: the 10 peaks with the highest accuracy were cross-validated in RFECV.
FIG. 3: the left column of each characteristic peak intensity is a negative control group, and the right column is a positive control group.
FIG. 4-1: various machine learning methods, training set ROC curve comparison. FIG. 4-2: test set ROC curve comparisons.
FIG. 5: the test set of true packets confuses the predicted results of the matrix.
FIG. 6: the method is used for establishing a characteristic polypeptide mass spectrum model for rapidly screening patients with new coronary pneumonia (COVID-19).
FIG. 7: the mass spectrum peak map of the characteristic polypeptide m/z 5157.6 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 8: the mass spectrum peak map of the characteristic polypeptide m/z 5366.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 9: the mass spectrum peak map of the characteristic polypeptide m/z 5892.9 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 10: the mass spectrum peak map of the characteristic polypeptide m/z 6357.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 11: the mass spectrum peak map of the characteristic polypeptide m/z 6654.0 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 12: the mass spectrum peak map of the characteristic polypeptide m/z 6939.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 13: the mass spectrum peak map of the characteristic polypeptide m/z 7364.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 14: the mass spectrum peak map of the characteristic polypeptide m/z 7614.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 15: the mass spectrum peak map of the characteristic polypeptide m/z 8034.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 16: the mass spectrum peak map of the characteristic polypeptide m/z 8042.7 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 17: the mass spectrum peak map of the characteristic polypeptide m/z 8226.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 18: the mass spectrum peak map of the characteristic polypeptide m/z 8424.9 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 19: the mass spectrum peak map of the characteristic polypeptide m/z 8559.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 20: the mass spectrum peak map of the characteristic polypeptide m/z 8986.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 21: the mass spectrum peak map of the characteristic polypeptide m/z 9626.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 22: the mass spectrum peak map of the characteristic polypeptide m/z 13719.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 23: the mass spectrum peak map of the characteristic polypeptide m/z 13765.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 24: the mass spectrum peak map of the characteristic polypeptide m/z 13886.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 25: the mass spectrum peak map of the characteristic polypeptide m/z 14049.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 26: the mass spectrum peak map of the characteristic polypeptide m/z 14094.7 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 27 is a schematic view showing: the mass spectrum peak map of the characteristic polypeptide m/z 14101.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 28: the mass spectrum peak map of the characteristic polypeptide m/z 15123.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 29: the mass spectrum peak map of the characteristic polypeptide m/z 15866.5 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 30: the mass spectrum peak map of the characteristic polypeptide m/z 28091.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 31: the mass spectrum peak map of the characteristic polypeptide m/z 28231.5 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 32: the mass spectrum peak map of the characteristic polypeptide m/z 11435.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 33: the mass spectrum peak map of the characteristic polypeptide m/z 11495.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 34: the mass spectrum peak map of the characteristic polypeptide m/z 11522.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 35: the mass spectrum peak map of the characteristic polypeptide m/z 11680.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
EXAMPLE 1 sample treatment
Serum samples from 146 patients diagnosed in Chongqing hospital, 2 months 2020, all patients were positive for nucleic acid detection and were classified strictly according to the southward standard.
Classification was done according to the following criteria:
(1) and (3) light: the clinical symptoms are slight, and no pneumonia is shown in the imaging;
(2) the general type is as follows: has fever and respiratory symptoms, and the imaging shows pneumonia expression;
(3) heavy: dyspnea, respiration rate is more than or equal to 30 times/min, oxygen saturation under static state is less than or equal to 93%, and arterial blood partial pressure (PaO 2)/oxygen concentration (FiO2) is less than or equal to 300 mmHg;
(4) the critical type is respiratory failure, a breathing machine is needed, shock occurs, and other organ failure occurs, and the critical type is sent to an ICU for rescue.
The 152 serum samples of non-new coronary pneumonia as controls were from a Chongqing hospital at 3 months of 2020, including 46 normal persons, 33 tuberculosis patient controls, and 73 controls with symptoms of the new coronary pneumonia type.
All samples were drawn on empty stomach in the morning before food was consumed, filled into unadditized vacuum serum collection tubes, centrifuged at 2,264g for 10min, incubated at 56 ℃ for 30min, and the serum samples were frozen at-80 ℃.
Pretreatment of a serum sample by mass spectrum: before the mass spectrometric detection experiment, 1 tube of each of the dispensed serum samples was taken from a low-temperature refrigerator and placed on wet ice. Thawing for 60-90 min. Sucking 5uL of serum sample, adding 45uL of sample treatment solution, and vortexing at 1200rpm for 30 s; sucking 10uL of the treated sample solution, adding 10uL of the prepared matrix solution, and carrying out vortex for 30s at 1200 rpm; dropping 1uL of the mixed solution on a target plate, repeating three experiments of each sample according to the required key points, and naturally drying the sample to perform mass spectrometry.
Example 2 establishment of Mass Spectrometry model for MALDI-TOF-MS
(I) sample preparation
5ul of serum from each sample was diluted in 45ul of sample treatment fluid (Bioyong Technologies Inc.). Then 10ul of diluted serum was removed and mixed with 10ul of matrix solution (Bioyong Technologies Inc.).
2ul of the mixture was taken out and dropped on a stainless steel target plate. After drying at room temperature, the sample was injected into a MALDI-TOF MS mass spectrometer (Clin-TOF-II; Bioyong Technologies Inc.). Each sample was tested in parallel 3 times.
The matrix-assisted laser desorption time-of-flight mass spectrum Clin-TOF and the experimental polypeptide mass spectrum universal pretreatment kit are developed by the company Bioyong in China. The data was preprocessed using maldquant program, square root transformed on the processed data, smoothed using filter fitting, and baseline corrected. The mass spectrometer was calibrated with a mixture of polypeptide proteins of known molecular weight. The mass drift of the calibrant should be within 500 ppm. 500 spectra were taken for each sample point. The molecular weight collection range is m/z 3000-30000.
The mass spectrum of different groups of samples is shown in figure 1 (figure 1: comparison of serum polypeptide fingerprints of different groups, wherein negative healthy people, negative tuberculosis, negative similar symptoms and positive new coronary patients are respectively shown from top to bottom). In a negative healthy human spectrum, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11523m/z, 15123m/z, 15867m/z, 28091m/z are low, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are high. In the negative tuberculosis spectrum, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z and 28091m/z are lower, while the peak intensities of 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z and 28232m/z are higher. In the similar negative symptom group spectra, the peak intensities of 5158m/z, 5366m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z were lower, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. In the positive neocoronal patient spectra, the peak intensities were higher for 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and lower for 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232 m/z.
(II) Mass Spectrometry data acquisition
A Clin-TOF mass spectrometer was used. And setting proper laser energy to collect a certain point of the crystallization point of the sample. And selecting 50 laser bombardment positions for each sample point, bombarding each position for 10 times, namely performing laser bombardment on each sample crystallization point for 500 times, and collecting a spectrogram. Laser frequency: 30 Hz. Data collection range: 3-30 KDa. External standard calibration with standards before each sample crystallization point was taken with average molecular weight deviation less than 500 ppm.
Experiment quality control:
(1) and detecting a blank matrix crystallization point by using the same mass spectrum parameters, and if an obvious mass spectrum peak appears, considering that the quality of the matrix solution is unqualified, and replacing a new matrix.
(2) When the standard substance is used for external standard calibration, the mass deviation of different calibration substance points is required to be ensured not to exceed 500ppm, and 5 calibration substance peaks are required to meet the requirements at the same time.
(3) And selecting original polypeptide peaks in 8 serums as internal standard quality control peaks. And if 6-8 internal standard peaks can be detected and the molecular weight deviation range of the internal standard peaks does not exceed 2 per mill, determining that the spectrogram is qualified. Otherwise, the spectrogram is required to be collected again. Internal standard peaks m/z are as follows: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z.
(III) preprocessing of raw data
And performing internal standard secondary calibration on the MALDI-TOF raw data by using internal standard calibration software, and storing the internal standard secondary calibration data as a txt format file. Internal standard peaks m/z used were: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z. The spectra were then processed using the maldquant program. The spectrogram processing content includes smoothing, baseline correction, and molecular weight calibration. Peak detection was performed with a signal-to-noise ratio of 3. The peaks are bin processed using the bin peaks command with a fault tolerance of 0.002. Peaks with peak frequencies not less than 25% in the group were retained. Finally, the resulting matrix was used for the following analysis.
After log2 transformation, the peak intensity matrix is quantile normalized to R-package limma. In all samples, the missing values are filled with the minimum value. COVID-19 patient data and control sample data were randomly assigned to the training and testing groups at a ratio of 2: 1.
(IV) selection of characteristic proteins
After intensity normalization and deficiency value normalization, the peaks of the training set were analyzed by three machine learning methods: LASSO Algorithm (LASSO), partial least squares regression analysis (PLS-DA), and recursive feature elimination with cross validation (RFECV). LASSO is called the blast absolute shrinkage and selection operator, and is a compression estimation. It obtains a more refined model by constructing a penalty function, so that it compresses some regression coefficients, i.e. the sum of the absolute values of the forcing coefficients is less than a certain fixed value; while some regression coefficients are set to zero. The advantage of subset puncturing is thus retained, and is a way to process biased estimates of data with complex collinearity.
FIG. 2-1 shows the 20 peaks with the highest repetition frequency in LASSO. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. Partial least squares discriminant analysis (PLS-DA) is a multivariate statistical analysis method used for discriminant analysis. Discriminant analysis is a common statistical analysis method for determining how to classify a subject based on observed or measured values of variables. The principle is that the characteristics of different processing samples (such as an observation sample and a comparison sample) are respectively trained to generate a training set, and the reliability of the training set is checked.
FIG. 2-2 shows the 20 peaks in PLS-DA where the significance of VIP changes is highest. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. RFECV refers to finding the optimal number of features by cross-validation. Wherein RFE (recurvefeature elimination) refers to recursive feature elimination, which is used to rank the importance of features. Cv (cross validation) refers to cross validation, i.e., after feature ranking, an optimal number of features are selected by cross validation. Fig. 2-3 show the 10 peaks with the highest cross-validation accuracy in RFECV. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak.
And through empirical test of the original spectrogram of the selected peak, 29 peaks qualified in quality control are screened out as features. The intensities of the characteristic peaks are shown in FIG. 3. Each row in the graph represents a characteristic peak, each column represents a spectrogram data, and the shade of color in the graph represents the intensity of the peak. The left column is a negative control group, and the right column is a positive group. It can be seen that peaks of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are generally expressed in the negative group more than in the positive group, while peaks of signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 91m/z are generally expressed in the positive group more than in the negative group. The intensity of these peaks was significantly different between COVID-19 and the control group.
(V) model Algorithm
We try to build a model by using 29 characteristic peaks of training set data by using 8 machine learning methods, and evaluate the model result by cross validation accuracy. The analyzed 8-machine learning method is as follows: logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), naive bayes method (NB), gradient descent tree (GBDT), K-nearest neighbor algorithm (KNN), Decision Tree (DT) and adaptive boost algorithm (Adaboost).
FIGS. 4-1 and 4-2 show the model results for the training and test sets, respectively, in the form of ROC curves. The ROC curve is a curve drawn based on a series of different binary classification methods (cut-off values or decision thresholds) with true positive rate (sensitivity) as ordinate and false positive rate (1-specificity) as abscissa. The area under the ROC curve (AUC) of each test is calculated and compared, and the diagnosis value of each test is the best when the AUC of each test is the maximum. In this study, the AUC of the area under the ROC curve for all models in the training set was greater than 0.99, where the AUC for LR, SVM, GBDT, DT and Adaboost was 1 (fig. 4-1). In ROC curve analysis of the validation set data, it was found that AUC of 8 models obtained by 8 machine learning methods exceeded 0.94 in the test set, and AUC was 1 in the case of LR model (FIG. 4-2). After the accuracy, recall rate, precision, F1, sensitivity and specificity of 8 models were evaluated, the LR model was found to have the best classification performance (AUC 1, sensitivity 98%, specificity 100%, accuracy 99%, precision 99%, recall 99%, F1 99%), and was further applied to the detection of codv-19.
The confusion matrix of the LR model in the test set is shown in FIG. 5, in which the vertical axis represents the real grouping of samples, the upper row represents the number of negative samples, and the lower row represents the number of positive samples; the horizontal axis represents the model prediction result, the left column represents the number of samples determined to be negative by the model, and the right column represents the number of samples determined to be positive by the model. All of the 51 negative samples were judged to be negative, and the negative sample judgment accuracy (i.e., model specificity) was 100%; of the 49 positive samples, 1 was judged as negative by mistake, 48 were judged as positive, and the positive sample judgment accuracy (i.e., the model sensitivity) was 98.0%.
TABLE 1 mean and quartering distances of 29 signature polypeptides in the training set in each group
Figure BDA0002932747260000081
Figure BDA0002932747260000091
The specific process for establishing the characteristic polypeptide mass spectrum model for rapidly screening patients with new coronary pneumonia (COVID-19) is shown in FIG. 6. The process comprises the following steps: (1) collecting new coronary pneumonia patients and negative control populations respectively and collecting serum samples; (2) performing mass spectrum pretreatment on the serum sample by using the kit; (3) MALDI-TOF MS mass spectrometry detection is carried out to obtain spectrogram information; (4) processing the spectrogram and obtaining a peak list; (5) bioinformatics analysis; and (6) determining a mass spectrum model.
Example 3 establishment of a model for screening patients with New coronary pneumonia
198 of 298 serum samples (146 from diagnosed new coronary pneumonia patients, another 46 normal persons, 33 tuberculosis patient controls and 73 controls with similar symptoms of new coronary pneumonia (fever cough)) were selected as training samples for model building, of which 97 were from new coronary pneumonia patients and 34 were from normal persons, 19 were from tuberculosis patient controls and 48 were from patients with similar symptoms of new coronary pneumonia. All serum samples were drawn on early morning fasts, serum was isolated and virus inactivated and stored in a-80 ℃ cold box.
The remaining samples (49 patients with new coronary pneumonia, 12 normal persons, 14 tuberculosis, 25 similar symptoms of new coronary pneumonia) were used as validation samples for blind selection test. The processing method is the same as above.
And (3) establishing a mass spectrum model of the new coronary pneumonia polypeptide by using the serum characteristic polypeptide peak of the new coronary pneumonia patient screened in the example 1-2. The model is determined to adopt 29 characteristic peaks, which are respectively: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.
The characteristic mass spectrum peak spectrogram of the characteristic polypeptide is shown in figures 7-35.
The training and validation set AUC of the LR model were both greater than 0.99. The accuracy of the test set is 99%, the sensitivity is 98%, and the specificity is 100%. The model has good prediction capability.
TABLE 2 model training results
Sample(s) Number of examples Predicting the new coronary pneumonia Predicting as a non-new crownPneumonia of lung Prediction accuracy%
Patient group 97 97 0 100.00
Normal group 34 0 34 100.00
Pulmonary tuberculosis group 19 0 19 100.00
Symptom-like group 48 0 48 100.00
In total 198 100.00
From the above table it can be seen that the results for the training set samples are: 34 of the 34 normal groups are judged correctly, and the specificity is 100.00%; 97 of 97 patients were judged correctly with 100.00% sensitivity; 19 of 19 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 48 patients with similar symptoms, 48 were judged correctly with a sensitivity of 100.00%.
Example 4 identification of novel coronary pneumonia signature Polypeptides
After the peak to be identified is determined in examples 2 and 3, 7 serum samples with different peak intensities to be identified in the pre-processed sample are searched. After DTT reduction of the sample, proteins with molecular weight of more than 50kDa are removed by ultrafiltration and centrifugation. The filtered small molecule protein/polypeptide was separated by tricine-SDS-PAGE. And carrying out secondary mass spectrum identification on each band after carrying out intracorporeal enzyme digestion.
Polypeptide sequence identification was performed using a nano-LC-MS/MS platform, including nanoflow HPLC (Thermo Fisher Scientific, USA) and Q-active mass spectrometer (Thermo Fisher Scientific, USA). The ion mode is positive ion mode, and the scanning range is 300-1400 m/z. The resolution of the primary mass spectrum is 70000, and the resolution of the secondary mass spectrum is 17500.
Liquid phase analytical column: the model is as follows: exil Pure 120C18(dr. maisch GmbH, USA); specification: 360 μm × 12 cm; inner diameter: 150 μm; granulating: 1.9 um. And (3) an elution mode: the mobile phase was eluted linearly from 7% B solution (80% acetonitrile, 0.1% formic acid) to 45% B solution. Flow rate: 600 nl/min; the total time was 38 minutes. The results are shown in tables 3 and 4.
TABLE 3 characterization of the Peak polypeptide
m/z Name of Gene Name of protein
5158 H2AJ Histone H2A.J
6357 S100A7 Protein S100-A7
6654 IGLL5 Immunoglobulin lambda-like polypeptide 5
6939 UBB Polyubiquitin-B
7364 IGKV3-7 Probable non-functional immunoglobulin kappa variable 3-7
7614 PF4V1 Platelet factor 4 variant
8034 IGKV3-15 Immunoglobulin kappa variable 3-15
8226 CFI Complement factor I
8986 RAB7A Ras-related protein Rab-7a
9626 ELANE Neutrophil elastase
13719 B2M Beta-2-microglobulin
13765 TTR Transthyretin
13886 PPBP Platelet basic protein
14049 DUSP14 Dual specificity protein phosphatase 14
14095 H2AC11 Histone H2A type 1
14102 H2AC6 Histone H2A type 1-C
15123 HBA1 Hemoglobin subunit alpha
15867 HBB Hemoglobin subunit beta
28091 WRAP73 WD repeat-containing proteinW RAP73
11435 SAA1 Serum amyloid A-1 protein
11495 SAA2 Serum amyloid A-2 protein
11523 SAA1 Serum amyloid A-1 protein
11680 SAA1 Serum amyloid A-1 protein
TABLE 4 polypeptide identification sequences
Figure BDA0002932747260000101
Figure BDA0002932747260000111
Example 5 Blind selection test of New coronary pneumonia patient screening model
After the model training is completed, a model of the input variables related to the 25 signature polypeptide fragments of SEQ ID NO. 1-15, a model of the input variables related to the 29 signature polypeptide fragments of SEQ ID NO. 1-19, and a model of the input variables of the 19 signature polypeptide fragments (i.e. sequences SEQ 1-19) after the sequencing is completed are established.
According to the method of example 3, 49 new patients with coronary pneumonia, 12 normal persons, 14 tuberculosis patients and 21 samples of type symptoms are blindly selected and predicted by the three models, and the types of the samples are judged, and the method is the same as that described in the above example. The results are shown in tables 5-1, 5-2 and 5-3, respectively.
TABLE 5-1 prediction of test sample results by 25 variables
Figure BDA0002932747260000112
Figure BDA0002932747260000121
From Table 5-1, it can be seen that the results for the test group samples are: 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 48 of 49 patients judged correctly with sensitivity of 97.96%; 14 of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 25 patients with similar symptoms, 25 were judged correctly with a sensitivity of 100.00%.
TABLE 5-2 prediction of test samples by 29 variables
Sample(s) Number of examples Predicting the new coronary pneumonia Prediction of non-neocoronary pneumonia Prediction accuracy%
Patient group 49 48 1 97.96
Normal group 12 0 12 100.00
Pulmonary tuberculosis group 14 0 14 100.00
Symptom-like group 25 0 25 100.00
Total of 100 99.00
From Table 5-2, it can be seen that the results for the test group samples are: 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 48 of 49 patients judged correctly with sensitivity of 97.96%; 14 of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 25 patients with similar symptoms, 25 were judged correctly with a sensitivity of 100.00%.
As is clear from tables 5-1 and 5-2, the prediction accuracy of both samples for 100 identical samples meets the criteria for clinical diagnosis. Although the accuracy rates are the same, the possible reason is that the number of cases of patients to be detected is too small in China, so that the differentiation degree is not displayed. However, according to the accuracy of the ten-fold cross validation, it can be predicted that the mass spectrum diagnosis model using 29 characteristic polypeptides will show higher accuracy as the number of the patients to be detected increases.
TABLE 5-3 prediction of test samples by 19 variables
Sample(s) Number of examples Predicting the new coronary pneumonia Prediction of non-neocoronary pneumonia Prediction accuracy%
Patient group 49 46 3 93.88
Normal group 12 0 12 100.00
Pulmonary tuberculosis group 14 0 14 100.00
Symptom-like group 25 4 21 84.00
Total of 100 93.00
From tables 5-3, it can be seen that the results for the test group samples are: 46 of 49 new crown patients judged correctly with 93.88% sensitivity; 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 14 of 14 tuberculosis patients were judged correctly with a specificity of 100.00%; 21 of the 25 patients with similar symptoms were judged correctly and had a sensitivity of 84.00%. This indicates that the model composed of 19 input variables for characteristic polypeptides has the same specificity as the test results for the complete variable for healthy persons and tuberculosis patients, and the other two groups have few misjudgments. This model has met the need for rapid clinical screening of patients with confirmed diagnosis.
In addition, as can be seen from the above table: the blind selection detection accuracy of the complete variables of 29 characteristic polypeptides for the new coronary pneumonia group is basically the same as that of model training, but the prediction result for the non-new coronary pneumonia group reaches 100%, which indicates that in the result after the model training, an experimenter can completely eliminate false positive results through fine optimization, and the result indicates that the diagnosis result for the positive results is real and credible, and the missed diagnosis and/or misdiagnosis is avoided to the maximum extent, so that the method has positive significance.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and improvements can be made without departing from the technical principle of the present invention, and these modifications and improvements should also be regarded as the protection scope of the present invention.
Sequence listing
Characteristic polypeptide composition for diagnosing new coronary pneumonia
<120> characteristic polypeptide composition for diagnosing neocoronary pneumonia
<160> 19
<170> SIPOSequenceListing 1.0
<210> 1
<211> 61
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 1
Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu Asn Val Lys
1 5 10 15
Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu
20 25 30
Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr
35 40 45
Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg
50 55 60
<210> 2
<211> 68
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 2
Glu Glu Asp Gly Asp Leu Gln Cys Leu Cys Val Lys Thr Thr Ser Gln
1 5 10 15
Val Arg Pro Arg His Ile Thr Ser Leu Glu Val Ile Lys Ala Gly Pro
20 25 30
His Cys Pro Thr Ala Gln Leu Ile Ala Thr Leu Lys Asn Gly Arg Lys
35 40 45
Ile Cys Leu Asp Leu Gln Ala Leu Leu Tyr Lys Lys Ile Ile Lys Glu
50 55 60
His Leu Glu Ser
65
<210> 3
<211> 74
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 3
Met Glu Ala Pro Ala Gln Leu Leu Phe Leu Leu Leu Leu Trp Leu Pro
1 5 10 15
Asp Thr Thr Gly Glu Ile Val Met Thr Gln Ser Pro Ala Thr Leu Ser
20 25 30
Val Ser Pro Gly Glu Arg Ala Thr Leu Ser Cys Arg Ala Ser Gln Ser
35 40 45
Val Ser Ser Asn Leu Ala Trp Tyr Gln Gln Lys Pro Gly Gln Ala Pro
50 55 60
Arg Leu Leu Ile Tyr Gly Ala Ser Thr Arg
65 70
<210> 4
<211> 69
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 4
Met Lys Leu Leu His Val Phe Leu Leu Phe Leu Cys Phe His Leu Arg
1 5 10 15
Phe Cys Lys Val Thr Tyr Thr Ser Gln Glu Asp Leu Val Glu Lys Lys
20 25 30
Cys Leu Ala Lys Lys Tyr Thr His Leu Ser Cys Asp Lys Val Phe Cys
35 40 45
Gln Pro Trp Gln Arg Cys Ile Glu Gly Thr Cys Val Cys Lys Leu Pro
50 55 60
Tyr Gln Cys Pro Lys
65
<210> 5
<211> 79
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 5
Met Thr Ser Arg Lys Lys Val Leu Leu Lys Val Ile Ile Leu Gly Asp
1 5 10 15
Ser Gly Val Gly Lys Thr Ser Leu Met Asn Gln Tyr Val Asn Lys Lys
20 25 30
Phe Ser Asn Gln Tyr Lys Ala Thr Ile Gly Ala Asp Phe Leu Thr Lys
35 40 45
Glu Val Met Val Asp Asp Arg Leu Val Thr Met Gln Ile Trp Asp Thr
50 55 60
Ala Gly Gln Glu Arg Phe Gln Ser Leu Gly Val Ala Phe Tyr Arg
65 70 75
<210> 6
<211> 91
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 6
Met Thr Leu Gly Arg Arg Leu Ala Cys Leu Phe Leu Ala Cys Val Leu
1 5 10 15
Pro Ala Leu Leu Leu Gly Gly Thr Ala Leu Ala Ser Glu Ile Val Gly
20 25 30
Gly Arg Arg Ala Arg Pro His Ala Trp Pro Phe Met Val Ser Leu Gln
35 40 45
Leu Arg Gly Gly His Phe Cys Gly Ala Thr Leu Ile Ala Pro Asn Phe
50 55 60
Val Met Ser Ala Ala His Cys Val Ala Asn Val Asn Val Arg Ala Val
65 70 75 80
Arg Val Val Leu Gly Ala His Asn Leu Ser Arg
85 90
<210> 7
<211> 119
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 7
Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser
1 5 10 15
Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Ile Gln Val Tyr Ser Arg
20 25 30
His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser
35 40 45
Gly Phe His Pro Ser Asp Ile Glu Val Asp Leu Leu Lys Asn Gly Glu
50 55 60
Arg Ile Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp
65 70 75 80
Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp
85 90 95
Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gln Pro Lys Ile
100 105 110
Val Lys Trp Asp Arg Asp Met
115
<210> 8
<211> 127
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 8
Gly Pro Thr Gly Thr Gly Glu Ser Lys Cys Pro Leu Met Val Lys Val
1 5 10 15
Leu Asp Ala Val Arg Gly Ser Pro Ala Ile Asn Val Ala Val His Val
20 25 30
Phe Arg Lys Ala Ala Asp Asp Thr Trp Glu Pro Phe Ala Ser Gly Lys
35 40 45
Thr Ser Glu Ser Gly Glu Leu His Gly Leu Thr Thr Glu Glu Glu Phe
50 55 60
Val Glu Gly Ile Tyr Lys Val Glu Ile Asp Thr Lys Ser Tyr Trp Lys
65 70 75 80
Ala Leu Gly Ile Ser Pro Phe His Glu His Ala Glu Val Val Phe Thr
85 90 95
Ala Asn Asp Ser Gly Pro Arg Arg Tyr Thr Ile Ala Ala Leu Leu Ser
100 105 110
Pro Tyr Ser Tyr Ser Thr Thr Ala Val Val Thr Asn Pro Lys Glu
115 120 125
<210> 9
<211> 128
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 9
Met Ser Leu Arg Leu Asp Thr Thr Pro Ser Cys Asn Ser Ala Arg Pro
1 5 10 15
Leu His Ala Leu Gln Val Leu Leu Leu Leu Ser Leu Leu Leu Thr Ala
20 25 30
Leu Ala Ser Ser Thr Lys Gly Gln Thr Lys Arg Asn Leu Ala Lys Gly
35 40 45
Lys Glu Glu Ser Leu Asp Ser Asp Leu Tyr Ala Glu Leu Arg Cys Met
50 55 60
Cys Ile Lys Thr Thr Ser Gly Ile His Pro Lys Asn Ile Gln Ser Leu
65 70 75 80
Glu Val Ile Gly Lys Gly Thr His Cys Asn Gln Val Glu Val Ile Ala
85 90 95
Thr Leu Lys Asp Gly Arg Lys Ile Cys Leu Asp Pro Asp Ala Pro Arg
100 105 110
Ile Lys Lys Ile Val Gln Lys Lys Leu Ala Gly Asp Glu Ser Ala Asp
115 120 125
<210> 10
<211> 123
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 10
Val Pro Leu Ala Asp Met Pro His Ala Pro Ile Gly Leu Tyr Phe Asp
1 5 10 15
Thr Val Ala Asp Lys Ile His Ser Val Ser Arg Lys His Gly Ala Thr
20 25 30
Leu Val His Cys Ala Ala Gly Val Ser Arg Ser Ala Thr Leu Cys Ile
35 40 45
Ala Tyr Leu Met Lys Phe His Asn Val Cys Leu Leu Glu Ala Tyr Asn
50 55 60
Trp Val Lys Ala Arg Arg Pro Val Ile Arg Pro Asn Val Gly Phe Trp
65 70 75 80
Arg Gln Leu Ile Asp Tyr Glu Arg Gln Leu Phe Gly Lys Ser Thr Val
85 90 95
Lys Met Val Gln Thr Pro Tyr Gly Ile Val Pro Asp Val Tyr Glu Lys
100 105 110
Glu Ser Arg His Leu Met Pro Tyr Trp Gly Ile
115 120
<210> 11
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 11
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 12
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 12
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 13
<211> 141
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 13
Val Leu Ser Pro Ala Asp Lys Thr Asn Val Lys Ala Ala Trp Gly Lys
1 5 10 15
Val Gly Ala His Ala Gly Glu Tyr Gly Ala Glu Ala Leu Glu Arg Met
20 25 30
Phe Leu Ser Phe Pro Thr Thr Lys Thr Tyr Phe Pro His Phe Asp Leu
35 40 45
Ser His Gly Ser Ala Gln Val Lys Gly His Gly Lys Lys Val Ala Asp
50 55 60
Ala Leu Thr Asn Ala Val Ala His Val Asp Asp Met Pro Asn Ala Leu
65 70 75 80
Ser Ala Leu Ser Asp Leu His Ala His Lys Leu Arg Val Asp Pro Val
85 90 95
Asn Phe Lys Leu Leu Ser His Cys Leu Leu Val Thr Leu Ala Ala His
100 105 110
Leu Pro Ala Glu Phe Thr Pro Ala Val His Ala Ser Leu Asp Lys Phe
115 120 125
Leu Ala Ser Val Ser Thr Val Leu Thr Ser Lys Tyr Arg
130 135 140
<210> 14
<211> 146
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 14
Val His Leu Thr Pro Glu Glu Lys Ser Ala Val Thr Ala Leu Trp Gly
1 5 10 15
Lys Val Asn Val Asp Glu Val Gly Gly Glu Ala Leu Gly Arg Leu Leu
20 25 30
Val Val Tyr Pro Trp Thr Gln Arg Phe Phe Glu Ser Phe Gly Asp Leu
35 40 45
Ser Thr Pro Asp Ala Val Met Gly Asn Pro Lys Val Lys Ala His Gly
50 55 60
Lys Lys Val Leu Gly Ala Phe Ser Asp Gly Leu Ala His Leu Asp Asn
65 70 75 80
Leu Lys Gly Thr Phe Ala Thr Leu Ser Glu Leu His Cys Asp Lys Leu
85 90 95
His Val Asp Pro Glu Asn Phe Arg Leu Leu Gly Asn Val Leu Val Cys
100 105 110
Val Leu Ala His His Phe Gly Lys Glu Phe Thr Pro Pro Val Gln Ala
115 120 125
Ala Tyr Gln Lys Val Val Ala Gly Val Ala Asn Ala Leu Ala His Lys
130 135 140
Tyr His
145
<210> 15
<211> 257
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 15
Ile Leu Leu Tyr Ser Leu Asp Gly Arg Leu Leu Ser Thr Tyr Ser Ala
1 5 10 15
Tyr Glu Trp Ser Leu Gly Ile Lys Ser Val Ala Trp Ser Pro Ser Ser
20 25 30
Gln Phe Leu Ala Val Gly Ser Tyr Asp Gly Lys Val Arg Ile Leu Asn
35 40 45
His Val Thr Trp Lys Met Ile Thr Glu Phe Gly His Pro Ala Ala Ile
50 55 60
Asn Asp Pro Lys Ile Val Val Tyr Lys Glu Ala Glu Lys Ser Pro Gln
65 70 75 80
Leu Gly Leu Gly Cys Leu Ser Phe Pro Pro Pro Arg Ala Gly Ala Gly
85 90 95
Pro Leu Pro Ser Ser Glu Ser Lys Tyr Glu Ile Ala Ser Val Pro Val
100 105 110
Ser Leu Gln Thr Leu Lys Pro Val Thr Asp Arg Ala Asn Pro Lys Ile
115 120 125
Gly Ile Gly Met Leu Ala Phe Ser Pro Asp Ser Tyr Phe Leu Ala Thr
130 135 140
Arg Asn Asp Asn Ile Pro Asn Ala Val Trp Val Trp Asp Ile Gln Lys
145 150 155 160
Leu Arg Leu Phe Ala Val Leu Glu Gln Leu Ser Pro Val Arg Ala Phe
165 170 175
Gln Trp Asp Pro Gln Gln Pro Arg Leu Ala Ile Cys Thr Gly Gly Ser
180 185 190
Arg Leu Tyr Leu Trp Ser Pro Ala Gly Cys Met Ser Val Gln Val Pro
195 200 205
Gly Glu Gly Asp Phe Ala Val Leu Ser Leu Cys Trp His Leu Ser Gly
210 215 220
Asp Ser Met Ala Leu Leu Ser Lys Asp His Phe Cys Leu Cys Phe Leu
225 230 235 240
Glu Thr Glu Ala Val Val Gly Thr Ala Cys Arg Gln Leu Gly Gly His
245 250 255
Thr
<210> 16
<211> 102
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 16
Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met Trp
1 5 10 15
Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp Lys
20 25 30
Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro Gly
35 40 45
Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile Gln
50 55 60
Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala Ala
65 70 75 80
Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro Ala
85 90 95
Gly Leu Pro Glu Lys Tyr
100
<210> 17
<211> 103
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 17
Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met
1 5 10 15
Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp
20 25 30
Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro
35 40 45
Gly Gly Ala Trp Ala Ala Glu Val Ile Ser Asn Ala Arg Glu Asn Ile
50 55 60
Gln Arg Leu Thr Gly Arg Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala
65 70 75 80
Ala Asn Lys Trp Gly Arg Ser Gly Arg Asp Pro Asn His Phe Arg Pro
85 90 95
Ala Gly Leu Pro Glu Lys Tyr
100
<210> 18
<211> 103
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 18
Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met
1 5 10 15
Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp
20 25 30
Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro
35 40 45
Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile
50 55 60
Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala
65 70 75 80
Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro
85 90 95
Ala Gly Leu Pro Glu Lys Tyr
100
<210> 19
<211> 104
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 19
Arg Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp
1 5 10 15
Met Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser
20 25 30
Asp Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly
35 40 45
Pro Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn
50 55 60
Ile Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln
65 70 75 80
Ala Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg
85 90 95
Pro Ala Gly Leu Pro Glu Lys Tyr
100

Claims (7)

1. A signature polypeptide composition for diagnosing neocoronary pneumonia, the signature polypeptide composition comprising 25 signature polypeptides having the following mass-to-charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, or 29 characteristic polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.
2. The composition of claim 1, wherein the composition comprises 19 signature polypeptides having the following mass-to-charge ratios and polypeptide sequences:
a characteristic polypeptide with a mass-to-charge ratio of 6939m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 1;
a characteristic polypeptide with the mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequence shown in SEQ ID No. 2;
a characteristic polypeptide with the mass-to-charge ratio of 8034m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 3;
a characteristic polypeptide with the mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 4;
a characteristic polypeptide with the mass-to-charge ratio of 8986m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 5;
a characteristic polypeptide with the mass-to-charge ratio of 9626m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 6;
a characteristic polypeptide with the mass-to-charge ratio of 13719m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 7;
a characteristic polypeptide with the mass-to-charge ratio of 13765m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 8;
a characteristic polypeptide with the mass-to-charge ratio of 13886m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 9;
a characteristic polypeptide with the mass-to-charge ratio of 14049m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 10;
a characteristic polypeptide with the mass-to-charge ratio of 14095m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 11;
a characteristic polypeptide with the mass-to-charge ratio of 14102m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 12;
a characteristic polypeptide with the mass-to-charge ratio of 15123m/z, and the polypeptide sequence is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide with the mass-to-charge ratio of 15867m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 14;
a characteristic polypeptide with a mass-to-charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15;
a characteristic polypeptide with the mass-to-charge ratio of 11435m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 16;
a characteristic polypeptide with the mass-to-charge ratio of 11495m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 17;
a characteristic polypeptide with the mass-to-charge ratio of 11523m/z, and the polypeptide sequence of the characteristic polypeptide is selected from the sequence shown as SEQ ID No. 18;
a characteristic polypeptide with the mass-to-charge ratio of 11680m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 19.
3. The composition of claim 2, wherein when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient with a cross-validation accuracy of about 91% in ten folds.
4. The composition of claim 3, wherein the composition of signature polypeptides comprises only signature polypeptides in a mass ratio of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.
5. The composition of claim 2, wherein a peak up-regulation of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and a peak down-regulation of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z indicates that the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, with a cross-validation accuracy of about 93.31% in ten folds.
6. The composition of claim 5, wherein the composition of signature polypeptides comprises only signature polypeptides in a mass ratio of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.
7. The composition of claim 1, wherein a peak of signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z is upregulated, while a peak of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is downregulated, indicating that the serum sample is positive, i.e. the patient is a new coronary pneumonia patient, the ten-fold cross validation accuracy is about 98.69%.
CN202110154026.1A 2021-02-04 2021-02-04 Characteristic polypeptide composition for diagnosing neocoronary pneumonia Pending CN114858903A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110154026.1A CN114858903A (en) 2021-02-04 2021-02-04 Characteristic polypeptide composition for diagnosing neocoronary pneumonia
PCT/CN2021/142821 WO2022166486A1 (en) 2021-02-04 2021-12-30 Characteristic polypeptide composition for diagnosing covid-19

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110154026.1A CN114858903A (en) 2021-02-04 2021-02-04 Characteristic polypeptide composition for diagnosing neocoronary pneumonia

Publications (1)

Publication Number Publication Date
CN114858903A true CN114858903A (en) 2022-08-05

Family

ID=82623162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110154026.1A Pending CN114858903A (en) 2021-02-04 2021-02-04 Characteristic polypeptide composition for diagnosing neocoronary pneumonia

Country Status (2)

Country Link
CN (1) CN114858903A (en)
WO (1) WO2022166486A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116087482A (en) * 2023-02-24 2023-05-09 广州国家实验室 Biomarkers for novel patient progression severity typing for coronavirus infection
WO2023125749A1 (en) * 2021-12-30 2023-07-06 北京毅新博创生物科技有限公司 Method for evaluating whether individual completes vaccination or individual immune change

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004092208A2 (en) * 2003-04-15 2004-10-28 Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Health Sars-related proteins
CN101196526A (en) * 2006-12-06 2008-06-11 许洋 Mass spectrometry reagent kit and method for rapid tuberculosis diagnosis
CN110632326A (en) * 2019-10-01 2019-12-31 北京毅新博创生物科技有限公司 Characteristic protein marker composition for mass spectrometric diagnosis of thalassemia and diagnostic product thereof
CN111455062A (en) * 2020-04-01 2020-07-28 中国人民解放军总医院 Kit and platform for detecting susceptibility genes of novel coronavirus
CN111733228A (en) * 2020-05-29 2020-10-02 武汉市金银潭医院 Marker, reagent or kit for detecting whether novel coronavirus diseases are cured
CN111830120A (en) * 2020-06-10 2020-10-27 北京东西分析仪器有限公司 Kit for identifying new coronavirus by using mass spectrometry system and use method thereof
CN111876524A (en) * 2020-06-22 2020-11-03 江苏康为世纪生物科技有限公司 Primer, probe combination and kit for detecting 34 respiratory pathogenic microorganisms based on multiple PCR-time-of-flight mass spectrometry

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105572355B (en) * 2015-03-13 2018-04-13 中国医学科学院肿瘤医院 Detect the biomarker of the cancer of the esophagus
CN110658252A (en) * 2019-10-01 2020-01-07 长沙湘华质谱医学科技有限公司 Characteristic protein spectrum model for mass spectrum diagnosis of thalassemia and application thereof
CN111366735B (en) * 2020-03-20 2021-07-13 广州市康润生物科技有限公司 Novel early stage coronavirus screening method
CN111394513B (en) * 2020-03-24 2023-05-12 中国医学科学院病原生物学研究所 Novel coronavirus SARS-CoV-2 fluorescent quantitative PCR detection method and application thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004092208A2 (en) * 2003-04-15 2004-10-28 Her Majesty The Queen In Right Of Canada As Represented By The Minister Of Health Sars-related proteins
CN101196526A (en) * 2006-12-06 2008-06-11 许洋 Mass spectrometry reagent kit and method for rapid tuberculosis diagnosis
CN110632326A (en) * 2019-10-01 2019-12-31 北京毅新博创生物科技有限公司 Characteristic protein marker composition for mass spectrometric diagnosis of thalassemia and diagnostic product thereof
CN111455062A (en) * 2020-04-01 2020-07-28 中国人民解放军总医院 Kit and platform for detecting susceptibility genes of novel coronavirus
CN111733228A (en) * 2020-05-29 2020-10-02 武汉市金银潭医院 Marker, reagent or kit for detecting whether novel coronavirus diseases are cured
CN111830120A (en) * 2020-06-10 2020-10-27 北京东西分析仪器有限公司 Kit for identifying new coronavirus by using mass spectrometry system and use method thereof
CN111876524A (en) * 2020-06-22 2020-11-03 江苏康为世纪生物科技有限公司 Primer, probe combination and kit for detecting 34 respiratory pathogenic microorganisms based on multiple PCR-time-of-flight mass spectrometry

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"东西分析Ebio Reader 3700飞行时间质谱系统成功创建新冠病毒(COVID-19)肺炎蛋白指纹图谱", 中国仪器仪表, no. 04, 25 April 2020 (2020-04-25), pages 21 *
ROSA M.GOMILA ET AL.: ""Rapid classification and prediction of COVID-19 severity by MALDI-TOF mass spectrometry analysis of serum peptidome"", MEDRXIV, 3 November 2020 (2020-11-03), pages 1 - 24, XP055956132, DOI: 10.1101/2020.10.30.20223057 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125749A1 (en) * 2021-12-30 2023-07-06 北京毅新博创生物科技有限公司 Method for evaluating whether individual completes vaccination or individual immune change
CN116087482A (en) * 2023-02-24 2023-05-09 广州国家实验室 Biomarkers for novel patient progression severity typing for coronavirus infection

Also Published As

Publication number Publication date
WO2022166486A1 (en) 2022-08-11

Similar Documents

Publication Publication Date Title
CN112858454B (en) Characteristic polypeptide composition for diagnosing new coronary pneumonia
Chalupová et al. Identification of fungal microorganisms by MALDI-TOF mass spectrometry
Croxatto et al. Applications of MALDI-TOF mass spectrometry in clinical diagnostic microbiology
Addis et al. Proteomics and pathway analyses of the milk fat globule in sheep naturally infected by Mycoplasma agalactiae provide indications of the in vivo response of the mammary epithelium to bacterial infection
WO2022166486A1 (en) Characteristic polypeptide composition for diagnosing covid-19
WO2022166485A1 (en) Kit for diagnosing covid-19
CN111289736A (en) Slow obstructive pulmonary early diagnosis marker based on metabonomics and application thereof
CN101403740B (en) Mass spectrum model used for detecting liver cancer characteristic protein and preparation method thereof
CN103308696A (en) Brucella rapid detection kit based on mass-spectrometric technique
WO2009076425A2 (en) Methods of analyzing wound samples
CN105572355A (en) Biomarker for detecting esophagus cancer
US8412464B1 (en) Methods for detection and identification of cell type
CN111307926B (en) Rapid detection method for brucella vaccine strain infection based on serum
WO2022166494A1 (en) Construction method for mass spectrum model for diagnosing covid-19
WO2022166487A1 (en) Use of characteristic polypeptide composition and mass spectrometry model for preparing covid-19 detection product
WO2022166493A1 (en) Mass spectrometry model comprising marker polypeptides for diagnosing covid-19 pneumonia
US9995751B2 (en) Method for detecting at least one mechanism of resistance to glycopeptides by mass spectrometry
Velichko et al. Classification and identification tasks in microbiology: Mass spectrometric methods coming to the aid
TW202208843A (en) Method of identification of methicillin-resistant staphylococcus aureus
CN113433253A (en) Novel method for detecting Enterobacter sakazakii, application and detection kit
CN116337986B (en) Quick identification method of salmonella kentucky based on MALDI-TOF MS
CN112180013A (en) Intestinal microbial metabolism marker composition for myocardial infarction diagnosis and detection method and application thereof
US11352655B2 (en) Method of identification of methicillin-resistant Staphylococcus aureus
CN114354946A (en) Method for establishing regional human pathogenic bacteria polypeptide quality reference spectrum library
CN116298280A (en) Application of liquid biopsy index in preparation of endometriosis screening and diagnosis products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination