CN112858454B - Characteristic polypeptide composition for diagnosing new coronary pneumonia - Google Patents

Characteristic polypeptide composition for diagnosing new coronary pneumonia Download PDF

Info

Publication number
CN112858454B
CN112858454B CN202110155492.1A CN202110155492A CN112858454B CN 112858454 B CN112858454 B CN 112858454B CN 202110155492 A CN202110155492 A CN 202110155492A CN 112858454 B CN112858454 B CN 112858454B
Authority
CN
China
Prior art keywords
polypeptide
characteristic
mass
leu
seq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110155492.1A
Other languages
Chinese (zh)
Other versions
CN112858454A (en
Inventor
廖璞
孙巍
乔亮
吕倩
马庆伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Clin Bochuang Biotechnology Co Ltd
Original Assignee
Beijing Clin Bochuang Biotechnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Clin Bochuang Biotechnology Co Ltd filed Critical Beijing Clin Bochuang Biotechnology Co Ltd
Publication of CN112858454A publication Critical patent/CN112858454A/en
Application granted granted Critical
Publication of CN112858454B publication Critical patent/CN112858454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/62Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode
    • G01N27/626Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas
    • G01N27/628Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating the ionisation of gases, e.g. aerosols; by investigating electric discharges, e.g. emission of cathode using heat to ionise a gas and a beam of energy, e.g. laser enhanced ionisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • G01N33/56983Viruses
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6848Methods of protein analysis involving mass spectrometry
    • G01N33/6851Methods of protein analysis involving laser desorption ionisation mass spectrometry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A50/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE in human health protection, e.g. against extreme weather
    • Y02A50/30Against vector-borne diseases, e.g. mosquito-borne, fly-borne, tick-borne or waterborne diseases whose impact is exacerbated by climate change

Abstract

The invention provides a characteristic polypeptide composition for detecting new coronary pneumonia, which comprises 25 characteristic polypeptides with specific mass-to-charge ratios, and whether a sample is a new coronary pneumonia patient or not can be judged by analyzing the expression condition of the characteristic polypeptides. The invention also provides applications of a mass spectrum model prepared from the characteristic polypeptide composition, a product for diagnosing the new coronary pneumonia and the like. The invention firstly proposes to search a plurality of characteristic protein combinations with differences according to new coronary pneumonia patients/normal persons, tuberculosis patients and new coronary pneumonia type symptom contrast, breaks through the traditional research thought of only searching characteristic polypeptides in normal persons and new coronary pneumonia patients, effectively avoids infection of false positive results similar to new coronary pneumonia symptoms, has simple operation, low detection cost and high accuracy, and is expected to be used for large-scale screening of new coronary pneumonia.

Description

Characteristic polypeptide composition for diagnosing new coronary pneumonia
Technical Field
The invention belongs to the field of detection, and relates to a technology for rapidly detecting novel coronavirus pneumonia by using a time-of-flight mass spectrometry technology.
Background
Coronaviruses are a group of pathogens that cause mainly respiratory and intestinal diseases. The surface of the virus particle has a plurality of regularly arranged protrusions, and the whole virus particle is like the crown of imperial king, so the name of the virus particle is 'coronavirus'. Besides humans, coronaviruses can infect various mammals such as pigs, cows, cats, dogs, minks, camels, bats, mice, hedgehogs, and various birds. The novel coronavirus COVID-19 is a novel coronavirus strain which is never discovered in human bodies before, and the propagation rule, the infection mechanism, the evolution rule and the mutation rule of the novel coronavirus strain are still unclear, so that the difficulty is brought to prevention and treatment.
In order to prevent the occurrence and the prevalence of the novel coronavirus (COVID-19) pneumonia, measures are rapidly taken, the development and the spread of epidemic situations are effectively controlled, and the rapid detection of the novel coronavirus pneumonia is particularly important. For a long time, the identification of coronavirus adopts the traditional microbiological detection method, namely morphological, physiological and biochemical characteristics and serological identification. Although the method has high accuracy, the required time is too long and can be completed in ten and several hours at the fastest speed, and the requirement of quick detection is difficult to adapt. The nucleic acid detection method based on the multiplex PCR has important significance for early diagnosis of coronavirus and discovery of infection source. And multiple PCR detection aims at multiple genes, the false negative rate is lower than that of single PCR, however, the PCR detection method has the defects of complicated detection process, higher cost and limited detection high flux.
Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) technology is a mass spectrometry analysis technology which is invented and developed rapidly in the end of the 20 th century and the 80 th century. The mass analyzer is an ion drift tube (iondirfttube), ions generated by an ion source are firstly collected, the speed of all ions in the collector is changed into 0, the ions enter the field-free drift tube after being accelerated by a pulse electric field and fly to the ion receiver at a constant speed, and the larger the mass of the ions is, the longer the time taken for the ions to reach the receiver is; the smaller the mass of the ions, the shorter the time it takes to reach the receiver. According to the principle, ions with different masses can be separated according to the mass-to-charge ratio, the molecular mass and the purity of biomacromolecules such as polypeptide, protein, nucleic acid, polysaccharide and the like can be accurately detected, and the method has the advantages of high accuracy, strong flexibility, large flux, short detection period and high cost performance.
In recent years, mass spectrometry techniques have emerged to detect polypeptides or polypeptides characteristic of pathogenic microorganisms or viruses. For example, chinese patent application CN102337223A, "penicillium chrysogenum antifungal protein Pc-Arctin and its preparation method" discloses a MALDI-TOF identification method for detecting penicillium chrysogenum antifungal protein Pc-Arctin, wherein penicillium chrysogenum a096 spores are picked from a plate and inoculated in SGY broth culture medium for culture, pretreated to obtain crude protein solution, separated and purified on a chromatographic column, and separated and purified on a carboxymethyl cation exchange chromatographic column, each eluted fraction is collected, each fraction is ultrafiltered at a core and concentrated to a required volume, paecilomyces variotii is used as a sensitive test indicator, antifungal active components are tracked, and the determined active components are judged to obtain the purity of the protein; a single band on the SDS-PAGE electrophoresis image is cut, and MALDI-TOF identification is carried out. The method is only suitable for specific microorganisms, needs a multiple protein purification process, and finally identifies the characteristic polypeptide Pc-Arctin by MALDI-TOF, has a complex process and a narrow application range, and cannot realize the purpose of detecting viruses by mass spectrometry.
Chinese patent application 201110154723, "MALDI TOF MS assisted identification Listeria monocytogenes" and 201110154469, "MALDI TOF MS assisted identification Vibrio cholerae" disclose a method for assisted identification of bacteria by MALDI TOF MS technology, comprising: pretreating the bacterial culture, collecting MALDI TOF MS spectra of all bacterial strain samples, preparing bacterial standard spectra according to software, detecting and collecting the spectra of the bacteria to be detected by using the same method, comparing the two spectra, and judging according to matching scores. Because the method uses conventional treatment (through absolute ethyl alcohol, formic acid and acetonitrile treatment, and is assisted with centrifugation, and finally supernatant is sucked for detection), although the characteristic map of the bacteria can be characterized to a certain extent, the obtained map is essentially the map set of the various molecules because the object to be detected contains protein, lipid, lipopolysaccharide, lipooligosaccharide, DNA, polypeptide and other molecules which can be ionized, the map information amount required to be treated and compared is overlarge, and the characteristic of the map is low because the molecule to be detected is overlarge, so that the method is only suitable for a specific bacteria and cannot be popularized to other large-scale virus detection.
Chinese patent application 200880121570, title of the invention "method and biomarker for diagnosing and monitoring psychiatric disorders" reports that nearly a hundred species of biological peptides related to psychiatric disorders, including influenza virus, can be detected by MALDI-TOF mass spectrometry. However, this method simply summarizes the various possible techniques, neither a specific protocol nor a specific target of the coronavirus has been reported, and it is therefore difficult to teach researchers to detect influenza by MALDI-TOF mass spectrometry.
Therefore, a characteristic polypeptide mass spectrum model for detecting the novel coronavirus pneumonia through matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS) and application are needed at present.
Disclosure of Invention
The first object of the present invention provides a set of compositions based on seropeptidome (peptome) signature polypeptides that can detect neocoronaviruses (COVID-19) by MALDI-TOF mass spectrometry, wherein the signature polypeptide composition comprises 25 signature polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.
In one embodiment, when the peak of characteristic polypeptides 8986m/z, 28091m/z is upregulated while the peak of characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z is downregulated, the serum sample is a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the ten-fold cross-validation accuracy is about 91%. In a preferred embodiment, the composition of characterizing polypeptides comprises only characterizing polypeptides in a mass ratio of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.
In another embodiment, when peaks of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are up-regulated while peaks of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z are down-regulated, the serum sample is a positive sample, i.e., the patient is a new coronary pneumonia patient, and the cross-over validation accuracy is about 93.88%. In a preferred embodiment, the composition of characterizing polypeptides comprises only characterizing polypeptides in a mass ratio of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.
In other embodiments, when the peaks of signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are up-regulated while the peaks of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are down-regulated, the serum sample is a positive sample, i.e., the patient is a new coronary pneumonia patient, the cross-fold cross-validation accuracy is about 97.96%.
The second invention aim of the invention is to provide a mass spectrum model for detecting the neocoronary pneumonia, which is prepared by the characteristic polypeptide composition with the mass-to-charge ratio peak value of any scheme.
In one embodiment, the mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z and 28091m/z, and when the peaks of the characteristic polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z and 28232m/z are down-regulated, the serum sample is a positive sample, i.e. the patient is a new crown pneumonia patient, and the cross-fold cross validation accuracy is about 97.96%.
In another embodiment, the mass spectral model is made only from a signature polypeptide composition having mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when peaks of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are upregulated, while peaks of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, respectively, When the peak of 14102m/z down-regulates the expression, the serum sample is a positive sample, namely the patient is a new coronary pneumonia patient, and the accuracy rate of ten-fold cross validation is about 93.88%.
In other embodiments, the mass spectral model is prepared from only the following signature polypeptide compositions at mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z and 14102m/z, respectively, wherein when the peaks of the signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of the signature polypeptides 6939m/z, 13886m/z, 14049m/z and 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-over validation accuracy is about 91%.
The third invention purpose of this invention is to offer a kind of kit used for detecting new coronary pneumonia, it includes the above-mentioned characteristic polypeptide composite, or include the above-mentioned mass spectrum model.
In one embodiment, the polypeptide composition or mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 734 m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and when peaks of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were down-regulated, it was indicated that the serum sample was a positive sample, i.e., the patient was a new crown pneumonia patient, and the cross-fold cross-validation accuracy was about 97.96%.
In another embodiment, the polypeptide composition or mass spectral model is prepared only from the signature polypeptides at 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peaks of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are upregulated, the signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, When the peak of 14102m/z down-regulates the expression, the serum sample is a positive sample, namely the patient is a new coronary pneumonia patient, and the accuracy rate of ten-fold cross validation is about 93.88%.
In other embodiments, the polypeptide composition or mass spectral model is prepared from only characteristic polypeptides at a mass ratio of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when a peak of characteristic polypeptide 8986m/z, 28091m/z is upregulated while a peak of characteristic polypeptide 6939m/z, 13886m/z, 14049m/z, 14102m/z is downregulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, with a cross-fold validation accuracy of about 91%.
In one embodiment, the kit includes a sample processing solution developed by Xinbo Biotech limited, Byjest.
In another embodiment, the kit further comprises a standard mass spectrum sample tube for ensuring that the molecular weight measured by the mass spectrometer is accurate, the sample tube can be a plurality of sample tubes containing single characteristic polypeptide, or a sample tube containing a plurality of characteristic polypeptides, and a sample in the standard sample tube is used for performing parallel mass spectrum test with a sample to be tested when performing mass spectrum so as to judge whether the molecular weight information of the sample to be tested is accurate and reliable.
In another embodiment, the kit may contain software or a chip of the standard database of the characteristic polypeptide, and may be used to provide a comparison of standard data or curves when a sample to be tested is subjected to mass spectrometry so as to determine the expression status of the characteristic polypeptide in the sample to be tested.
The fourth invention of the invention is to provide the characteristic polypeptide composition or the mass spectrum model for use in preparing products for diagnosing new coronary pneumonia.
In one embodiment, the polypeptide composition or mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 734 m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and when peaks of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were down-regulated, it was indicated that the serum sample was a positive sample, i.e., the patient was a new crown pneumonia patient, and the cross-fold cross-validation accuracy was about 97.96%.
In another embodiment, the polypeptide composition or mass spectral model is prepared from only characterizing polypeptides having mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when peaks of characterizing polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are upregulated, while peaks of characterizing polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 3565 m/z, When the peak of 14102m/z reduces the expression, the serum sample is a positive sample, namely the patient is a new coronary pneumonia patient, and the ten-fold cross validation accuracy rate is about 93.88%.
In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the following signature polypeptides at a mass ratio of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the signature polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new crown pneumonia patient, and the cross-fold cross-validation accuracy is about 91%.
In any of the above embodiments, the product for diagnosing neocoronary pneumonia refers to any conventional product for diagnosing neocoronary pneumonia, including: detection reagent, detection chip, detection carrier, detection kit and the like.
The fifth invention of the present invention is to provide a method for constructing a mass spectrometry model, comprising:
1) collecting serum samples of multiple clinically confirmed new coronary pneumonia persons and non-new coronary pneumonia contrast persons (including tuberculosis patients, similar patients with fever and cough and healthy people), and freezing at low temperature for later use;
2) performing pretreatment before mass spectrum on the serum protein;
3) performing mass spectrum detection reading on the two groups of pretreated serum proteins to obtain fingerprint spectrums of two groups of serum polypeptides;
4) carrying out standardization processing on the fingerprint spectrums of serum polypeptides of all patients and normal persons, and collecting data;
5) and performing quality control treatment on the obtained data, and screening out the characteristic polypeptides with the following mass-to-charge ratio peaks: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, secondary mass spectrometric identification of the characteristic polypeptides and establishment of a mass spectrometric model for detecting new crown pneumonia based on these mass-to-charge ratio peaks.
In one embodiment, wherein the mass spectral model of step 5) is prepared only from signature polypeptides having a mass ratio of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the peaks of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are upregulated, while the peaks of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, respectively, When the peak of 14102m/z down-regulates the expression, the serum sample is a positive sample, namely the patient is a new coronary pneumonia patient, and the accuracy rate of ten-fold cross validation is about 93.88%.
In another embodiment, wherein the mass spectrum model of step 5) is prepared only from characteristic polypeptides with the following mass ratio of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z and 14102m/z, respectively, wherein when the peaks of the characteristic polypeptides 8986m/z and 28091m/z are up-regulated and the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z and 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new crown pneumonia patient, and the cross-fold cross-validation accuracy is about 91%.
In any of the embodiments above, wherein the step 2) pre-treatment method comprises diluting the serum protein or polypeptide in the stable sample with the sample processing solution.
In any of the above embodiments, in the step 3), the polypeptide mass spectrum universal pretreatment kit is used to dilute and read two groups of serum proteins, so as to obtain fingerprint spectra of two groups of serum polypeptides.
In any of the above embodiments, in the quality control treatment in step 5), for a blank substrate, the crystallization point of the blank substrate is detected by using the same mass spectrometry parameters, and if a significant mass spectrometry peak occurs, the quality of the substrate solution is considered to be unqualified.
In any one of the above embodiments, in the quality control processing in step 5), the following 8 characteristic peaks are selected as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z.
Furthermore, in any one of the embodiments of any one of the above objects, the signature polypeptide composition, the mass spectrometric model, the detection product, the use, the method of construction may involve a composition comprising only 15 signature polypeptides having the following mass to charge ratios and polypeptide sequences:
a characteristic polypeptide with a mass-to-charge ratio of 6939m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 1;
a characteristic polypeptide with the mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequence shown in SEQ ID No. 2;
a characteristic polypeptide with the mass-to-charge ratio of 8034m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 3;
a characteristic polypeptide with the mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 4;
a characteristic polypeptide with the mass-to-charge ratio of 8986m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 5;
a characteristic polypeptide with the mass-to-charge ratio of 9626m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 6;
a characteristic polypeptide with the mass-to-charge ratio of 13719m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 7;
a characteristic polypeptide with the mass-to-charge ratio of 13765m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 8;
a characteristic polypeptide with the mass-to-charge ratio of 13886m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 9;
a characteristic polypeptide with the mass-to-charge ratio of 14049m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 10;
a characteristic polypeptide with the mass-to-charge ratio of 14095m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 11;
a characteristic polypeptide with the mass-to-charge ratio of 14102m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 12;
the characteristic polypeptide with the mass-to-charge ratio of 15123m/z, and the polypeptide sequence is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide with the mass-to-charge ratio of 15867m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 14;
a characteristic polypeptide with a mass-to-charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15.
In the process of detecting a biological sample by using the time-of-flight mass spectrometry, the quality of a mass spectrometry spectrogram is influenced by a plurality of conditions such as individual difference, sample quality, environmental temperature and humidity change, and crystallization states of the sample and a matrix. In order to avoid the influence of an abnormal spectrogram on an analysis result, the 8 common characteristic peaks in human serum are introduced as quality control peaks, and the appearance of the quality control peaks is irrelevant to whether a patient has the novel coronavirus pneumonia. In the 843 spectra collected, 683 spectra detected all 8 mass control peaks (81.0% of the total spectra), and 156 spectra detected 7 mass control peaks (18.5% of the total spectra). Wherein, the following spectrogram quality control conditions are set: in the spectrogram of a single sample, when the quantity of the quality control peaks is 6-8 and the deviation of the molecular weight shift of the internal standard peak is less than 0.002 (or the shift range is not more than 2 per thousand), the quality control is qualified. The failing spectrum needs to be re-detected.
Cross validation by ten folds, called 10-fold cross-validation by English name, is used for testing the accuracy of the algorithm. Is a commonly used test method. The data set was divided into ten parts, and 9 parts of the data set were used as training data and 1 part of the data set was used as test data in turn for the experiments. Each trial will yield a corresponding accuracy (or error rate). The average of the accuracy (or error rate) of the 10 results is used as an estimate of the accuracy of the algorithm, and generally 10-fold cross validation is performed multiple times (for example, 10 times of 10-fold cross validation), and then the average is obtained as an estimate of the accuracy of the algorithm. It should be noted that the ten-fold cross-validation accuracy correlates with but is not equivalent to the actual detection accuracy (or sensitivity). In the process of evaluating the effect of the test algorithm, the effect meets the ten-fold cross validation accuracy of the confidence interval, and if the effect presents correlation change along with the quantity of the characteristic polypeptides and reaches the feasible value of clinical diagnosis, the mass spectrum model constructed by the polypeptides is shown to meet the requirement of clinical diagnosis.
The invention screens out corresponding new coronary pneumonia markers and establishes a detection model for analysis and detection by combining with a bioinformatics method, wherein the bioinformatics method comprises the steps of carrying out standardization processing on a fingerprint, carrying out experimental quality control processing on obtained data, screening expected serum characteristic polypeptides and establishing a mass spectrum model, and optionally establishing and verifying the mass spectrum model by using an LR algorithm and the like. And performing experimental quality control treatment, namely reserving mass spectrum spectrogram data with the internal standard peak output quantity not less than 6, and performing secondary calibration on the spectrogram by using the internal standard peak.
Technical effects
1. The invention adopts a plurality of characteristic protein combinations with differences between a new coronary pneumonia patient and a normal person, a tuberculosis patient and a contrast patient with new coronary pneumonia type symptoms to detect serum samples, and adopts a method of combining traditional statistics with a modern bioinformatics method to carry out data processing, thereby obtaining polypeptide fingerprint detection models of the pneumonia patient and a healthy person as well as the contrast patient, and a series of discovered protein mass-to-charge ratio peaks provide basis and resources for searching new more ideal markers.
2. Compared with the prior detection method, the method has higher sensitivity and specificity, simple operation, low detection cost and high flux, and is expected to be used for large-scale screening of the neocoronary pneumonia.
3. The construction method of the model is reasonable and feasible in design, provides a new screening method for providing the clinical cure rate of the new coronary pneumonia, and also provides a new idea for exploring the mechanism of the occurrence and development of the new coronary pneumonia.
4. The invention firstly provides a method for searching a plurality of characteristic protein combinations with differences in the contrast of 146 cases of patients with confirmed diagnosis of new coronary pneumonia, 46 cases of normal persons, 33 cases of tuberculosis patients and 73 cases of contrast with new coronary pneumonia type symptoms, breaks through the traditional research thought of only searching characteristic polypeptides in normal persons and new coronary pneumonia patients, and effectively avoids infection with false positive results similar to the new coronary pneumonia symptoms.
5. The result shows that the serum peptidomics characteristic polypeptide model can be rapidly used for screening patients with new coronary pneumonia in crowds.
Drawings
FIG. 1: comparing serum polypeptide fingerprints of different groups (a healthy human group, a tuberculosis group, a similar symptom group and a new crown patient group), wherein the serum polypeptide fingerprints are respectively a negative healthy human map, a negative tuberculosis map, a negative similar symptom and a positive new crown patient from top to bottom.
FIG. 2-1: the 20 peaks with the highest repetition frequency in LASSO. FIG. 2-2: the 20 peaks with the highest significance for VIP changes in PLS-DA.
FIGS. 2 to 3: the 10 peaks with the highest accuracy were cross-validated in RFECV.
FIG. 3: and (3) training each characteristic peak intensity, wherein the left column is a negative control group, and the right column is a positive control group.
FIG. 4-1: various machine learning methods, training set ROC curve comparison. FIG. 4-2: test set ROC curve comparisons.
FIG. 5: the test set of true packets confuses the predicted results of the matrix.
FIG. 6: the method is used for establishing a characteristic polypeptide mass spectrum model for rapidly screening patients with new coronary pneumonia (COVID-19).
FIG. 7: the mass spectrum peak map of the characteristic polypeptide m/z 5157.6 is the mass spectrum map of the non-new crown control in the upper graph, and the mass spectrum map of COVID-19 in the lower graph.
FIG. 8: the mass spectrum peak map of the characteristic polypeptide m/z 5366.2 is the mass spectrum map of the non-new crown control in the upper graph, and the mass spectrum map of COVID-19 in the lower graph.
FIG. 9: the upper graph is a mass spectrum peak map of the characteristic polypeptide m/z 5892.9, the upper graph is a mass spectrum of a non-new crown control, and the lower graph is a mass spectrum of COVID-19.
FIG. 10: the mass spectrum peak map of the characteristic polypeptide m/z 6357.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 11: the mass spectrum peak map of the characteristic polypeptide m/z 6654.0 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 12: the mass spectrum peak map of the characteristic polypeptide m/z 6939.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 13: the mass spectrum peak map of the characteristic polypeptide m/z 7364.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 14: the mass spectrum peak map of the characteristic polypeptide m/z 7614.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 15: the mass spectrum peak map of the characteristic polypeptide m/z 8034.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 16: the mass spectrum peak map of the characteristic polypeptide m/z 8042.7 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 17: the mass spectrum peak map of the characteristic polypeptide m/z 8226.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 18: the mass spectrum peak map of the characteristic polypeptide m/z 8424.9 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 19: the mass spectrum peak map of the characteristic polypeptide m/z 8559.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 20: the mass spectrum peak map of the characteristic polypeptide m/z 8986.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 21: the mass spectrum peak map of the characteristic polypeptide m/z 9626.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 22: the mass spectrum peak map of the characteristic polypeptide m/z 13719.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 23: the mass spectrum peak map of the characteristic polypeptide m/z 13765.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 24: the mass spectrum peak map of the characteristic polypeptide m/z 13886.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 25: the mass spectrum peak map of the characteristic polypeptide m/z 14049.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 26: the mass spectrum peak map of the characteristic polypeptide m/z 14094.7 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 27 is a schematic view showing: the mass spectrum peak map of the characteristic polypeptide m/z 14101.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 28: the mass spectrum peak map of the characteristic polypeptide m/z 15123.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 29: the mass spectrum peak map of the characteristic polypeptide m/z 15866.5 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 30: the mass spectrum peak map of the characteristic polypeptide m/z 28091.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
FIG. 31: the mass spectrum peak map of the characteristic polypeptide m/z 28231.5 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.
Detailed Description
The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
EXAMPLE 1 sample treatment
Serum samples from 146 patients diagnosed in Chongqing hospital, 2 months 2020, all patients were positive for nucleic acid detection and were classified strictly according to the guidelines.
Classification was done according to the following criteria:
(1) and (3) light: the clinical symptoms are slight, and no pneumonia is shown in the imaging;
(2) the general type is as follows: has fever and respiratory symptoms, and the imaging shows pneumonia expression;
(3) heavy: dyspnea, respiration rate is more than or equal to 30 times/min, oxygen saturation under static state is less than or equal to 93%, and arterial blood partial pressure (PaO 2)/oxygen concentration (FiO2) is less than or equal to 300 mmHg;
(4) critical type respiratory failure, which needs a breathing machine, shock occurs, and other organ failure occurs, and the critical type respiratory failure is sent to an ICU for rescue.
The 152 serum samples of non-new coronary pneumonia as controls were from a Chongqing hospital at 3 months of 2020, including 46 normal persons, 33 tuberculosis patient controls, and 73 controls with symptoms of the new coronary pneumonia type.
All samples were drawn on empty stomach in the morning before food was consumed, filled into unadditized vacuum serum collection tubes, centrifuged at 2,264g for 10min, incubated at 56 ℃ for 30min, and the serum samples were frozen at-80 ℃.
Pretreatment of a serum sample by mass spectrum: before the mass spectrometric detection experiment, 1 tube of each of the dispensed serum samples was taken from a low-temperature refrigerator and placed on wet ice. Thawing for 60-90 min. Sucking 5uL of serum sample, adding 45uL of sample treatment solution, and vortexing at 1200rpm for 30 s; sucking 10uL of the processed sample solution, adding 10uL of the prepared matrix solution, and carrying out vortex for 30s at 1200 rpm; and (3) dropping the 1uL mixed solution on a target plate, repeating three experiments for each sample, and naturally drying to perform mass spectrometry.
Example 2 establishment of Mass Spectrometry model for MALDI-TOF-MS
(I) sample preparation
5ul of serum from each sample was diluted in 45ul of sample treatment fluid (Bioyong Technologies Inc.). Then 10ul of the diluted serum was removed and mixed with 10ul of a matrix solution (Bioyong Technologies Inc.).
2ul of the mixture was taken out and dropped onto a stainless steel target plate. After drying at room temperature, the sample was injected into a MALDI-TOF MS mass spectrometer (Clin-TOF-II; Bioyong Technologies Inc.). Each sample was tested in parallel 3 times.
The matrix-assisted laser desorption time-of-flight mass spectrum Clin-TOF and the experimental polypeptide mass spectrum universal pretreatment kit are developed by the company Bioyong in China. The data was preprocessed using maldquant program, square root transformed on the processed data, smoothed using filter fitting, and baseline corrected. The mass spectrometer was calibrated with a mixture of polypeptide proteins of known molecular weight. The mass drift of the calibrant should be within 500 ppm. 500 spectra were taken for each sample point. The molecular weight collection range is m/z 3000-30000.
The mass spectrum of different groups of samples is shown in figure 1 (figure 1: comparison of serum polypeptide fingerprints of different groups, wherein negative healthy people, negative tuberculosis, negative similar symptoms and positive new coronary patients are respectively shown from top to bottom). In the negative healthy human spectrogram, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z and 28091m/z are lower, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z and 28232m/z are higher. In the negative tuberculosis spectrogram, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z and 28091m/z are lower, while the peak intensities of 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z and 28232m/z are higher. In the similar negative symptom panel spectra, the peak intensities were lower at 5158m/z, 5366m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 28091m/z, and higher at 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232 m/z. In the positive new crown patient spectrogram, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z and 28091m/z are higher, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z and 28232m/z are lower.
(II) Mass Spectrometry data acquisition
A Clin-TOF mass spectrometer was used. And setting proper laser energy to collect a certain point of the crystallization point of the sample. And selecting 50 laser bombardment positions for each sample point, bombarding each position for 10 times, namely performing laser bombardment on each sample crystallization point for 500 times, and collecting a spectrogram. Laser frequency: 30 Hz. Data collection range: 3-30 KDa. External standard calibration with standards before each sample crystallization point was taken with average molecular weight deviation less than 500 ppm.
Experiment quality control:
(1) and detecting a blank matrix crystallization point by using the same mass spectrum parameters, and if an obvious mass spectrum peak appears, considering that the quality of the matrix solution is unqualified, and replacing a new matrix.
(2) When the standard substance is used for external standard calibration, the mass deviation of different calibration substance points is required to be ensured not to exceed 500ppm, and 5 calibration substance peaks are required to meet the requirements at the same time.
(3) And selecting original polypeptide peaks in 8 serums as internal standard quality control peaks. And if 6-8 internal standard peaks can be detected and the molecular weight deviation range of the internal standard peaks does not exceed 2 per mill, determining that the spectrogram is qualified. Otherwise, the spectrogram is required to be collected again. Internal standard peaks m/z are as follows: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z.
(III) preprocessing of raw data
And performing internal standard secondary calibration on the MALDI-TOF raw data by using internal standard calibration software, and storing the internal standard secondary calibration data as a txt format file. The internal standard peaks m/z used are: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z. The spectra were then processed using the maldquant program. The spectrogram processing content includes smoothing, baseline correction, and molecular weight calibration. Peak detection is performed with a signal-to-noise ratio of 3. The peak is bin processed using the bin peaks command, which is 0.002 fault tolerant. Peaks with peak frequencies not less than 25% in the group were retained. Finally, the resulting matrix was used for the following analysis.
After log2 transformation, the peak intensity matrix is quantile normalized to R-package limma. In all samples, the missing values are filled with the minimum value. COVID-19 patient data and control sample data were randomly assigned to the training and test groups at a ratio of 2: 1.
(IV) selection of characteristic proteins
After intensity normalization and missing value normalization, the peak values of the training set were analyzed by three machine learning methods: LASSO Algorithm (LASSO), partial least squares regression analysis (PLS-DA), and recursive feature elimination with cross validation (RFECV). LASSO is called the blast absolute shrinkage and selection operator, and is a compression estimation. It obtains a more refined model by constructing a penalty function, so that it compresses some regression coefficients, i.e. the sum of the absolute values of the forcing coefficients is less than a certain fixed value; while some regression coefficients are set to zero. The advantage of subset puncturing is thus retained, and is a way to process biased estimates of data with complex collinearity.
FIG. 2-1 shows the 20 peaks with the highest repetition frequency in LASSO. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. Partial least squares discriminant analysis (PLS-DA) is a multivariate statistical analysis method used for discriminant analysis. Discriminant analysis is a common statistical analysis method that determines how the study object is classified based on observed or measured values of variables. The principle is that the characteristics of different processing samples (such as an observation sample and a comparison sample) are respectively trained to generate a training set, and the reliability of the training set is checked.
FIG. 2-2 shows the 20 peaks in PLS-DA where the significance of VIP changes is highest. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. RFECV refers to finding the optimal number of features by cross-validation. Wherein RFE (recursive feature elimination) refers to recursive feature elimination, which is used to rank the importance of features. Cv (cross validation) refers to cross validation, i.e., after feature ranking, an optimal number of features are selected by cross validation. Fig. 2-3 show the 10 peaks with the highest cross-validation accuracy in RFECV. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak.
Through empirical test of the original spectrogram of the selected peak, 25 peaks qualified in quality control are screened out as features. The intensities of the characteristic peaks of the training set are shown in fig. 3. Each row in the graph represents a characteristic peak, each column represents a spectrogram data, and the shade of color in the graph represents the intensity of the peak. The left column is a negative control group, and the right column is a positive group. It can be seen that peaks of signature polypeptides 6939m/z, 13765m/z, 13886m/z, 6357m/z, 6654m/z, 14049m/z, 28232m/z, 13719m/z, 14095m/z, 14102 m/are expressed in the negative group more generally than in the positive group, while peaks of signature polypeptides 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 7364m/z, 7614m/z, 28091m/z, 8034m/z, 8043m/z, 8226m/z, 15123m/z, 15867m/z, 5893m/z, 5158m/z, 5366m/z are expressed in the positive group more generally than in the negative group. The intensity of these peaks was significantly different between COVID-19 and the control group.
(V) model Algorithm
We tried to build a model with 25 characteristic peaks of training set data using 8 machine learning methods, and the model results were evaluated by cross validation accuracy. The analyzed 8-machine learning method is as follows: logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), naive bayes method (NB), gradient descent tree (GBDT), K-nearest neighbor algorithm (KNN), Decision Tree (DT) and adaptive boost algorithm (Adaboost).
FIGS. 4-1 and 4-2 show the model results for the training and test sets, respectively, in the form of ROC curves. The ROC curve is a curve drawn based on a series of different two classification methods (cut-off values or decision thresholds) with true positive rate (sensitivity) as ordinate and false positive rate (1-specificity) as abscissa. The area under the ROC curve (AUC) of each test is calculated respectively for comparison, and the AUC of each test is the largest, so that the diagnostic value of each test is the best. In this study, the AUC of the area under the ROC curve for all models in the training set was greater than 0.99, where the AUC for LR, SVM, RF, GBDT, DT and Adaboost was 1 (fig. 4-1). In ROC curve analysis of the validation set data, it was found that the AUC of 8 models obtained by 8 machine learning methods in the test set exceeded 0.92, and the AUC was 1 for the LR, SVM and NB models (FIG. 4-2). After the accuracy, recall rate, precision, F1, sensitivity and specificity of 8 models were evaluated, the LR model was found to have the best classification performance (AUC 1, sensitivity 98%, specificity 100%, accuracy 99%, precision 100%, recall 98%, F1 99%), and was further applied to the detection of codv-19.
The confusion matrix of the LR model in the test set is shown in FIG. 5, wherein the vertical axis represents the real grouping situation of the samples, the upper row represents the number of negative samples, and the lower row represents the number of positive samples; the horizontal axis represents the model prediction result, the left column represents the number of samples determined to be negative by the model, and the right column represents the number of samples determined to be positive by the model. All 51 negative samples were judged to be negative, and the negative sample judgment accuracy (i.e., model specificity) was 100%; of the 49 positive samples, 1 was judged as negative by mistake, 48 were judged as positive, and the positive sample judgment accuracy (i.e., the model sensitivity) was 98.0%.
TABLE 1 median of 25 characteristic Polypeptides in the training set in patients, healthy people
Figure GDA0003023355880000071
Figure GDA0003023355880000081
The specific process for establishing the characteristic polypeptide mass spectrum model for rapidly screening patients with new coronary pneumonia (COVID-19) is shown in FIG. 6. The process comprises the following steps: (1) collecting new coronary pneumonia patients and negative control populations respectively and collecting serum samples; (2) carrying out mass spectrum pretreatment on a serum sample by using the kit; (3) MALDI-TOF MS mass spectrometry detection is carried out, and spectrogram information is obtained; (4) processing the spectrogram and obtaining a peak list; (5) bioinformatics analysis; (6) and determining a mass spectrum model.
Example 3 establishment of a New coronary pneumonia patient screening model
198 of 298 serum samples (146 from diagnosed new coronary pneumonia patients, another 46 normal persons, 33 tuberculosis patient controls and 73 controls with similar symptoms of new coronary pneumonia (fever cough)) were selected as training samples for model building, of which 97 were from new coronary pneumonia patients and 34 were from normal persons, 19 were from tuberculosis patient controls and 48 were from patients with similar symptoms of new coronary pneumonia. All serum samples were drawn on early morning fasts, serum was isolated and virus inactivated and stored in a-80 ℃ cold box.
The remaining samples (49 new coronary pneumonia patients, 12 normal persons, 14 tuberculosis, 25 new coronary pneumonia similar symptoms) were used as validation samples for blind selection test. The processing method is the same as above.
And (3) establishing a mass spectrum model of the new coronary pneumonia polypeptide by using the serum characteristic polypeptide peak of the new coronary pneumonia patient screened in the example 1-2. The model is determined to adopt 25 characteristic peaks, which are respectively: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.
The characteristic mass spectrum peak spectrogram of the characteristic polypeptide is shown in figures 6-30.
The training and validation sets AUC for the LR model were both 1. The accuracy of the test set is 99%, the sensitivity is 98%, and the specificity is 100%. The model has good prediction capability.
TABLE 2 model training results
Sample(s) Number of examples Predicting new coronary pneumonia Prediction of non-neocoronary pneumonia Prediction accuracy%
Patient group 97 97 0 100.00
Normal group 34 0 34 100.00
Pulmonary tuberculosis group 19 0 19 100.00
Symptom-like group 48 0 48 100.00
Total of 198 100.00
From the above table it can be seen that the results for the training set samples are: 34 of the 34 normal groups are judged correctly, and the specificity is 100.00%; 97 of 97 patients were judged correctly with 100.00% sensitivity; 19 of 19 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 48 patients with similar symptoms, 48 were judged correctly with a sensitivity of 100.00%.
Example 4 identification of novel coronary pneumonia signature Polypeptides
After the peak to be identified is determined in examples 2 and 3, 7 serum samples with different peak intensities to be identified in the pre-processed sample are searched. After DTT reduction of the sample, proteins with molecular weight of more than 50kDa are removed by ultrafiltration and centrifugation. The filtered small molecule protein/polypeptide was separated by tricine-SDS-PAGE. And performing secondary mass spectrum identification on each band after in-gel enzyme digestion.
Polypeptide sequence identification was performed using a nano-LC-MS/MS platform, including nanoflow HPLC (Thermo Fisher Scientific, USA) and Q-active mass spectrometer (Thermo Fisher Scientific, USA). The ion mode is positive ion mode, and the scanning range is 300-1400 m/z. The resolution of the primary mass spectrum is 70000, and the resolution of the secondary mass spectrum is 17500.
Liquid phase analytical column: the model is as follows: exil Pure 120C18(dr. maisch GmbH, USA); specification: 360 μm × 12 cm; inner diameter: 150 μm; granulating: 1.9 um. And (3) an elution mode: the mobile phase was eluted linearly from 7% B solution (80% acetonitrile, 0.1% formic acid) to 45% B solution. Flow rate: 600 nl/min; the total time was 38 minutes.
The results are shown in tables 3 and 4.
TABLE 3 characterization of the Peak polypeptide
Figure GDA0003023355880000082
Figure GDA0003023355880000091
TABLE 4 polypeptide identification sequences
Figure GDA0003023355880000092
Figure GDA0003023355880000101
Example 5 Blind selection test of New coronary pneumonia patient screening model
After the model training is completed, a model with input variables of 25 characteristic polypeptide fragments is established, and a model with input variables of 15 characteristic polypeptide fragments for sequencing is established.
According to the method of example 3, samples of 49 patients with new coronary pneumonia, 12 normal persons, 14 pulmonary tuberculosis and 21 types of symptoms were blindly predicted by using the above two models, and the types of the samples were determined, which was the same as that described in the above example. The results are shown in tables 5-1 and 5-2, respectively.
TABLE 5-1 prediction of test sample results by 25 variables
Sample(s) Number of examples Predicting the new coronary pneumonia Prediction of non-neocoronary pneumonia Prediction accuracy%
Patient group 49 48 1 97.96
Normal group 12 0 12 100.00
Pulmonary tuberculosis group 14 0 14 100.00
Symptom-like group 25 0 25 100.00
Total of 100 99.00
From Table 5-1, it can be seen that the results for the test group samples are: 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 48 of 49 patients judged correctly with sensitivity of 97.96%; 14 of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%; 25 of the 25 symptomatic similar patients judged correctly with a sensitivity of 100.00%.
TABLE 5-2 prediction of test samples by 15 variables
Sample(s) Number of examples Predicting the new coronary pneumonia Prediction of non-neocoronary pneumonia Prediction accuracy%
Patient group 49 46 3 93.88
Normal group 12 1 11 91.67
Pulmonary tuberculosis group 14 0 14 100.00
Symptom-like group 25 1 24 96.00
Total of 100 95.00
From Table 5-2, it can be seen that the results for the test group samples are: 46 of 49 new crown patients judged correctly with 93.88% sensitivity; 11 of the 12 normal groups were judged correctly with specificity 91.67%; 14 of 14 tuberculosis patients were judged correctly with a specificity of 100.00%; 24 of the 25 patients with similar symptoms were judged correctly with a sensitivity of 96.00%. This indicates that the model composed of the input variables of 15 characteristic polypeptides has the same specificity as the detection result of the complete variable for tuberculosis patients, and the other three groups have few misjudgments. This model has met the need for rapid clinical screening of patients with confirmed diagnosis.
In addition, as can be seen from the above table: the blind selection detection accuracy of the new coronary pneumonia group by using the complete variables of 25 characteristic polypeptides is basically the same as that of model training, but the prediction result of a non-new coronary pneumonia group reaches 100%, so that in the result after model training, an experimenter can completely eliminate false positive results through fine optimization, the result shows that the diagnosis result of the new coronary pneumonia group on the positive results is real and credible, and missed diagnosis and/or misdiagnosis are avoided to the maximum extent, thereby having positive significance.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and improvements can be made without departing from the technical principle of the present invention, and these modifications and improvements should also be regarded as the protection scope of the present invention.
Sequence listing
Characteristic polypeptide composition for diagnosing new coronary pneumonia
<120> characteristic polypeptide composition for diagnosing neocoronary pneumonia
<150> 202011107819X
<151> 2020-10-16
<160> 15
<170> SIPOSequenceListing 1.0
<210> 1
<211> 61
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 1
Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu Asn Val Lys
1 5 10 15
Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu
20 25 30
Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr
35 40 45
Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg
50 55 60
<210> 2
<211> 68
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 2
Glu Glu Asp Gly Asp Leu Gln Cys Leu Cys Val Lys Thr Thr Ser Gln
1 5 10 15
Val Arg Pro Arg His Ile Thr Ser Leu Glu Val Ile Lys Ala Gly Pro
20 25 30
His Cys Pro Thr Ala Gln Leu Ile Ala Thr Leu Lys Asn Gly Arg Lys
35 40 45
Ile Cys Leu Asp Leu Gln Ala Leu Leu Tyr Lys Lys Ile Ile Lys Glu
50 55 60
His Leu Glu Ser
65
<210> 3
<211> 74
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 3
Met Glu Ala Pro Ala Gln Leu Leu Phe Leu Leu Leu Leu Trp Leu Pro
1 5 10 15
Asp Thr Thr Gly Glu Ile Val Met Thr Gln Ser Pro Ala Thr Leu Ser
20 25 30
Val Ser Pro Gly Glu Arg Ala Thr Leu Ser Cys Arg Ala Ser Gln Ser
35 40 45
Val Ser Ser Asn Leu Ala Trp Tyr Gln Gln Lys Pro Gly Gln Ala Pro
50 55 60
Arg Leu Leu Ile Tyr Gly Ala Ser Thr Arg
65 70
<210> 4
<211> 69
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 4
Met Lys Leu Leu His Val Phe Leu Leu Phe Leu Cys Phe His Leu Arg
1 5 10 15
Phe Cys Lys Val Thr Tyr Thr Ser Gln Glu Asp Leu Val Glu Lys Lys
20 25 30
Cys Leu Ala Lys Lys Tyr Thr His Leu Ser Cys Asp Lys Val Phe Cys
35 40 45
Gln Pro Trp Gln Arg Cys Ile Glu Gly Thr Cys Val Cys Lys Leu Pro
50 55 60
Tyr Gln Cys Pro Lys
65
<210> 5
<211> 79
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 5
Met Thr Ser Arg Lys Lys Val Leu Leu Lys Val Ile Ile Leu Gly Asp
1 5 10 15
Ser Gly Val Gly Lys Thr Ser Leu Met Asn Gln Tyr Val Asn Lys Lys
20 25 30
Phe Ser Asn Gln Tyr Lys Ala Thr Ile Gly Ala Asp Phe Leu Thr Lys
35 40 45
Glu Val Met Val Asp Asp Arg Leu Val Thr Met Gln Ile Trp Asp Thr
50 55 60
Ala Gly Gln Glu Arg Phe Gln Ser Leu Gly Val Ala Phe Tyr Arg
65 70 75
<210> 6
<211> 91
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 6
Met Thr Leu Gly Arg Arg Leu Ala Cys Leu Phe Leu Ala Cys Val Leu
1 5 10 15
Pro Ala Leu Leu Leu Gly Gly Thr Ala Leu Ala Ser Glu Ile Val Gly
20 25 30
Gly Arg Arg Ala Arg Pro His Ala Trp Pro Phe Met Val Ser Leu Gln
35 40 45
Leu Arg Gly Gly His Phe Cys Gly Ala Thr Leu Ile Ala Pro Asn Phe
50 55 60
Val Met Ser Ala Ala His Cys Val Ala Asn Val Asn Val Arg Ala Val
65 70 75 80
Arg Val Val Leu Gly Ala His Asn Leu Ser Arg
85 90
<210> 7
<211> 119
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 7
Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser
1 5 10 15
Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Ile Gln Val Tyr Ser Arg
20 25 30
His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser
35 40 45
Gly Phe His Pro Ser Asp Ile Glu Val Asp Leu Leu Lys Asn Gly Glu
50 55 60
Arg Ile Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp
65 70 75 80
Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp
85 90 95
Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gln Pro Lys Ile
100 105 110
Val Lys Trp Asp Arg Asp Met
115
<210> 8
<211> 127
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 8
Gly Pro Thr Gly Thr Gly Glu Ser Lys Cys Pro Leu Met Val Lys Val
1 5 10 15
Leu Asp Ala Val Arg Gly Ser Pro Ala Ile Asn Val Ala Val His Val
20 25 30
Phe Arg Lys Ala Ala Asp Asp Thr Trp Glu Pro Phe Ala Ser Gly Lys
35 40 45
Thr Ser Glu Ser Gly Glu Leu His Gly Leu Thr Thr Glu Glu Glu Phe
50 55 60
Val Glu Gly Ile Tyr Lys Val Glu Ile Asp Thr Lys Ser Tyr Trp Lys
65 70 75 80
Ala Leu Gly Ile Ser Pro Phe His Glu His Ala Glu Val Val Phe Thr
85 90 95
Ala Asn Asp Ser Gly Pro Arg Arg Tyr Thr Ile Ala Ala Leu Leu Ser
100 105 110
Pro Tyr Ser Tyr Ser Thr Thr Ala Val Val Thr Asn Pro Lys Glu
115 120 125
<210> 9
<211> 128
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 9
Met Ser Leu Arg Leu Asp Thr Thr Pro Ser Cys Asn Ser Ala Arg Pro
1 5 10 15
Leu His Ala Leu Gln Val Leu Leu Leu Leu Ser Leu Leu Leu Thr Ala
20 25 30
Leu Ala Ser Ser Thr Lys Gly Gln Thr Lys Arg Asn Leu Ala Lys Gly
35 40 45
Lys Glu Glu Ser Leu Asp Ser Asp Leu Tyr Ala Glu Leu Arg Cys Met
50 55 60
Cys Ile Lys Thr Thr Ser Gly Ile His Pro Lys Asn Ile Gln Ser Leu
65 70 75 80
Glu Val Ile Gly Lys Gly Thr His Cys Asn Gln Val Glu Val Ile Ala
85 90 95
Thr Leu Lys Asp Gly Arg Lys Ile Cys Leu Asp Pro Asp Ala Pro Arg
100 105 110
Ile Lys Lys Ile Val Gln Lys Lys Leu Ala Gly Asp Glu Ser Ala Asp
115 120 125
<210> 10
<211> 123
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 10
Val Pro Leu Ala Asp Met Pro His Ala Pro Ile Gly Leu Tyr Phe Asp
1 5 10 15
Thr Val Ala Asp Lys Ile His Ser Val Ser Arg Lys His Gly Ala Thr
20 25 30
Leu Val His Cys Ala Ala Gly Val Ser Arg Ser Ala Thr Leu Cys Ile
35 40 45
Ala Tyr Leu Met Lys Phe His Asn Val Cys Leu Leu Glu Ala Tyr Asn
50 55 60
Trp Val Lys Ala Arg Arg Pro Val Ile Arg Pro Asn Val Gly Phe Trp
65 70 75 80
Arg Gln Leu Ile Asp Tyr Glu Arg Gln Leu Phe Gly Lys Ser Thr Val
85 90 95
Lys Met Val Gln Thr Pro Tyr Gly Ile Val Pro Asp Val Tyr Glu Lys
100 105 110
Glu Ser Arg His Leu Met Pro Tyr Trp Gly Ile
115 120
<210> 11
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 11
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 12
<211> 130
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 12
Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys
1 5 10 15
Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His
20 25 30
Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala
35 40 45
Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu
50 55 60
Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile
65 70 75 80
Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys
85 90 95
Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile
100 105 110
Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys
115 120 125
Gly Lys
130
<210> 13
<211> 141
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 13
Val Leu Ser Pro Ala Asp Lys Thr Asn Val Lys Ala Ala Trp Gly Lys
1 5 10 15
Val Gly Ala His Ala Gly Glu Tyr Gly Ala Glu Ala Leu Glu Arg Met
20 25 30
Phe Leu Ser Phe Pro Thr Thr Lys Thr Tyr Phe Pro His Phe Asp Leu
35 40 45
Ser His Gly Ser Ala Gln Val Lys Gly His Gly Lys Lys Val Ala Asp
50 55 60
Ala Leu Thr Asn Ala Val Ala His Val Asp Asp Met Pro Asn Ala Leu
65 70 75 80
Ser Ala Leu Ser Asp Leu His Ala His Lys Leu Arg Val Asp Pro Val
85 90 95
Asn Phe Lys Leu Leu Ser His Cys Leu Leu Val Thr Leu Ala Ala His
100 105 110
Leu Pro Ala Glu Phe Thr Pro Ala Val His Ala Ser Leu Asp Lys Phe
115 120 125
Leu Ala Ser Val Ser Thr Val Leu Thr Ser Lys Tyr Arg
130 135 140
<210> 14
<211> 146
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 14
Val His Leu Thr Pro Glu Glu Lys Ser Ala Val Thr Ala Leu Trp Gly
1 5 10 15
Lys Val Asn Val Asp Glu Val Gly Gly Glu Ala Leu Gly Arg Leu Leu
20 25 30
Val Val Tyr Pro Trp Thr Gln Arg Phe Phe Glu Ser Phe Gly Asp Leu
35 40 45
Ser Thr Pro Asp Ala Val Met Gly Asn Pro Lys Val Lys Ala His Gly
50 55 60
Lys Lys Val Leu Gly Ala Phe Ser Asp Gly Leu Ala His Leu Asp Asn
65 70 75 80
Leu Lys Gly Thr Phe Ala Thr Leu Ser Glu Leu His Cys Asp Lys Leu
85 90 95
His Val Asp Pro Glu Asn Phe Arg Leu Leu Gly Asn Val Leu Val Cys
100 105 110
Val Leu Ala His His Phe Gly Lys Glu Phe Thr Pro Pro Val Gln Ala
115 120 125
Ala Tyr Gln Lys Val Val Ala Gly Val Ala Asn Ala Leu Ala His Lys
130 135 140
Tyr His
145
<210> 15
<211> 257
<212> PRT
<213> 2 Ambystoma laterale x Ambystoma jeffersonianum
<400> 15
Ile Leu Leu Tyr Ser Leu Asp Gly Arg Leu Leu Ser Thr Tyr Ser Ala
1 5 10 15
Tyr Glu Trp Ser Leu Gly Ile Lys Ser Val Ala Trp Ser Pro Ser Ser
20 25 30
Gln Phe Leu Ala Val Gly Ser Tyr Asp Gly Lys Val Arg Ile Leu Asn
35 40 45
His Val Thr Trp Lys Met Ile Thr Glu Phe Gly His Pro Ala Ala Ile
50 55 60
Asn Asp Pro Lys Ile Val Val Tyr Lys Glu Ala Glu Lys Ser Pro Gln
65 70 75 80
Leu Gly Leu Gly Cys Leu Ser Phe Pro Pro Pro Arg Ala Gly Ala Gly
85 90 95
Pro Leu Pro Ser Ser Glu Ser Lys Tyr Glu Ile Ala Ser Val Pro Val
100 105 110
Ser Leu Gln Thr Leu Lys Pro Val Thr Asp Arg Ala Asn Pro Lys Ile
115 120 125
Gly Ile Gly Met Leu Ala Phe Ser Pro Asp Ser Tyr Phe Leu Ala Thr
130 135 140
Arg Asn Asp Asn Ile Pro Asn Ala Val Trp Val Trp Asp Ile Gln Lys
145 150 155 160
Leu Arg Leu Phe Ala Val Leu Glu Gln Leu Ser Pro Val Arg Ala Phe
165 170 175
Gln Trp Asp Pro Gln Gln Pro Arg Leu Ala Ile Cys Thr Gly Gly Ser
180 185 190
Arg Leu Tyr Leu Trp Ser Pro Ala Gly Cys Met Ser Val Gln Val Pro
195 200 205
Gly Glu Gly Asp Phe Ala Val Leu Ser Leu Cys Trp His Leu Ser Gly
210 215 220
Asp Ser Met Ala Leu Leu Ser Lys Asp His Phe Cys Leu Cys Phe Leu
225 230 235 240
Glu Thr Glu Ala Val Val Gly Thr Ala Cys Arg Gln Leu Gly Gly His
245 250 255
Thr

Claims (8)

1. The application of the characteristic polypeptide composition as a reagent for preparing a reagent for diagnosing neocoronary pneumonia comprises 25 characteristic polypeptides with the following mass-to-charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.
2. The use according to claim 1, wherein the 15 characteristic polypeptides in the composition comprise the polypeptide sequences:
a characteristic polypeptide with a mass-to-charge ratio of 6939m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 1;
a characteristic polypeptide with the mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequence shown in SEQ ID No. 2;
a characteristic polypeptide with the mass-to-charge ratio of 8034m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 3;
a characteristic polypeptide with the mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 4;
a characteristic polypeptide with the mass-to-charge ratio of 8986m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 5;
a characteristic polypeptide with the mass-to-charge ratio of 9626m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 6;
a characteristic polypeptide with the mass-to-charge ratio of 13719m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 7;
a characteristic polypeptide with the mass-to-charge ratio of 13765m/z, and the polypeptide sequence is selected from a sequence shown as SEQ ID No. 8;
a characteristic polypeptide with the mass-to-charge ratio of 13886m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 9;
a characteristic polypeptide with the mass-to-charge ratio of 14049m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 10;
a characteristic polypeptide with the mass-to-charge ratio of 14095m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 11;
a characteristic polypeptide with the mass-to-charge ratio of 14102m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 12;
the characteristic polypeptide with the mass-to-charge ratio of 15123m/z, and the polypeptide sequence is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide with the mass-to-charge ratio of 15867m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 14;
a characteristic polypeptide with a mass-to-charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15.
3. The use of claim 2, wherein when the peaks of the characteristic polypeptides 8986m/z and 28091m/z and the peaks of the characteristic polypeptides 6939m/z, 13886m/z, 14049m/z and 14102m/z of the serum sample to be tested are down-regulated, the serum sample is determined to be a positive sample, that is, the sample is determined to be a sample of a patient with new coronary pneumonia, and the cross-validation accuracy of ten folds is 91%.
4. The use according to claim 2, wherein when the peaks of the polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z and 28091m/z of the serum sample to be tested are up-regulated, and the peaks of the polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z are down-regulated, the serum sample is a positive sample, i.e. the sample is a sample of a new patient with coronary pneumonia, and the cross-fold cross-validation accuracy is 93.88%.
5. The use of claim 1, wherein when the characteristic polypeptides of the test serum sample 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z are up-regulated, meanwhile, when the peaks of the characteristic polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z and 28232m/z are down-regulated, the serum sample is a positive sample, namely, the sample is a sample of a new coronary pneumonia patient, and the ten-fold cross validation accuracy rate is 97.96%.
6. The application of the characteristic polypeptide composition as a reagent for preparing a reagent for diagnosing the new coronary pneumonia comprises 15 characteristic polypeptides with the following mass-to-charge ratios: 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102 m/z.
7. The use according to claim 6, wherein the 15 signature polypeptides comprise the polypeptide sequences:
a characteristic polypeptide with a mass-to-charge ratio of 6939m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 1;
a characteristic polypeptide with the mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequence shown in SEQ ID No. 2;
a characteristic polypeptide with the mass-to-charge ratio of 8034m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 3;
a characteristic polypeptide with the mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 4;
a characteristic polypeptide with the mass-to-charge ratio of 8986m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 5;
a characteristic polypeptide with the mass-to-charge ratio of 9626m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 6;
a characteristic polypeptide with the mass-to-charge ratio of 13719m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 7;
a characteristic polypeptide with the mass-to-charge ratio of 13765m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 8;
a characteristic polypeptide with the mass-to-charge ratio of 13886m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 9;
a characteristic polypeptide with the mass-to-charge ratio of 14049m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 10;
a characteristic polypeptide with the mass-to-charge ratio of 14095m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 11;
a characteristic polypeptide with the mass-to-charge ratio of 14102m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 12;
the characteristic polypeptide with the mass-to-charge ratio of 15123m/z, and the polypeptide sequence is selected from the sequence shown in SEQ ID No. 13;
a characteristic polypeptide with the mass-to-charge ratio of 15867m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 14;
a characteristic polypeptide with a mass-to-charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15.
8. The use according to claim 7, wherein when the peaks of the polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z and 28091m/z are up-regulated and the peaks of the polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z are down-regulated, the serum sample is a positive sample, that is, the sample is a sample of a new patient with coronary pneumonia, and the cross-fold verification accuracy is 93.88%.
CN202110155492.1A 2020-10-16 2021-02-04 Characteristic polypeptide composition for diagnosing new coronary pneumonia Active CN112858454B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011107819X 2020-10-16
CN202011107819 2020-10-16

Publications (2)

Publication Number Publication Date
CN112858454A CN112858454A (en) 2021-05-28
CN112858454B true CN112858454B (en) 2022-09-30

Family

ID=75653649

Family Applications (5)

Application Number Title Priority Date Filing Date
CN202110155258.9A Active CN112798679B (en) 2020-10-16 2021-02-04 Kit for diagnosing novel coronavirus infection
CN202110156544.7A Active CN112748173B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection
CN202110154054.3A Active CN112946053B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for preparing detection product for diagnosing new coronavirus infection
CN202110155492.1A Active CN112858454B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for diagnosing new coronary pneumonia
CN202110158952.6A Active CN112903802B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection

Family Applications Before (3)

Application Number Title Priority Date Filing Date
CN202110155258.9A Active CN112798679B (en) 2020-10-16 2021-02-04 Kit for diagnosing novel coronavirus infection
CN202110156544.7A Active CN112748173B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection
CN202110154054.3A Active CN112946053B (en) 2020-10-16 2021-02-04 Characteristic polypeptide composition for preparing detection product for diagnosing new coronavirus infection

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110158952.6A Active CN112903802B (en) 2020-10-16 2021-02-04 Method for constructing mass spectrum model for diagnosing new coronavirus infection

Country Status (1)

Country Link
CN (5) CN112798679B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114858906A (en) * 2021-02-04 2022-08-05 北京毅新博创生物科技有限公司 Kit for diagnosing neocoronary pneumonia
CN114858904A (en) * 2021-02-04 2022-08-05 北京毅新博创生物科技有限公司 Mass spectrometry model comprising characteristic polypeptides for diagnosing neocoronary pneumonia
CN114858907A (en) * 2021-02-04 2022-08-05 北京毅新博创生物科技有限公司 Construction method of mass spectrum model for diagnosing new coronary pneumonia
CN113555118B (en) * 2021-07-26 2023-03-31 内蒙古自治区人民医院 Method and device for predicting disease degree, electronic equipment and storage medium
WO2023123175A1 (en) * 2021-12-30 2023-07-06 北京毅新博创生物科技有限公司 Method for evaluating whether individual completes vaccination or individual immune changes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111337673A (en) * 2020-05-18 2020-06-26 博奥赛斯(天津)生物科技有限公司 Synthetic polypeptide composition for novel coronavirus immunodetection and application
CN111504886A (en) * 2020-05-06 2020-08-07 西安交通大学 Application of a group of molecules in preparation of auxiliary diagnosis reagent or kit for new coronary pneumonia

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602004031289D1 (en) * 2003-04-15 2011-03-17 Canada Natural Resources SARS-RELATED PROTEINS
US8057993B2 (en) * 2003-04-26 2011-11-15 Ibis Biosciences, Inc. Methods for identification of coronaviruses
US20060257861A1 (en) * 2005-05-12 2006-11-16 Wright State University Screening assay for inhibitors of severe acute respiratory syndrome (SARS) using SELDI-TOF Mass Spectrometry
US7714276B2 (en) * 2005-09-30 2010-05-11 New York University Methods for direct biomolecule identification by matrix-assisted laser desorption ionization (MALDI) mass spectrometry
CN101093215A (en) * 2007-04-03 2007-12-26 许洋 Mass spectrum kit and method for evaluating prognosis from screening lung cancer
AU2008298888A1 (en) * 2007-09-11 2009-03-19 Cancer Prevention And Cure, Ltd. Identification of proteins in human serum indicative of pathologies of human lung tissues
CN101424661B (en) * 2008-07-23 2012-05-30 中国人民解放军总医院第二附属医院 Serodiagnosis model establishing method for active tuberculosis disease
CN102323246B (en) * 2011-07-29 2016-08-03 北京毅新博创生物科技有限公司 One group for detecting the characteristic protein of pulmonary carcinoma
CN102661884B (en) * 2012-05-03 2015-04-15 浙江大学 Sample containing tuberculosis serum characterized protein and preparation method thereof
GB201807380D0 (en) * 2018-05-04 2018-06-20 Karlsson Roger Biomarkers for detecting microbial infection
CN111366734B (en) * 2020-03-20 2021-07-13 广州市康润生物科技有限公司 Method for screening new coronavirus through double indexes and predicting severe pneumonia
CN111323511B (en) * 2020-03-26 2022-04-29 浙江大学医学院附属第四医院(浙江省义乌医院、浙江大学医学院附属第四医院医共体) Rapid detection kit and method for inactivating new coronavirus
CN111455062B (en) * 2020-04-01 2022-02-11 中国人民解放军总医院 Kit and platform for detecting susceptibility genes of novel coronavirus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111504886A (en) * 2020-05-06 2020-08-07 西安交通大学 Application of a group of molecules in preparation of auxiliary diagnosis reagent or kit for new coronary pneumonia
CN111337673A (en) * 2020-05-18 2020-06-26 博奥赛斯(天津)生物科技有限公司 Synthetic polypeptide composition for novel coronavirus immunodetection and application

Also Published As

Publication number Publication date
CN112903802B (en) 2023-06-27
CN112748173B (en) 2023-06-20
CN112858454A (en) 2021-05-28
CN112946053A (en) 2021-06-11
CN112946053B (en) 2023-06-27
CN112798679A (en) 2021-05-14
CN112748173A (en) 2021-05-04
CN112798679B (en) 2023-06-20
CN112903802A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN112858454B (en) Characteristic polypeptide composition for diagnosing new coronary pneumonia
CN109884302B (en) Lung cancer early diagnosis marker based on metabonomics and artificial intelligence technology and application thereof
Chalupová et al. Identification of fungal microorganisms by MALDI-TOF mass spectrometry
Vaidyanathan et al. Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-throughput bacterial identification
Panda et al. MALDI-TOF mass spectrometry for rapid identification of clinical fungal isolates based on ribosomal protein biomarkers
CN104797939B (en) The apparatus and method of microbiological analysis
WO2022166486A1 (en) Characteristic polypeptide composition for diagnosing covid-19
WO2022166485A1 (en) Kit for diagnosing covid-19
CN108363908B (en) Intelligent spectroscopy system for detecting biomolecules
CN102253111A (en) MALDI-TOF MS (Matrix-assisted Laser Desorption/Ionization Time of Flight Mass Spectrometry)-assisted identification method for listeria monocytogenes
CN103308696A (en) Brucella rapid detection kit based on mass-spectrometric technique
WO2009076425A2 (en) Methods of analyzing wound samples
CN111307926B (en) Rapid detection method for brucella vaccine strain infection based on serum
CN111239235A (en) Database establishment method and identification method of Bartonella strain MALDI-TOF MS
CN110687191A (en) Microorganism identification and typing method based on matrix-assisted laser desorption ionization time-of-flight mass spectrometry and FTIR (Fourier transform infrared spectroscopy) spectrum combination
WO2022166494A1 (en) Construction method for mass spectrum model for diagnosing covid-19
WO2022166487A1 (en) Use of characteristic polypeptide composition and mass spectrometry model for preparing covid-19 detection product
CN114858904A (en) Mass spectrometry model comprising characteristic polypeptides for diagnosing neocoronary pneumonia
Pan et al. Identification of lethal Aspergillus at early growth stages based on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry
TWI775205B (en) Method of identification of methicillin-resistant staphylococcus aureus
WO2011015631A1 (en) Method of identifying micro-organisms and their species in blood culture
CN116337986B (en) Quick identification method of salmonella kentucky based on MALDI-TOF MS
Velichko et al. Classification and identification tasks in microbiology: Mass spectrometric methods coming to the aid
CN104833803B (en) The multiple detection method of a kind of bean pathogenetic bacteria and data base
CN109879928B (en) Method for extracting endogenous peptide from plant

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant