CN114858903A

CN114858903A - Characteristic polypeptide composition for diagnosing neocoronary pneumonia

Info

Publication number: CN114858903A
Application number: CN202110154026.1A
Authority: CN
Inventors: 廖璞; 孙巍; 乔亮; 吕倩; 马庆伟
Original assignee: Beijing Clin Bochuang Biotechnology Co Ltd
Current assignee: Beijing Clin Bochuang Biotechnology Co Ltd
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2022-08-05
Also published as: WO2022166486A1

Abstract

The invention provides a characteristic polypeptide composition for detecting new coronary pneumonia, which comprises 29 characteristic polypeptides with specific mass-to-charge ratios, and whether a sample is a new coronary pneumonia patient or not can be judged by analyzing the expression condition of the characteristic polypeptides. The invention also provides applications of a mass spectrum model prepared from the characteristic polypeptide composition, a product for diagnosing the new coronary pneumonia and the like. The invention firstly proposes to search a plurality of characteristic protein combinations with differences according to new coronary pneumonia patients/normal persons, tuberculosis patients and new coronary pneumonia type symptom contrast, breaks through the traditional research thought of only searching characteristic polypeptides in normal persons and new coronary pneumonia patients, effectively avoids infection of false positive results similar to new coronary pneumonia symptoms, has simple operation, low detection cost and high accuracy, and is expected to be used for large-scale screening of new coronary pneumonia.

Description

Characteristic polypeptide composition for diagnosing new coronary pneumonia

Technical Field

The invention belongs to the field of detection, and relates to a technology for rapidly detecting novel coronavirus pneumonia by using a time-of-flight mass spectrometry technology.

Background

Coronaviruses are a group of pathogens that cause mainly respiratory and intestinal diseases. The surface of such virus particles has many regularly arranged protrusions, and the whole virus particle is like the crown of emperor, hence the name "coronavirus". Besides humans, coronaviruses can infect various mammals such as pigs, cows, cats, dogs, minks, camels, bats, mice, hedgehogs, and various birds. The novel coronavirus COVID-19 is a novel coronavirus strain which is never discovered in human bodies before, and the propagation rule, the infection mechanism, the evolution rule and the mutation rule of the novel coronavirus strain are still unclear, so that the difficulty is brought to prevention and treatment.

In order to prevent the occurrence and the prevalence of the novel coronavirus (COVID-19) pneumonia, measures are rapidly taken, the development and the spread of epidemic situations are effectively controlled, and the rapid detection of the novel coronavirus pneumonia is particularly important. For a long time, the identification of coronavirus adopts the traditional microbiological detection method, namely morphological, physiological and biochemical characteristics and serological identification. Although the method has high accuracy, the required time is too long and can be finished within ten and several hours at the fastest speed, and the requirement of quick detection is difficult to adapt. The nucleic acid detection method based on the multiplex PCR has important significance for early diagnosis of coronavirus and discovery of infection source. And multiple PCR detection aims at multiple genes, the false negative rate is lower than that of single PCR, however, the PCR detection method has the defects of complicated detection process, higher cost and limited detection high flux.

Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) technology is a mass spectrometry technology which is published and developed rapidly in the end of the 20 th century and the 80 th century. The mass analyzer is an ion drift tube (ion drift tube), ions generated by an ion source are firstly collected, the speed of all ions in a collector is changed into 0, the ions enter the field-free drift tube after being accelerated by a pulse electric field and fly to an ion receiver at a constant speed, and the larger the mass of the ions is, the longer the time for the ions to reach the receiver is; the smaller the mass of the ions, the shorter the time it takes to reach the receiver. According to the principle, ions with different masses can be separated according to the mass-to-charge ratio, the molecular mass and the purity of biomacromolecules such as polypeptide, protein, nucleic acid, polysaccharide and the like can be accurately detected, and the method has the advantages of high accuracy, strong flexibility, large flux, short detection period and high cost performance.

In recent years, mass spectrometry techniques have emerged to detect polypeptides or polypeptides characteristic of pathogenic microorganisms or viruses. For example, chinese patent application CN102337223A, "penicillium chrysogenum antifungal protein Pc-Arctin and a preparation method thereof", discloses a MALDI-TOF identification method for detecting penicillium chrysogenum antifungal protein Pc-Arctin, wherein penicillium chrysogenum a096 spores are picked from a plate and inoculated into SGY liquid culture medium for culture, crude protein solution obtained by pretreatment is separated and purified on a chromatographic column, and separated and purified on a carboxymethyl cation exchange chromatographic column, each eluted component is collected, each component is concentrated to a required volume by centrifugal ultrafiltration, paecilomyces variotii is used as sensitive test indicator bacteria, antifungal active components are tracked, and the determined active components are used for judging the purity of the obtained protein; a single band on the SDS-PAGE electrophoresis image is cut, and MALDI-TOF identification is carried out. The method is only suitable for specific microorganisms, needs a multiple protein purification process, and finally identifies the characteristic polypeptide Pc-Arctin by MALDI-TOF, has complicated process and narrow application range, and cannot realize the purpose of detecting viruses by mass spectrometry.

Chinese patent application 201110154723, "MALDI TOF MS assisted identification Listeria monocytogenes" and 201110154469, "MALDI TOF MS assisted identification Vibrio cholerae" disclose a method for assisted identification of bacteria by MALDI TOF MS technology, comprising: pretreating the bacterial culture, collecting MALDI TOF MS spectra of all bacterial strain samples, preparing bacterial standard spectra according to software, detecting and collecting the spectra of the bacteria to be detected by using the same method, comparing the two spectra, and judging according to matching scores. Because the method uses conventional treatment (through absolute ethyl alcohol, formic acid and acetonitrile treatment, and is assisted with centrifugation, and finally supernatant is sucked for detection), although the characteristic map of the bacteria can be characterized to a certain extent, because the to-be-detected object contains protein, lipid, lipopolysaccharide, lipooligosaccharide, DNA, polypeptide and other molecules which can be ionized, the obtained map is essentially the map set of the various molecules, the map information amount required to be treated and compared is overlarge, and the map characteristic is low because the to-be-detected molecule is overlarge, so that the method is only suitable for a specific bacterium and cannot be popularized to other large-scale virus detection.

Chinese patent application 200880121570, title of the invention "method and biomarker for diagnosing and monitoring psychiatric disorders" reports that nearly a hundred species of biological peptides related to psychiatric disorders, including influenza virus, can be detected by MALDI-TOF mass spectrometry. However, this method simply summarizes the various possible techniques, neither reporting specific protocols nor specific targets for coronaviruses, and thus it is difficult to teach researchers to detect influenza viruses by MALDI-TOF mass spectrometry.

Therefore, a characteristic polypeptide mass spectrum model for detecting the novel coronary pneumonia by matrix assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF-MS) and application thereof are needed at present.

Disclosure of Invention

The first object of the present invention provides a set of compositions based on seropeptidome (peptome) signature polypeptides that can detect neocoronaviruses (COVID-19) by MALDI-TOF mass spectrometry, wherein the signature polypeptide composition comprises 25 signature polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, or 29 characteristic polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.

In any of the above embodiments, when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-validation accuracy of ten fold is about 91%. In a preferred embodiment, the composition of characterizing polypeptides comprises only characterizing polypeptides in a mass ratio of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.

In another any embodiment, when the peaks of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z are up-regulated while the peaks of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z are down-regulated, the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, with a cross-over validation accuracy of about 93.31%. In a preferred embodiment, the composition of signature polypeptides comprises only signature polypeptides in a mass ratio of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.

In other embodiments, when the peak of signature polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z is upregulated, while the peak of signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is downregulated in expression, it is indicative that the serum sample is a positive pneumonia, i.e. the patient is a new coronary patient, the ten-fold cross validation accuracy is about 98.69%.

The second invention aim of the invention is to provide a mass spectrum model for detecting the neocoronary pneumonia, which is prepared by the characteristic polypeptide composition with the mass-to-charge ratio peak value of any scheme.

In one embodiment, the mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, peaks of the signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were up-regulated, indicating that the serum sample was a positive sample, i.e. that the patient was a new coronary pneumonia patient, with a cross-fold accuracy of about 97.96%.

Alternatively, in another embodiment of the foregoing, the mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5323 m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new patient with a cross-fold test accuracy of about 98.69%.

In another embodiment, the mass spectral model is prepared from only the following signature polypeptide compositions at mass to mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when signature polypeptide 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z, were expressed in a down-regulated manner, indicating that the serum sample was a positive sample, i.e., the patient was a new coronary pneumonia patient, and the ten-fold cross-validation accuracy was about 93.31%.

In other embodiments, the mass spectral model is prepared from only the following signature polypeptide compositions at mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z and 14102m/z, respectively, wherein when the peaks of the signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of the signature polypeptides 6939m/z, 13886m/z, 14049m/z and 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-over validation accuracy is about 91%.

The third invention of the present invention is to provide a kit for detecting neocoronary pneumonia, which comprises the characteristic polypeptide composition, or comprises the mass spectrum model.

In one embodiment, the polypeptide composition or mass spectral model is prepared from signature polypeptides 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, wherein when signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 5893m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the characteristic polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, and the cross-over validation accuracy is about 97.96%.

Alternatively, in another embodiment, the polypeptide composition or mass spectral model is prepared from a signature polypeptide 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, when the signature polypeptide 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and when the peak of the signature polypeptide 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is down-regulated, the serum sample is a positive sample, i.e., the patient is a new coronary pneumonia patient, and the cross-fold validation accuracy is about 98.69%.

In another embodiment, the polypeptide composition or mass spectral model is prepared from only the featured polypeptides having mass to charge ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the featured polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z, were expressed when the peaks were down-regulated, indicating that the serum sample was a positive sample, i.e. the patient was a new coronary pneumonia patient, and the ten-fold cross-validation accuracy was about 93.31%.

In other embodiments, the polypeptide composition or mass spectral model is prepared from only characteristic polypeptides having mass to charge ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when peaks of characteristic polypeptides 8986m/z, 28091m/z are up-regulated while peaks of characteristic polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e., the patient is determined to be a new crown pneumonia patient, and the cross-fold cross-validation accuracy is about 91%.

In one embodiment, the kit includes a sample processing solution developed by New Bsakawa Biotech limited, Beijing resol.

In another embodiment, the kit further comprises a standard mass spectrum sample tube for ensuring that the molecular weight measured by the mass spectrometer is accurate, the sample tube can be a plurality of sample tubes containing single characteristic polypeptide, or a sample tube containing a plurality of characteristic polypeptides, and a sample in the standard sample tube is used for performing parallel mass spectrum test with a sample to be tested when performing mass spectrum so as to judge whether the molecular weight information of the sample to be tested is accurate and reliable.

In another embodiment, the kit may contain software or a chip of the standard database of the characteristic polypeptide, and may be used to provide a comparison of standard data or curves when a sample to be tested is subjected to mass spectrometry so as to determine the expression status of the characteristic polypeptide in the sample to be tested.

The fourth invention of the invention is to provide the characteristic polypeptide composition or the mass spectrum model for use in preparing products for diagnosing new coronary pneumonia.

In another embodiment, the polypeptide composition or mass spectral model is prepared from only the featured polypeptides having mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively, wherein when the featured polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z, were expressed when the peaks were down-regulated, indicating that the serum sample was a positive sample, i.e. the patient was a new coronary pneumonia patient, and the ten-fold cross-validation accuracy was about 93.31%.

In other embodiments, the polypeptide composition or mass spectrometry model is prepared from only the following signature polypeptides at a mass ratio of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when the peaks of the signature polypeptides 8986m/z, 28091m/z are up-regulated while the peaks of the signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is indicated as a positive sample, i.e., the patient is determined to be a new crown pneumonia patient with a cross-fold accuracy of about 91%.

In any of the above embodiments, the product for diagnosing neocoronary pneumonia refers to any conventional product for diagnosing neocoronary pneumonia, including: detection reagent, detection chip, detection carrier, detection kit and the like.

The fifth invention of the present invention is to provide a method for constructing a mass spectrometry model, comprising:

1) collecting serum samples of multiple clinically confirmed new coronary pneumonia persons and non-new coronary pneumonia contrast persons (including tuberculosis patients, similar patients with fever and cough and healthy people), and freezing at low temperature for later use;

2) performing pretreatment before mass spectrum on the serum protein;

3) performing mass spectrum detection reading on the two groups of pretreated serum proteins to obtain fingerprint spectrums of the two groups of serum polypeptides;

4) performing standardized processing on the fingerprint spectrums of serum polypeptides of all patients and normal people, and collecting data;

5) and performing quality control treatment on the obtained data, and screening out the characteristic polypeptides with the following mass-to-charge ratio peaks: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, performing secondary mass spectral identification on the characteristic polypeptides, and establishing a mass spectral model for detecting new crown pneumonia according to the mass-to-charge ratio peaks.

In one embodiment, wherein step 5) performs quality control processing on the obtained data, a characteristic polypeptide having the following mass-to-charge ratio peaks is selected: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, secondary mass spectrometric identification of the signature polypeptides and establishment of a mass spectrometric model for the detection of new coronary pneumonia based on these mass to charge ratio peaks.

In a preferred embodiment, wherein the mass spectral model of step 5) is prepared only from signature polypeptides having mass ratios of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, wherein when signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, m/z, When the peaks of 15867m/z and 28091m/z are up-regulated and the peaks of characteristic polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z and 14102m/z are down-regulated, the serum sample is a positive sample, namely the patient is a new coronary pneumonia patient, and the ten-fold cross validation accuracy is about 93.31%.

In another embodiment, wherein the mass spectrometric model of step 5) is prepared only from characteristic polypeptides at mass ratios of 8986m/z, 28091m/z, 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively, wherein when peaks of characteristic polypeptides of 8986m/z, 28091m/z are upregulated while peaks of characteristic polypeptides of 6939m/z, 13886m/z, 14049m/z, 14102m/z are downregulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient, and the cross-fold cross-validation accuracy is about 91%.

Furthermore, in any one of the embodiments of any one of the above objects, the signature polypeptide composition, the mass spectrometric model, the detection product, the use, the method of construction may involve a polypeptide comprising only 19 signature polypeptides having the following mass to charge ratios and polypeptide sequences:

a characteristic polypeptide with a mass-to-charge ratio of 6939m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 1;

a characteristic polypeptide with the mass-to-charge ratio of 7614m/z, wherein the polypeptide sequence is selected from the sequence shown in SEQ ID No. 2;

a characteristic polypeptide with the mass-to-charge ratio of 8034m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 3;

a characteristic polypeptide with the mass-to-charge ratio of 8226m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 4;

a characteristic polypeptide with the mass-to-charge ratio of 8986m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 5;

a characteristic polypeptide with the mass-to-charge ratio of 9626m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 6;

a characteristic polypeptide with the mass-to-charge ratio of 13719m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 7;

a characteristic polypeptide with the mass-to-charge ratio of 13765m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 8;

a characteristic polypeptide with the mass-to-charge ratio of 13886m/z, wherein the polypeptide sequence is selected from a sequence shown as SEQ ID No. 9;

a characteristic polypeptide with the mass-to-charge ratio of 14049m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 10;

a characteristic polypeptide with the mass-to-charge ratio of 14095m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 11;

a characteristic polypeptide with the mass-to-charge ratio of 14102m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 12;

a characteristic polypeptide with the mass-to-charge ratio of 15123m/z, and the polypeptide sequence is selected from the sequence shown in SEQ ID No. 13;

a characteristic polypeptide with the mass-to-charge ratio of 15867m/z, and the polypeptide sequence is selected from the sequence shown as SEQ ID No. 14;

a characteristic polypeptide with a mass-to-charge ratio of 28091m/z, the polypeptide sequence of which is selected from the sequence shown in SEQ ID No. 15;

a characteristic polypeptide with the mass-to-charge ratio of 11435m/z, and the polypeptide sequence thereof is selected from the sequence shown as SEQ ID No. 16;

a characteristic polypeptide with the mass-to-charge ratio of 11495m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 17;

a characteristic polypeptide with the mass-to-charge ratio of 11523m/z, and the polypeptide sequence of the characteristic polypeptide is selected from the sequence shown as SEQ ID No. 18;

a characteristic polypeptide with the mass-to-charge ratio of 11680m/z, and the polypeptide sequence thereof is selected from the sequence shown in SEQ ID No. 19.

In any of the embodiments above, wherein the step 2) pre-treatment method comprises diluting the serum protein or polypeptide in the stable sample with the sample processing solution.

In any of the above embodiments, in the step 3), the polypeptide mass spectrum universal pretreatment kit is used to dilute and read two groups of serum proteins, so as to obtain fingerprint spectra of two groups of serum polypeptides.

In any of the above embodiments, in the quality control treatment in step 5), for a blank substrate, the crystallization point of the blank substrate is detected by using the same mass spectrometry parameters, and if a significant mass spectrometry peak occurs, the quality of the substrate solution is considered to be unqualified.

In any one of the above embodiments, in the quality control processing in step 5), the following 8 characteristic peaks are selected as quality control peaks: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z.

In the process of detecting a biological sample by the time-of-flight mass spectrometry, the quality of a mass spectrometry spectrogram is influenced by a plurality of conditions such as individual difference, sample quality, environmental temperature and humidity change, and crystallization states of the sample and a matrix. In order to avoid the influence of an abnormal spectrogram on an analysis result, the total 8 characteristic peaks in human serum are introduced as quality control peaks, and the appearance of the quality control peaks is irrelevant to whether a patient has the novel coronavirus pneumonia. In the 843 collected spectrograms, 683 spectrograms can detect all 8 quality control peaks (accounting for 81.0 percent of the total spectrograms), and 156 spectrograms can detect 7 quality control peaks (accounting for 18.5 percent of the total spectrograms). Wherein, the following spectrogram quality control conditions are set: in the spectrogram of a single sample, the quality control is qualified when the quantity of the quality control peaks is 6-8 and the deviation of the molecular weight shift of the internal standard peak is less than 0.002 (or the shift range is not more than 2 per thousand). The failing spectrum needs to be re-detected.

The invention screens out corresponding new coronary pneumonia markers and establishes a detection model for analysis and detection by combining with a bioinformatics method, wherein the bioinformatics method comprises the steps of carrying out standardization processing on a fingerprint, carrying out experimental quality control processing on obtained data, screening expected serum characteristic polypeptides and establishing a mass spectrum model, and optionally establishing and verifying the mass spectrum model by using an LR algorithm and the like. And performing experimental quality control treatment, namely retaining mass spectrum spectrogram data with the internal standard peak outgoing amount not less than 6, and performing secondary calibration on the spectrogram by using the internal standard peak.

Terms and definitions

Cross validation by ten folds, called 10-fold cross-validation by English name, is used for testing the accuracy of the algorithm. Is a commonly used test method. The data set was divided into ten parts, and 9 parts of the data set were used as training data and 1 part of the data set was used as test data in turn for the experiments. Each trial will yield a corresponding correct rate (or error rate). The average of the accuracy (or error rate) of the 10 results is used as an estimate of the accuracy of the algorithm, and generally 10-fold cross validation is performed multiple times (for example, 10 times of 10-fold cross validation), and then the average is obtained as an estimate of the accuracy of the algorithm. It should be noted that the ten-fold cross-validation accuracy correlates with but is not equivalent to the actual detection accuracy (or sensitivity). In the process of evaluating the effect of the test algorithm, the effect meets the ten-fold cross validation accuracy of the confidence interval, and if the effect presents correlation change along with the quantity of the characteristic polypeptides and reaches the feasible value of clinical diagnosis, the mass spectrum model constructed by the polypeptides is shown to meet the requirement of clinical diagnosis.

SAA protein (Serum amyloid A protein) is a Serum amyloid A family protein, an acute phase reaction protein, and belongs to a heterogeneous class of proteins in an apolipoprotein family. There are 4 serum amyloid A genes in humans, SAA1-SAA4, respectively, where two proteins, SAA1 and SAA2, which are in the acute phase (acute phase), are referred to as A-SAA.

Technical effects

Compared with the prior art, the invention has the following advantages:

1. the invention adopts a plurality of characteristic protein combinations which are different between a new coronary pneumonia patient and a normal person, a tuberculosis patient and a contrast patient with new coronary pneumonia type symptoms to detect a serum sample, and adopts a method of combining traditional statistics and a modern bioinformatics method to carry out data processing, thereby obtaining a polypeptide fingerprint spectrum detection model of the pneumonia patient, a healthy person and the contrast patient, and a series of discovered protein mass-to-charge ratio peaks provide basis and resources for searching for new more ideal markers.

2. Compared with the prior detection method, the method has higher sensitivity and specificity, simple operation, low detection cost and high flux, and is expected to be used for large-scale screening of the neocoronary pneumonia.

3. The construction method of the model is reasonable and feasible in design, provides a new screening method for providing the clinical cure rate of the new coronary pneumonia, and also provides a new idea for exploring the mechanism of the occurrence and development of the new coronary pneumonia.

4. The invention firstly provides a method for searching a plurality of characteristic protein combinations with differences in the contrast of 146 cases of patients with confirmed diagnosis of new coronary pneumonia, 46 cases of normal persons, 33 cases of tuberculosis patients and 73 cases of contrast with new coronary pneumonia type symptoms, breaks through the traditional research thought of only searching characteristic polypeptides in normal persons and new coronary pneumonia patients, and effectively avoids infection with false positive results similar to the new coronary pneumonia symptoms.

5. The result shows that the serum peptide characteristic polypeptide model can be rapidly used for screening patients with new coronary pneumonia in crowds.

6. Compared with a composition and a mass spectrum model constructed by 25 characteristic polypeptides, the newly introduced 4 characteristic polypeptides (SEQ ID NO:16-19) belong to the SAA protein marker family, and can be used as biomarkers to diagnose the bacterial and viral infections clinically by ELASA, immunoturbidimetry, colloidal gold, immunofluorescence chromatography and other methods. However, on the basis of the completed mass spectrum models of 25 specific polypeptides, the invention firstly proposes the use of the SAA protein marker for detecting viruses by laser flight mass spectrometry, and firstly and accurately identifies the specific SAA protein sequence (namely SEQ ID NO:16-19), so that the condition of clinical misdiagnosis of normal samples can be effectively avoided. The results show that compared with the ten-fold cross validation accuracy rate of 25 characteristic polypeptide mass spectrum models, the ten-fold cross validation accuracy rate is about 97.96%, and the ten-fold cross validation accuracy rate of 29 characteristic polypeptide mass spectrum models with the introduction of 4 SAA polypeptide markers is about 98.69%.

Drawings

FIG. 1: comparing serum polypeptide fingerprints of different groups (healthy people group, tuberculosis group, similar symptom group and new crown patient group), wherein negative healthy people fingerprint, negative tuberculosis fingerprint, negative similar symptom and positive new crown patient are respectively arranged from top to bottom

FIG. 2-1: the 20 peaks with the highest repetition frequency in LASSO. FIG. 2-2: the 20 peaks with the highest significance for VIP changes in PLS-DA.

FIGS. 2 to 3: the 10 peaks with the highest accuracy were cross-validated in RFECV.

FIG. 3: the left column of each characteristic peak intensity is a negative control group, and the right column is a positive control group.

FIG. 4-1: various machine learning methods, training set ROC curve comparison. FIG. 4-2: test set ROC curve comparisons.

FIG. 5: the test set of true packets confuses the predicted results of the matrix.

FIG. 6: the method is used for establishing a characteristic polypeptide mass spectrum model for rapidly screening patients with new coronary pneumonia (COVID-19).

FIG. 7: the mass spectrum peak map of the characteristic polypeptide m/z 5157.6 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 8: the mass spectrum peak map of the characteristic polypeptide m/z 5366.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 9: the mass spectrum peak map of the characteristic polypeptide m/z 5892.9 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 10: the mass spectrum peak map of the characteristic polypeptide m/z 6357.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 11: the mass spectrum peak map of the characteristic polypeptide m/z 6654.0 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 12: the mass spectrum peak map of the characteristic polypeptide m/z 6939.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 13: the mass spectrum peak map of the characteristic polypeptide m/z 7364.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 14: the mass spectrum peak map of the characteristic polypeptide m/z 7614.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 15: the mass spectrum peak map of the characteristic polypeptide m/z 8034.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 16: the mass spectrum peak map of the characteristic polypeptide m/z 8042.7 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 17: the mass spectrum peak map of the characteristic polypeptide m/z 8226.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 18: the mass spectrum peak map of the characteristic polypeptide m/z 8424.9 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 19: the mass spectrum peak map of the characteristic polypeptide m/z 8559.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 20: the mass spectrum peak map of the characteristic polypeptide m/z 8986.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 21: the mass spectrum peak map of the characteristic polypeptide m/z 9626.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 22: the mass spectrum peak map of the characteristic polypeptide m/z 13719.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 23: the mass spectrum peak map of the characteristic polypeptide m/z 13765.2 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 24: the mass spectrum peak map of the characteristic polypeptide m/z 13886.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 25: the mass spectrum peak map of the characteristic polypeptide m/z 14049.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 26: the mass spectrum peak map of the characteristic polypeptide m/z 14094.7 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 27 is a schematic view showing: the mass spectrum peak map of the characteristic polypeptide m/z 14101.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 28: the mass spectrum peak map of the characteristic polypeptide m/z 15123.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 29: the mass spectrum peak map of the characteristic polypeptide m/z 15866.5 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 30: the mass spectrum peak map of the characteristic polypeptide m/z 28091.4 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 31: the mass spectrum peak map of the characteristic polypeptide m/z 28231.5 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 32: the mass spectrum peak map of the characteristic polypeptide m/z 11435.1 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 33: the mass spectrum peak map of the characteristic polypeptide m/z 11495.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 34: the mass spectrum peak map of the characteristic polypeptide m/z 11522.8 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

FIG. 35: the mass spectrum peak map of the characteristic polypeptide m/z 11680.3 is shown as the upper graph which is a non-new crown control mass spectrum map, and the lower graph is a COVID-19 mass spectrum map.

Detailed Description

The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

EXAMPLE 1 sample treatment

Serum samples from 146 patients diagnosed in Chongqing hospital, 2 months 2020, all patients were positive for nucleic acid detection and were classified strictly according to the southward standard.

Classification was done according to the following criteria:

(1) and (3) light: the clinical symptoms are slight, and no pneumonia is shown in the imaging;

(2) the general type is as follows: has fever and respiratory symptoms, and the imaging shows pneumonia expression;

(3) heavy: dyspnea, respiration rate is more than or equal to 30 times/min, oxygen saturation under static state is less than or equal to 93%, and arterial blood partial pressure (PaO 2)/oxygen concentration (FiO2) is less than or equal to 300 mmHg;

(4) the critical type is respiratory failure, a breathing machine is needed, shock occurs, and other organ failure occurs, and the critical type is sent to an ICU for rescue.

The 152 serum samples of non-new coronary pneumonia as controls were from a Chongqing hospital at 3 months of 2020, including 46 normal persons, 33 tuberculosis patient controls, and 73 controls with symptoms of the new coronary pneumonia type.

All samples were drawn on empty stomach in the morning before food was consumed, filled into unadditized vacuum serum collection tubes, centrifuged at 2,264g for 10min, incubated at 56 ℃ for 30min, and the serum samples were frozen at-80 ℃.

Pretreatment of a serum sample by mass spectrum: before the mass spectrometric detection experiment, 1 tube of each of the dispensed serum samples was taken from a low-temperature refrigerator and placed on wet ice. Thawing for 60-90 min. Sucking 5uL of serum sample, adding 45uL of sample treatment solution, and vortexing at 1200rpm for 30 s; sucking 10uL of the treated sample solution, adding 10uL of the prepared matrix solution, and carrying out vortex for 30s at 1200 rpm; dropping 1uL of the mixed solution on a target plate, repeating three experiments of each sample according to the required key points, and naturally drying the sample to perform mass spectrometry.

Example 2 establishment of Mass Spectrometry model for MALDI-TOF-MS

(I) sample preparation

5ul of serum from each sample was diluted in 45ul of sample treatment fluid (Bioyong Technologies Inc.). Then 10ul of diluted serum was removed and mixed with 10ul of matrix solution (Bioyong Technologies Inc.).

2ul of the mixture was taken out and dropped on a stainless steel target plate. After drying at room temperature, the sample was injected into a MALDI-TOF MS mass spectrometer (Clin-TOF-II; Bioyong Technologies Inc.). Each sample was tested in parallel 3 times.

The matrix-assisted laser desorption time-of-flight mass spectrum Clin-TOF and the experimental polypeptide mass spectrum universal pretreatment kit are developed by the company Bioyong in China. The data was preprocessed using maldquant program, square root transformed on the processed data, smoothed using filter fitting, and baseline corrected. The mass spectrometer was calibrated with a mixture of polypeptide proteins of known molecular weight. The mass drift of the calibrant should be within 500 ppm. 500 spectra were taken for each sample point. The molecular weight collection range is m/z 3000-30000.

The mass spectrum of different groups of samples is shown in figure 1 (figure 1: comparison of serum polypeptide fingerprints of different groups, wherein negative healthy people, negative tuberculosis, negative similar symptoms and positive new coronary patients are respectively shown from top to bottom). In a negative healthy human spectrum, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11523m/z, 15123m/z, 15867m/z, 28091m/z are low, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are high. In the negative tuberculosis spectrum, the peak intensities of 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z and 28091m/z are lower, while the peak intensities of 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z and 28232m/z are higher. In the similar negative symptom group spectra, the peak intensities of 5158m/z, 5366m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 15123m/z, 15867m/z, 28091m/z were lower, while the peak intensities of 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z were higher. In the positive neocoronal patient spectra, the peak intensities were higher for 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and lower for 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232 m/z.

(II) Mass Spectrometry data acquisition

A Clin-TOF mass spectrometer was used. And setting proper laser energy to collect a certain point of the crystallization point of the sample. And selecting 50 laser bombardment positions for each sample point, bombarding each position for 10 times, namely performing laser bombardment on each sample crystallization point for 500 times, and collecting a spectrogram. Laser frequency: 30 Hz. Data collection range: 3-30 KDa. External standard calibration with standards before each sample crystallization point was taken with average molecular weight deviation less than 500 ppm.

Experiment quality control:

(1) and detecting a blank matrix crystallization point by using the same mass spectrum parameters, and if an obvious mass spectrum peak appears, considering that the quality of the matrix solution is unqualified, and replacing a new matrix.

(2) When the standard substance is used for external standard calibration, the mass deviation of different calibration substance points is required to be ensured not to exceed 500ppm, and 5 calibration substance peaks are required to meet the requirements at the same time.

(3) And selecting original polypeptide peaks in 8 serums as internal standard quality control peaks. And if 6-8 internal standard peaks can be detected and the molecular weight deviation range of the internal standard peaks does not exceed 2 per mill, determining that the spectrogram is qualified. Otherwise, the spectrogram is required to be collected again. Internal standard peaks m/z are as follows: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z.

(III) preprocessing of raw data

And performing internal standard secondary calibration on the MALDI-TOF raw data by using internal standard calibration software, and storing the internal standard secondary calibration data as a txt format file. Internal standard peaks m/z used were: 6426m/z, 6623m/z, 8753m/z, 8785m/z, 8904m/z, 9118m/z, 9409m/z, 9700 m/z. The spectra were then processed using the maldquant program. The spectrogram processing content includes smoothing, baseline correction, and molecular weight calibration. Peak detection was performed with a signal-to-noise ratio of 3. The peaks are bin processed using the bin peaks command with a fault tolerance of 0.002. Peaks with peak frequencies not less than 25% in the group were retained. Finally, the resulting matrix was used for the following analysis.

After log2 transformation, the peak intensity matrix is quantile normalized to R-package limma. In all samples, the missing values are filled with the minimum value. COVID-19 patient data and control sample data were randomly assigned to the training and testing groups at a ratio of 2: 1.

(IV) selection of characteristic proteins

After intensity normalization and deficiency value normalization, the peaks of the training set were analyzed by three machine learning methods: LASSO Algorithm (LASSO), partial least squares regression analysis (PLS-DA), and recursive feature elimination with cross validation (RFECV). LASSO is called the blast absolute shrinkage and selection operator, and is a compression estimation. It obtains a more refined model by constructing a penalty function, so that it compresses some regression coefficients, i.e. the sum of the absolute values of the forcing coefficients is less than a certain fixed value; while some regression coefficients are set to zero. The advantage of subset puncturing is thus retained, and is a way to process biased estimates of data with complex collinearity.

FIG. 2-1 shows the 20 peaks with the highest repetition frequency in LASSO. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. Partial least squares discriminant analysis (PLS-DA) is a multivariate statistical analysis method used for discriminant analysis. Discriminant analysis is a common statistical analysis method for determining how to classify a subject based on observed or measured values of variables. The principle is that the characteristics of different processing samples (such as an observation sample and a comparison sample) are respectively trained to generate a training set, and the reliability of the training set is checked.

FIG. 2-2 shows the 20 peaks in PLS-DA where the significance of VIP changes is highest. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak. RFECV refers to finding the optimal number of features by cross-validation. Wherein RFE (recurvefeature elimination) refers to recursive feature elimination, which is used to rank the importance of features. Cv (cross validation) refers to cross validation, i.e., after feature ranking, an optimal number of features are selected by cross validation. Fig. 2-3 show the 10 peaks with the highest cross-validation accuracy in RFECV. Wherein the vertical axis is the mass-to-nuclear ratio of each preferred characteristic peak.

And through empirical test of the original spectrogram of the selected peak, 29 peaks qualified in quality control are screened out as features. The intensities of the characteristic peaks are shown in FIG. 3. Each row in the graph represents a characteristic peak, each column represents a spectrogram data, and the shade of color in the graph represents the intensity of the peak. The left column is a negative control group, and the right column is a positive group. It can be seen that peaks of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z are generally expressed in the negative group more than in the positive group, while peaks of signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 91m/z are generally expressed in the positive group more than in the negative group. The intensity of these peaks was significantly different between COVID-19 and the control group.

(V) model Algorithm

We try to build a model by using 29 characteristic peaks of training set data by using 8 machine learning methods, and evaluate the model result by cross validation accuracy. The analyzed 8-machine learning method is as follows: logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), naive bayes method (NB), gradient descent tree (GBDT), K-nearest neighbor algorithm (KNN), Decision Tree (DT) and adaptive boost algorithm (Adaboost).

FIGS. 4-1 and 4-2 show the model results for the training and test sets, respectively, in the form of ROC curves. The ROC curve is a curve drawn based on a series of different binary classification methods (cut-off values or decision thresholds) with true positive rate (sensitivity) as ordinate and false positive rate (1-specificity) as abscissa. The area under the ROC curve (AUC) of each test is calculated and compared, and the diagnosis value of each test is the best when the AUC of each test is the maximum. In this study, the AUC of the area under the ROC curve for all models in the training set was greater than 0.99, where the AUC for LR, SVM, GBDT, DT and Adaboost was 1 (fig. 4-1). In ROC curve analysis of the validation set data, it was found that AUC of 8 models obtained by 8 machine learning methods exceeded 0.94 in the test set, and AUC was 1 in the case of LR model (FIG. 4-2). After the accuracy, recall rate, precision, F1, sensitivity and specificity of 8 models were evaluated, the LR model was found to have the best classification performance (AUC 1, sensitivity 98%, specificity 100%, accuracy 99%, precision 99%, recall 99%, F1 99%), and was further applied to the detection of codv-19.

The confusion matrix of the LR model in the test set is shown in FIG. 5, in which the vertical axis represents the real grouping of samples, the upper row represents the number of negative samples, and the lower row represents the number of positive samples; the horizontal axis represents the model prediction result, the left column represents the number of samples determined to be negative by the model, and the right column represents the number of samples determined to be positive by the model. All of the 51 negative samples were judged to be negative, and the negative sample judgment accuracy (i.e., model specificity) was 100%; of the 49 positive samples, 1 was judged as negative by mistake, 48 were judged as positive, and the positive sample judgment accuracy (i.e., the model sensitivity) was 98.0%.

TABLE 1 mean and quartering distances of 29 signature polypeptides in the training set in each group

The specific process for establishing the characteristic polypeptide mass spectrum model for rapidly screening patients with new coronary pneumonia (COVID-19) is shown in FIG. 6. The process comprises the following steps: (1) collecting new coronary pneumonia patients and negative control populations respectively and collecting serum samples; (2) performing mass spectrum pretreatment on the serum sample by using the kit; (3) MALDI-TOF MS mass spectrometry detection is carried out to obtain spectrogram information; (4) processing the spectrogram and obtaining a peak list; (5) bioinformatics analysis; and (6) determining a mass spectrum model.

Example 3 establishment of a model for screening patients with New coronary pneumonia

198 of 298 serum samples (146 from diagnosed new coronary pneumonia patients, another 46 normal persons, 33 tuberculosis patient controls and 73 controls with similar symptoms of new coronary pneumonia (fever cough)) were selected as training samples for model building, of which 97 were from new coronary pneumonia patients and 34 were from normal persons, 19 were from tuberculosis patient controls and 48 were from patients with similar symptoms of new coronary pneumonia. All serum samples were drawn on early morning fasts, serum was isolated and virus inactivated and stored in a-80 ℃ cold box.

The remaining samples (49 patients with new coronary pneumonia, 12 normal persons, 14 tuberculosis, 25 similar symptoms of new coronary pneumonia) were used as validation samples for blind selection test. The processing method is the same as above.

And (3) establishing a mass spectrum model of the new coronary pneumonia polypeptide by using the serum characteristic polypeptide peak of the new coronary pneumonia patient screened in the example 1-2. The model is determined to adopt 29 characteristic peaks, which are respectively: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.

The characteristic mass spectrum peak spectrogram of the characteristic polypeptide is shown in figures 7-35.

The training and validation set AUC of the LR model were both greater than 0.99. The accuracy of the test set is 99%, the sensitivity is 98%, and the specificity is 100%. The model has good prediction capability.

TABLE 2 model training results

Sample(s)	Number of examples	Predicting the new coronary pneumonia	Predicting as a non-new crownPneumonia of lung	Prediction accuracy%
					Patient group	97	97	0	100.00
Normal group	34	0	34	100.00
					Pulmonary tuberculosis group	19	0	19	100.00
Symptom-like group	48	0	48	100.00
					In total	198			100.00

From the above table it can be seen that the results for the training set samples are: 34 of the 34 normal groups are judged correctly, and the specificity is 100.00%; 97 of 97 patients were judged correctly with 100.00% sensitivity; 19 of 19 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 48 patients with similar symptoms, 48 were judged correctly with a sensitivity of 100.00%.

Example 4 identification of novel coronary pneumonia signature Polypeptides

After the peak to be identified is determined in examples 2 and 3, 7 serum samples with different peak intensities to be identified in the pre-processed sample are searched. After DTT reduction of the sample, proteins with molecular weight of more than 50kDa are removed by ultrafiltration and centrifugation. The filtered small molecule protein/polypeptide was separated by tricine-SDS-PAGE. And carrying out secondary mass spectrum identification on each band after carrying out intracorporeal enzyme digestion.

Polypeptide sequence identification was performed using a nano-LC-MS/MS platform, including nanoflow HPLC (Thermo Fisher Scientific, USA) and Q-active mass spectrometer (Thermo Fisher Scientific, USA). The ion mode is positive ion mode, and the scanning range is 300-1400 m/z. The resolution of the primary mass spectrum is 70000, and the resolution of the secondary mass spectrum is 17500.

Liquid phase analytical column: the model is as follows: exil Pure 120C18(dr. maisch GmbH, USA); specification: 360 μm × 12 cm; inner diameter: 150 μm; granulating: 1.9 um. And (3) an elution mode: the mobile phase was eluted linearly from 7% B solution (80% acetonitrile, 0.1% formic acid) to 45% B solution. Flow rate: 600 nl/min; the total time was 38 minutes. The results are shown in tables 3 and 4.

TABLE 3 characterization of the Peak polypeptide

m/z	Name of Gene	Name of protein
			5158	H2AJ	Histone H2A.J
6357	S100A7	Protein S100-A7
			6654	IGLL5	Immunoglobulin lambda-like polypeptide 5
6939	UBB	Polyubiquitin-B
			7364	IGKV3-7	Probable non-functional immunoglobulin kappa variable 3-7
7614	PF4V1	Platelet factor 4 variant
			8034	IGKV3-15	Immunoglobulin kappa variable 3-15
8226	CFI	Complement factor I
			8986	RAB7A	Ras-related protein Rab-7a
9626	ELANE	Neutrophil elastase
			13719	B2M	Beta-2-microglobulin
13765	TTR	Transthyretin
			13886	PPBP	Platelet basic protein
14049	DUSP14	Dual specificity protein phosphatase 14
			14095	H2AC11	Histone H2A type 1
14102	H2AC6	Histone H2A type 1-C
			15123	HBA1	Hemoglobin subunit alpha
15867	HBB	Hemoglobin subunit beta
			28091	WRAP73	WD repeat-containing proteinW RAP73
11435	SAA1	Serum amyloid A-1 protein
			11495	SAA2	Serum amyloid A-2 protein
11523	SAA1	Serum amyloid A-1 protein
			11680	SAA1	Serum amyloid A-1 protein

TABLE 4 polypeptide identification sequences

Example 5 Blind selection test of New coronary pneumonia patient screening model

After the model training is completed, a model of the input variables related to the 25 signature polypeptide fragments of SEQ ID NO. 1-15, a model of the input variables related to the 29 signature polypeptide fragments of SEQ ID NO. 1-19, and a model of the input variables of the 19 signature polypeptide fragments (i.e. sequences SEQ 1-19) after the sequencing is completed are established.

According to the method of example 3, 49 new patients with coronary pneumonia, 12 normal persons, 14 tuberculosis patients and 21 samples of type symptoms are blindly selected and predicted by the three models, and the types of the samples are judged, and the method is the same as that described in the above example. The results are shown in tables 5-1, 5-2 and 5-3, respectively.

TABLE 5-1 prediction of test sample results by 25 variables

From Table 5-1, it can be seen that the results for the test group samples are: 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 48 of 49 patients judged correctly with sensitivity of 97.96%; 14 of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 25 patients with similar symptoms, 25 were judged correctly with a sensitivity of 100.00%.

TABLE 5-2 prediction of test samples by 29 variables

Sample(s)	Number of examples	Predicting the new coronary pneumonia	Prediction of non-neocoronary pneumonia	Prediction accuracy%
					Patient group	49	48	1	97.96
Normal group	12	0	12	100.00
					Pulmonary tuberculosis group	14	0	14	100.00
Symptom-like group	25	0	25	100.00
					Total of	100			99.00

From Table 5-2, it can be seen that the results for the test group samples are: 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 48 of 49 patients judged correctly with sensitivity of 97.96%; 14 of 14 tuberculosis patients were judged correctly with a sensitivity of 100.00%; of the 25 patients with similar symptoms, 25 were judged correctly with a sensitivity of 100.00%.

As is clear from tables 5-1 and 5-2, the prediction accuracy of both samples for 100 identical samples meets the criteria for clinical diagnosis. Although the accuracy rates are the same, the possible reason is that the number of cases of patients to be detected is too small in China, so that the differentiation degree is not displayed. However, according to the accuracy of the ten-fold cross validation, it can be predicted that the mass spectrum diagnosis model using 29 characteristic polypeptides will show higher accuracy as the number of the patients to be detected increases.

TABLE 5-3 prediction of test samples by 19 variables

Sample(s)	Number of examples	Predicting the new coronary pneumonia	Prediction of non-neocoronary pneumonia	Prediction accuracy%
					Patient group	49	46	3	93.88
Normal group	12	0	12	100.00
					Pulmonary tuberculosis group	14	0	14	100.00
Symptom-like group	25	4	21	84.00
					Total of	100			93.00

From tables 5-3, it can be seen that the results for the test group samples are: 46 of 49 new crown patients judged correctly with 93.88% sensitivity; 12 of the 12 normal groups were judged correctly with a specificity of 100.00%; 14 of 14 tuberculosis patients were judged correctly with a specificity of 100.00%; 21 of the 25 patients with similar symptoms were judged correctly and had a sensitivity of 84.00%. This indicates that the model composed of 19 input variables for characteristic polypeptides has the same specificity as the test results for the complete variable for healthy persons and tuberculosis patients, and the other two groups have few misjudgments. This model has met the need for rapid clinical screening of patients with confirmed diagnosis.

In addition, as can be seen from the above table: the blind selection detection accuracy of the complete variables of 29 characteristic polypeptides for the new coronary pneumonia group is basically the same as that of model training, but the prediction result for the non-new coronary pneumonia group reaches 100%, which indicates that in the result after the model training, an experimenter can completely eliminate false positive results through fine optimization, and the result indicates that the diagnosis result for the positive results is real and credible, and the missed diagnosis and/or misdiagnosis is avoided to the maximum extent, so that the method has positive significance.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and improvements can be made without departing from the technical principle of the present invention, and these modifications and improvements should also be regarded as the protection scope of the present invention.

Sequence listing

Characteristic polypeptide composition for diagnosing new coronary pneumonia

<120> characteristic polypeptide composition for diagnosing neocoronary pneumonia

<160> 19

<170> SIPOSequenceListing 1.0

<210> 1

<211> 61

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 1

Thr Ile Thr Leu Glu Val Glu Pro Ser Asp Thr Ile Glu Asn Val Lys

1 5 10 15

Ala Lys Ile Gln Asp Lys Glu Gly Ile Pro Pro Asp Gln Gln Arg Leu

20 25 30

Ile Phe Ala Gly Lys Gln Leu Glu Asp Gly Arg Thr Leu Ser Asp Tyr

35 40 45

Asn Ile Gln Lys Glu Ser Thr Leu His Leu Val Leu Arg

50 55 60

<210> 2

<211> 68

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 2

Glu Glu Asp Gly Asp Leu Gln Cys Leu Cys Val Lys Thr Thr Ser Gln

1 5 10 15

Val Arg Pro Arg His Ile Thr Ser Leu Glu Val Ile Lys Ala Gly Pro

20 25 30

His Cys Pro Thr Ala Gln Leu Ile Ala Thr Leu Lys Asn Gly Arg Lys

35 40 45

Ile Cys Leu Asp Leu Gln Ala Leu Leu Tyr Lys Lys Ile Ile Lys Glu

50 55 60

His Leu Glu Ser

65

<210> 3

<211> 74

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 3

Met Glu Ala Pro Ala Gln Leu Leu Phe Leu Leu Leu Leu Trp Leu Pro

1 5 10 15

Asp Thr Thr Gly Glu Ile Val Met Thr Gln Ser Pro Ala Thr Leu Ser

20 25 30

Val Ser Pro Gly Glu Arg Ala Thr Leu Ser Cys Arg Ala Ser Gln Ser

35 40 45

Val Ser Ser Asn Leu Ala Trp Tyr Gln Gln Lys Pro Gly Gln Ala Pro

50 55 60

Arg Leu Leu Ile Tyr Gly Ala Ser Thr Arg

65 70

<210> 4

<211> 69

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 4

Met Lys Leu Leu His Val Phe Leu Leu Phe Leu Cys Phe His Leu Arg

1 5 10 15

Phe Cys Lys Val Thr Tyr Thr Ser Gln Glu Asp Leu Val Glu Lys Lys

20 25 30

Cys Leu Ala Lys Lys Tyr Thr His Leu Ser Cys Asp Lys Val Phe Cys

35 40 45

Gln Pro Trp Gln Arg Cys Ile Glu Gly Thr Cys Val Cys Lys Leu Pro

50 55 60

Tyr Gln Cys Pro Lys

65

<210> 5

<211> 79

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 5

Met Thr Ser Arg Lys Lys Val Leu Leu Lys Val Ile Ile Leu Gly Asp

1 5 10 15

Ser Gly Val Gly Lys Thr Ser Leu Met Asn Gln Tyr Val Asn Lys Lys

20 25 30

Phe Ser Asn Gln Tyr Lys Ala Thr Ile Gly Ala Asp Phe Leu Thr Lys

35 40 45

Glu Val Met Val Asp Asp Arg Leu Val Thr Met Gln Ile Trp Asp Thr

50 55 60

Ala Gly Gln Glu Arg Phe Gln Ser Leu Gly Val Ala Phe Tyr Arg

65 70 75

<210> 6

<211> 91

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 6

Met Thr Leu Gly Arg Arg Leu Ala Cys Leu Phe Leu Ala Cys Val Leu

1 5 10 15

Pro Ala Leu Leu Leu Gly Gly Thr Ala Leu Ala Ser Glu Ile Val Gly

20 25 30

Gly Arg Arg Ala Arg Pro His Ala Trp Pro Phe Met Val Ser Leu Gln

35 40 45

Leu Arg Gly Gly His Phe Cys Gly Ala Thr Leu Ile Ala Pro Asn Phe

50 55 60

Val Met Ser Ala Ala His Cys Val Ala Asn Val Asn Val Arg Ala Val

65 70 75 80

Arg Val Val Leu Gly Ala His Asn Leu Ser Arg

85 90

<210> 7

<211> 119

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 7

Met Ser Arg Ser Val Ala Leu Ala Val Leu Ala Leu Leu Ser Leu Ser

1 5 10 15

Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Ile Gln Val Tyr Ser Arg

20 25 30

His Pro Ala Glu Asn Gly Lys Ser Asn Phe Leu Asn Cys Tyr Val Ser

35 40 45

Gly Phe His Pro Ser Asp Ile Glu Val Asp Leu Leu Lys Asn Gly Glu

50 55 60

Arg Ile Glu Lys Val Glu His Ser Asp Leu Ser Phe Ser Lys Asp Trp

65 70 75 80

Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Thr Pro Thr Glu Lys Asp

85 90 95

Glu Tyr Ala Cys Arg Val Asn His Val Thr Leu Ser Gln Pro Lys Ile

100 105 110

Val Lys Trp Asp Arg Asp Met

115

<210> 8

<211> 127

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 8

Gly Pro Thr Gly Thr Gly Glu Ser Lys Cys Pro Leu Met Val Lys Val

1 5 10 15

Leu Asp Ala Val Arg Gly Ser Pro Ala Ile Asn Val Ala Val His Val

20 25 30

Phe Arg Lys Ala Ala Asp Asp Thr Trp Glu Pro Phe Ala Ser Gly Lys

35 40 45

Thr Ser Glu Ser Gly Glu Leu His Gly Leu Thr Thr Glu Glu Glu Phe

50 55 60

Val Glu Gly Ile Tyr Lys Val Glu Ile Asp Thr Lys Ser Tyr Trp Lys

65 70 75 80

Ala Leu Gly Ile Ser Pro Phe His Glu His Ala Glu Val Val Phe Thr

85 90 95

Ala Asn Asp Ser Gly Pro Arg Arg Tyr Thr Ile Ala Ala Leu Leu Ser

100 105 110

Pro Tyr Ser Tyr Ser Thr Thr Ala Val Val Thr Asn Pro Lys Glu

115 120 125

<210> 9

<211> 128

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 9

Met Ser Leu Arg Leu Asp Thr Thr Pro Ser Cys Asn Ser Ala Arg Pro

1 5 10 15

Leu His Ala Leu Gln Val Leu Leu Leu Leu Ser Leu Leu Leu Thr Ala

20 25 30

Leu Ala Ser Ser Thr Lys Gly Gln Thr Lys Arg Asn Leu Ala Lys Gly

35 40 45

Lys Glu Glu Ser Leu Asp Ser Asp Leu Tyr Ala Glu Leu Arg Cys Met

50 55 60

Cys Ile Lys Thr Thr Ser Gly Ile His Pro Lys Asn Ile Gln Ser Leu

65 70 75 80

Glu Val Ile Gly Lys Gly Thr His Cys Asn Gln Val Glu Val Ile Ala

85 90 95

Thr Leu Lys Asp Gly Arg Lys Ile Cys Leu Asp Pro Asp Ala Pro Arg

100 105 110

Ile Lys Lys Ile Val Gln Lys Lys Leu Ala Gly Asp Glu Ser Ala Asp

115 120 125

<210> 10

<211> 123

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 10

Val Pro Leu Ala Asp Met Pro His Ala Pro Ile Gly Leu Tyr Phe Asp

1 5 10 15

Thr Val Ala Asp Lys Ile His Ser Val Ser Arg Lys His Gly Ala Thr

20 25 30

Leu Val His Cys Ala Ala Gly Val Ser Arg Ser Ala Thr Leu Cys Ile

35 40 45

Ala Tyr Leu Met Lys Phe His Asn Val Cys Leu Leu Glu Ala Tyr Asn

50 55 60

Trp Val Lys Ala Arg Arg Pro Val Ile Arg Pro Asn Val Gly Phe Trp

65 70 75 80

Arg Gln Leu Ile Asp Tyr Glu Arg Gln Leu Phe Gly Lys Ser Thr Val

85 90 95

Lys Met Val Gln Thr Pro Tyr Gly Ile Val Pro Asp Val Tyr Glu Lys

100 105 110

Glu Ser Arg His Leu Met Pro Tyr Trp Gly Ile

115 120

<210> 11

<211> 130

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 11

Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys

1 5 10 15

Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His

20 25 30

Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala

35 40 45

Pro Val Tyr Met Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu

50 55 60

Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile

65 70 75 80

Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys

85 90 95

Leu Leu Gly Lys Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile

100 105 110

Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys

115 120 125

Gly Lys

130

<210> 12

<211> 130

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 12

Met Ser Gly Arg Gly Lys Gln Gly Gly Lys Ala Arg Ala Lys Ala Lys

1 5 10 15

Ser Arg Ser Ser Arg Ala Gly Leu Gln Phe Pro Val Gly Arg Val His

20 25 30

Arg Leu Leu Arg Lys Gly Asn Tyr Ala Glu Arg Val Gly Ala Gly Ala

35 40 45

Pro Val Tyr Leu Ala Ala Val Leu Glu Tyr Leu Thr Ala Glu Ile Leu

50 55 60

Glu Leu Ala Gly Asn Ala Ala Arg Asp Asn Lys Lys Thr Arg Ile Ile

65 70 75 80

Pro Arg His Leu Gln Leu Ala Ile Arg Asn Asp Glu Glu Leu Asn Lys

85 90 95

Leu Leu Gly Arg Val Thr Ile Ala Gln Gly Gly Val Leu Pro Asn Ile

100 105 110

Gln Ala Val Leu Leu Pro Lys Lys Thr Glu Ser His His Lys Ala Lys

115 120 125

Gly Lys

130

<210> 13

<211> 141

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 13

Val Leu Ser Pro Ala Asp Lys Thr Asn Val Lys Ala Ala Trp Gly Lys

1 5 10 15

Val Gly Ala His Ala Gly Glu Tyr Gly Ala Glu Ala Leu Glu Arg Met

20 25 30

Phe Leu Ser Phe Pro Thr Thr Lys Thr Tyr Phe Pro His Phe Asp Leu

35 40 45

Ser His Gly Ser Ala Gln Val Lys Gly His Gly Lys Lys Val Ala Asp

50 55 60

Ala Leu Thr Asn Ala Val Ala His Val Asp Asp Met Pro Asn Ala Leu

65 70 75 80

Ser Ala Leu Ser Asp Leu His Ala His Lys Leu Arg Val Asp Pro Val

85 90 95

Asn Phe Lys Leu Leu Ser His Cys Leu Leu Val Thr Leu Ala Ala His

100 105 110

Leu Pro Ala Glu Phe Thr Pro Ala Val His Ala Ser Leu Asp Lys Phe

115 120 125

Leu Ala Ser Val Ser Thr Val Leu Thr Ser Lys Tyr Arg

130 135 140

<210> 14

<211> 146

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 14

Val His Leu Thr Pro Glu Glu Lys Ser Ala Val Thr Ala Leu Trp Gly

1 5 10 15

Lys Val Asn Val Asp Glu Val Gly Gly Glu Ala Leu Gly Arg Leu Leu

20 25 30

Val Val Tyr Pro Trp Thr Gln Arg Phe Phe Glu Ser Phe Gly Asp Leu

35 40 45

Ser Thr Pro Asp Ala Val Met Gly Asn Pro Lys Val Lys Ala His Gly

50 55 60

Lys Lys Val Leu Gly Ala Phe Ser Asp Gly Leu Ala His Leu Asp Asn

65 70 75 80

Leu Lys Gly Thr Phe Ala Thr Leu Ser Glu Leu His Cys Asp Lys Leu

85 90 95

His Val Asp Pro Glu Asn Phe Arg Leu Leu Gly Asn Val Leu Val Cys

100 105 110

Val Leu Ala His His Phe Gly Lys Glu Phe Thr Pro Pro Val Gln Ala

115 120 125

Ala Tyr Gln Lys Val Val Ala Gly Val Ala Asn Ala Leu Ala His Lys

130 135 140

Tyr His

145

<210> 15

<211> 257

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 15

Ile Leu Leu Tyr Ser Leu Asp Gly Arg Leu Leu Ser Thr Tyr Ser Ala

1 5 10 15

Tyr Glu Trp Ser Leu Gly Ile Lys Ser Val Ala Trp Ser Pro Ser Ser

20 25 30

Gln Phe Leu Ala Val Gly Ser Tyr Asp Gly Lys Val Arg Ile Leu Asn

35 40 45

His Val Thr Trp Lys Met Ile Thr Glu Phe Gly His Pro Ala Ala Ile

50 55 60

Asn Asp Pro Lys Ile Val Val Tyr Lys Glu Ala Glu Lys Ser Pro Gln

65 70 75 80

Leu Gly Leu Gly Cys Leu Ser Phe Pro Pro Pro Arg Ala Gly Ala Gly

85 90 95

Pro Leu Pro Ser Ser Glu Ser Lys Tyr Glu Ile Ala Ser Val Pro Val

100 105 110

Ser Leu Gln Thr Leu Lys Pro Val Thr Asp Arg Ala Asn Pro Lys Ile

115 120 125

Gly Ile Gly Met Leu Ala Phe Ser Pro Asp Ser Tyr Phe Leu Ala Thr

130 135 140

Arg Asn Asp Asn Ile Pro Asn Ala Val Trp Val Trp Asp Ile Gln Lys

145 150 155 160

Leu Arg Leu Phe Ala Val Leu Glu Gln Leu Ser Pro Val Arg Ala Phe

165 170 175

Gln Trp Asp Pro Gln Gln Pro Arg Leu Ala Ile Cys Thr Gly Gly Ser

180 185 190

Arg Leu Tyr Leu Trp Ser Pro Ala Gly Cys Met Ser Val Gln Val Pro

195 200 205

Gly Glu Gly Asp Phe Ala Val Leu Ser Leu Cys Trp His Leu Ser Gly

210 215 220

Asp Ser Met Ala Leu Leu Ser Lys Asp His Phe Cys Leu Cys Phe Leu

225 230 235 240

Glu Thr Glu Ala Val Val Gly Thr Ala Cys Arg Gln Leu Gly Gly His

245 250 255

Thr

<210> 16

<211> 102

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 16

Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met Trp

1 5 10 15

Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp Lys

20 25 30

Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro Gly

35 40 45

Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile Gln

50 55 60

Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala Ala

65 70 75 80

Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro Ala

85 90 95

Gly Leu Pro Glu Lys Tyr

100

<210> 17

<211> 103

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 17

Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met

1 5 10 15

Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp

20 25 30

Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro

35 40 45

Gly Gly Ala Trp Ala Ala Glu Val Ile Ser Asn Ala Arg Glu Asn Ile

50 55 60

Gln Arg Leu Thr Gly Arg Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala

65 70 75 80

Ala Asn Lys Trp Gly Arg Ser Gly Arg Asp Pro Asn His Phe Arg Pro

85 90 95

Ala Gly Leu Pro Glu Lys Tyr

100

<210> 18

<211> 103

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 18

Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp Met

1 5 10 15

Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser Asp

20 25 30

Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly Pro

35 40 45

Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn Ile

50 55 60

Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln Ala

65 70 75 80

Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg Pro

85 90 95

Ala Gly Leu Pro Glu Lys Tyr

100

<210> 19

<211> 104

<212> PRT

<213> 2 Ambystoma laterale x Ambystoma jeffersonianum

<400> 19

Arg Ser Phe Phe Ser Phe Leu Gly Glu Ala Phe Asp Gly Ala Arg Asp

1 5 10 15

Met Trp Arg Ala Tyr Ser Asp Met Arg Glu Ala Asn Tyr Ile Gly Ser

20 25 30

Asp Lys Tyr Phe His Ala Arg Gly Asn Tyr Asp Ala Ala Lys Arg Gly

35 40 45

Pro Gly Gly Val Trp Ala Ala Glu Ala Ile Ser Asp Ala Arg Glu Asn

50 55 60

Ile Gln Arg Phe Phe Gly His Gly Ala Glu Asp Ser Leu Ala Asp Gln

65 70 75 80

Ala Ala Asn Glu Trp Gly Arg Ser Gly Lys Asp Pro Asn His Phe Arg

85 90 95

Pro Ala Gly Leu Pro Glu Lys Tyr

100

Claims

1. A signature polypeptide composition for diagnosing neocoronary pneumonia, the signature polypeptide composition comprising 25 signature polypeptides having the following mass-to-charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 15123m/z, 15867m/z, 28091m/z, 28232m/z, or 29 characteristic polypeptides having the following mass to charge ratios: 5158m/z, 5366m/z, 5893m/z, 6357m/z, 6654m/z, 6939m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 02m/z, 15123m/z, 15867m/z, 28091m/z, 28232 m/z.

2. The composition of claim 1, wherein the composition comprises 19 signature polypeptides having the following mass-to-charge ratios and polypeptide sequences:

3. The composition of claim 2, wherein when the peaks of signature polypeptides 8986m/z, 28091m/z are up-regulated and the peaks of signature polypeptides 6939m/z, 13886m/z, 14049m/z, 14102m/z are down-regulated, the serum sample is determined to be a positive sample, i.e., the patient is determined to be a new coronary pneumonia patient with a cross-validation accuracy of about 91% in ten folds.

4. The composition of claim 3, wherein the composition of signature polypeptides comprises only signature polypeptides in a mass ratio of 8986m/z, 28091m/z, and 6939m/z, 13886m/z, 14049m/z, 14102m/z, respectively.

5. The composition of claim 2, wherein a peak up-regulation of signature polypeptides 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and a peak down-regulation of signature polypeptides 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z indicates that the serum sample is a positive sample, i.e., the patient is a new crown pneumonia patient, with a cross-validation accuracy of about 93.31% in ten folds.

6. The composition of claim 5, wherein the composition of signature polypeptides comprises only signature polypeptides in a mass ratio of 7614m/z, 8034m/z, 8226m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z, and 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, respectively.

7. The composition of claim 1, wherein a peak of signature polypeptides 5158m/z, 5366m/z, 5893m/z, 7364m/z, 7614m/z, 8034m/z, 8043m/z, 8226m/z, 8425m/z, 8560m/z, 8986m/z, 9626m/z, 11435m/z, 11495m/z, 11523m/z, 11680m/z, 15123m/z, 15867m/z, 28091m/z is upregulated, while a peak of signature polypeptides 6357m/z, 6654m/z, 6939m/z, 13719m/z, 13765m/z, 13886m/z, 14049m/z, 14095m/z, 14102m/z, 28232m/z is downregulated, indicating that the serum sample is positive, i.e. the patient is a new coronary pneumonia patient, the ten-fold cross validation accuracy is about 98.69%.