WO2021180182A1 - 基于免疫表征技术对样本分类的方法、装置及存储介质 - Google Patents
基于免疫表征技术对样本分类的方法、装置及存储介质 Download PDFInfo
- Publication number
- WO2021180182A1 WO2021180182A1 PCT/CN2021/080279 CN2021080279W WO2021180182A1 WO 2021180182 A1 WO2021180182 A1 WO 2021180182A1 CN 2021080279 W CN2021080279 W CN 2021080279W WO 2021180182 A1 WO2021180182 A1 WO 2021180182A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- category
- target
- sample
- infected
- coronavirus
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 88
- 238000012512 characterization method Methods 0.000 title claims abstract description 24
- 230000004044 response Effects 0.000 claims abstract description 137
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 131
- 238000001514 detection method Methods 0.000 claims abstract description 35
- 239000000523 sample Substances 0.000 claims description 247
- 241000711573 Coronaviridae Species 0.000 claims description 182
- 102000004196 processed proteins & peptides Human genes 0.000 claims description 80
- 239000013068 control sample Substances 0.000 claims description 63
- 238000005516 engineering process Methods 0.000 claims description 63
- 208000001528 Coronaviridae Infections Diseases 0.000 claims description 41
- 208000015181 infectious disease Diseases 0.000 claims description 40
- 208000019693 Lung disease Diseases 0.000 claims description 39
- 238000012360 testing method Methods 0.000 claims description 39
- 238000013145 classification model Methods 0.000 claims description 31
- 208000024891 symptom Diseases 0.000 claims description 26
- 238000010801 machine learning Methods 0.000 claims description 25
- 239000013642 negative control Substances 0.000 claims description 20
- 230000015654 memory Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 16
- 210000002966 serum Anatomy 0.000 claims description 16
- 238000012216 screening Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- 238000012706 support-vector machine Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 10
- 244000052769 pathogen Species 0.000 claims description 10
- 241001678559 COVID-19 virus Species 0.000 claims description 6
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 6
- 230000009385 viral infection Effects 0.000 claims description 5
- 230000001900 immune effect Effects 0.000 claims description 4
- 230000001717 pathogenic effect Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 abstract description 6
- 230000035945 sensitivity Effects 0.000 description 18
- 238000012937 correction Methods 0.000 description 16
- 238000012795 verification Methods 0.000 description 12
- 102000039446 nucleic acids Human genes 0.000 description 11
- 108020004707 nucleic acids Proteins 0.000 description 11
- 150000007523 nucleic acids Chemical class 0.000 description 11
- 206010035664 Pneumonia Diseases 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 229920001184 polypeptide Polymers 0.000 description 5
- 239000000427 antigen Substances 0.000 description 4
- 102000036639 antigens Human genes 0.000 description 4
- 108091007433 antigens Proteins 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 102000014914 Carrier Proteins Human genes 0.000 description 3
- 108010078791 Carrier Proteins Proteins 0.000 description 3
- 241000700605 Viruses Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 240000008168 Ficus benjamina Species 0.000 description 2
- 101710141454 Nucleoprotein Proteins 0.000 description 2
- 210000001744 T-lymphocyte Anatomy 0.000 description 2
- 125000003275 alpha amino acid group Chemical group 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 210000003719 b-lymphocyte Anatomy 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000028993 immune response Effects 0.000 description 2
- 230000005847 immunogenicity Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 201000008827 tuberculosis Diseases 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- 102000006306 Antigen Receptors Human genes 0.000 description 1
- 108010083359 Antigen Receptors Proteins 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 206010011224 Cough Diseases 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 101800000461 Stable signal peptide Proteins 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 208000036142 Viral infection Diseases 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 208000017574 dry cough Diseases 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 239000007850 fluorescent dye Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002458 infectious effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000001503 one-tailed test Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000000528 statistical test Methods 0.000 description 1
- 238000012353 t test Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/10—Signal processing, e.g. from mass spectrometry [MS] or from PCR
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
Definitions
- the embodiments of the present application relate to the field of communications, and in particular, to a method, device, storage medium, and electronic device for classifying samples based on immune characterization technology.
- nucleic acid detection kits for the new coronavirus, using nucleic acid testing methods to quickly and effectively confirm the diagnosis of infected patients in my country.
- the principle of nucleic acid detection is to design primers based on the gene sequence of the virus, and detect the fluorescent signal generated after amplification by PCR amplification and the addition of fluorescent probe labeling during the amplification process, thereby indicating whether there is viral nucleic acid in the sample.
- Nucleic acid detection has the characteristics of high throughput, easy development, and quantification.
- nucleic acid testing is the current diagnostic indicator of new coronary pneumonia, with the development of nucleic acid testing, multiple results indicate that nucleic acid testing has a high false negative rate, with a detection rate of only 30%-50%, which is the same as nucleic acid testing for sampling sites.
- the sampling process of nucleic acid detection poses a high risk to medical staff, and it can only detect the presence of viral nucleic acid but cannot determine whether it is a live virus. Therefore, it is particularly important to develop a detection method that does not require high sampling requirements (such as universal, can be detected as long as blood is collected), which is more specific, more sensitive, and has lower sampling requirements. In related technologies, there is no clear technology for detecting whether a sample contains a new coronavirus for peptides.
- the embodiments of this application provide a method, device, storage medium, and electronic device for classifying samples based on immunocharacterization technology, to at least solve the problem that there is no clear technique for detecting whether a sample is infected with peptides in related technologies. problem.
- a method for classifying samples based on immunocharacterization technology includes: using the immunocharacterization technology to detect the corresponding differential response of the differential peptide in the target sample to be tested and the control sample. Signal to obtain a second differential response signal, wherein the differential peptide is a peptide that has a first differential response signal between the positive sample of the target coronavirus infection and the control sample that has been screened by the immunocharacterization technology in advance,
- the control sample includes a negative control sample and/or a sample in another state, the sample in the other state includes a sample infected by a pathogen other than the target coronavirus, and the sample is a serum sample or a plasma sample
- Each set of data includes: the difference response signal and the category to which the sample to be tested corresponding to the difference response
- the method further includes: using the immunological characterization technology
- the positive sample infected with the target coronavirus and the control sample are screened out for peptides with the first differential response signal, and the screened peptides are determined as the differential peptides.
- the method further includes: using the multiple sets of data to perform machine learning on the initial model Training is performed to obtain the target model, wherein the target model includes a first model or a second model; the first model is used to output a label for identifying one of the following results for the input signal: The aforementioned coronavirus is infected and has been infected by the target coronavirus; the second model is used to output a label for identifying one of the following results for the input signal: not infected by the target coronavirus and not infected
- the category is the first category, the category that is not infected by the target coronavirus and is not infected is the second category, has been infected by the target coronavirus and the infected category is the third category, has been infected by the target coronavirus
- the infection category is the fourth category, the target coronavirus has been infected and the infection category
- the method further includes: determining the category to which the output target sample to be tested belongs
- a third model is used to analyze the second differential response signal to determine the category of the target sample to be tested that is not infected by the target coronavirus, where The third model is trained by machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the difference response signal and the sample to be tested corresponding to the difference response signal belongs to which the target does not belong to.
- the category of coronavirus infection includes one of the following: the category that is not infected by the target coronavirus and is not infected is the first category, and is not infected by the target coronavirus And the uninfected category is the second category; output the category that the target sample to be tested belongs to that is not infected with the target coronavirus.
- the method further includes: determining that the output target sample to be tested belongs to When the symptom category of is infected by the target coronavirus, use the fourth model to analyze the second differential response signal to determine the target sample to be tested belongs to the category of the target coronavirus infected , wherein the fourth model is trained through machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: the difference response signal and the sample to be tested corresponding to the difference response signal has been taken.
- the category of the aforementioned coronavirus infection, the category that has been infected by the target coronavirus includes one of the following: has been infected by the target coronavirus and the type of infection is the third category, has been infected by the target coronavirus And the infection category is the fourth category, the target coronavirus has been infected and the infection category is the fifth category, and the target coronavirus
- the target model includes a first linear kernel support vector machine SVM.
- the third model includes a second linear kernel support vector machine SVM.
- the fourth model includes a third linear kernel support vector machine SVM.
- an apparatus for classifying samples based on immunocharacterization technology includes: a detection module configured to use the immunocharacterization technology to detect differential peptides between the target sample to be tested and the control sample In order to obtain a second differential response signal, the differential peptide segment is a positive sample of the target coronavirus infection that has been screened in advance using the immunocharacterization technique and the control sample has a first differential response.
- the control sample includes a negative control sample and/or a sample in another state, and the sample in the other state includes a sample infected by a pathogen other than the target coronavirus, and the sample is Serum sample or plasma sample;
- the first analysis module is configured to use a target model to analyze the second differential response signal to determine the category to which the target sample to be tested belongs, wherein the target model is to use multiple sets of data to pass Trained by machine learning, each set of data in the multiple sets of data includes: the difference response signal and the category to which the sample to be tested corresponding to the difference response signal belongs;
- the first output module is set to output the target sample to be tested belongs to Category.
- the device further includes: a screening module, which is configured to use immunocharacterization technology to detect the corresponding differential response signal of the differential peptide in the target sample to be tested and the control sample to obtain the second differential response signal.
- the immunocharacterization technology screens out the peptides that have the first differential response signal between the positive sample infected with the target coronavirus and the control sample, and determines the screened peptides as the differential peptides .
- the device further includes: a training module, configured to use the multiple sets of data to pass data before analyzing the second differential response signal using a target model to determine the symptom category to which the target sample to be tested belongs
- Machine learning trains an initial model to obtain the target model, wherein the target model includes a first model or a second model; the first model is used to output signals for the input to identify one of the following results Label: not infected by the target coronavirus, has been infected by the target coronavirus; the second model is used to output a label for identifying one of the following results for the input signal: not infected by the target coronavirus And the uninfected category is the first category, the category that is not infected by the target coronavirus and the uninfected category is the second category, has been infected by the target coronavirus and the infected category is the third category, and has been infected.
- the aforementioned coronavirus infection and the category of infection are the fourth category, the target coronavirus has been infected and the infection category is the fifth category, and the target coronavirus has been infected and the infection category is the sixth category, where, The degree of infection corresponding to the third category, the fourth category, the fifth category, and the sixth category increases in turn.
- the device further includes: a second analysis module configured to output the category to which the target sample to be tested belongs when the target model includes the first model, and determine the output
- a second analysis module configured to output the category to which the target sample to be tested belongs when the target model includes the first model, and determine the output
- a third model is used to analyze the second differential response signal, and it is determined that the target sample to be tested belongs to which is not infected by the target coronavirus.
- Target category of coronavirus infection where the third model is trained by machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: a differential response signal and a test to be tested corresponding to the differential response signal
- the category that the sample belongs to is not infected by the target coronavirus, and the category that is not infected by the target coronavirus includes one of the following: the category that is not infected by the target coronavirus and is not infected is the first category, The category that is not infected by the target coronavirus and is not infected is the second category; the second output module is configured to output the category that the target sample to be tested belongs to and is not infected by the target coronavirus.
- the device further includes: a third analysis module configured to output the symptom category to which the target sample to be tested belongs in a case where the target model includes the second model, and after determining the output
- the fourth model is used to analyze the second differential response signal to determine that the target sample to be tested belongs to have been The target type of coronavirus infection, wherein, the fourth model is trained by machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: a difference response signal and a difference response signal corresponding
- the category that the sample to be tested belongs to has been infected by the target coronavirus
- the category that has been infected by the target coronavirus includes one of the following: has been infected by the target coronavirus and the infection category is the third category, Has been infected with the target coronavirus and the infected category is the fourth category, has been infected by the target coronavirus and the
- a detection method of coronavirus infection includes: using immunocharacterization technology to screen out the positive sample for the target coronavirus infection and the control sample with the first differential response signal
- the peptides of the differential peptides are recorded as differential peptides, and the sample is a serum sample or a plasma sample; characterized by the first differential response signal of the differential peptide, the support vector machine method is used to compare the positive sample and the sample Constructing a classification model for the control sample to obtain a sample classification model; using the immunocharacterization technology to detect the corresponding differential response signal of the differential peptide in the sample to be tested and the control sample, and record it as the second differential response signal;
- the second differential response signal is input into the sample classification model for classification, thereby obtaining the symptom category of the sample to be tested;
- the control sample includes a negative control sample and other lung disease control samples
- the other Pulmonary disease refers to a lung disease caused by a non-target coronavirus
- a detection device for coronavirus infection includes: a differential peptide screening module, which is configured to screen out positive samples for target coronavirus infection by using immunocharacterization technology.
- the peptides with the first differential response signal in the control sample are recorded as differential peptides, and the sample is a serum sample or a plasma sample;
- the model building module is set to be characterized by the first differential response signal of the differential peptide , Using a support vector machine method to construct a classification model for the positive sample and the control sample to obtain a sample classification model;
- the response signal detection module is configured to use the immunocharacterization technology to detect that the differential peptide is in the to-be-tested
- the corresponding difference response signal in the sample and the control sample is recorded as the second difference response signal;
- the classification detection module is configured to input the second difference response signal into the sample classification model for classification, thereby obtaining the test The symptom category of the sample;
- the control sample includes a negative control sample and a control sample of
- a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute any of the above methods when running Steps in the embodiment.
- an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor is configured to run the computer program to execute any of the above Steps in the method embodiment.
- the corresponding differential response signals of the differential peptides in the target sample to be tested and the control sample can be detected based on the neural network, and then the category to which the sample belongs can be determined. Therefore, it can solve the problems in the related technology.
- the technical problem of detecting whether the sample is infected by peptides is clear, and the technology of detecting whether the sample is infected by peptides is realized, and the detection accuracy rate of the category of the sample is improved.
- FIG. 1 is a hardware structural block diagram of a mobile terminal based on a method for classifying samples based on immune characterization technology according to an embodiment of the present application;
- FIG. 2 is a flowchart of a method for classifying samples based on immune characterization technology according to an embodiment of the present application
- LEO Leave-one-out
- Fig. 5 is a first diagram of verification results according to an embodiment of the present application.
- Figure 6 is Figure 2 of the verification result according to an embodiment of the present application.
- Figure 7 is Figure 3 of the verification result according to an embodiment of the present application.
- Fig. 8 is a diagram of an apparatus for classifying samples based on immune characterization technology according to an embodiment of the present application.
- ImmuneSignatuer technology immunological characterization technology that uses high-density random peptides (for example, 130,000 peptides) chips to bind to antibodies in the blood, and after incubating with a fluorescently labeled secondary antibody, the fluorescence value is detected in the microplate reader to reflect the blood Antibody. This method can identify antibodies that are differentially expressed between different individuals.
- Polypeptide In this application, it refers to any peptide that is predicted or screened to specifically bind to an antibody.
- Antigen refers to all substances that can induce an immune response in the body. That is, it can be specifically bound by antigen receptors (TCR/BCR) on the surface of T/B lymphocytes, activate T/B cells, make them proliferate and differentiate, produce immune response products (sensitized lymphocytes or antibodies) and can interact with corresponding products A substance that specifically binds inside and outside the body. Therefore, antigens have two important characteristics: immunogenicity and immunoreactivity.
- the antigen in this application refers to a complete antigen with immunogenicity formed after a polypeptide hapten is coupled with a carrier protein, which can be a polypeptide-carrier protein conjugate formed by coupling a polypeptide of a single amino acid sequence with a carrier protein; or A polypeptide-carrier protein conjugate composition formed by coupling polypeptides with multiple different amino acid sequences and a carrier protein.
- ROC curve the curve of the relationship between reaction sensitivity and specificity.
- the X-axis on the abscissa is 1-specificity, which is also called false positive rate.
- AUC Absolute Under Curve
- FIG. 1 is a hardware structural block diagram of a mobile terminal based on a method for classifying samples based on immune characterization technology in an embodiment of the present application.
- the mobile terminal may include one or more (only one is shown in FIG. 1) processor 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA)
- the memory 104 configured to store data, wherein the above-mentioned mobile terminal may also include a transmission device 106 and an input/output device 108 for communication functions.
- FIG. 1 is only for illustration, and does not limit the structure of the above-mentioned mobile terminal.
- the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration from that shown in FIG.
- the memory 104 may be configured to store computer programs, for example, software programs and modules of application software, such as the computer programs corresponding to the method for classifying samples based on immune characterization technology in the embodiment of the present application, and the processor 102 is stored in the memory 104 by running The computer program to perform various functional applications and data processing, that is, to achieve the above-mentioned methods.
- the memory 104 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
- the memory 104 may further include a memory remotely provided with respect to the processor 102, and these remote memories may be connected to the mobile terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the transmission device 106 is configured to receive or transmit data via a network.
- the above-mentioned specific examples of the network may include a wireless network provided by a communication provider of a mobile terminal.
- the transmission device 106 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices through a base station to communicate with the Internet.
- the transmission device 106 may be a radio frequency (Radio Frequency, referred to as RF) module, which is configured to communicate with the Internet in a wireless manner.
- RF Radio Frequency
- a method for classifying samples based on immune characterization technology is provided.
- This embodiment is differentially expressed in a large number of samples (serum samples or plasma samples) of healthy people, other lung diseases, and patients with new coronary pneumonia Antibody feature data, based on artificial intelligence methods, establish a classification model for determining the class of the sample. Then the sensitivity and specificity of the classification model were tested and verified by known samples. It shows that the classification model has high classification accuracy, and the sensitivity and specificity data of using the classification model to classify the object to be tested show that the method can effectively and accurately determine the class of the sample.
- This embodiment is differentially expressed in a large number of samples (serum samples or plasma samples) of healthy people, other lung diseases, and patients with new coronary pneumonia Antibody feature data, based on artificial intelligence methods, establish a classification model for determining the class of the sample. Then the sensitivity and specificity of the classification model were tested and verified by known samples. It shows that the classification model has high classification accuracy, and the sensitivity and specificity data of
- Fig. 2 is a flowchart of a method for classifying samples based on immune characterization technology according to an embodiment of the present application. As shown in Fig. 2, the process includes the following steps:
- S202 Using the immunocharacterization technology, detect the corresponding differential response signals of the differential peptides in the target sample to be tested and the control sample to obtain a second differential response signal, wherein the differential peptides are pre-screened using the immunocharacterization technology
- the positive sample of the target coronavirus infection and the control sample have a first differential response signal peptide.
- the control sample includes a negative control sample and/or samples in other states, and the samples in other states include A sample infected with a pathogen other than the target coronavirus, and the sample is a serum sample or a plasma sample;
- S204 Use a target model to analyze the second difference response signal to determine the category to which the target sample to be tested belongs, where the target model is trained through machine learning using multiple sets of data, and the multiple sets of data Each group of data in includes: the difference response signal and the category of the sample to be tested corresponding to the difference response signal;
- the category to which the sample to be tested belongs can be obtained by the above method, where different categories can be used to indicate whether the sample to be tested is infected by the target coronavirus, and/or the specific degree of non-infection, and/or the specific infected Degree.
- the above-mentioned other pathogens include viruses or bacteria used to cause other lung diseases, and other lung diseases refer to lung diseases caused by non-target coronavirus infections.
- the type of the target sample to be tested is a serum sample
- the type of the control sample is also a serum sample
- the type of the control sample is also It is a plasma sample, that is, there is a one-to-one correspondence between the type of the target sample to be tested and the type of the control sample.
- the corresponding differential response signals of the differential peptides in the target sample to be tested and the control sample can be detected based on the machine learning model, and then the category to which the sample belongs can be determined. Therefore, the existing problems in the related technology can be solved. There is no clear technical problem for detecting whether a sample is infected with peptides, and it has achieved the effect of realizing a technique for detecting whether a sample is infected with peptides and improving the detection accuracy of the category of the sample.
- the corresponding differentially expressed antibody characteristics can be screened separately and then trained, so that classification and screening models of different categories can be obtained. So as to accurately confirm the different types of samples.
- the samples consisted of 3 groups, namely healthy (denoted as H) plasma samples, other lung diseases (denoted as T, mainly tuberculosis) plasma samples, and new coronary pneumonia (denoted as F) plasma samples.
- the number of samples mentioned above is only an example. In practical applications, other numbers of sample data can be used, for example, 200 samples, 500 samples, etc., and the larger the number of sample data , The final confirmation result is actually more accurate.
- the method further includes: using the immunological characterization technology
- the positive sample infected with the target coronavirus and the control sample are screened out for peptides with the first differential response signal, and the screened peptides are determined as the differential peptides.
- the first step is to compare F and H to screen out the HT polypeptide characteristics that are significantly increased in plasma samples infected by pathogens that can cause lung diseases. Such characteristics correspond to plasma samples being infected by pathogens.
- the antibody concentration caused by infection rises, but the antibodies found are not necessarily specific for the new coronavirus (corresponding to the above-mentioned target coronavirus), but may also be caused by other pathogens that cause lung disease infection or other factors
- the second step is to compare F and T to find specific antibodies for the new coronavirus compared to other lung diseases.
- the new coronavirus is infected Comparing with other lung diseases, it may be easy to find some non-specific HT peptides by mistake; therefore, in order to obtain the new crown-specific peptides more accurately, we finally use the characteristic peptides found in the first and second steps Take the intersection to obtain the new crown-specific peptide with high accuracy.
- the method further includes: using the multiple sets of data to perform machine learning on the initial model Training is performed to obtain the target model, wherein the target model includes a first model or a second model; the first model is used to output a label for identifying one of the following results for the input signal: The aforementioned coronavirus is infected and has been infected by the target coronavirus; the second model is used to output a label for identifying one of the following results for the input signal: not infected by the target coronavirus and not infected
- the category is the first category, the category that is not infected by the target coronavirus and is not infected is the second category, has been infected by the target coronavirus and the infected category is the third category, has been infected by the target coronavirus
- the infection category is the fourth category, the target coronavirus has been infected and the infection category
- the data contains three categories: data of uninfected plasma samples (denoted as H), data of plasma samples of other lung diseases (tuberculosis) Mainly, denoted as T), data of plasma samples infected by the new coronavirus (denoted as F).
- H uninfected plasma samples
- T data of plasma samples of other lung diseases
- F data of plasma samples infected by the new coronavirus
- the support vector machine classifier is used to construct the classification model.
- the model kernel function uses a linear kernel.
- the category weight of the loss function is the inverse ratio of the number of categories in the training set. Category 1, non-F is category 0). It should be noted that the sample in this embodiment comes from Shenzhen Third People's Hospital.
- a better way is to construct a model (that is, the above-mentioned target model) to predict whether the sample is infected by the new coronavirus through the input data characteristics.
- a model that is, the above-mentioned target model
- linear kernel support vector machines of course, it is also feasible to choose other neural network models.
- the embodiment of this application takes linear kernel SVM as an example. Note) for classification, the error penalty weight used is 1.0, and the category weight of the loss function is the inverse ratio of the number of categories in the training set.
- the classification granularity of the above-mentioned trained model is adjustable.
- the classification granularity of the above-mentioned trained model can be adjusted to only determine whether the sample is not infected by the target coronavirus or has been infected.
- the classification granularity of the above-mentioned trained model can also be adjusted more finely.
- the classification granularity of the above-mentioned trained model can be further adjusted to be judged as follows Classification: the category that is not infected by the target coronavirus and is not infected is the first category, the category that is not infected by the target coronavirus and is not infected is the second category, has been infected and infected by the target coronavirus
- the category of is the third category, has been infected by the target coronavirus and the infected category is the fourth category, has been infected by the target coronavirus and the infected category is the fifth category, has been infected by the target coronavirus and
- the infection category is the sixth category, where the infection degree corresponding to the third category, the fourth category, the fifth category and the sixth category increases in turn (it should be noted that the above third category
- the division of the fourth category, the fifth category and the sixth category is only an optional division method.
- the above-mentioned first category may be a category that is not currently infected by the new coronavirus but has antibodies to the new coronavirus (that is, the corresponding sample has been infected with the new coronavirus before), and the above-mentioned second category may be the category that has never been infected by the new coronavirus Category.
- the classification granularity of the above-trained model is only to determine whether the sample is not infected by the target coronavirus or the sample has been infected by the target coronavirus, if it is necessary to further determine a more detailed category, it can be introduced
- Other models are used for discrimination, such as:
- the method further includes: determining that the category to which the target sample to be tested belongs is not In the case of the target coronavirus infection, a third model is used to analyze the second differential response signal to determine the category of the target sample to be tested that is not infected by the target coronavirus, wherein the first The three models are trained by machine learning using multiple sets of data.
- Each set of data in the multiple sets of data includes: a differential response signal and a sample to be tested corresponding to the differential response signal that is not infected with the target coronavirus Category, the category that is not infected by the target coronavirus includes one of the following: the category that is not infected by the target coronavirus and is not infected is the first category, is not infected by the target coronavirus and is not infected The category of is the second category; output the category of the target sample to be tested that is not infected by the target coronavirus.
- the method further includes: determining that the symptom category to which the target sample to be tested belongs to be output is
- the fourth model is used to analyze the second differential response signal to determine the category of the target sample to be tested that has been infected by the target coronavirus, where The fourth model is trained by machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: a differential response signal and a sample to be tested corresponding to the differential response signal belongs to the target coronavirus
- the category of infection, the category that has been infected by the target coronavirus includes one of the following: has been infected by the target coronavirus and the category of infection is the third category, and the category has been infected and infected by the target coronavirus Is the fourth category, has been infected by the target coronavirus and the infected category is the
- the above-mentioned third model may also be a linear kernel SVM
- the above-mentioned fourth model may also be a linear kernel SVM
- the above model type is only an exemplary description.
- Other types of models can be trained to obtain the above-mentioned third model and/or fourth model.
- one of the data set 1 and data set 2 is used as the training set, and the other data set is used as the test set for performance testing.
- the model After the model is trained on the feature data of the above 864 differential peptides, it can be used to predict whether the feature data of the new differential peptide corresponds to the sample infected by the new coronavirus.
- the specific method of use is: the same 864 for the new sample After detecting the response signal value of two different peptides, after necessary pre-processing and correction, input the characteristic data of the selected 864 peptides into the model, and judge whether the sample is infected by the new coronavirus according to the prediction result output by the model .
- sample data from Hefei and Wuhan were mainly collected.
- the samples were divided into four categories according to the severity of the new coronavirus infection, namely suspected type, mild (Mild) type, and ordinary ( Regular) type and heavy (Severe) type, see Table 1 for details (Wuhan data may contain a large number of false positive diagnoses, the N protein test was subsequently carried out, here 21* is the data that removes the positive N protein) :
- This verification method can verify the modeling method under different data and level ratios. Reliability and universality.
- this verification operation It uses the leave-one-out method to verify model performance, and evaluates model performance based on AUC.
- the model is used to detect the sample data of Hefei and Wuhan respectively, and the results obtained are shown in Figure 5 and Figure 6. Among them, in the data of Hefei, the detection constructed by this method is used. Model, the model prediction threshold is 0.5.
- the method according to the above embodiment can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
- the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to the prior art.
- the computer software product is stored in a storage medium (such as ROM/RAM, magnetic Disk, optical disk), including several instructions to make the computing device execute the method described in each embodiment of the present application, or make the processor execute the method described in each embodiment of the present application.
- This embodiment also provides a device for classifying samples based on immune characterization technology. As shown in FIG. 8, the device includes:
- the detection module 82 is configured to use the immunocharacterization technology to detect the corresponding differential response signal of the differential peptide in the target sample to be tested and the control sample to obtain a second differential response signal, wherein the differential peptide is used in advance
- the positive sample of the target coronavirus infection screened by the immunocharacterization technology and the control sample have a first differential response signal peptide, and the control sample includes a negative control sample and/or samples in other states.
- the samples of includes samples infected by pathogens other than the target coronavirus, and the samples are serum samples or plasma samples;
- the first analysis module 84 is configured to analyze the second differential response signal using a target model to determine the category to which the target sample to be tested belongs, wherein the target model is trained by machine learning using multiple sets of data Each of the multiple sets of data includes: the difference response signal and the category to which the sample to be tested corresponding to the difference response signal belongs;
- the first output module 86 is configured to output the category to which the target sample to be tested belongs.
- the device further includes: a screening module, which is configured to use immunocharacterization technology to detect the corresponding differential response signal of the differential peptide in the target sample to be tested and the control sample to obtain the second differential response signal.
- the immunocharacterization technology screens out the peptides that have the first differential response signal between the positive sample infected with the target coronavirus and the control sample, and determines the screened peptides as the differential peptides .
- the device further includes: a training module, configured to use the multiple sets of data to pass data before analyzing the second differential response signal using a target model to determine the symptom category to which the target sample to be tested belongs
- Machine learning trains an initial model to obtain the target model, wherein the target model includes a first model or a second model; the first model is used to output signals for the input to identify one of the following results Label: not infected by the target coronavirus, has been infected by the target coronavirus; the second model is used to output a label for identifying one of the following results for the input signal: not infected by the target coronavirus And the uninfected category is the first category, the category that is not infected by the target coronavirus and the uninfected category is the second category, has been infected by the target coronavirus and the infected category is the third category, and has been infected.
- the aforementioned coronavirus infection and the category of infection are the fourth category, the target coronavirus has been infected and the infection category is the fifth category, and the target coronavirus has been infected and the infection category is the sixth category, where, The degree of infection corresponding to the third category, the fourth category, the fifth category, and the sixth category increases in turn.
- the device further includes: a second analysis module configured to output the category to which the target sample to be tested belongs when the target model includes the first model, and determine the output
- a second analysis module configured to output the category to which the target sample to be tested belongs when the target model includes the first model, and determine the output
- a third model is used to analyze the second differential response signal, and it is determined that the target sample to be tested belongs to which is not infected by the target coronavirus.
- Target category of coronavirus infection where the third model is trained by machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: a differential response signal and a test to be tested corresponding to the differential response signal
- the category that the sample belongs to is not infected by the target coronavirus, and the category that is not infected by the target coronavirus includes one of the following: the category that is not infected by the target coronavirus and is not infected is the first category, The category that is not infected by the target coronavirus and is not infected is the second category; the second output module is configured to output the category that the target sample to be tested belongs to and is not infected by the target coronavirus.
- the device further includes: a third analysis module configured to output the symptom category to which the target sample to be tested belongs in a case where the target model includes the second model, and after determining the output
- the fourth model is used to analyze the second differential response signal to determine that the target sample to be tested belongs to have been The target type of coronavirus infection, wherein, the fourth model is trained by machine learning using multiple sets of data, and each set of data in the multiple sets of data includes: a difference response signal and a difference response signal corresponding
- the category that the sample to be tested belongs to has been infected by the target coronavirus
- the category that has been infected by the target coronavirus includes one of the following: has been infected by the target coronavirus and the infection category is the third category, Has been infected with the target coronavirus and the infected category is the fourth category, has been infected by the target coronavirus and the
- the target model includes a first linear kernel support vector machine SVM.
- the third model includes a second linear kernel support vector machine SVM.
- the fourth model includes a third linear kernel support vector machine SVM.
- a method for detecting coronavirus infection includes: using immunocharacterization technology to screen out peptides that have a first differential response signal between a positive sample and a control sample infected by the target coronavirus.
- the samples are serum samples or plasma samples; feature the first differential response signal of the differential peptides, and use the support vector machine method to construct a classification model for the positive samples and the control samples to obtain the sample classification Model; using immune characterization technology to detect the corresponding differential response signal of the differential peptide in the test sample and the control sample, and record it as the second differential response signal; input the second differential response signal into the sample classification model for classification, thereby obtaining the test
- the symptom category of the sample includes a negative control sample and samples of other lung diseases.
- Other lung diseases refer to lung diseases caused by non-target coronavirus infections.
- the target coronavirus is SARS-CoV-2.
- the immunocharacterization technology is used to screen out the peptides that have the first differential response signal between the positive sample and the control sample for the target coronavirus infection, and recording as the differential peptide includes: selecting the positive sample for the target coronavirus infection, the negative control sample, and Control samples for other lung diseases.
- Other lung diseases refer to lung diseases caused by viral infections other than the target coronavirus; immunocharacterization technology is used to combine positive samples, negative control samples and other lung disease control samples with peptide array chips , To obtain the signal value of the binding peptide response; for each binding peptide, calculate the p value when there is a difference between the signal value of the positive sample and the signal value of the negative control sample, record it as the first p value, and calculate the positive sample at the same time When there is a difference between the signal value of and other lung disease control samples, the p value is recorded as the second p value; all binding peptides that meet the first p value and the second p value and meet the third threshold are retained, so Obtain differential peptides; preferably the third threshold is ⁇ 0.05.
- log10 conversion is performed on the signal value of the binding peptide, and the converted log value is used as the feature, and the p-value of each feature when there is a difference between the positive sample and the negative control sample is calculated through a one-tailed T test, and The p-value is corrected by multiple hypothesis testing to obtain the first p-value; at the same time, the p-value when the corresponding feature is different between the positive sample and the control sample of other lung diseases is calculated, and the p-value is corrected by multiple hypothesis testing and recorded as The second p-value: screening the binding peptides that meet the first p-value less than the third threshold and the second p-value less than the third threshold at the same time, so as to obtain different peptides.
- a detection device for coronavirus infection includes: a differential peptide screening module configured to screen out positive samples and control samples for target coronavirus infection by using immunocharacterization technology.
- the peptides with the first differential response signal are recorded as differential peptides, and the samples are serum samples or plasma samples;
- the model building module is set to feature the first differential response signal of the differential peptides, and the support vector machine is used.
- Methods The positive sample and the control sample were classified into the model to obtain the sample classification model; the response signal detection module was set to use the immunocharacterization technology to detect the corresponding differential response signal of the differential peptide in the test sample and the control sample, which was recorded as the first 2.
- the classification detection module is configured to input the second differential response signal into the sample classification model for classification, thereby obtaining the symptom category of the sample to be tested; wherein the control sample includes a negative control sample and other lung disease samples,
- Other lung diseases refer to lung diseases caused by non-target coronavirus infections, and the preferred target coronavirus is SARS-CoV-2.
- the differential peptide screening module includes: a sample selection unit configured to select positive samples, negative control samples and other lung disease control samples for the target coronavirus infection, and other lung diseases are caused by virus infections other than the target coronavirus
- the signal acquisition unit is set to use immunocharacterization technology to combine the positive samples, negative control samples and other lung disease control samples with the peptide array chip to obtain the signal value of the binding peptide response;
- the differential peptide screening unit Set to calculate the p value when there is a difference between the signal value of the positive sample and the signal value of the negative control sample for each bound peptide, and record it as the first p value, and calculate the signal value of the positive sample and other lungs at the same time
- the p value when the signal value of the disease control sample is different is recorded as the second p value; all binding peptides that meet the first p value and the second p value while meeting the third threshold are retained to obtain the difference peptide;
- the three thresholds are ⁇ 0.05.
- the differential peptide screening unit includes: a signal conversion subunit, configured to perform log10 conversion on the signal value of the bound peptide; Test, calculate the p value of each feature when there is a difference between the positive sample and the negative control sample, and perform multiple hypothesis test correction on the p value to obtain the first p value; at the same time, calculate the corresponding feature in the positive sample and other lung diseases
- the p-value when there is a difference between the control samples, and the p-value is corrected by multiple hypothesis tests, and it is recorded as the second p-value; the combination that meets the first p-value less than the third threshold and the second p-value is less than the third threshold is selected at the same time Peptides, so as to get differential peptides.
- the data processing part of the technical solution of the present application can be embodied in the form of a software product, and the computer software product can be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including several instructions.
- a computer device which may be a personal computer, a server, or a network device, etc. executes the various embodiments of the application or the methods of some parts of the embodiments.
- This application can be used in many general or special computing system environments or configurations. For example: personal computers, server computers, handheld devices or portable devices, tablet devices, multi-processor systems, microprocessor-based systems, set-top boxes, programmable consumer electronic devices, network PCs, small computers, large computers, including Distributed computing environment for any of the above systems or equipment, etc.
- the embodiment of the present application also provides a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
- the foregoing computer-readable storage medium may include, but is not limited to: U disk, Read-Only Memory (Read-Only Memory, referred to as ROM), Random Access Memory (Random Access Memory, referred to as RAM) , Mobile hard drives, magnetic disks or optical disks and other media that can store computer programs.
- U disk Read-Only Memory
- RAM Random Access Memory
- Mobile hard drives magnetic disks or optical disks and other media that can store computer programs.
- An embodiment of the present application also provides an electronic device, including a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to execute the steps in any one of the foregoing method embodiments.
- the aforementioned electronic device may further include a transmission device and an input-output device, wherein the transmission device is connected to the aforementioned processor, and the input-output device is connected to the aforementioned processor.
- modules or steps in the above embodiments of the present application can be implemented by a general computing device, and they can be concentrated on a single computing device, or distributed among multiple computing devices. On the composed network, they can be implemented by the program code executable by the computing device, so that they can be stored in the storage device to be executed by the computing device, and in some cases, they can be executed in a different order than here.
- the steps shown or described can be implemented by making them into individual integrated circuit modules, or making multiple modules or steps of them into a single integrated circuit module. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.
- the method, device, and storage medium for classifying samples based on immune characterization technology provided by the embodiments of the present application have the following beneficial effects: solving the problem in related technologies that does not clearly detect whether the sample is infected or not for peptides.
- the problem of the technology has achieved the effect of realizing the technology of detecting whether the sample is infected according to the peptide, and improving the detection accuracy of the category of the sample.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Signal Processing (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
Description
Type | 合肥数据 | 武汉数据 |
Suspected | 15 | 21* |
Mild | 18 | - |
Regular | 38 | 40 |
Severe | 23 | 32 |
Claims (12)
- 一种基于免疫表征技术对样本分类的方法,所述方法包括:利用免疫表征技术,检测差异肽段在目标待测样本与对照样本中相应的差异响应信号,以得到第二差异响应信号,其中,所述差异肽段为预先利用所述免疫表征技术筛选出的目的冠状病毒感染的阳性样本与所述对照样本存在第一差异响应信号的肽段,所述对照样本包括阴性对照样本和/或其他状态下的样本,所述其他状态下的样本包括由除所述目的冠状病毒之外的其他病原体感染的样本,所述样本为血清样本或血浆样本;使用目标模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的类别,其中,所述目标模型为使用多组数据通过机器学习训练出的,所述多组数据中的每组数据均包括:差异响应信号和差异响应信号对应的待测样本所属的类别;输出所述目标待测样本所属的类别。
- 根据权利要求1所述的方法,其中,在利用免疫表征技术,检测差异肽段在目标待测样本与对照样本中相应的差异响应信号,以得到第二差异响应信号之前,所述方法还包括:利用所述免疫表征技术筛选出对所述目的冠状病毒感染的阳性样本与所述对照样本存在所述第一差异响应信号的肽段,并将筛选出的所述肽段确定为所述差异肽段。
- 根据权利要求1所述的方法,其中,在使用目标模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的症状类别之前,所述方法还包括:使用所述多组数据通过机器学习对初始模型进行训练,以得到所述目标模型,其中,所述目标模型包括第一模型或者第二模型;所述第一模型用于针对输入的信号输出用于标识以下结果之一的标签:未被所述目的冠状病毒感染、已被所述目的冠状病毒感染;所述第二模型用于针对输入的信号输出用于标识以下结果之一的标签:未被所述目的冠状病毒感染且未被感染的类别为第一类别、未被所述目的冠状病毒感染且未被感染的类别为第二类别、已被所述目的冠状病毒感染且感染的类别为第三类别、已被所述目的冠状病毒感染且感染的类别为第四类别、已被所述目的冠状病毒感染且感染的类别为第五类别、已被所述目的冠状病毒感染且感染的类别为第六类别,其中,所述第三类别、所述第四类别、所述第五类别和所述第六类别所对应的感染程度依次加重。
- 根据权利要求3所述的方法,其中,在所述目标模型包括所述第一模型的情况下,在输出所述目标待测样本所属的类别之后,所述方法还包括:在确定输出的所述目标待测样本所属的类别为未被所述目的冠状病毒感染的情况下,使用第三模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的未被所述目的冠状病毒感染的类别,其中,所述第三模型为使用多组数据通过机器学习训练出的,所述多组数据中的每组数据均包括:差异响应信号和差异响应信号对应的待测样本所属的未被所述目的冠状病毒感染的类别,所述未被所述目的冠状病毒感染的类别包括以下之一:未被所述目的冠状病毒感染且未被感染的类别为第一类别、未被所述目的冠状病毒感染且未被感染的类别为第二类别;输出所述目标待测样本所属的未被所述目的冠状病毒感染的类别;在所述目标模型包括所述第二模型的情况下,在输出所述目标待测样本所属的症状类别之后,所述方法还包括:在确定输出的所述目标待测样本所属的症状类别为已被所述目的冠状病毒感染的情况下,使用第四模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的已被所述目的冠状病毒感染的类别,其中,所述第四模型为使用多组数据通过机器学习训练出的,所述多组数据中的每组数据均包括:差异响应信号和差异响应信号对应的待测样本所属的已被所述目的冠状病毒感染的类别,所述已被所述目的冠状病毒感染的类别 包括以下之一:已被所述目的冠状病毒感染且感染的类别为第三类别、已被所述目的冠状病毒感染且感染的类别为第四类别、已被所述目的冠状病毒感染且感染的类别为第五类别、已被所述目的冠状病毒感染且感染的类别为第六类别;输出所述目标待测样本所属的已被所述目的冠状病毒感染的类别。
- 一种基于免疫表征技术对样本分类的装置,所述装置包括:检测模块,设置为利用免疫表征技术,检测差异肽段在目标待测样本与对照样本中相应的差异响应信号,以得到第二差异响应信号,其中,所述差异肽段为预先利用所述免疫表征技术筛选出的目的冠状病毒感染的阳性样本与所述对照样本存在第一差异响应信号的肽段,所述对照样本包括阴性对照样本和/或其他状态下的样本,所述其他状态下的样本包括由除所述目的冠状病毒之外的其他病原体感染的样本,所述样本为血清样本或血浆样本;第一分析模块,设置为使用目标模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的类别,其中,所述目标模型为使用多组数据通过机器学习训练出的,所述多组数据中的每组数据均包括:差异响应信号和差异响应信号对应的待测样本所属的类别;第一输出模块,设置为输出所述目标待测样本所属的类别。
- 根据权利要求5所述的装置,其中,所述装置还包括:筛选模块,设置为在利用免疫表征技术,检测差异肽段在目标待测样本与对照样本中相应的差异响应信号,以得到第二差异响应信号之前,利用所述免疫表征技术筛选出对所述目的冠状病毒感染的阳性样本与所述对照样本存在所述第一差异响应信号的肽段,并将筛选出的所述肽段确定为所述差异肽段。
- 根据权利要求5所述的装置,其中,所述装置还包括:训练模块,设置为在使用目标模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的症状类别之前,使用所述多组数 据通过机器学习对初始模型进行训练,以得到所述目标模型,其中,所述目标模型包括第一模型或者第二模型;所述第一模型用于针对输入的信号输出用于标识以下结果之一的标签:未被所述目的冠状病毒感染、已被所述目的冠状病毒感染;所述第二模型用于针对输入的信号输出用于标识以下结果之一的标签:未被所述目的冠状病毒感染且未被感染的类别为第一类别、未被所述目的冠状病毒感染且未被感染的类别为第二类别、已被所述目的冠状病毒感染且感染的类别为第三类别、已被所述目的冠状病毒感染且感染的类别为第四类别、已被所述目的冠状病毒感染且感染的类别为第五类别、已被所述目的冠状病毒感染且感染的类别为第六类别,其中,所述第三类别、所述第四类别、所述第五类别和所述第六类别所对应的感染程度依次加重。
- 根据权利要求7所述的装置,其中,所述装置还包括:第二分析模块,设置为在所述目标模型包括所述第一模型的情况下,在输出所述目标待测样本所属的类别之后,且在确定输出的所述目标待测样本所属的类别为未被所述目的冠状病毒感染的情况下,使用第三模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的未被所述目的冠状病毒感染的类别,其中,所述第三模型为使用多组数据通过机器学习训练出的,所述多组数据中的每组数据均包括:差异响应信号和差异响应信号对应的待测样本所属的未被所述目的冠状病毒感染的类别,所述未被所述目的冠状病毒感染的类别包括以下之一:未被所述目的冠状病毒感染且未被感染的类别为第一类别、未被所述目的冠状病毒感染且未被感染的类别为第二类别;第二输出模块,设置为输出所述目标待测样本所属的未被所述目的冠状病毒感染的类别;或者,所述装置还包括:第三分析模块,设置为在所述目标模型包括所 述第二模型的情况下,在输出所述目标待测样本所属的症状类别之后,且在确定输出的所述目标待测样本所属的症状类别为已被所述目的冠状病毒感染的情况下,使用第四模型对所述第二差异响应信号进行分析,确定所述目标待测样本所属的已被所述目的冠状病毒感染的类别,其中,所述第四模型为使用多组数据通过机器学习训练出的,所述多组数据中的每组数据均包括:差异响应信号和差异响应信号对应的待测样本所属的已被所述目的冠状病毒感染的类别,所述已被所述目的冠状病毒感染的类别包括以下之一:已被所述目的冠状病毒感染且感染的类别为第三类别、已被所述目的冠状病毒感染且感染的类别为第四类别、已被所述目的冠状病毒感染且感染的类别为第五类别、已被所述目的冠状病毒感染且感染的类别为第六类别;第三输出模块,设置为输出所述目标待测样本所属的已被所述目的冠状病毒感染的类别。
- 一种冠状病毒感染的检测方法,所述检测方法包括:利用免疫表征技术筛选出对目的冠状病毒感染的阳性样本与对照样本存在第一差异响应信号的肽段,记为差异肽段,所述样本为血清样本或血浆样本;以所述差异肽段的所述第一差异响应信号为特征,采用支持向量机的方法对所述阳性样本和所述对照样本进行分类模型构建,得到样本分类模型;利用所述免疫表征技术,检测所述差异肽段在待测样本与所述对照样本中相应的差异响应信号,记为第二差异响应信号;将所述第二差异响应信号输入所述样本分类模型进行分类,从而获得所述待测样本的所属症状类别;其中,所述对照样本包括阴性对照样本和其他肺部疾病对照样本,所述其他肺部疾病指非所述目的冠状病毒感染引起的肺部疾病,优选所述目的冠状病毒为SARS-CoV-2。
- 一种冠状病毒感染的检测装置,所述检测装置包括:差异肽段筛选模块,设置为利用免疫表征技术筛选出对目的冠状病毒感染的阳性样本与对照样本存在第一差异响应信号的肽段,记为差异肽段,所述样本为血清样本或血浆样本;模型建立模块,设置为以所述差异肽段的所述第一差异响应信号为特征,采用支持向量机的方法对所述阳性样本和所述对照样本进行分类模型构建,得到样本分类模型;响应信号检测模块,设置为利用所述免疫表征技术,检测所述差异肽段在待测样本与所述对照样本中相应的差异响应信号,记为第二差异响应信号;分类检测模块,设置为将所述第二差异响应信号输入所述样本分类模型进行分类,从而获得所述待测样本的所属症状类别;其中,所述对照样本包括阴性对照样本和其他肺部疾病对照样本,所述其他肺部疾病指非所述目的冠状病毒感染引起的肺部疾病,优选所述目的冠状病毒为SARS-CoV-2。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,其中,所述计算机程序被处理器执行时实现所述权利要求1至4任一项中所述的方法的步骤,或者实现权利要求9中所述的方法的步骤。
- 一种电子装置,包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现所述权利要求1至4任一项中所述的方法的步骤,或者实现权利要求9中所述的方法的步骤。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010176984 | 2020-03-13 | ||
CN202010176984.4 | 2020-03-13 | ||
CN202010923587.9 | 2020-09-04 | ||
CN202010923587.9A CN113393902A (zh) | 2020-03-13 | 2020-09-04 | 基于免疫表征技术对样本分类的方法、装置及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021180182A1 true WO2021180182A1 (zh) | 2021-09-16 |
Family
ID=77616460
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/080279 WO2021180182A1 (zh) | 2020-03-13 | 2021-03-11 | 基于免疫表征技术对样本分类的方法、装置及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113393902A (zh) |
WO (1) | WO2021180182A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113888636A (zh) * | 2021-09-29 | 2022-01-04 | 山东大学 | 基于多尺度深度特征的蛋白质亚细胞定位方法 |
CN113903400A (zh) * | 2021-10-29 | 2022-01-07 | 复旦大学附属华山医院 | 免疫相关疾病分子分型和亚型分类器的分类方法、系统 |
CN116564416A (zh) * | 2023-07-12 | 2023-08-08 | 中国农业科学院蜜蜂研究所 | 一种基于分段融合的ace抑制小肽筛选方法及其应用 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336915A (zh) * | 2013-05-31 | 2013-10-02 | 中国人民解放军国防科学技术大学 | 基于质谱数据获取生物标志物的方法及装置 |
US20170073769A1 (en) * | 2015-09-16 | 2017-03-16 | Innomedicine, LLC | Chemotherapy regimen selection |
CN108491690A (zh) * | 2018-03-16 | 2018-09-04 | 中国科学院数学与系统科学研究院 | 一种蛋白质组学中肽段的肽段定量效率预测方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2618939A1 (en) * | 2004-08-13 | 2006-04-27 | Jaguar Bioscience Inc. | Systems and methods for identifying diagnostic indicators |
GB0510511D0 (en) * | 2005-05-23 | 2005-06-29 | St Georges Entpr Ltd | Diagnosis of tuberculosis |
-
2020
- 2020-09-04 CN CN202010923587.9A patent/CN113393902A/zh active Pending
-
2021
- 2021-03-11 WO PCT/CN2021/080279 patent/WO2021180182A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336915A (zh) * | 2013-05-31 | 2013-10-02 | 中国人民解放军国防科学技术大学 | 基于质谱数据获取生物标志物的方法及装置 |
US20170073769A1 (en) * | 2015-09-16 | 2017-03-16 | Innomedicine, LLC | Chemotherapy regimen selection |
CN108491690A (zh) * | 2018-03-16 | 2018-09-04 | 中国科学院数学与系统科学研究院 | 一种蛋白质组学中肽段的肽段定量效率预测方法 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113888636A (zh) * | 2021-09-29 | 2022-01-04 | 山东大学 | 基于多尺度深度特征的蛋白质亚细胞定位方法 |
CN113903400A (zh) * | 2021-10-29 | 2022-01-07 | 复旦大学附属华山医院 | 免疫相关疾病分子分型和亚型分类器的分类方法、系统 |
CN116564416A (zh) * | 2023-07-12 | 2023-08-08 | 中国农业科学院蜜蜂研究所 | 一种基于分段融合的ace抑制小肽筛选方法及其应用 |
CN116564416B (zh) * | 2023-07-12 | 2023-09-15 | 中国农业科学院蜜蜂研究所 | 一种基于分段融合的ace抑制小肽筛选方法及其应用 |
Also Published As
Publication number | Publication date |
---|---|
CN113393902A (zh) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021180182A1 (zh) | 基于免疫表征技术对样本分类的方法、装置及存储介质 | |
Lei et al. | Antibody dynamics to SARS‐CoV‐2 in asymptomatic COVID‐19 infections | |
JP2020113285A (ja) | 多様体および超平面を用いる生物学的データのコンピュータ分析 | |
CN107209184B (zh) | 用于诊断多种感染的标记组合及其使用方法 | |
Fu et al. | Dynamics and correlation among viral positivity, seroconversion, and disease severity in COVID-19: a retrospective study | |
Coghill et al. | Epstein–Barr virus serology as a potential screening marker for nasopharyngeal carcinoma among high-risk individuals from multiplex families in Taiwan | |
Wang et al. | Screening and identification of a six-cytokine biosignature for detecting TB infection and discriminating active from latent TB | |
Dumollard et al. | Prospective evaluation of a new Aspergillus IgG enzyme immunoassay kit for diagnosis of chronic and allergic pulmonary aspergillosis | |
Yang et al. | Identification of eight-protein biosignature for diagnosis of tuberculosis | |
US11360086B2 (en) | Diagnostic to distinguish bacterial infections | |
Wielders et al. | High Coxiella burnetii DNA load in serum during acute Q fever is associated with progression to a serologic profile indicative of chronic Q fever | |
Li et al. | Novel serological biomarker panel using protein microarray can distinguish active TB from latent TB infection | |
Sinha et al. | Utility of Epstein-Barr virus (EBV) antibodies as screening markers for nasopharyngeal carcinoma: A narrative review | |
Liu et al. | Multilaboratory assessment of Epstein-Barr virus serologic assays: the case for standardization | |
Li et al. | Microarray-based selection of a serum biomarker panel that can discriminate between latent and active pulmonary TB | |
Rajam et al. | Development and validation of a sensitive and robust multiplex antigen capture assay to quantify streptococcus pneumoniae serotype-specific capsular polysaccharides in urine | |
Tuite et al. | Estimating SARS-CoV-2 seroprevalence in Canadian blood donors, April 2020 to March 2021: improving accuracy with multiple assays | |
CN106950365A (zh) | 一种acpa阴性的ra诊断标志物及其应用 | |
Byrum et al. | multiSero: open multiplex-ELISA platform for analyzing antibody responses to SARS-CoV-2 infection | |
Chaillon et al. | Decreased specificity of an assay for recent infection in HIV-1-infected patients on highly active antiretroviral treatment: implications for incidence estimates | |
Ravindran et al. | Validation of multiplex microbead immunoassay for simultaneous serodetection of multiple infectious agents in laboratory mouse | |
CN106950366A (zh) | 一种acpa阴性的ra诊断标志物及其应用 | |
CN106918697A (zh) | 一种预测ra药物疗效的诊断标志物及其应用 | |
CN104292322A (zh) | 原发性胆汁性肝硬化特异性自身抗原及其应用 | |
Chambliss et al. | Immune biomarkers associated with COVID-19 disease severity in an urban, hospitalized population |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21768970 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21768970 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16/02/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21768970 Country of ref document: EP Kind code of ref document: A1 |