Exhaled air detection device and method for establishing exhaled air marker thereof
Technical Field
The invention relates to an exhaled breath detection device and an establishment method of an exhaled breath marker thereof, which provide reference for establishment of the exhaled breath marker of a CRDS lung cancer molecular marker detection device.
Background
The invention is a research based on a CRDS lung cancer molecular marker detection device in Chinese patent 2019207908415.
According to data published by the world health organization, lung cancer has become the most prevalent malignancy of morbidity and mortality worldwide. Meanwhile, lung cancer is the first malignant tumor of morbidity and mortality in China, 78.7 thousands of new cases occur in 2015, and 63.1 thousands of deaths occur. The natural environment, aging society, urbanization development and other factors are also increasingly aggravating the challenge of serious lung cancer diseases, so that the lung cancer becomes one of the most serious public health problems in China. Clinical research data show that the cure rate of carcinoma in situ is close to 100%, the 5-year survival rate of stage I lung cancer patients reaches 60% -90%, and the 5-year survival rate of IIIb and IV patients is only 5% -20%. However, the 5-year survival rate of lung cancer is only about 18% at present, and one of the main reasons is the lack of effective screening and early diagnosis technology. Currently, the common clinical diagnostic methods for lung cancer include imaging examination: such as X-ray, Low Dose Computed Tomography (LDCT), blood tests, and endoscopy. The diagnosis method is a mode that good and malignant diseases cannot be distinguished based on tracking and follow-up of tumor size and blood detection is invasive, endoscope examination is painful, and the sensitivity diagnosis rate of the diagnosis method to early lung cancer is not high, so that the disease of a patient is often diagnosed to an advanced stage, and the treatment and prognosis difficulty is increased.
The tumor cells have the characteristic of high metabolism, can continuously and abnormally synthesize and secrete various special substances, and can be detected in human respiratory gas earlier. The pathological investigation and research show that after the internal organs or tissues of the human body are damaged or diseased, the functional change of the internal organs or tissues can correspondingly cause the change of corresponding metabolites, the metabolites enter the blood to cause the relative increase of the content of certain metabolites, and the damage degree of the internal organs can be diagnosed by detecting the concentration of the metabolites. Therefore, diagnosis of disease by analysis of certain specific components in blood has become a routine means of modern medical diagnosis. And metabolites in blood can enter the lung through a qi-blood barrier, so that the exhaled gas components are changed, the concentration of certain specific gases discharged out of the body is increased, and the similarity with blood Analysis and the non-invasiveness compared with the blood Analysis enable the respiratory gas Analysis technology (Breath Analysis) to have very high application prospect and research significance. In the last 10 years, with the development of modern breath analysis technology and the cross fusion of multiple disciplines, the method is used for diagnosing serious diseasesThe biomarkers of severed expiration were rapidly developed. Respiratory gas diagnostics that have been currently approved by the U.S. Food and Drug Administration (FDA) include: detection of ethanol concentration in expired air (for law enforcement), detection of hydrogen in expired air (for analysis of carbohydrate metabolism in humans), detection of nitric oxide concentration in expired air (for diagnosis of asthma),13C Urea breath test (for clinical diagnosis of H.pylori infection). In particular in the context of the diagnosis of helicobacter pylori,13C/14the C urea breath test has become the gold standard. Meanwhile, more than 30 VOCs including acetone, hydrogen sulfide, ammonia gas, butane and the like are confirmed as respiratory biomarkers related to metabolic disorder or diseases of human bodies, and the research on the lung cancer respiratory biomarkers is considered to be one of the most possible research applications leading to clinical practical application. Numerous studies have shown the feasibility of respiratory gas analysis techniques for early diagnosis of lung cancer (most methods have detection sensitivity and specificity of over 70%, and some methods have detection sensitivity and specificity even up to over 90%), but the uniformity cannot be determined due to the lack of reproducibility of respiratory markers between different studies. The main reasons are that: due to the diversity and complexity of the physiological and pathological states of the human body, the ideal effect cannot be achieved by small-scale data volume statistical analysis (the maximum data volume of the lung cancer group reported in the literature at present is 193 cases). Aiming at the situation, the invention provides a method for deeply mining a target exhaled breath marker and establishing an analysis model by establishing a large sample exhaled breath database on the basis of artificial intelligence serving as a breakthrough and non-target exhaled breath metabonomics serving as a technical basis so as to eliminate individual difference influence.
In recent years, proton transfer reaction-mass spectrometry (PTR-MS) has been widely used for trace gas analysis. It is based on different precursor ions (H)3O+,NO+And O2 +) And the analysis technique of proton transfer reaction with the tested substance. The PTR-MS has the main advantages of saving the time of sample pretreatment, being capable of frequently and quickly measuring, being used for real-time online breath analysis and eliminating the main sample pretreatment in breath analysisError, high detection sensitivity up to 10-12The detection method has the advantages of level, high flux, detection molecular mass range of 1-10000 amu, and capability of distinguishing substances with the same relative molecular mass by using different leading ion modes. The commercial PTR-MS online monitoring detection analysis mass spectrometer provides PTR-MS data simulation software for users, and the simulation software is operated to call the exhaled breath component and concentration data for research and experiment, so that guarantee is provided for establishing a target exhaled breath marker analysis model and an exhaled breath compound fingerprint.
Disclosure of Invention
The invention aims to provide an exhaled breath detection device and an establishment method of an exhaled breath marker thereof, and provides a new scientific basis for establishment of the exhaled breath marker of the CRDS lung cancer molecular marker detection device.
Another object of the present invention is to provide a method for building a combination of volatile organic compounds and an analysis model for exhaled breath markers.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a volatile organic compound combination and an analysis model for an exhaled breath marker, which are characterized in that a commercial database of components and concentrations of multiple trace volatile organic compounds in exhaled breath of a general body and a target body of a large sample is established by adopting a real-time online high-throughput PTR-MS (packet transfer protocol-mass spectrometer), a characteristic variable screening method is used for screening out volatile organic compounds with significant concentration difference in exhaled breath of the general body and the target body, the volatile organic compounds are used as a target exhaled breath marker combination, a machine learning algorithm is used for inputting a screened characteristic variable (target exhaled breath marker combination) as an input training analysis model, and the accuracy, sensitivity and specificity of the target analysis model established based on the training of three characteristic variable screening methods are contrastingly analyzed, so that an optimal first analysis model is determined. The method is applicable to data provided by exhaled breath detection methods (such as partial mass spectrometers, spectroscopy, electrochemical methods) which can only perform qualitative and quantitative detection on selective components.
The invention relates to a volatile organic compound combination and analysis model for an exhaled breath marker, and another establishment method of the analysis model is to train a second analysis model by using machine learning algorithm to input the exhaled breath components and concentrations of all general bodies and target bodies of a database of components and concentrations of various trace volatile organic compounds in exhaled breath of the general bodies and the target bodies of a large sample established by adopting real-time online high-flux PTR-MS. The method is applicable to data provided by high throughput (at least the first ten important VOCS can be detected) exhaled breath detection methods (e.g., PTR-MS).
The volatile organic compound combination and analysis model for the exhaled breath marker, provided by the invention, can be used for establishing the exhaled breath marker of the CRDS lung cancer molecular marker detection device by inputting the concentration of the target exhaled breath marker in exhaled air into the first analysis model for judgment, or inputting the concentrations of the first ten important VOCs or more in the target exhaled breath into the second analysis model for judgment.
The volatile organic compound combination and analysis model for the exhaled breath marker can be used for a CRDS lung cancer molecular marker detection device, and achieves the purpose of detecting various volatile organic compounds including alkanes, aldehydes, ketones, alcohols, benzene derivatives and the like in real time on line. One of the modeling methods is to screen out a volatile organic compound combination (marker combination) with difference between a target body and a general body by using a characteristic variable screening method, and train training set data by using a machine learning algorithm to establish a corresponding first analysis model; in another modeling method, a machine learning algorithm is adopted to train the exhaled gas components and concentrations of all target bodies and general bodies in a training set as input to a database of the types and concentrations of multiple trace volatile organic compounds in the exhaled gas of the target bodies and the general bodies of large samples established by adopting real-time online high-throughput PTR-MS.
The invention further provides an exhaled breath detection device based on Chinese patent 2019207908415, which comprises a base, a lifting device and a CRDS detection instrument, wherein the CRDS detection instrument is used for identifying and detecting exhaled breath markers which are respectively a plurality of Volatile Organic Compounds (VOCs) screened by adopting three characteristic variable screening methods of UVE, SPA and CARS;
further, the exhaled breath markers respectively comprise 9, 14 and 7 Volatile Organic Compounds (VOCs), which are respectively screened by a UVE characteristic variable screening method: c5H8,C4H4O,C10H16,C7H8N2O,C5H12O,C4H8O2,C4H8S,C3H4O3,C4H5N; VOCs screened by the CARS characteristic variable screening method: CH (CH)4O,C2H3N,CH4S,C3H6O,C2H6S,C2H2F2,C4H4O,C5H8,C6H6O,C7H10,C7H6O,C8H10,C7H8N2O,C10H16(ii) a VOCs screened by the SPA characteristic variable screening method: c7H10,C2H2F2,C10H16,C6H6,C6H12,C5H10O,C3H6O。
Further, the exhaled breath markers respectively include 9, 8 and 7 Volatile Organic Compounds (VOCs), which are respectively the VOCs screened by the UVE characteristic variable screening method: c4H5N,CH2O,C5H12O,C4H8O2,C4H8S,C3H4O3,C10H16,C7H8N2O,C5H8(ii) a VOCs screened by the CARS characteristic variable screening method: CH (CH)4O,C2H3N,CH4S,C3H6O,C2H6S,C2H2F2,C6H6O,C7H10(ii) a VOCs screened by the SPA characteristic variable screening method: c4H6O2,C7H10,C2H2F2,C6H12,C7H8N2O,C5H8,C2H6S。
Further, the establishing method of the present invention specifically includes the following steps:
a) calling exhaled air composition and concentration data in PTR-MS data simulation software, storing and establishing a biological information database;
b) screening out Volatile Organic Compounds (VOCs) with concentration difference in exhaled air of a general body and a target body by adopting a characteristic variable screening method, and taking the VOCs as potential target exhaled air markers;
c) based on a potential target exhaled breath marker, a first analysis model is preliminarily established by adopting a machine learning algorithm to obtain an exhaled breath compound fingerprint;
d) and (3) carrying out comparative analysis on the influence of different characteristic variable screening methods (UVE, CARS and SPA) on the first analysis model, thereby determining the optimal first analysis model.
Another modeling method specifically includes the steps of:
a) calling exhaled air composition and concentration data in PTR-MS data simulation software, storing and establishing a biological information database;
b) taking the data of all the components and the concentrations in the exhaled breath as input, and establishing a second analysis model by adopting a machine learning algorithm to obtain an exhaled breath compound fingerprint;
c) and (4) selecting the first ten important VOCs of the second analysis model by using a feature classification method.
Selecting three characteristic classification methods, and respectively comparing:
1. the weight of each feature after SVM training;
2. the score of each feature under univariate analysis;
3. importance degree of each feature after random forest training;
then, the first ten important substances of the second analysis model are selected and listed respectively, as shown in fig. 4.
The method adopted by the invention is suitable for analysis and detection of large-scale data, is simple and quick to operate, has high credibility of target body data sources and abundant resources, is suitable for targets at any age stage including but not limited to lung cancer, gastric cancer, pancreatic cancer, intestinal cancer, diabetes and the like, and adopts training methods including but not limited to characteristic variable screening methods (UVE, CARS and SPA), machine learning algorithms and characteristic classification methods for modeling.
The invention makes up the defects of the CRDS lung cancer molecular marker detection device at the present stage, finds out the specific volatile organic compound combination in the expired air of a lung cancer patient, and establishes a corresponding analysis model for establishing the breathing marker. In another modeling method, the exhaled breath components and concentrations of all target bodies and general bodies are used as input, a second analysis model is established by adopting a machine learning algorithm to obtain an exhaled breath compound fingerprint, and the 9 machine learning algorithms are used for establishing the second analysis model.
Description of the drawings:
FIG. 1 illustrates the establishment of a first analytical model in accordance with an embodiment of the present invention;
FIG. 2 illustrates the establishment of a first analytical model in accordance with an embodiment of the present invention;
FIG. 3 illustrates the establishment of a second analytical model in accordance with an embodiment of the present invention;
FIG. 4 shows the first ten important substances of the analytical model in an embodiment of the present invention.
Detailed Description
The present invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention.
Commercial software: PTR-TOF-MS1000 System (Ionic Analytik GmbH, Innsbruck, Austria) simulation software;
the experimental steps are as follows: calling the exhaled air component and concentration data in PTR-MS data simulation software, storing and establishing a biological information database.
One modeling method is as follows: in order to reduce the data volume and improve the operation speed, three data dimension reduction methods of UVE, SPA and CARS are adopted to extract characteristic variables so as to replace all experimental data with a small amount of experimental data. And performing machine learning on the training set data by adopting BPNN (Business process neural network), and determining an analysis model.
The other modeling method is to take all exhaled breath types and concentrations as input, establish another analysis model by adopting a machine learning algorithm to obtain an exhaled breath compound fingerprint, and then select the first ten important VOCs established by the analysis model by utilizing a characteristic classification method.
As a result: (1) the exhaled breath markers screened by the screening method of three characteristic variables of UVE, SPA and CARS respectively comprise 9 respiratory markers, 14 respiratory markers and 7 respiratory markers, namely UVE (C)5H8,C4H4O,C10H16,C7H8N2O,C5H12O,C4H8O2,C4H8S,C3H4O3,C4H5N),CARS(CH4O,C2H3N,CH4S,C3H6O,C2H6S,C2H2F2,C4H4O,C5H8,C6H6O,C7H10,C7H6O,C8H10,C7H8N2O,C10H16),SPA(C7H10,C2H2F2,C10H16,C6H6,C6H12,C5H10O,C3H6O)。
(2) The exhaled breath markers for early cancer diagnosis/early warning screened by the screening method of three characteristic variables of UVE, SPA and CARS respectively comprise 9 VOCs, 8 VOCs and 7 VOCs, and are respectively UVE (C)4H5N,CH2O,C5H12O,C4H8O2,C4H8S,C3H4O3,C10H16,C7H8N2O,C5H8),CARS(CH4O,C2H3N,CH4S,C3H6O,C2H6S,C2H2F2,C6H6O,C7H10),SPA(C4H6O2,C7H10,C2H2F2,C6H12,C7H8N2O,C5H8,C2H6S)。
(3) And then training the training set data by applying a machine learning algorithm Back Propagation Neural Network (BPNN), establishing a first analysis model and obtaining the exhaled breath compound fingerprint.
Another modeling method is to use all exhaled breath components and concentrations as input, establish a second analysis model by using a machine learning algorithm to obtain an exhaled breath compound fingerprint, and then select the first ten important VOCs established by the second analysis model by using a feature classification method. The CRDS lung cancer molecular marker detection device is taken as the prior art, and the concentration of an exhaled breath marker is input into the first analysis model of the invention for judgment, or the concentration of the first ten important VOCs or more in exhaled breath is input into the second analysis model of the invention for judgment, so that the data detected by the device can be analyzed quickly.
The invention is not to be limited by the specific embodiments which are intended as single illustrations of individual aspects of the invention, which disclosure also includes functionally equivalent methods and components. Indeed, various modifications of the invention in addition to those described herein will become apparent to those skilled in the art from the foregoing description and accompanying drawings. Such modifications are intended to fall within the scope of the appended claims.