CN111259909A - LC-MS data high-sensitivity characteristic detection method - Google Patents
LC-MS data high-sensitivity characteristic detection method Download PDFInfo
- Publication number
- CN111259909A CN111259909A CN201811453137.7A CN201811453137A CN111259909A CN 111259909 A CN111259909 A CN 111259909A CN 201811453137 A CN201811453137 A CN 201811453137A CN 111259909 A CN111259909 A CN 111259909A
- Authority
- CN
- China
- Prior art keywords
- data
- network
- target
- detection
- image blocks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Abstract
The invention discloses a novel LC-MS data high-sensitivity characteristic detection method, which comprises the following steps: firstly, acquiring liquid chromatography-mass spectrometry (LC-MS) data by adopting an LC-MS instrument, acquiring a series of image blocks by utilizing a sliding window in a certain step length, manually marking target ion position information in the image blocks in a target frame mode, and taking the information as a training sample set; an effective deep convolution neural network structure is designed, and target ion distribution characteristics are learned according to input training samples in the training process, so that the problem that complex scenes are difficult to recognize due to conventional predefined shapes is solved; in the testing stage, different samples are used for verifying that the novel method provided by the invention can realize high-sensitivity characteristic detection and probability output of LC-MS data.
Description
Technical Field
The application relates to the technical field of biochemical image processing, in particular to a method for realizing high-sensitivity characteristic detection based on a mass spectrum image of neural network deep learning.
Background
Liquid chromatography-mass spectrometry (LC-MS) technology has been widely used in the study of complex biological samples based on metabolomics, proteomics, and genomics. As LC-MS sensitivity, chromatographic resolution and mass measurement accuracy continue to improve, more and more biomolecules can be detected. However, analyzing raw LC-MS data that is heavily noisy is more challenging and efficient preprocessing techniques need to be developed to reduce the complexity of the data set. The peak detection/feature detection is used as a key technology for preprocessing LC-MS data.
Currently, various peak detection calculation methods are integrated into public software such as XCMS, MZmine2, MaxQuant, and OpenMS. The main classification is two main categories: EIC (extracted ion chromatography) -based peak detection method and 2D (two-dimensional) detection method based on predefined shape matching. The EIC-based detection method finally realizes feature detection by respectively processing in retention time and m/z dimension. However, the two dimensions are treated separately, ignoring the overall characteristic distribution of the eluting compound (including isotopes, charge state distribution, and LC elution characteristics). Compared with a 1D feature detection method, the 2D detection method based on the predefined shape matching can fully utilize the ion comprehensive features of the metabolites and has more advantages in the aspect of processing LC-MS data. Although the method based on the predefined shape (e.g., Gussian) matching can effectively achieve the detection that the ion distribution characteristics conform to the ideal state (conform to the Gussian distribution), the distribution of many ions does not completely conform to the gaussian distribution in consideration of the instrument noise, the high complexity of the sample itself, and the different settings of the experimental parameters of different instruments during the experimental process. In this case, the feature detection sensitivity based on the predefined shape matching method will be greatly reduced. Existing methods, on the other hand, almost always use predefined thresholds to reduce noise. The use of a threshold to remove noise results in the loss of target ions below the threshold, which ultimately results in a higher false negative rate. In addition, low abundance ions generally have higher biological significance, and low abundance ion rejection caused by threshold setting can have irreversible effects in biomarker identification. Therefore, development of a new LC-MS data characteristic detection method with high sensitivity and specificity is urgently needed to realize detection of low-intensity and non-ideal shapes.
Disclosure of Invention
The invention aims to overcome the defects of the existing feature detection technology, provides a high-sensitivity feature detection method for LC-MS data, and particularly provides a target detection method based on deep learning to realize high-sensitivity and high-specificity detection of features.
The invention provides a high-sensitivity characteristic detection method for LC-MS data, which comprises the following steps:
the method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:
1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;
2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: and for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is expressed as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category as the target or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis.
3a) Taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;
step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:
1b) designing a deep convolutional network structure to enable the deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;
2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:
a(l+1)=f(W(l)a(l)+b(l)) I
wherein f (), W, bfRespectively representing the activation function of each layer,Weight and bias matrix parameters; the network loss function is expressed as formula II, the network back propagation process is the process of minimizing the loss function,
wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:
wherein, { xi,yi,wi,hi,Ci,Pi(c) Obtaining the data through a forward propagation formula I;
step three: network testing
Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { xi,yi,wi,hi,Ci,Pi(c)}。
In a preferred embodiment, the mass signal comprises multiple charges or isotopes.
In a preferred embodiment, the deep convolutional network structure further comprises a front j (j < l-1) excitation layer and a last layer.
The front j (j < l-1) excitation layer uses the formula VI as the excitation function,
the last layer uses formula VII as the excitation function,
f(z)=max(0,z) VI
in a preferred embodiment, the flag information { x }i,yi,wi,hi,Ci,Pi(c) And the position information and the category information of the sample to be detected are included.
In a preferred embodiment, the image blocks have a size of m × n pixels in length and width, and the label data is also scaled to the corresponding size according to the corresponding ratio, where m is an integer between 224 and 448 and n is an integer between 224 and 448.
The beneficial effects that this application can produce include:
due to the strong sample feature extraction capability of the deep convolutional neural network layer designed in the invention, the two-dimensional distribution features of sample ions can be effectively captured, and further the target ion detection under a complex scene is realized. Due to the multi-scale structure, the detection of different target sizes can be realized, and the robustness to different elution conditions and instrument platforms is further improved. By the target detection and probability output, the interpretability of the detection result is improved.
Drawings
FIG. 1 shows the basic structure of the process of the present invention, in which M/Z represents the mass-to-charge ratio and RT represents the retention time;
fig. 2 shows the detection result of the method in a complex scene. The method is verified to be capable of realizing detection of targets with different sizes. In the figure, M/Z represents the mass-to-charge ratio and RT represents the retention time.
Detailed Description
The present application will be described in detail with reference to examples, but the present application is not limited to these examples.
The method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:
1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;
2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: and for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is expressed as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category as the target or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis.
3a) Taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;
step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:
1b) designing a deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure (a series of network layers with different scales obtained by up-sampling or down-sampling network intermediate layers) and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;
2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:
a(l+1)=f(W(l)a(l)+b(l)) I
wherein f (), W and bf respectively represent an activation function, weight and bias matrix parameters of each layer; the network loss function is expressed as a formula II, the network back propagation process is the process of minimizing the loss function, and the back propagation calculation process can be realized by a conventional gradient descent method, so that the error between the calculated estimated value and the actual value is minimized, and the training process of the network is further realized.
Wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:
wherein, { xi,yi,wi,hi,Ci,Pi(c) Obtaining the data through a forward propagation formula I;
step three: network testing
Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { xi,yi,wi,hi,Ci,Pi(c)}。
Table 1 shows the results of the inventive method compared to other detection methods. Therefore, the method improves the detection result and can obtain higher detection precision.
TABLE 1
Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.
Claims (5)
1. A high-sensitivity characteristic detection method for LC-MS data is characterized by comprising the following steps:
the method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:
1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;
2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is represented as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category of the target object or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis;
3a) taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;
step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:
1b) designing a deep convolutional network structure to enable the deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;
2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:
a(l+1)=f(W(l)a(l)+b(l)) I
wherein f (), W, bfRespectively representing the excitation of each layerLive function, weight and bias matrix parameters; the network loss function is expressed as formula II, the network back propagation process is the process of minimizing the loss function,
wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:
wherein, { xi,yi,wi,hi,Ci,Pi(c) Obtaining the data through a forward propagation formula I;
step three: network testing
Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { xi,yi,wi,hi,Ci,Pi(c)}。
2. The detection method of claim 1, wherein the mass signal comprises multiple charges or isotopes.
3. The detection method according to claim 1, wherein the deep convolutional network structure further comprises a front j (j < l-1) excitation layer and a last layer;
the front j (j < l-1) excitation layer uses formula VI as an excitation function;
the last layer uses formula VII as the excitation function,
f(z)=max(0,z) VI
4. the detection method according to claim 1, wherein the label information { x }i,yi,wi,hi,Ci,Pi(c) And the position information and the category information of the sample to be detected are included.
5. The detection method according to claim 1, wherein the plurality of image blocks have a length and width of mxn pixel sizes, and the label data is also scaled to the corresponding size according to the corresponding scale, where m is an integer between 224 and 448 and n is an integer between 224 and 448.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453137.7A CN111259909A (en) | 2018-11-30 | 2018-11-30 | LC-MS data high-sensitivity characteristic detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811453137.7A CN111259909A (en) | 2018-11-30 | 2018-11-30 | LC-MS data high-sensitivity characteristic detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111259909A true CN111259909A (en) | 2020-06-09 |
Family
ID=70951903
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811453137.7A Pending CN111259909A (en) | 2018-11-30 | 2018-11-30 | LC-MS data high-sensitivity characteristic detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111259909A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113504528A (en) * | 2021-07-05 | 2021-10-15 | 武汉大学 | Atmospheric level detection method based on multi-scale hypothesis test |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050226536A1 (en) * | 2004-01-30 | 2005-10-13 | Applera Corporation | Systems and methods for aligning multiple point sets |
CN104713971A (en) * | 2015-04-01 | 2015-06-17 | 山东省肿瘤医院 | Method for analyzing serum metabolomics on basis of LC-MS (liquid chromatogram-mass spectrograph) serum metabolomics technology |
CN105572212A (en) * | 2014-10-14 | 2016-05-11 | 中国科学院大连化学物理研究所 | Visual mass spectrometry information-based sun-dried ginseng and red ginseng rapid identification method |
CN105574474A (en) * | 2014-10-14 | 2016-05-11 | 中国科学院大连化学物理研究所 | Mass spectrometry information-based biological characteristic image identification method |
CN108062744A (en) * | 2017-12-13 | 2018-05-22 | 中国科学院大连化学物理研究所 | A kind of mass spectrum image super-resolution rebuilding method based on deep learning |
CN108152434A (en) * | 2016-12-02 | 2018-06-12 | 中国科学院大连化学物理研究所 | A kind of lookup method of the Chinese medicine specific component based on visualization Information in Mass Spectra |
CN108509860A (en) * | 2018-03-09 | 2018-09-07 | 西安电子科技大学 | HOh Xil Tibetan antelope detection method based on convolutional neural networks |
-
2018
- 2018-11-30 CN CN201811453137.7A patent/CN111259909A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050226536A1 (en) * | 2004-01-30 | 2005-10-13 | Applera Corporation | Systems and methods for aligning multiple point sets |
CN105572212A (en) * | 2014-10-14 | 2016-05-11 | 中国科学院大连化学物理研究所 | Visual mass spectrometry information-based sun-dried ginseng and red ginseng rapid identification method |
CN105574474A (en) * | 2014-10-14 | 2016-05-11 | 中国科学院大连化学物理研究所 | Mass spectrometry information-based biological characteristic image identification method |
CN104713971A (en) * | 2015-04-01 | 2015-06-17 | 山东省肿瘤医院 | Method for analyzing serum metabolomics on basis of LC-MS (liquid chromatogram-mass spectrograph) serum metabolomics technology |
CN108152434A (en) * | 2016-12-02 | 2018-06-12 | 中国科学院大连化学物理研究所 | A kind of lookup method of the Chinese medicine specific component based on visualization Information in Mass Spectra |
CN108062744A (en) * | 2017-12-13 | 2018-05-22 | 中国科学院大连化学物理研究所 | A kind of mass spectrum image super-resolution rebuilding method based on deep learning |
CN108509860A (en) * | 2018-03-09 | 2018-09-07 | 西安电子科技大学 | HOh Xil Tibetan antelope detection method based on convolutional neural networks |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113504528A (en) * | 2021-07-05 | 2021-10-15 | 武汉大学 | Atmospheric level detection method based on multi-scale hypothesis test |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110533084B (en) | Multi-scale target detection method based on self-attention mechanism | |
CN110210560B (en) | Incremental training method, classification method and device, equipment and medium of classification network | |
CN111652217B (en) | Text detection method and device, electronic equipment and computer storage medium | |
CN109903282B (en) | Cell counting method, system, device and storage medium | |
CN110246171B (en) | Real-time monocular video depth estimation method | |
CN112017192B (en) | Glandular cell image segmentation method and glandular cell image segmentation system based on improved U-Net network | |
CN103226196B (en) | Radar target recognition method based on sparse feature | |
CN108846414B (en) | SAR image subcategory classification method based on decision-level fusion idea | |
CN103870836A (en) | POCS (Projections Onto Convex Sets) super-resolution reconstruction-based SAR (Synthetic Aperture Radar) image target recognition method | |
CN110738171B (en) | Hyperspectral image spectral space classification method based on class feature iterative random sampling | |
CN110135428B (en) | Image segmentation processing method and device | |
CN111259909A (en) | LC-MS data high-sensitivity characteristic detection method | |
CN109064464B (en) | Method and device for detecting burrs of battery pole piece | |
CN110579554A (en) | 3D mass spectrometric predictive classification | |
CN106951918B (en) | Single-particle image clustering method for analysis of cryoelectron microscope | |
CN112560925A (en) | Complex scene target detection data set construction method and system | |
JP2022534468A (en) | A method for real-time encoding of scanned SWATH data and a probabilistic framework for progenitor inference | |
CN110969630A (en) | Ore bulk rate detection method based on RDU-net network model | |
CN110728316A (en) | Classroom behavior detection method, system, device and storage medium | |
CN107230201B (en) | Sample self-calibration ELM-based on-orbit SAR (synthetic aperture radar) image change detection method | |
Hong et al. | Weighted elastic net model for mass spectrometry imaging processing | |
CN111612740B (en) | Pathological image processing method and device | |
CN104616264B (en) | The automatic contrast enhancement method of gene-chip Image | |
Wang et al. | Reversible jump MCMC approach for peak identification for stroke SELDI mass spectrometry using mixture model | |
CN113743413B (en) | Visual SLAM method and system combining image semantic information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |