CN111259909A - LC-MS data high-sensitivity characteristic detection method - Google Patents

LC-MS data high-sensitivity characteristic detection method Download PDF

Info

Publication number
CN111259909A
CN111259909A CN201811453137.7A CN201811453137A CN111259909A CN 111259909 A CN111259909 A CN 111259909A CN 201811453137 A CN201811453137 A CN 201811453137A CN 111259909 A CN111259909 A CN 111259909A
Authority
CN
China
Prior art keywords
data
network
target
detection
image blocks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811453137.7A
Other languages
Chinese (zh)
Inventor
张晓哲
赵凡
黄帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Institute of Chemical Physics of CAS
Original Assignee
Dalian Institute of Chemical Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Institute of Chemical Physics of CAS filed Critical Dalian Institute of Chemical Physics of CAS
Priority to CN201811453137.7A priority Critical patent/CN111259909A/en
Publication of CN111259909A publication Critical patent/CN111259909A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a novel LC-MS data high-sensitivity characteristic detection method, which comprises the following steps: firstly, acquiring liquid chromatography-mass spectrometry (LC-MS) data by adopting an LC-MS instrument, acquiring a series of image blocks by utilizing a sliding window in a certain step length, manually marking target ion position information in the image blocks in a target frame mode, and taking the information as a training sample set; an effective deep convolution neural network structure is designed, and target ion distribution characteristics are learned according to input training samples in the training process, so that the problem that complex scenes are difficult to recognize due to conventional predefined shapes is solved; in the testing stage, different samples are used for verifying that the novel method provided by the invention can realize high-sensitivity characteristic detection and probability output of LC-MS data.

Description

LC-MS data high-sensitivity characteristic detection method
Technical Field
The application relates to the technical field of biochemical image processing, in particular to a method for realizing high-sensitivity characteristic detection based on a mass spectrum image of neural network deep learning.
Background
Liquid chromatography-mass spectrometry (LC-MS) technology has been widely used in the study of complex biological samples based on metabolomics, proteomics, and genomics. As LC-MS sensitivity, chromatographic resolution and mass measurement accuracy continue to improve, more and more biomolecules can be detected. However, analyzing raw LC-MS data that is heavily noisy is more challenging and efficient preprocessing techniques need to be developed to reduce the complexity of the data set. The peak detection/feature detection is used as a key technology for preprocessing LC-MS data.
Currently, various peak detection calculation methods are integrated into public software such as XCMS, MZmine2, MaxQuant, and OpenMS. The main classification is two main categories: EIC (extracted ion chromatography) -based peak detection method and 2D (two-dimensional) detection method based on predefined shape matching. The EIC-based detection method finally realizes feature detection by respectively processing in retention time and m/z dimension. However, the two dimensions are treated separately, ignoring the overall characteristic distribution of the eluting compound (including isotopes, charge state distribution, and LC elution characteristics). Compared with a 1D feature detection method, the 2D detection method based on the predefined shape matching can fully utilize the ion comprehensive features of the metabolites and has more advantages in the aspect of processing LC-MS data. Although the method based on the predefined shape (e.g., Gussian) matching can effectively achieve the detection that the ion distribution characteristics conform to the ideal state (conform to the Gussian distribution), the distribution of many ions does not completely conform to the gaussian distribution in consideration of the instrument noise, the high complexity of the sample itself, and the different settings of the experimental parameters of different instruments during the experimental process. In this case, the feature detection sensitivity based on the predefined shape matching method will be greatly reduced. Existing methods, on the other hand, almost always use predefined thresholds to reduce noise. The use of a threshold to remove noise results in the loss of target ions below the threshold, which ultimately results in a higher false negative rate. In addition, low abundance ions generally have higher biological significance, and low abundance ion rejection caused by threshold setting can have irreversible effects in biomarker identification. Therefore, development of a new LC-MS data characteristic detection method with high sensitivity and specificity is urgently needed to realize detection of low-intensity and non-ideal shapes.
Disclosure of Invention
The invention aims to overcome the defects of the existing feature detection technology, provides a high-sensitivity feature detection method for LC-MS data, and particularly provides a target detection method based on deep learning to realize high-sensitivity and high-specificity detection of features.
The invention provides a high-sensitivity characteristic detection method for LC-MS data, which comprises the following steps:
the method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:
1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;
2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: and for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is expressed as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category as the target or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis.
3a) Taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;
step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:
1b) designing a deep convolutional network structure to enable the deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;
2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:
a(l+1)=f(W(l)a(l)+b(l)) I
wherein f (), W, bfRespectively representing the activation function of each layer,Weight and bias matrix parameters; the network loss function is expressed as formula II, the network back propagation process is the process of minimizing the loss function,
Figure BDA0001887152640000031
wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:
Figure BDA0001887152640000032
Figure BDA0001887152640000033
Figure BDA0001887152640000034
wherein, { xi,yi,wi,hi,Ci,Pi(c) Obtaining the data through a forward propagation formula I;
step three: network testing
Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { xi,yi,wi,hi,Ci,Pi(c)}。
In a preferred embodiment, the mass signal comprises multiple charges or isotopes.
In a preferred embodiment, the deep convolutional network structure further comprises a front j (j < l-1) excitation layer and a last layer.
The front j (j < l-1) excitation layer uses the formula VI as the excitation function,
the last layer uses formula VII as the excitation function,
f(z)=max(0,z) VI
Figure BDA0001887152640000041
in a preferred embodiment, the flag information { x }i,yi,wi,hi,Ci,Pi(c) And the position information and the category information of the sample to be detected are included.
In a preferred embodiment, the image blocks have a size of m × n pixels in length and width, and the label data is also scaled to the corresponding size according to the corresponding ratio, where m is an integer between 224 and 448 and n is an integer between 224 and 448.
The beneficial effects that this application can produce include:
due to the strong sample feature extraction capability of the deep convolutional neural network layer designed in the invention, the two-dimensional distribution features of sample ions can be effectively captured, and further the target ion detection under a complex scene is realized. Due to the multi-scale structure, the detection of different target sizes can be realized, and the robustness to different elution conditions and instrument platforms is further improved. By the target detection and probability output, the interpretability of the detection result is improved.
Drawings
FIG. 1 shows the basic structure of the process of the present invention, in which M/Z represents the mass-to-charge ratio and RT represents the retention time;
fig. 2 shows the detection result of the method in a complex scene. The method is verified to be capable of realizing detection of targets with different sizes. In the figure, M/Z represents the mass-to-charge ratio and RT represents the retention time.
Detailed Description
The present application will be described in detail with reference to examples, but the present application is not limited to these examples.
The method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:
1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;
2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: and for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is expressed as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category as the target or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis.
3a) Taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;
step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:
1b) designing a deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure (a series of network layers with different scales obtained by up-sampling or down-sampling network intermediate layers) and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;
2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:
a(l+1)=f(W(l)a(l)+b(l)) I
wherein f (), W and bf respectively represent an activation function, weight and bias matrix parameters of each layer; the network loss function is expressed as a formula II, the network back propagation process is the process of minimizing the loss function, and the back propagation calculation process can be realized by a conventional gradient descent method, so that the error between the calculated estimated value and the actual value is minimized, and the training process of the network is further realized.
Figure BDA0001887152640000051
Wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:
Figure BDA0001887152640000061
Figure BDA0001887152640000062
Figure BDA0001887152640000063
wherein, { xi,yi,wi,hi,Ci,Pi(c) Obtaining the data through a forward propagation formula I;
step three: network testing
Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { xi,yi,wi,hi,Ci,Pi(c)}。
Table 1 shows the results of the inventive method compared to other detection methods. Therefore, the method improves the detection result and can obtain higher detection precision.
TABLE 1
Figure BDA0001887152640000064
Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

Claims (5)

1. A high-sensitivity characteristic detection method for LC-MS data is characterized by comprising the following steps:
the method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:
1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;
2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is represented as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category of the target object or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis;
3a) taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;
step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:
1b) designing a deep convolutional network structure to enable the deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;
2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:
a(l+1)=f(W(l)a(l)+b(l)) I
wherein f (), W, bfRespectively representing the excitation of each layerLive function, weight and bias matrix parameters; the network loss function is expressed as formula II, the network back propagation process is the process of minimizing the loss function,
Figure FDA0001887152630000011
wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:
Figure FDA0001887152630000021
Figure FDA0001887152630000022
Figure FDA0001887152630000023
wherein, { xi,yi,wi,hi,Ci,Pi(c) Obtaining the data through a forward propagation formula I;
step three: network testing
Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { xi,yi,wi,hi,Ci,Pi(c)}。
2. The detection method of claim 1, wherein the mass signal comprises multiple charges or isotopes.
3. The detection method according to claim 1, wherein the deep convolutional network structure further comprises a front j (j < l-1) excitation layer and a last layer;
the front j (j < l-1) excitation layer uses formula VI as an excitation function;
the last layer uses formula VII as the excitation function,
f(z)=max(0,z) VI
Figure FDA0001887152630000024
4. the detection method according to claim 1, wherein the label information { x }i,yi,wi,hi,Ci,Pi(c) And the position information and the category information of the sample to be detected are included.
5. The detection method according to claim 1, wherein the plurality of image blocks have a length and width of mxn pixel sizes, and the label data is also scaled to the corresponding size according to the corresponding scale, where m is an integer between 224 and 448 and n is an integer between 224 and 448.
CN201811453137.7A 2018-11-30 2018-11-30 LC-MS data high-sensitivity characteristic detection method Pending CN111259909A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811453137.7A CN111259909A (en) 2018-11-30 2018-11-30 LC-MS data high-sensitivity characteristic detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811453137.7A CN111259909A (en) 2018-11-30 2018-11-30 LC-MS data high-sensitivity characteristic detection method

Publications (1)

Publication Number Publication Date
CN111259909A true CN111259909A (en) 2020-06-09

Family

ID=70951903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811453137.7A Pending CN111259909A (en) 2018-11-30 2018-11-30 LC-MS data high-sensitivity characteristic detection method

Country Status (1)

Country Link
CN (1) CN111259909A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113504528A (en) * 2021-07-05 2021-10-15 武汉大学 Atmospheric level detection method based on multi-scale hypothesis test

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050226536A1 (en) * 2004-01-30 2005-10-13 Applera Corporation Systems and methods for aligning multiple point sets
CN104713971A (en) * 2015-04-01 2015-06-17 山东省肿瘤医院 Method for analyzing serum metabolomics on basis of LC-MS (liquid chromatogram-mass spectrograph) serum metabolomics technology
CN105572212A (en) * 2014-10-14 2016-05-11 中国科学院大连化学物理研究所 Visual mass spectrometry information-based sun-dried ginseng and red ginseng rapid identification method
CN105574474A (en) * 2014-10-14 2016-05-11 中国科学院大连化学物理研究所 Mass spectrometry information-based biological characteristic image identification method
CN108062744A (en) * 2017-12-13 2018-05-22 中国科学院大连化学物理研究所 A kind of mass spectrum image super-resolution rebuilding method based on deep learning
CN108152434A (en) * 2016-12-02 2018-06-12 中国科学院大连化学物理研究所 A kind of lookup method of the Chinese medicine specific component based on visualization Information in Mass Spectra
CN108509860A (en) * 2018-03-09 2018-09-07 西安电子科技大学 HOh Xil Tibetan antelope detection method based on convolutional neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050226536A1 (en) * 2004-01-30 2005-10-13 Applera Corporation Systems and methods for aligning multiple point sets
CN105572212A (en) * 2014-10-14 2016-05-11 中国科学院大连化学物理研究所 Visual mass spectrometry information-based sun-dried ginseng and red ginseng rapid identification method
CN105574474A (en) * 2014-10-14 2016-05-11 中国科学院大连化学物理研究所 Mass spectrometry information-based biological characteristic image identification method
CN104713971A (en) * 2015-04-01 2015-06-17 山东省肿瘤医院 Method for analyzing serum metabolomics on basis of LC-MS (liquid chromatogram-mass spectrograph) serum metabolomics technology
CN108152434A (en) * 2016-12-02 2018-06-12 中国科学院大连化学物理研究所 A kind of lookup method of the Chinese medicine specific component based on visualization Information in Mass Spectra
CN108062744A (en) * 2017-12-13 2018-05-22 中国科学院大连化学物理研究所 A kind of mass spectrum image super-resolution rebuilding method based on deep learning
CN108509860A (en) * 2018-03-09 2018-09-07 西安电子科技大学 HOh Xil Tibetan antelope detection method based on convolutional neural networks

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113504528A (en) * 2021-07-05 2021-10-15 武汉大学 Atmospheric level detection method based on multi-scale hypothesis test

Similar Documents

Publication Publication Date Title
CN110533084B (en) Multi-scale target detection method based on self-attention mechanism
CN110210560B (en) Incremental training method, classification method and device, equipment and medium of classification network
CN111652217B (en) Text detection method and device, electronic equipment and computer storage medium
CN109903282B (en) Cell counting method, system, device and storage medium
CN110246171B (en) Real-time monocular video depth estimation method
CN112017192B (en) Glandular cell image segmentation method and glandular cell image segmentation system based on improved U-Net network
CN103226196B (en) Radar target recognition method based on sparse feature
CN108846414B (en) SAR image subcategory classification method based on decision-level fusion idea
CN103870836A (en) POCS (Projections Onto Convex Sets) super-resolution reconstruction-based SAR (Synthetic Aperture Radar) image target recognition method
CN110738171B (en) Hyperspectral image spectral space classification method based on class feature iterative random sampling
CN110135428B (en) Image segmentation processing method and device
CN111259909A (en) LC-MS data high-sensitivity characteristic detection method
CN109064464B (en) Method and device for detecting burrs of battery pole piece
CN110579554A (en) 3D mass spectrometric predictive classification
CN106951918B (en) Single-particle image clustering method for analysis of cryoelectron microscope
CN112560925A (en) Complex scene target detection data set construction method and system
JP2022534468A (en) A method for real-time encoding of scanned SWATH data and a probabilistic framework for progenitor inference
CN110969630A (en) Ore bulk rate detection method based on RDU-net network model
CN110728316A (en) Classroom behavior detection method, system, device and storage medium
CN107230201B (en) Sample self-calibration ELM-based on-orbit SAR (synthetic aperture radar) image change detection method
Hong et al. Weighted elastic net model for mass spectrometry imaging processing
CN111612740B (en) Pathological image processing method and device
CN104616264B (en) The automatic contrast enhancement method of gene-chip Image
Wang et al. Reversible jump MCMC approach for peak identification for stroke SELDI mass spectrometry using mixture model
CN113743413B (en) Visual SLAM method and system combining image semantic information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination