CN111259909A

CN111259909A - LC-MS data high-sensitivity characteristic detection method

Info

Publication number: CN111259909A
Application number: CN201811453137.7A
Authority: CN
Inventors: 张晓哲; 赵凡; 黄帅
Original assignee: Dalian Institute of Chemical Physics of CAS
Current assignee: Dalian Institute of Chemical Physics of CAS
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09

Abstract

The invention discloses a novel LC-MS data high-sensitivity characteristic detection method, which comprises the following steps: firstly, acquiring liquid chromatography-mass spectrometry (LC-MS) data by adopting an LC-MS instrument, acquiring a series of image blocks by utilizing a sliding window in a certain step length, manually marking target ion position information in the image blocks in a target frame mode, and taking the information as a training sample set; an effective deep convolution neural network structure is designed, and target ion distribution characteristics are learned according to input training samples in the training process, so that the problem that complex scenes are difficult to recognize due to conventional predefined shapes is solved; in the testing stage, different samples are used for verifying that the novel method provided by the invention can realize high-sensitivity characteristic detection and probability output of LC-MS data.

Description

LC-MS data high-sensitivity characteristic detection method

Technical Field

The application relates to the technical field of biochemical image processing, in particular to a method for realizing high-sensitivity characteristic detection based on a mass spectrum image of neural network deep learning.

Background

Liquid chromatography-mass spectrometry (LC-MS) technology has been widely used in the study of complex biological samples based on metabolomics, proteomics, and genomics. As LC-MS sensitivity, chromatographic resolution and mass measurement accuracy continue to improve, more and more biomolecules can be detected. However, analyzing raw LC-MS data that is heavily noisy is more challenging and efficient preprocessing techniques need to be developed to reduce the complexity of the data set. The peak detection/feature detection is used as a key technology for preprocessing LC-MS data.

Currently, various peak detection calculation methods are integrated into public software such as XCMS, MZmine2, MaxQuant, and OpenMS. The main classification is two main categories: EIC (extracted ion chromatography) -based peak detection method and 2D (two-dimensional) detection method based on predefined shape matching. The EIC-based detection method finally realizes feature detection by respectively processing in retention time and m/z dimension. However, the two dimensions are treated separately, ignoring the overall characteristic distribution of the eluting compound (including isotopes, charge state distribution, and LC elution characteristics). Compared with a 1D feature detection method, the 2D detection method based on the predefined shape matching can fully utilize the ion comprehensive features of the metabolites and has more advantages in the aspect of processing LC-MS data. Although the method based on the predefined shape (e.g., Gussian) matching can effectively achieve the detection that the ion distribution characteristics conform to the ideal state (conform to the Gussian distribution), the distribution of many ions does not completely conform to the gaussian distribution in consideration of the instrument noise, the high complexity of the sample itself, and the different settings of the experimental parameters of different instruments during the experimental process. In this case, the feature detection sensitivity based on the predefined shape matching method will be greatly reduced. Existing methods, on the other hand, almost always use predefined thresholds to reduce noise. The use of a threshold to remove noise results in the loss of target ions below the threshold, which ultimately results in a higher false negative rate. In addition, low abundance ions generally have higher biological significance, and low abundance ion rejection caused by threshold setting can have irreversible effects in biomarker identification. Therefore, development of a new LC-MS data characteristic detection method with high sensitivity and specificity is urgently needed to realize detection of low-intensity and non-ideal shapes.

Disclosure of Invention

The invention aims to overcome the defects of the existing feature detection technology, provides a high-sensitivity feature detection method for LC-MS data, and particularly provides a target detection method based on deep learning to realize high-sensitivity and high-specificity detection of features.

The invention provides a high-sensitivity characteristic detection method for LC-MS data, which comprises the following steps:

the method comprises the following steps: the method for preparing the training sample data set of the LC-MS data comprises the following steps:

1a) acquiring LC-MS data of a biological sample; converting the acquired LC-MS data into a plurality of image blocks to be marked with different sizes by adopting different windows and step lengths, wherein the X axis of the image block is M/Z, and the Y axis of the image block is retention time RT;

2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: and for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is expressed as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category as the target or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis.

3a) Taking the image blocks as network input, and taking corresponding marking information { x, y, w, h, C, P } as a network true value to obtain a large number of training sample sets;

step two: designing a deep convolutional neural network for LC-MS data detection, wherein the method comprises the following steps:

1b) designing a deep convolutional network structure to enable the deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;

2b) performing network training, wherein the network training comprises a forward propagation process and a backward propagation process, and the forward propagation process of the network conforms to formula I:

a^(l+1)＝f(W^(l)a^(l)+b^(l)) I

wherein f (), W, b_fRespectively representing the activation function of each layer,Weight and bias matrix parameters; the network loss function is expressed as formula II, the network back propagation process is the process of minimizing the loss function,

wherein the first term, the second term and the third term respectively represent a coordinate error, a confidence error and a coordinate error, respectively represented as:

wherein, { x_i,y_i,w_i,h_i,C_i,P_i(c) Obtaining the data through a forward propagation formula I;

step three: network testing

Inputting LC-MS data of a sample to be detected, sliding on the known LC-MS data by adopting a window with a certain size, and carrying out target detection in the window range, wherein the target detection result can be expressed as { x_i,y_i,w_i,h_i,C_i,P_i(c)}。

In a preferred embodiment, the mass signal comprises multiple charges or isotopes.

In a preferred embodiment, the deep convolutional network structure further comprises a front j (j < l-1) excitation layer and a last layer.

The front j (j < l-1) excitation layer uses the formula VI as the excitation function,

the last layer uses formula VII as the excitation function,

f(z)＝max(0,z) VI

in a preferred embodiment, the flag information { x }_i,y_i,w_i,h_i,C_i,P_i(c) And the position information and the category information of the sample to be detected are included.

In a preferred embodiment, the image blocks have a size of m × n pixels in length and width, and the label data is also scaled to the corresponding size according to the corresponding ratio, where m is an integer between 224 and 448 and n is an integer between 224 and 448.

The beneficial effects that this application can produce include:

due to the strong sample feature extraction capability of the deep convolutional neural network layer designed in the invention, the two-dimensional distribution features of sample ions can be effectively captured, and further the target ion detection under a complex scene is realized. Due to the multi-scale structure, the detection of different target sizes can be realized, and the robustness to different elution conditions and instrument platforms is further improved. By the target detection and probability output, the interpretability of the detection result is improved.

Drawings

FIG. 1 shows the basic structure of the process of the present invention, in which M/Z represents the mass-to-charge ratio and RT represents the retention time;

fig. 2 shows the detection result of the method in a complex scene. The method is verified to be capable of realizing detection of targets with different sizes. In the figure, M/Z represents the mass-to-charge ratio and RT represents the retention time.

Detailed Description

The present application will be described in detail with reference to examples, but the present application is not limited to these examples.

1b) designing a deep convolutional network structure to comprise a feature extraction layer, a multi-scale structure (a series of network layers with different scales obtained by up-sampling or down-sampling network intermediate layers) and a detection layer, wherein the feature extraction layer consists of a plurality of convolutional layers CONV and an excitation layer;

a^(l+1)＝f(W^(l)a^(l)+b^(l)) I

wherein f (), W and bf respectively represent an activation function, weight and bias matrix parameters of each layer; the network loss function is expressed as a formula II, the network back propagation process is the process of minimizing the loss function, and the back propagation calculation process can be realized by a conventional gradient descent method, so that the error between the calculated estimated value and the actual value is minimized, and the training process of the network is further realized.

step three: network testing

Table 1 shows the results of the inventive method compared to other detection methods. Therefore, the method improves the detection result and can obtain higher detection precision.

TABLE 1

Although the present application has been described with reference to a few embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the application as defined by the appended claims.

Claims

1. A high-sensitivity characteristic detection method for LC-MS data is characterized by comprising the following steps:

2a) marking a plurality of image blocks obtained in the step 1), wherein the marking process is as follows: for the target object with distribution characteristics in the image blocks, a target frame is used for defining the position and the belonging category of the target object, the labeling information of each image block is represented as { X, Y, w, h, C, P }, wherein X, Y represents the central position of the target frame, w, h represents the width and the length of the target frame, C represents the confidence coefficient that the target falls into the target frame, and P represents the belonging category of the target object or background noise, the distribution characteristics refer to that the compound appears more than twice on the Y axis, and each eluting compound generates more than two mass signals on the X axis;

a^(l+1)＝f(W^(l)a^(l)+b^(l)) I

wherein f (), W, b_fRespectively representing the excitation of each layerLive function, weight and bias matrix parameters; the network loss function is expressed as formula II, the network back propagation process is the process of minimizing the loss function,

step three: network testing

2. The detection method of claim 1, wherein the mass signal comprises multiple charges or isotopes.

3. The detection method according to claim 1, wherein the deep convolutional network structure further comprises a front j (j < l-1) excitation layer and a last layer;

the front j (j < l-1) excitation layer uses formula VI as an excitation function;

the last layer uses formula VII as the excitation function,

f(z)＝max(0,z) VI

4. the detection method according to claim 1, wherein the label information { x }_i,y_i,w_i,h_i,C_i,P_i(c) And the position information and the category information of the sample to be detected are included.

5. The detection method according to claim 1, wherein the plurality of image blocks have a length and width of mxn pixel sizes, and the label data is also scaled to the corresponding size according to the corresponding scale, where m is an integer between 224 and 448 and n is an integer between 224 and 448.