CN111192334A

CN111192334A - Trainable compressed sensing module and image segmentation method

Info

Publication number: CN111192334A
Application number: CN202010002908.1A
Authority: CN
Inventors: 向德辉; 田海鸿; 陈新建
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2020-01-02
Filing date: 2020-01-02
Publication date: 2020-05-22
Anticipated expiration: 2040-01-02
Also published as: CN111192334B

Abstract

The invention discloses a trainable compression sensing module and an image segmentation method, which comprise the following steps: constructing a network model with the trainable compression sensing module; and taking the three-dimensional PET image with the non-small cell lung tumor as a training data set to be brought into the network model to train the network, and obtaining the trained network. The method can realize information compression in the training process, remove redundant characteristic graphs and enhance effective characteristic graphs, and has the advantages of more accurate segmentation result, higher segmentation speed and less used parameter quantity.

Description

Trainable compressed sensing module and image segmentation method

Technical Field

The invention relates to the technical field of image processing, in particular to a trainable compression sensing module and an image segmentation method.

Background

In recent 50 years, lung cancer has become the most common malignant tumor in the world, and the incidence rate is high and top in the list of malignant tumors, and the death rate is the first. According to statistics, the incidence rate of lung tumor is the first of all tumors in men. PET (positron emission tomography) is capable of detecting the active processes and characteristics of metabolism of biological tissues from the level of molecules and cells, with a higher brightness on PET images for tissues with vigorous metabolism (e.g. tumors, heart, liver). PET images are therefore widely used for clinical lung tumor diagnosis. Accurate automatic segmentation can provide important assistance to the diagnosis of the doctor. Current lung tumor segmentation methods can be divided into non-learning based methods and learning based methods. Non-learning based methods typically rely on statistical distributions of intensities.

However, non-learning based segmentation methods have limited ability to cope with variability in tumor shape. Convolutional neural networks have rapidly proven to be the most advanced tool for processing various medical images in recent years. UNet and UNet-based improved networks perform well on medical image pixel-level segmentation tasks. However, when the segmentation target is very similar in brightness to the surrounding tissue structure, the segmentation effect of the convolutional neural network is poor. Since the lung tumor in the PET image is very bright and the brightness of other organs (e.g., heart, spine and liver) is similar to the tumor, when the convolutional neural network segmentation is used, the false positive result is very high.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a trainable compressed sensing module and an image segmentation method, which can realize information compression in the training process, remove redundant feature maps and enhance effective feature maps.

In order to solve the above technical problem, the present invention provides a trainable compressed sensing module, including the following steps:

obtaining three-dimensional matrix input of compressed sensing module

The input of the compression sensing module is from the output of the previous layer of the convolutional layer, wherein H is the height of the characteristic diagram, W is the width of the characteristic diagram, and C is the number of characteristic channels of the current layer;

inputting the three-dimensional matrix of the compressed sensing module

Making two-dimensional matrix deformationObtaining a deformed two-dimensional matrix

Two-dimensional matrix using singular value decomposition algorithm

Decomposition into X ═ U ∑ V^T, wherein ,

is a left-singular orthogonal vector matrix that,

is a right singular orthogonal vector matrix and,

is a matrix of singular values;

setting a weight matrix

The weight matrix

The elements on the diagonal of (1) are trainable and updated, the values of the elements outside the diagonal are zero, and a new Y matrix is obtained

Using set weight matrices

Selecting singular values of sigma to realize two-dimensional rectangle X ═ U sigma V^TThe information compression of (2) removes redundant features and enhances the available information.

The invention discloses an image segmentation method based on the trainable compressed sensing module, which comprises the following steps:

constructing a network model with the trainable compression sensing module;

acquiring a three-dimensional PET image training data set with non-small cell lung tumor, bringing the training data set into a network model to train a network, and acquiring the trained network.

Preferably, the constructing of the trainable compression induction module network model specifically includes:

the method comprises the steps of utilizing a full convolutional network as a basic framework of a network model, wherein the full convolutional network comprises three convolutional layers, three trainable compression sensing modules, three times of downsampling and three times of upsampling, and each convolutional layer of the full convolutional network comprises a normalization function, a convolution function and an activation function.

Preferably, the system further comprises a deep layer supervision mechanism, and the deep layer supervision mechanism specifically comprises the following steps:

adding an auxiliary branch after the trainable compression sensing module, wherein the auxiliary branch amplifies some lower-level and medium-level feature vectors by using deconvolution;

calculating the obtained feature vectors with the same size and the same complete size by utilizing a softmax activation function to obtain an additional dense prediction graph;

for the prediction graph of the auxiliary branch and the corresponding manual labeling graph, calculating a segmentation error between the prediction graph and the corresponding manual labeling graph by using a cross entropy cost function;

the cross-entropy losses of all the auxiliary branches and the cross-entropy loss of the last layer are combined to encourage gradient back-propagation and more efficient updating of parameters in each iteration.

Preferably, the method further comprises the following steps:

order to

The weight parameter of the first trainable compressed sensing module network model of the network model, wherein L is 1,2, …, L;

all trainable compression sensing modules are weighted by

With p (x)_i(ii) a Φ) represents pixel x after the output of the last layer has passed through the softmax function_iThe cross entropy loss can then be expressed as:

wherein ,

representing a training data set, pixel y_iIs associated with pixel x_iThe element belongs to the segmentation label of the pixel corresponding to the X, the second term of the formula is normalized by weight, and lambda is an adjustable hyper-parameter.

Preferably, in the "deep supervision mechanism",

by using

Weight representing the first d trainable compressed sensing modules of the main network, p (x)_i；Φ_d) The output representing the d-th branch is also after the softmax function pixel x_iA predicted probability of (d);

the deep supervision mechanism

And optimizing the formula to obtain the optimal weight phi, thereby supervising and training the compressed sensing module to extract the key effective characteristic diagram.

Preferably, the output of the last layer of the Focal local supervision network model is used.

Preferably, the "output of the last layer using the Focal local supervision network model" specifically includes:

network loss function

wherein p_k＝y_ip(x_i) is the prediction probability of the kth class, α is a balance factor used to balance positive and negative samples, gamma is a weight factor, gamma is>0 can reduce the loss of easily-divided samples;

total loss function

wherein ,

cross entropy loss for the last layer output of the network model;

is the cross entropy loss, η, of the middle auxiliary branch_dIs that

Is a balanced weight of

D is the number of branches with deep supervision.

Preferably, the "acquiring a three-dimensional PET image training data set with a non-small cell lung tumor" specifically includes: and (3) cutting each three-dimensional PET image into two-dimensional slice images along the Z-axis direction as the input of a convolutional neural network, and performing data amplification by using overturning, rotating and translating in the length and width directions.

The invention has the beneficial effects that:

1. the invention provides a trainable compressed sensing module CSM which can realize information compression in a training process, remove redundant feature maps and enhance effective feature maps;

2. the full convolution neural network provided by the invention can extract key features through the trainable CSM and further strengthen the key features in a convolution mode, so that a more excellent segmentation effect is obtained;

3. the invention provides a deep monitoring mechanism for monitoring weight parameters in a trainable compression sensing module CSM, which can guide the CSM to extract features in the training process;

4. compared with the existing method, the method provided by the invention has the advantages of more accurate segmentation result, higher segmentation speed and less used parameter amount.

Drawings

FIG. 1 is a block diagram of a trainable compressed sensing module CSM of the present invention;

FIG. 2 is a graph comparing effects, wherein (a) is a visual graph without the addition of a CSM module; (b) is a visual effect graph after adding a CSM module.

FIG. 3 is a block diagram of the overall network architecture of the present invention;

FIG. 4 is a PET image of a human body at various angles, wherein (a) is a sectional view of a coronal plane of a three-dimensional PET image; (b) a cross-sectional view of a horizontal plane of a three-dimensional PET image at the time of drawing; (c) the figure shows a section of a sagittal plane of a three-dimensional PET image;

FIG. 5 is a comparison graph of different segmentation methods.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

Example one

Referring to fig. 1, the invention proposes a trainable compressed sensing module CSM based on the idea of principal component analysis PCA, comprising the following steps:

step one, acquiring three-dimensional matrix input of a compression induction module

The input of the compression sensing module is from the output of the previous convolution layer, wherein H is the height of the characteristic diagram, W is the width of the characteristic diagram, and C is the channel number of the characteristic diagram;

step two, inputting the three-dimensional matrix of the compressed sensing module

Performing two-dimensional matrix deformation to obtain a deformed two-dimensional matrix

Where N × W is the total number of pixels on each feature map;

step three, utilizing a singular value decomposition algorithm to divide the two-dimensional matrix into two-dimensional matrixes

Decomposition into X ═ U ∑ V^T, wherein ,

is a left-singular orthogonal vector matrix that,

is a right singular orthogonal vector matrix and,

is a singular value matrix whose diagonal elements are all non-negative real numbers and whose values of the elements except the diagonal are all zero. Defining the elements on the diagonal as

wherein λ_iRepresenting the ith singular value. The C singular values are arranged on the diagonal in descending order. The covariance matrix can be written as:

XX^T＝U∑V^TV∑^TU^T＝U∑∑^TU^T(1)

where T denotes a matrix transpose. Sigma^TIs a diagonal matrix. Thus the singular value decomposition of X is equivalent to XX^TEigenspace decomposition of (a). Singular value of X XX^TIs the square root of the eigenvalue of (a), the left singular vector of X is XX^TThe feature vector of (2).

Step four, utilizing the matrix Y_r＝U∑_rFor X ═ U ∑ V^TPerforming dimensionality reduction, wherein Y_rObtained after using a singular value decomposition algorithmTwo-dimensional matrix, r representing the first r number of singular values, Σ_rPresentation selection

The first r columns of the singular value matrix;

since eigenvalues obtained by singular value decomposition are arranged in descending order, a weight matrix is set

The weight matrix

The elements on the diagonal of (1) are trainable and updated, the elements outside the diagonal have values of zero, and the elements outside the diagonal have values of zero. Thus, the new Y matrix can be defined as:

using defined weight matrices

The CSM module can realize information compression in the training process, remove redundant features and enhance effective information. After network training, although the number of the convolution characteristic maps of each layer is not changed, redundant information in each layer is removed. The feature maps of each layer become sparse and uncorrelated with each other.

Each CSM module may be understood as an information compression algorithm for transforming the originally related feature map into a set of linearly uncorrelated feature maps, called principal components. This is consistent with the signature that is irrelevant for our goal. The CSM provided by the invention has the characteristic extraction effect diagram as shown in FIG. 2.

Example two

Referring to fig. 3, the invention discloses an image segmentation method, based on the trainable compressed sensing module, including the following steps:

step one, constructing a trainable compression induction module network model; the method specifically comprises the following steps: the full convolutional network FCN is used as a basic framework of a network model, wherein the full convolutional network comprises three convolutional layers, three trainable compressed sensing modules CSM, three downsamplings and three upsamplings, and each convolutional layer of the full convolutional network comprises normalization (batch normalization), convolution (convolution kernel size is 3 × 3, sliding step size is 1) and an activation function (ReLU). The downsampling of the full convolution network adopts a convolution mode with the sliding step length of 2 and the convolution kernel size of 3 multiplied by 3. A trainable Compressed Sensing Module (CSM) is added behind each convolution layer to remove redundant features and extract valid features.

The invention also comprises a deep supervision mechanism, which specifically comprises the following steps: auxiliary branches are added behind the trainable compression sensing module, and the auxiliary branches amplify some feature vectors of lower levels and medium levels by deconvolution (the convolution kernel size is 3 x 3, and the sliding step length is 2, 4 and 8 respectively); calculating the obtained feature vectors with the same size and the same complete size by utilizing a softmax activation function to obtain an additional dense prediction graph; for the prediction graph of the auxiliary branch and the corresponding manual labeling graph, calculating a segmentation error between the prediction graph and the corresponding manual labeling graph by using a cross entropy cost function; the cross-entropy penalties of all the auxiliary branches and the cross-entropy penalty of the last layer are combined to excite gradient back propagation so as to more efficiently update the parameters in each iteration.

Order to

The weight parameter of the first trainable compressed sensing module network model of the network model, wherein L is 1,2, …, L; all trainable compression sensing modules are weighted by

wherein ,

On the other hand, in the "deep supervision mechanism", use

the deep supervision mechanism

By optimizing the formula, the optimal weight phi can be obtained, so that the trainable compressed sensing module CSM provided by the invention is supervised to extract a key effective characteristic diagram.

since the segmentation target monitors the output of the last layer of the network using Focal local less than a fraction of the slice, as shown in equation (5), better performance is obtained by adjusting the hyper-parameters α and γ.

wherein p_k＝y_ip(x_i) is the prediction probability of the kth class, α is a balance factor used to balance positive and negative samples, gamma is a weight factor, gamma is>0 can reduce the loss of easily dividable samples.

The overall loss function consists of three parts: the cross entropy Loss of the last layer output, the cross entropy Loss of the middle auxiliary branch and the Focal Loss. The total loss function is shown in equation 6.

wherein ,

cross entropy loss for the last layer output of the network model;

is the cross entropy loss, η, of the middle auxiliary branch_dIs that

Is a balanced weight of

D is the number of branches with deep supervision.

And step two, acquiring a three-dimensional PET image training data set with the non-small cell lung tumor, and bringing the training data set into a network model to train a network to obtain the trained network. As shown in fig. 4, PET images are taken at various angles of the human body.

Data used in the present invention is collected from a patient with a non-small cell lung tumor (NSCLC). The data set consisted of 54 three-dimensional (3D) PET images with physician labeling. The image size of each 3D PET image was 512X 60, and the voxel size was 0.234X 1mm³. In the present invention, 49 of them were used for training, 5 were used for testing, and each 3D pet image was sliced into two-dimensional (2D) slice images along the Z-axis direction as an input of the convolutional neural network. Taking into account data balance, using flipping, rotation, and translation in the length and width directionsAnd (5) data amplification. And finally, performing verification by using 13-fold cross validation.

The invention uses a Keras ImageDataGenerator for data amplification, with specific parameters being flip, rotation, width/height translation (True, 0.2, 0.2). The number of pictures per network was set to 4, and the network was trained using a random gradient descent (SGD) with a momentum of 0.9 and a weight decay of 0.0001. Using a "multivariate" learning rate strategy, in which the learning rate is multiplied by

power is 0.9, and initial learning rate is set to 4e^-3。

The experimental results of the invention are as follows:

in order to quantitatively evaluate the performance of the method provided by the invention, the segmentation result is compared with a manual labeling chart according to the following four indexes: DSC coefficient (Dice similarity coefficient), Precision (Precision), true positive fraction TPF (true positive fraction) and false positive fraction FPF (false positive fraction). DSC calculates the overlap between the segmentation results and the manual annotation map and is defined as:

where TP is the number of divided pixels that are true positive, FP is the number of divided pixels that are false positive, and FN is the number of divided pixels that are false negative. The TPF, FPF and accuracy index calculation formula is as follows:

the present invention compares the segmentation results with others' methods, as shown in table 1, where Param represents the neural network parameters and M represents the megabits.

As shown in Table 1, the results of the present invention are compared with the results of the prior art methods.

TABLE 1

Methods	DSC(％)	Precision(％)	TPF(％)	FPF(％)	Param
						DenseNet	57.77±17.31	57.59±26.55	87.47±15.65	0.29±0.24	2.4M
CGAN	54.10±10.01	75.40±14.58	51.57±16.08	5.66±13.39	142M
						Segcaps	58.98±23.21	50.03±23.43	97.89±3.01	3.29±3.46	1.4M
UNet	58.04±18.56	60.10±20.08	90.65±10.71	0.12±0.11	31M
						SFC-FCN	79.63±7.99	86.83±7.14	92.05±5.81	0.02±0.01	1M

Fig. 5 shows a graph of the segmentation results of different methods. The first column is the original PET picture. The second column is the corresponding manual callout. Next, a graph of the segmentation results of CGAN, DenseNet, SegCaps, UNet follows. The last ones are graphs of the segmentation results of the method of the invention.

The invention has the following beneficial effects: (1) a trainable Compressed Sensing Module (CSM) is proposed; the method can realize information compression in the training process, remove redundant feature maps and enhance effective feature maps; (2) the proposed full convolution neural network can extract key features through a trainable Compressed Sensing Module (CSM) and further strengthen the key features through a convolution mode, so that a more excellent segmentation effect is obtained; (3) a deep supervision mechanism is provided for supervising weight parameters in a trainable Compressed Sensing Module (CSM), and the CSM can be guided to carry out feature extraction in the training process; (4) compared with the existing method, the method provided by the invention has the advantages of more accurate segmentation result, higher segmentation speed and less used parameter amount.

The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims

1. A trainable compressed sensing module comprising the steps of:

obtaining three-dimensional matrix input of compressed sensing module

inputting the three-dimensional matrix of the compressed sensing module

Where N × W is the total number of pixels on each feature map;

two-dimensional matrix using singular value decomposition algorithm

Decomposition into X ═ U ∑ V^T, wherein ,

is the left singular quadrature momentThe number of the arrays is determined,

is a right singular orthogonal vector matrix and,

is a matrix of singular values;

setting a weight matrix

The weight matrix

Using set weight matrices

2. An image segmentation method based on sparse feature maps to reconstruct a full convolution network, comprising the trainable compressed sensing module of claim 1, comprising the steps of:

constructing a network model with the trainable compression sensing module;

3. The image segmentation method according to claim 2, wherein constructing the trainable compression sensing module network model specifically comprises:

4. The image segmentation method according to claim 3, further comprising a deep supervision mechanism, the deep supervision mechanism comprising in particular the steps of:

5. The image segmentation method of claim 4, further comprising:

order to

all trainable compression sensing modules are weighted by

With p (x)_i(ii) a Phi) representsPixel x after the output of the last layer passes through the softmax function_iThe cross entropy loss can then be expressed as:

wherein ,

6. The image segmentation method according to claim 4, wherein in the deep supervision mechanism,

by using

the deep supervision mechanism

7. The image segmentation method of claim 4 wherein the output of the last layer of the Focal local supervised network model is used.

8. The image segmentation method according to claim 7, wherein the "output of the last layer using a Focal local supervised network model" specifically includes:

network loss function

total loss function

wherein ,

cross entropy loss for the last layer output of the network model;

is the cross entropy loss, η, of the middle auxiliary branch_dIs that

Is a balanced weight of

D is the number of branches with deep supervision.

9. The image segmentation method as set forth in claim 2, wherein the "acquiring a three-dimensional PET image training dataset with a non-small cell lung tumor" specifically comprises: and (3) cutting each three-dimensional PET image into two-dimensional slice images along the Z-axis direction as the input of a convolutional neural network, and performing data amplification by using overturning, rotating and translating in the length and width directions.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1 to 9 are implemented when the program is executed by the memory.