CN113362295A

CN113362295A - Liver tumor identification method based on self-supervision dense convolutional neural network

Info

Publication number: CN113362295A
Application number: CN202110593311.3A
Authority: CN
Inventors: 潘奇; 李凯旋; 杜佳忆; 任芳; 杨自华; 李鹏; 杨延延; 王梦祥
Original assignee: Second Affiliated Hospital of Xian Medical University
Current assignee: Second Affiliated Hospital of Xian Medical University
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-09-07

Abstract

The invention discloses a liver tumor recognition method based on an automatic supervision dense convolutional neural network, which is characterized in that a magnetic resonance image of a patient acquires a liver slice data set, the slice data set is divided and then used for training the constructed dense convolutional network, the trained dense convolutional network is used as a coding module for constructing an automatic supervision learning network, the coding module is densely connected with the coding module, the tumor region of a part of images in the slice data set is manually marked, then the whole slice data set is divided, the divided blocks are used for training the automatic supervision learning network, the trained automatic supervision learning network is used for automatically recognizing tumors in the images, the automatic supervision convolutional neural network is used for liver tumor recognition, a picture splicing task is set as an automatic supervision upstream training task, useful expressions are learned from a large number of images which are not labeled by medicine, the method is used for learning and training of downstream target tasks, so that the purposes of automatically expanding training data samples, reducing dependence on expert experience and historical data and improving the identification accuracy of the liver focus region are achieved.

Description

Liver tumor identification method based on self-supervision dense convolutional neural network

Technical Field

The invention belongs to the technical field of medical image recognition, and particularly relates to a liver tumor recognition method based on an automatic supervision dense convolutional neural network.

Background

Liver cancer (HCC) is one of the most serious malignant tumors with mortality and morbidity in the world, and although surgical resection is the first treatment method for liver cancer, the HCC early symptoms in clinic are usually not obvious, most HCC early symptoms are in the middle and late stages, or HCC clinically discovered can obtain less than 25% of the chance of surgical resection due to the condition limitation of severe cirrhosis, abnormal liver function and the like. For patients without surgical resection, transcatheter chemoembolization (TACE) is the primary option. The theoretical basis for TACE treatment of HCC is that the blood supply to normal liver tissue is 70% -75% from the portal vein and only 25% -30% from the hepatic artery. 95% -99% of blood supply of HCC comes from hepatic artery, after ligation or embolism of hepatic artery, tumor blood supply is reduced by 90%, and blood flow of normal liver tissue is reduced by 35% -40%, without affecting blood supply of normal liver tissue. The tumor tissue has high sensitivity to ischemia and hypoxia, and because the tumor blood vessels are much thicker than normal blood vessels, the blood flow is slow, and the siphonage phenomenon exists, the chemotherapeutic drugs and the embolic agents can be retained in the blood vessel tissue of the liver cancer mostly. In order to judge the treatment effect of the drug on cancer cells, lesion region identification needs to be performed on the medical image for forward and backward comparison.

At present, B-ultrasonic, CT and MRI are the main means for clinical diagnosis of liver cancer. The clinical application of the multi-layer spiral CT and MRI multi-stage enhanced scanning in the aspect of liver cancer diagnosis is more prominent, and particularly the MRI multi-stage enhanced scanning can better play an advantage in the aspect of liver cancer diagnosis with the diameter of less than 3 cm. The lesion area of the liver can be found through MRI examination, and the influence of TACE treatment on the lesion can be obtained through front-back comparison to determine the treatment scheme of the next step, but the method needs manual identification of the lesion area on an MRI image, the comparison is time-consuming and labor-consuming, and different results can be given to the same MRI image for the same doctor, which is not favorable for subsequent medical analysis.

In view of the disadvantages of manual segmentation, an intelligent image recognition method is urgently needed to replace manual recognition. The deep learning plays an important role in the field of medical images, and in the field of medical images, image segmentation is generally defined as identifying the edge or internal pixel points of an interested region so as to conveniently carry out quantitative measurement on morphological parameters of organs, tissues or lesion regions. The U-net based on the convolutional neural network technology is a famous neural network framework in the medical field at present, the framework combines the same number of up-sampling layers and down-sampling layers, the whole image can be directly generated into a segmentation image in a forward processing process, the storage of high-resolution information is guaranteed, and the problem that training sample data is insufficient in the medical image field due to deep learning is solved because a large amount of data are marked manually and the requirement for quite high accuracy rate is difficult to realize.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a liver tumor identification method based on an automatic supervision dense convolutional neural network, which uses the automatic supervision learning reinforcement learning, can learn the useful representation of unmarked data, and improves the accuracy of the model for identifying the liver MRI image focus.

The invention is realized by the following technical scheme:

a liver tumor identification method based on an automatic supervision dense convolutional neural network comprises the following steps:

step 1, slicing is carried out from a plurality of cross directions according to a magnetic resonance image of a patient to obtain a slice data set { A };

step 2, preprocessing the slice data set { A }, and performing focus labeling on partial pictures in the slice data set to obtain a labeled data set { X } and a label-free data set { Y };

step 3, constructing and training a Densenet network, wherein the Densenet network comprises a feature extraction module, a full connection layer and a Softmax function which are sequentially connected, each picture in a slice data set { A } is divided into a plurality of picture blocks, serial number labeling is carried out on each picture block, the divided picture blocks are input to the Densenet network in a disordering sequence to be trained, and the Densenet network outputs position information of each picture block in an original image;

step 4, constructing a dense convolutional neural network, which comprises an encoding module and a decoding module, wherein the trained DenseNet network is used as the encoding module for feature extraction, the decoding module is a triple stack of an Up-sampling module and a Denseblock module, the Up-sampling module and the Denseblock module are densely connected with the feature extraction part of the decoding module, and the last Denseblock module is connected with a convolutional layer and a Softmax function;

and 5, dividing each image in the marked data set { X } into a plurality of image blocks, using the image blocks as the input of a dense convolutional neural network, training the dense convolutional neural network, integrally classifying the image blocks by using an encoding module, calculating the image blocks containing focus pixels by using a decoding module, performing pixel-level binary classification judgment by using a Softmax function to realize semantic division of the image, obtaining a semantic division result of the image blocks, directly outputting the image blocks without decoding and calculating the image blocks containing the marked focus pixels, and finally restoring the output results of the image blocks by using the decoding module according to the division sequence to obtain the original image marked with the focus position area.

Preferably, the slicing is performed in step 1 in two intersecting directions, and the two directions are perpendicular to each other.

Preferably, the slice data set { a } is preprocessed in step 2 to be an image normalization process, and the image normalization process is performed by the following method:

wherein,

all image data matrices representing the slice data set before non-normalization processing, subscripts I and J representing rows and columns of the image data matrices, subscript K representing different images, I and J representing the number of pixels of the matrices in the rows and columns, K representing the total number of images, MaxP representing the maximum value of all pixels, MinP representing the minimum value of all pixels, P_i,j,kAll image data matrices representing the normalized slice data set.

Preferably, the feature extraction module in step 3 includes a convolutional layer, a plurality of deteblock modules and a Transition module that are alternately connected, where the convolutional layer is connected to the first deteblock module, and the last deteblock module is further connected to a convolutional layer.

Preferably, the dense convolutional network in the step 3 adopts an SGD gradient descent optimization algorithm, calculates a loss function by using a cross entropy function, and adopts a strategy of adaptively updating the network learning rate in the training process.

Preferably, the calculation method of the loss function is as follows:

where m is the number of image segments, label_i,jIndicating the class i, predictor to which the image block j belongs_i,jRepresenting the prediction probability that an image block j belongs to class i.

Preferably, the dense convolutional neural network uses an SGD gradient descent optimization algorithm, uses a cross entropy function to calculate a loss function, adopts a strategy of adaptively updating the network learning rate in the training process, and the loss function of the self-supervision learning network comprises a classification loss function L of integral judgment₁And loss function L of pixel judgment₂And monitoring the change of the two during training.

Preferably, the overall judged classification loss function L₁Is calculated as follows:

L₁＝-[label·log(predict)+(1-label)·log(1-predict)]

wherein, the prediction is the probability of labeling the lesion area in the predicted image, and the label is a sample label;

loss function L of pixel judgment of each image block₂The method is the summation of loss functions of a single pixel point, and the formula is as follows:

wherein n represents the total number of pixels of the image, predict_iIs the probability that the ith pixel point of the model prediction image belongs to the cancer cell, label_iIs the sample label of the ith pixel.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention provides a liver tumor identification method based on an automatic supervision dense convolution neural network, which is characterized in that a medical image of a patient is sliced for a plurality of times from two crossed directions, the direction of a conventional medical slice is taken as a main direction, one direction perpendicular to the main direction is selected as an auxiliary direction, slices are made from the two perpendicular directions of the main direction and the auxiliary direction, compared with the slice in a single direction, the problem that local cancer cells are smaller and neglected in the direction can exist, the positions of the cancer cells are cooperatively positioned from the two directions, secondly, a data set used by the neural network training not only comprises an MRI image of a focus area containing a mark, but also comprises a large number of MRI images which are not marked, compared with the traditional method that only the marked image is used for learning and training of the neural network, and the method does not depend on medical knowledge, expert experience and historical data too much. An automatic supervision learning method is introduced, and a jigsaw upstream training task is set for all MRI images, so that a neural network can learn useful expressions of a large amount of unlabeled data, and the accuracy of a liver cancer identification task can be effectively improved. The self-supervision learning network coding module and the decoding module both refer to the concept of densenert, and simultaneously, feature maps of a coding part and a decoding part are densely connected, so that features of multiple sizes can be fused, the picture identification effect is improved, finally, a full connection layer and a softmax activation function are arranged at the end of a coding region to judge whether the partial image comprises cancer cells, only the part where the cancer cells are judged to exist participates in the calculation of the decoding region, the calculation amount can be effectively reduced, and the calculation efficiency is improved.

Drawings

FIG. 1 is a schematic block diagram of a liver tumor identification method based on an auto-supervised dense convolutional neural network according to the present invention;

FIG. 2 is a block diagram schematically illustrating the structure of a dense convolutional neural network constructed in the recognition method of the present invention;

FIG. 3 is a schematic structural diagram of a dense convolution block Densblock in a network;

FIG. 4 is a diagram illustrating the structure of a volume block in Densblock;

FIG. 5 is a diagram illustrating the structure of a Transition block Transition in a network;

Detailed Description

The present invention will now be described in further detail with reference to the attached drawings, which are illustrative, but not limiting, of the present invention.

Referring to fig. 1, a liver tumor identification method based on an auto-supervised dense convolutional neural network includes the following steps:

step 1, slicing from two crossed directions according to a magnetic resonance image of a patient to obtain a liver slice data set { A };

in particular, medical images are acquired using liver Magnetic Resonance Imaging (MRI) techniques. Slice making is carried out from two cross directions, M slices are cut in the direction which is supposed to be the main direction of the conventional medical slice, N slices are cut in the direction which is selected to be the auxiliary direction from the direction which is vertical to the main direction, M + N slices can be obtained from one patient in total, wherein M and N are supposed to be 10, and MRI slice images of a plurality of patients are obtained to obtain a required liver slice data set { A }.

And 2, preprocessing the liver slice data set { A }, and then requesting a qualified medical expert to label cancer cells for partial pictures in the liver slice data set { A }, so that the slice data set { A } is divided into a data set { X } with cancer cell labels and a data set { Y } without labels.

Specifically, the data in the dirty slice data set { a } is normalized, and since the MRI image is a grayscale image, the image has only one channel, and the image normalization processing method is as follows:

wherein,

all image data matrices representing the dirty slice data set { A } before non-normalization processing, subscripts i and j representing rows and columns of the image data matrices, and subscript k representing differencesI and J denote the number of pixels of the matrix in the rows and columns, K denotes the total number of images, MaxP denotes the maximum value of all pixels, MinP denotes the minimum value of all pixels, P denotes the minimum value of all pixels_i,j,kAll image data matrices representing the normalized dirty slice dataset a.

The annotation image for the data set { X } is also normalized using the method described above.

And 3, constructing and training a Densenet network, wherein the Densenet network comprises a feature extraction module, a full connection layer and a Softmax function which are sequentially connected, each picture in the slice data set { A } is divided into a plurality of picture blocks, the divided picture blocks are input to the Densenet network in a disorderly sequence to be trained, and the Densenet network outputs the position information of each picture block in the whole image.

Specifically, an upstream learning task of the self-supervision learning is constructed for the liver slice data set { A }, a neural network is constructed to complete the task, and a 'jigsaw puzzle' task is set as the upstream task of the self-supervision learning. The slice dataset { A } is as per the training set: the verification set is divided by 4. For each picture in the liver slice data set { A }, the picture is divided into 9 blocks, and the 9 blocks are respectively marked with numbers 1-9 from the upper left corner to the lower right corner of the picture, and the label is an artificial label set by self-supervised learning and is represented by a matrix of 1x9, such as the number 3, namely 1 in the first bit of the matrix, and the rest bits are filled with 0, as represented by label (3).

label(1)＝[1,0,0,0,0,0,0,0,0]

Referring to fig. 2, the structure of the densenert network is: firstly, a 3x3 convolutional layer, then, a three-time stacking of a Denseblock module and a Transition module, and then, a 3x3 convolutional layer, wherein the network layer is a feature extraction network, and after the feature extraction network, a full connection layer FC1 and a Softmax function are connected for classification prediction.

As shown in FIG. 3, the dense network connection of the Denseblock module includes n convolution blocks, and the Input for the nth convolution block_nThe calculation is as follows:

wherein, X₀Representing the original input, X_iRepresents the output of the ith convolution block, i being 1 to n-1.

Referring to fig. 4, the structure of the volume block is a convolution of 1x1 and 3x3, and a BatchNorm layer and an activation function ReLu layer are introduced before the convolution to improve the data distribution rule, accelerate the network convergence speed and solve the problem of gradient explosion.

Where the convolution output channel depth of 1x1 is 48 and the convolution output channel depth of 3x3 is 12.

The number of the volume blocks in the Densblock can be freely set, and the number of the volume blocks in Densblock _ 1-3 is 16.

Referring to fig. 5, the Transition module BatchNorm layer, ReLu layer, 1x1 convolution layer, and 2x2 average pooling layer are used to down-sample the picture in half.

The training method of the Densenet network comprises the following steps of inputting an image, cutting the image into 9 blocks, adding artificial labels 1-9, randomly disorganizing the sequence of the 9 blocks of images, inputting the images into the constructed Densenet network for learning and training, outputting a predicted value of each image block by the Densenet network, wherein the predicted value is given by the Densenet network, for example, the actual label is an image block with 8, when the Densenet network predicts correctly, the output numerical value is 8, and the prediction error is other numerical values.

The Densenet network training adopts an SGD gradient descent optimization algorithm, a cross entropy function is used for calculating a loss function, a strategy of self-adaption updating of the network learning rate is adopted in the training process, and when the training error does not have obvious updating after repeated iteration, the learning rate is reduced.

The "jigsaw task" is equivalent to a nine-class learning task, and the loss function for one MRI image input is calculated as follows:

where m is the number of image segments, where m is 9, n is the number of categories, n is 9, label_i,jIndicating the class i, predictor to which the image block j belongs_i,jRepresenting the probability that the image block j predicted by the network belongs to class i.

The DenseNet is used as a self-supervision learning upstream task, namely, a jigsaw puzzle task is trained, a trained DenseNet is obtained, and the feature extraction part of the network is used as a feature extraction part for liver cancer identification.

And 4, constructing a dense convolutional neural network, which comprises a coding module and a decoding module, wherein the coding part is a feature extraction network, and the trained DenseNet network is used as the coding module for feature extraction.

The decoding module is formed by stacking the Up-sampling module and the Denseblock module for three times, and the tail end of decoding is connected with the 1x1 convolution layer and the Softmax function to carry out semantic segmentation on the picture.

The full connection layer of the DenseNet network and the Softmax function make an overall judgment on the picture,

the method comprises the steps that a full connection layer FC2 and a Softmax function are connected to the tail of an encoding module to judge a picture integrally, Densblock _ 1-3 of an encoding portion and Densblock _ 4-6 of a decoding portion are connected densely, and outputs of Densblock _ 1-3 are scaled to the same size and input Densblock _ 4-6. The network intensive connection mode of the coding module and the decoding module plays a role of information supplement, semantic information is supplemented at a high layer, a segmented outline is refined at a bottom layer, and the final image semantic segmentation effect can be improved.

The Up-sampling layer of the decoding module is an upsampling layer and is used for amplifying the size of the picture, reducing the number of channels, enabling the size of the picture to be consistent with that of the input picture after the last Up-sampling layer is passed through, and enabling the number of convolution blocks in Densblock _ 4-6 to be 6.

And 5, dividing the data set { X } with the cancer cell mark into a training set and a verification set according to a proportion, dividing each picture in the data set { X } with the cancer cell mark into a plurality of picture blocks, and training the dense convolutional neural network by taking the picture blocks of the training set as the input of the dense convolutional neural network.

Specifically, a data set { X } with liver cancer labels is used for downstream task training of the self-supervised learning. The data set { X } is expressed as a training set: dividing the verification set into 4, dividing each image in the training set into 9 blocks, using each divided image as the input of the coding module, classifying each image block integrally at the end of the coding part, the block image has labeled cancer cell pixels, is judged to be 1, if not, the block image is judged to be 0, the image judged to be 1 is calculated by a subsequent decoding module, the semantic segmentation of the image is realized by performing pixel-level binary judgment through a Softmax function, the semantic segmentation result of the block image is obtained, the image block judged to be 0 is not subjected to the calculation of a decoding part, the block image is directly output, the decoding module finally restores the output results of 9 blocks of images according to the segmentation sequence, and the semantic segmentation result of the whole original image is obtained, the image result marks the position area of the liver cancer cell, and the liver cancer cell identification result of the whole MRI image is obtained in time.

In the network training process, an SGD gradient descent optimization algorithm is used, a cross entropy function is used for calculating a loss function, a strategy of self-adaptively updating the network learning rate is adopted in the network training process, and when the training error is not obviously updated after repeated iteration, the learning rate is reduced. The loss function of the present network comprises two parts: global judged classification loss function L₁And loss function L of pixel judgment₂And monitoring the change of the two during training.

The overall classification is a binary classification problem, and the loss function calculation formula for each image after image segmentation is as follows:

L₁＝-[label·log(predict)+(1-label)·log(1-predict)]

the prediction is the probability of labeling cancer cells in a model prediction image, the label is a sample label, if the image contains cancer cells, the value is 1, and if not, the value is 0.

The pixel judgment is equivalent to a binary problem for each pixel point, so that the loss function calculation formula of each small image is the sum of the loss functions of the single pixel point, and the formula is as follows:

wherein n represents the total number of pixels of the image, predict_iIs the probability that the ith pixel point of the model prediction image belongs to the cancer cell, label_iAnd if the pixel point belongs to the cancer cell, the value is 1, and if not, the value is 0.

And after the dense convolutional neural network is trained, obtaining a neural network model for liver tumor identification, carrying out automatic identification and division on the liver cancer cell region of the unmarked MRI image by using the dense convolutional neural network, and using the division result for post-processing of medical software.

The self-supervised learning is a full-supervised learning mode, a training task is preset by utilizing the characteristics of a data sample, so that a neural network can learn useful representation of unlabeled data, and then the training of a target task is completed, so that the learning can be effectively strengthened.

In conclusion, the invention provides a dense convolutional neural network based on self-supervision for liver tumor identification, a 'jigsaw' task is set as an upstream training task of self-supervision, useful expressions are learned from a large number of images without medical marking, and the upstream training task is used for learning and training of a downstream target task, so that the purposes of automatically expanding training data samples, reducing the dependence on expert experience and historical data and improving the identification accuracy of liver focus areas are achieved.

The above-mentioned contents are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modification made on the basis of the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A liver tumor identification method based on an automatic supervision dense convolutional neural network is characterized by comprising the following steps:

2. The liver tumor identification method based on the self-supervision dense convolutional neural network of claim 1, wherein in step 1, the slice is performed in two cross directions, and the two directions are perpendicular to each other.

3. The liver tumor identification method based on the self-supervision dense convolutional neural network as claimed in claim 1, wherein the slice data set { a } is preprocessed in step 2 into an image normalization processing, and the image normalization processing method is as follows:

wherein,

4. The liver tumor identification method based on the unsupervised dense convolutional neural network as claimed in claim 1, wherein the feature extraction module in step 3 comprises a convolutional layer, a plurality of alternately connected Densblock modules and Transition modules, the convolutional layer is connected with the first Densblock module, and the last Densblock module is connected with a convolutional layer.

5. The liver tumor identification method based on the self-supervision dense convolutional neural network as claimed in claim 4, wherein the dense convolutional network in step 3 adopts SGD gradient descent optimization algorithm, uses cross entropy function to calculate loss function, and adopts strategy of self-adaptive updating network learning rate in training process.

6. The liver tumor identification method based on the self-supervision dense convolutional neural network of claim 5, wherein the loss function is calculated as follows:

7. The liver tumor identification method based on the self-supervision dense convolutional neural network as claimed in claim 1, wherein the dense convolutional neural network uses SGD gradient descent optimization algorithm, uses cross entropy function to calculate the loss function, adopts the strategy of self-adapting updating network learning rate in the training process, and the loss function of the self-supervision learning network comprises the classification loss function L of integral judgment₁And loss function L of pixel judgment₂And monitoring the change of the two during training.

8. The liver tumor identification method based on the self-supervision dense convolutional neural network as claimed in claim 1, wherein the classification loss function L of the overall judgment₁Is calculated as follows:

L₁＝-[label·log(predict)+(1-label)·log(1-predict)]

loss of pixel determination per tileFunction L₂The method is the summation of loss functions of a single pixel point, and the formula is as follows: