CN111275720B

CN111275720B - Full end-to-end small organ image identification method based on deep learning

Info

Publication number: CN111275720B
Application number: CN202010066775.4A
Authority: CN
Inventors: 龚薇; 斯科; 薛颖
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2022-05-17
Anticipated expiration: 2040-01-20
Also published as: CN111275720A

Abstract

The invention discloses a full end-to-end small organ image identification method based on deep learning. The method comprises the steps of establishing a medical image data set, screening effective images, establishing a neural network, training the neural network and fusing images; the neural network establishes a sub-network comprising the following in series: the organ screening network and the organ segmenting network form a series multistage convolution neural network by the organ screening network and the organ segmenting network. The method can simply and efficiently realize the accurate identification of the small organs, has wide application range and reduces manual operation.

Description

Full end-to-end small organ image identification method based on deep learning

Technical Field

The invention relates to a medical image identification method in the technical field of image identification, in particular to a full end-to-end small organ image identification method based on deep learning.

Background

With the establishment of big data and the remarkable enhancement of computing power, deep learning has been greatly successful in the field of organ image recognition in recent years. However, in the face of complicated medical image data and different kinds of organs to be identified, the existing organ image identification method has two defects:

1. the identification of small organs is very difficult. Compared with large organs such as lung and liver, pancreas, intestinal tract, spleen, stomach, kidney and the like, which are typical small organs, occupy a small space in a single medical image, are not obvious in characterization, and it is difficult to obtain information of only the small organs by using an algorithm.

2. Since clinical medical images contain different angiographic sessions, imaging planes and interference images. The existing organ identification method needs to manually process images in advance to screen effective images, and then the effective images are input into a deep learning model for identification, so that the labor consumption is high, and the efficiency is low.

Disclosure of Invention

In order to solve the problems in the background art, the invention aims to provide a full end-to-end small organ image identification method based on deep learning, so that clinical medical images can be directly analyzed, accurate identification of small organs can be simply and efficiently realized, the application range is wide, and manual operation is reduced.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

the invention comprises the following steps which are carried out in sequence: establishing a medical image data set, screening effective images, establishing a neural network, training the neural network and fusing images; the neural network establishes a sub-network comprising the following in series: the organ screening network and the organ segmenting network form a series multistage convolution neural network by the organ screening network and the organ segmenting network.

The organ screening network comprises a convolutional layer Conv, a linear rectification activation function ReLU, a maximum pooling layer Maxpool, an average pooling layer AvgPool, a batch normalization layer BN, a discarding layer Dropout, a full connection layer FC, an identity mapping residual Block ID Block and a down-sampling residual Block DS Block; the result after the effective image screening is an arterial phase image of a coronary surface, and a convolution pooling module, a first identity residual Block ID Block, a second identity residual Block ID Block, three continuous mixed residual Block modules and a pooling module are sequentially connected from an input end to an output end; inputting the image after effective image screening into a convolution pooling module, outputting a result of whether the image contains a small organ or not after sequentially passing through a first identity mapping residual Block ID Block, a second identity mapping residual Block ID Block, a mixed residual Block module and a pooling module by the convolution pooling module, and then taking the image containing the small organ as the input of a subsequent organ segmentation network; the convolution pooling module is formed by sequentially connecting a convolution layer Conv, a linear rectification activation function ReLU and a maximum pooling layer MaxPool, and the mixed residual Block module is formed by sequentially connecting a first down-sampling residual Block DS Block, a third identity mapping residual Block ID Block, a second down-sampling residual Block DS Block, a fourth identity mapping residual Block ID Block, a third down-sampling residual Block DS Block and a fifth identity mapping residual Block ID Block; the first to fifth identity mapping residual Block ID blocks have the same structure, and are formed by sequentially connecting a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, a discard layer Dropout, a second batch normalization layer BN, a second linear rectification activation function ReLU and a second convolution layer Conv, wherein the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv for merging connection; the first to third downsampling residual Block DS Block have the same structure, and are formed by sequentially connecting a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, a discard layer Dropout, a second batch normalization layer BN, a second linear rectification activation function ReLU, and a second convolution layer Conv, and meanwhile, the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv through the third convolution layer Conv to be combined and connected; the difference between the downsampling residual Block DS Block and the identity residual Block ID Block is that the input of the first bulk normalization layer BN is additionally connected to the second convolutional layer Conv via a third convolutional layer Conv. The pooling module is formed by sequentially connecting an average pooling layer AvgPool and a full connection layer FC.

The organ segmentation network comprises a volume Block Conv Block, a volume layer Conv, a deconvolution layer Trans Conv, a linear rectification activation function ReLU, a maximum pooling layer MaxPool, a batch normalization layer BN and a discarding layer Dropout; the image output by the organ screening network is input into a convolution pooling discarding module, and the output of the convolution pooling discarding module sequentially passes through a fifth convolution Block Conv Block, a deconvolution discarding module and a convolution module and then outputs a binary image of a small organ area; the convolution pooling discarding module is formed by sequentially connecting a first convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a second convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a third convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a fourth convolution Block Conv Block, a maximum pooling layer Maxpool and a discarding layer Dropout; the deconvolution discarding module is mainly formed by sequentially connecting a first deconvolution layer Trans Conv, a discarding layer Dropout, a sixth convolution Block Conv Block, a second deconvolution layer Trans Conv, a discarding layer Dropout, a seventh convolution Block Conv Block, a third deconvolution layer Trans Conv, a discarding layer Dropout, an eighth convolution Block Conv Block, a fourth deconvolution layer Trans Conv, a discarding layer Dropout, and a ninth convolution Block Conv Block, and simultaneously, the output of the first convolution Block Conv Block of the convolution pooling discarding module and the output of the first discarding layer Dropout of the deconvolution discarding module are subjected to pixel level superposition and then input to the sixth convolution Block Conv Block of the deconvolution discarding module; the output of the second convolution Block Conv Block of the convolution pooling discarding module and the output of the second discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to a seventh convolution Block Conv Block of the deconvolution discarding module; the output of the third convolution Block Conv Block of the convolution pooling discarding module and the output of the third discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to an eighth convolution Block Conv Block of the deconvolution discarding module; the output of the fourth convolution Block Conv Block of the convolution pooling discarding module and the output of the fourth discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to the ninth convolution Block Conv Block of the deconvolution discarding module; the structures of the first to ninth convolution blocks Conv Block are the same except the number of convolution kernels of the convolution layers, and are formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting one convolution layer Conv, one batch normalization layer BN and one linear rectification activation function ReLU; the structures of the first to fourth deconvolution layers Trans Conv are the same except that the number of convolution kernels is different; the convolution module is formed by sequentially connecting a convolution layer Conv and a batch normalization layer BN.

And the image fusion is to specifically superpose the image of the original medical image and the binarized image of the small organ region output by the organ segmentation network pixel by pixel to generate a final small organ identification image.

The organ screening network is used for accurately identifying the position of a diagnosed organ in a medical image and screening out a medical image containing the diagnosed organ.

The organ segmentation network is used for extracting medical images of the positions of the diagnosed organs and eliminating interference information of other organs.

The effective image screening specifically comprises the steps of selecting an imaging image of a cross section from a plurality of images, removing the imaging images of a coronary plane and a sagittal plane and a text information report image, and screening an arterial phase image from the imaging image of the cross section.

The effective image screening firstly eliminates the interference image which is ineffective for diagnosis, such as a character edition information report; and for the medical images which can be shot only after the contrast agent is injected, the arterial phase screening is needed to screen out the arterial phase images.

The neural network training comprises the following steps:

constructing an optimizer;

constructing a loss function;

known image data and labels are used to train hyper-parameters such as batch size, learning rate.

The neural network training takes real data as a label, and uses a back propagation algorithm to train a convolutional neural network to establish the relationship between the medical image and the small organ area.

The medical image data set establishes medical image pictures including, but not limited to, X-ray film, CT, MRI, PET, and the like.

The small organs include pancreas, intestine, spleen, stomach, kidney, etc., but are not limited thereto.

The amplifier features are obvious and easy to distinguish in medical images, and are suitable for being used as the input of a convolutional neural network; the small organ is not obvious in characterization in the medical image, the algorithm is difficult to obtain the characteristics and the information of the small organ, and the common convolutional neural network is difficult to process and identify. Since the identification processing can be performed for small organs, the adaptation processing can be performed for large organs such as liver and lung.

The invention realizes full end-to-end recognition, namely, the trained algorithm is directly output without image processing or screening by others.

The invention has the advantages that:

the invention can not only identify and segment the large organs which are easy to be distinguished in the medical image, but also identify and segment the small organs which are difficult to be processed and have unobvious characteristics;

the method of the invention can output the identification and segmentation result of the small organ by directly inputting the medical image without image processing or screening by others.

Drawings

FIG. 1 is a flowchart of the present invention for pancreatic image recognition;

FIG. 2 is a diagram of the pancreas screening network structure and training results of the present invention;

FIG. 3 is a diagram of the pancreas segmentation network structure and training results of the present invention.

Detailed Description

The invention is explained in more detail below with reference to a pancreas image recognition example, without restricting the invention in any way.

As shown in fig. 1, the embodiment of the present invention and the implementation process thereof specifically include:

1. establishing an abdominal cavity CT image data set;

collecting clinical abdominal cavity CT flat scanning big data from a hospital, wherein about 90000 original 8-bit CT images of more than 300 persons are collected in total, and simultaneously marking whether the images contain pancreas and pancreas regions or not for all the images to be respectively used as labels of a pancreas screening network and a pancreas segmentation network; the specific process of marking whether the pancreas is contained in the image is as follows: "yes" corresponds to the label being represented by the two-dimensional vector [0,1], and "no" corresponds to the label being represented by the two-dimensional vector [1,0 ]; the specific process of labeling the pancreatic region in the image is as follows: whether each pixel in the image belongs to a pancreatic region or not is marked, a pixel corresponding label belonging to the pancreatic region is represented by a two-dimensional vector [0,1], and a pixel corresponding label not belonging to the pancreatic region is represented by a two-dimensional vector [1,0], so that the abdominal cavity CT image dataset is established.

2. And (3) effective image screening:

the built abdominal cavity CT image data set is complex in covered image and comprises the following steps: (1) different imaging planes, such as transverse, coronal, and sagittal planes; (2) different angiographic phases, such as a no-contrast phase, an arterial phase and a venous phase; (3) interfering images, such as patient reports. Since the cross section contains most pancreatic image information and the arterial phase has richer details of blood vessels and tissues and organs, the invention selects the cross section image at the arterial phase, and removes the images and text information report images of other imaging planes and angiography phases.

The images in the abdominal cavity CT image data set comprise personal information attributes such as patient name, patient age, patient sex and the like, and image attributes such as image size, shooting sequence time, shooting sequence description and the like; the specific steps of effective image screening are as follows: firstly, screening out a cross-sectional image according to the 'image size' of 512 by 512 and removing other imaging plane images and text information report images; then, screening out an artery image according to the attribute of the shooting sequence description, namely 'artery'; specifically, images with the "shooting sequence description" attribute empty are sorted by the "shooting sequence time", and an image at the third stage is selected as an arterial stage image.

After effective images are screened out, the gray value of the images is converted into a Hounsfield (HU) value, the contrast of the images is improved, and the pancreatic details are more obvious, and the method specifically comprises the following steps: firstly, reading a RescaleSlope value and a RescaleIntercept value of an image, and converting a gray value pixel of each pixel in the image into a HU value according to a formula HU (pixel multiplied by RescaleSlope + RescaleIntercept); then, setting HU values lower than 0 as 0, and setting HU values higher than 150 as 150; finally according to the formula

And normalizing the HU value of each pixel to be within the range of 0-255.

3. Pancreas screening:

the pancreas screening network is used for screening out images containing pancreas; the method comprises the steps of constructing and training a pancreas screening network with 18 layers, wherein the specific structure is shown in fig. 2(a-b), and the pancreas screening network comprises a convolutional layer Conv, a linear rectification activation function ReLU, a maximum pooling layer MaxPool, an average pooling layer Avgpool, a batch normalization layer BN, a discarding layer Dropout, a full connection layer FC, an identity mapping residual Block ID Block and a downsampling residual Block DS Block;

inputting 512 × 1 images obtained after effective image screening into a convolution pooling module, wherein the output of the convolution pooling module sequentially passes through a first identity mapping residual Block ID Block, a second identity mapping residual Block ID Block, a mixed residual Block module and a pooling module and then outputs a two-dimensional probability vector [ m, n ]; if n > m, the pancreas is contained in the image, and the image containing the pancreas is used as the input of a subsequent pancreas segmentation network;

the convolution pooling module is formed by sequentially connecting 64 convolution kernels with the size of 7 × 7 by one convolution layer Conv in a completion mode, a linear rectification activation function ReLU and a 3 × 3 maximum pooling layer MaxPool, and the mixed residual Block module is formed by sequentially connecting a first downsampling residual Block DS Block, a third identity mapping residual Block ID Block, a second downsampling residual Block DS Block, a fourth identity mapping residual Block ID Block, a third downsampling residual Block DS Block and a fifth identity mapping residual Block ID Block;

the first identity mapping residual Block ID Block and the second identity mapping residual Block ID Block have the same structure, and are respectively formed by a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, 64 convolution kernels with the size of 3 x 3, a completion mode is adopted, a discard layer Dropout with the parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU, a second convolution layer Conv and 64 convolution kernels with the size of 3 x 3 are sequentially connected in the completion mode, and meanwhile, the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv for combination and connection;

the third identity mapping residual Block ID Block is formed by sequentially connecting 128 convolution kernels with the size of 3 × 3 in total, namely a first batch normalization layer BN, a first linear rectification activation function ReLU and a first convolution layer Conv in a completion mode, a discarded layer Dropout with the parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU and a second convolution layer Conv in the completion mode, and 128 convolution kernels with the size of 3 × 3 in total, wherein the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv for combination connection;

the fourth identity mapping residual Block ID Block is formed by sequentially connecting 256 convolution kernels with the size of 3 × 3 in total by a first batch normalization layer BN, a first linear rectification activation function ReLU and a first convolution layer Conv in a completion mode, a discard layer Dropout with the parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU and a second convolution layer Conv in total by 256 convolution kernels with the size of 3 × 3 in a completion mode, and meanwhile, the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv for combination connection;

the fifth identity mapping residual Block ID Block is formed by sequentially connecting 512 convolution kernels with a size of 3 × 3 in total, in a completion manner, a discard layer Dropout with a parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU, a second convolution layer Conv, and 512 convolution kernels with a size of 3 × 3 in total, in a completion manner, and an input of the first batch normalization layer BN is connected to an output of the second convolution layer Conv for combination and connection.

The five identity mapping residual blocks ID Block are identical except for the number of convolution kernels in the first and second convolution layers.

The first downsampling residual Block DS Block is formed by sequentially connecting 128 convolution kernels with the size of 3 × 3 in total, namely a first batch normalization layer BN, a first linear rectification activation function ReLU and a first convolution layer Conv, in a completion mode, a discard layer Dropout with the parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU and a second convolution layer Conv, and 128 convolution kernels with the size of 3 × 3 in total, in a completion mode, and meanwhile, the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv in a combination mode through the third convolution layer Conv and the 128 convolution kernels with the size of 3 × 3 in total;

the second downsampling residual Block DS Block is formed by sequentially connecting 256 convolution kernels with the size of 3 × 3, namely a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, in a completion mode, a discard layer Dropout with the parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU, a second convolution layer Conv, and 256 convolution kernels with the size of 3 × 3, in a completion mode, and meanwhile, the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv in a combination mode through the third convolution layer Conv and the 256 convolution kernels with the size of 3 × 3 in a completion mode;

the third downsampling residual Block DS Block is formed by sequentially connecting 512 convolution kernels with the size of 3 × 3 in total, a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, and a complementary manner, a discard layer Dropout with the parameter of 0.4, a second batch normalization layer BN, a second linear rectification activation function ReLU, a second convolution layer Conv, and 512 convolution kernels with the size of 3 × 3 in total, in a complementary manner, and the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv in a combined manner through the third convolution layer Conv and 512 convolution kernels with the size of 3 × 3 in total in a complementary manner.

The three downsampled residual blocks differ in the number of convolution kernels of the convolutional layers.

The pooling module is formed by sequentially connecting 7 × 7 average pooling layers AvgPool and full connection layers FC with the number of output neurons being 2;

designing a hyper-parameter: batch size 32, initial learning rate 1 × 10^-3Decreasing in size at 1/20 for each 50 iterations;

designing an optimizer: adam;

designing a loss function: cross entropy;

training a pancreas screening network by using the images after effective image screening and the labels whether the images correspond to the pancreas or not through a back propagation algorithm, and continuously optimizing the weight and bias of the neurons until the loss value of the network is converged to the minimum, wherein the training result is shown in fig. 2 (c); a training curve for the pancreas screening network is drawn: line (r) represents the loss value (left Y-axis); lines II and III respectively represent the training and verification accuracy (right Y axis), the indexes have large fluctuation in the early stage, and after 100 iterations, the loss value is reduced and finally stabilized at 4.93 multiplied by 10^-6Simultaneously training andthe verification accuracy is respectively improved to 0.996 and 0.966.

4. Pancreas segmentation:

the pancreas segmentation network is used for segmenting and extracting pancreas images; a pancreas segmentation network with 32 layers is constructed and trained, and the specific structure is shown in fig. 3(a-b), and comprises a convolution Block Conv Block, a convolution layer Conv, a deconvolution layer Trans Conv, a linear rectification activation function ReLU, a maximum pooling layer MaxPool, a batch normalization layer BN and a drop layer Dropout;

inputting a 512 × 1 image output by the pancreas screening network into a convolution pooling discarding module, outputting whether each pixel point in a predicted image belongs to a pancreas region after sequentially passing through a fifth convolution Block Conv Block, a deconvolution discarding module and the convolution module by the convolution pooling discarding module, and outputting a binary image of the pancreas region;

the convolution pooling discarding module is formed by sequentially connecting a first convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a second convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a third convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a fourth convolution Block Conv Block, a maximum pooling layer Maxpool, and a discarding layer Dropout, and the deconvolution discarding module is formed by sequentially connecting a first deconvolution layer Trans Conv, a discarding layer Dropout, a sixth convolution Block Conv Block, a second deconvolution layer Trans Conv, a discarding layer Dropout, a seventh convolution Block Conv Block, a third deconvolution layer Trans Conv, a discarding layer Dropout, an eighth convolution Block Conv, a fourth deconvolution layer Trans, a discarding layer Dropout, and a ninth convolution Block Conv, simultaneously, pixel-level superposition is carried out on the output of the first convolution Block Conv Block of the convolution pooling discarding module and the output of the first discarding layer Dropout of the deconvolution discarding module, and then the output is input into a sixth convolution Block Conv Block of the deconvolution discarding module; the output of the second convolution Block Conv Block of the convolution pooling discarding module and the output of the second discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to a seventh convolution Block Conv Block of the deconvolution discarding module; the output of the third convolution Block Conv Block of the convolution pooling discarding module and the output of the third discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to an eighth convolution Block Conv Block of the deconvolution discarding module; the output of the fourth convolution Block Conv Block of the convolution pooling discarding module and the output of the fourth discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to the ninth convolution Block Conv Block of the deconvolution discarding module;

the first convolution Block Conv Block and the ninth convolution Block Conv Block have the same structure and are formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting 16 convolution kernels with the size of 3 × 3 and a completion mode, a batch normalization layer BN and a linear rectification activation function ReLU;

the second convolution Block Conv Block and the eighth convolution Block Conv Block have the same structure and are formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting 32 convolution kernels with the size of 3 × 3, a completion mode, a batch normalization layer BN and a linear rectification activation function ReLU, wherein each convolution activation unit is formed by connecting one convolution layer Conv;

the third convolution Block Conv Block and the seventh convolution Block Conv Block have the same structure and are formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting 64 convolution kernels with the size of 3 × 3, a completion mode, a batch normalization layer BN and a linear rectification activation function ReLU, wherein each convolution activation unit is formed by connecting one convolution layer Conv;

the fourth convolution Block Conv Block and the sixth convolution Block Conv Block have the same structure and are formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting 128 convolution kernels with the size of 3 × 3, one completion mode, one batch normalization layer BN and one linear rectification activation function ReLU, wherein each convolution activation unit is formed by connecting one convolution layer Conv;

the fifth convolution Block Conv Block is formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting 256 convolution kernels with the size of 3 × 3 and a completion mode, a batch normalization layer BN and a linear rectification activation function ReLU;

a first deconvolution layer Trans Conv, which is composed of 256 convolution kernels with the size of 2 x 2 and adopts a non-completion mode;

a second deconvolution layer Trans Conv, 128 convolution kernels with the size of 2 x 2 in total, and a non-completion mode is adopted;

a third deconvolution layer Trans Conv, which is composed of 64 convolution kernels with the size of 2 x 2 and adopts a non-completion mode;

a fourth deconvolution layer Trans Conv, 32 convolution kernels with the size of 2 x 2 in total, and a non-completion mode is adopted;

the kernel of the maximum pooling layer MaxPool is 2 × 2, and the parameter of the discarding layer Dropout is 0.4;

the convolution module is formed by sequentially connecting 2 convolution kernels with the size of 1 x 1 in a non-completion mode and a batch normalization layer BN;

designing a hyper-parameter: batch size 2, initial learning rate 1 × 10^-3Decreasing in size at 1/10 every 20 iterations;

designing an optimizer: adam;

designing a loss function: cross entropy;

training a pancreas segmentation network by using an image output by a pancreas screening network and a pancreas region label corresponding to the image through a back propagation algorithm, and continuously optimizing the weight and bias of a neuron until the loss value of the network is converged to the minimum, wherein the training result is shown in fig. 3 (c); a training curve of the pancreas segmentation network is drawn: line (r) represents the loss value (left Y-axis), which decreases and eventually stabilizes at 1.54 × 10 after 50 iterations^-3(ii) a Quantitatively evaluating the accuracy of the pancreas segmentation effect by using an average Intersection over Union (MIoU) and a Dice coefficient (right Y-axis); lines two and four indicate that after 50 iterations, the trained and validated MioU is increased to 0.969 and 0.882, respectively; lines tri and v indicate that the Dice coefficients of training and verification are increased to 0.968 and 0.837, respectively, after 50 iterations. Three pancreas-screened images were randomly selected as an example for visualizing the segmentation result; three columns in FIG. 3(d) from left to right are the original abdominal CT image, the truly labeled binarized image of the pancreatic region, and the pancreas divided by the pancreatic division networkAnd (3) carrying out binarization on the glandular region image, and finding that the image segmented by the pancreas segmentation network keeps high similarity with the real annotation image, thereby showing that the segmentation effect is good.

5. Image fusion:

and carrying out image fusion processing after pancreas segmentation, and carrying out pixel-by-pixel superposition on the original abdominal cavity CT image and the pancreas region binary image segmented by the pancreas segmentation network to generate a pancreas region image with pancreas texture information as a final pancreas identification image.

In specific implementation, the result after image fusion is further utilized to carry out medical analysis and judgment, and the result of whether pancreatic lesions exist is obtained. For example, the result after the image fusion processing can be input into a pancreatic lesion prediction network to output the result of whether the pancreas is diseased;

the results show that the method can realize accurate identification and segmentation of the pancreas, improve the efficiency of image processing, reduce manual operation and provide accurate basis for assisting scientific judgment.

Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

Claims

1. A full end-to-end small organ image recognition method based on deep learning is characterized in that:

comprises the following steps which are carried out in sequence: establishing a medical image data set, screening effective images, establishing a neural network, training the neural network and fusing images; the neural network establishes a sub-network comprising the following in series: the organ screening network and the organ segmentation network form a serial multistage convolution neural network;

the organ screening network comprises a convolutional layer Conv, a linear rectification activation function ReLU, a maximum pooling layer Maxpool, an average pooling layer AvgPool, a batch normalization layer BN, a discarding layer Dropout, a full connection layer FC, an identity mapping residual Block ID Block and a down-sampling residual Block DS Block; inputting the image after effective image screening into a convolution pooling module, outputting a result of whether the image contains a small organ or not after sequentially passing through a first identity mapping residual Block ID Block, a second identity mapping residual Block ID Block, a mixed residual Block module and a pooling module by the convolution pooling module, and then taking the image containing the small organ as the input of a subsequent organ segmentation network; the convolution pooling module is formed by sequentially connecting a convolution layer Conv, a linear rectification activation function ReLU and a maximum pooling layer MaxPool, and the mixed residual Block module is formed by sequentially connecting a first down-sampling residual Block DS Block, a third identity mapping residual Block ID Block, a second down-sampling residual Block DS Block, a fourth identity mapping residual Block ID Block, a third down-sampling residual Block DS Block and a fifth identity mapping residual Block ID Block; the first to fifth identity mapping residual Block ID blocks have the same structure, and are formed by sequentially connecting a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, a discard layer Dropout, a second batch normalization layer BN, a second linear rectification activation function ReLU and a second convolution layer Conv, wherein the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv for merging connection; the first to third downsampling residual Block DS Block have the same structure, and are formed by sequentially connecting a first batch normalization layer BN, a first linear rectification activation function ReLU, a first convolution layer Conv, a discard layer Dropout, a second batch normalization layer BN, a second linear rectification activation function ReLU, and a second convolution layer Conv, and meanwhile, the input of the first batch normalization layer BN is connected to the output of the second convolution layer Conv through the third convolution layer Conv to be combined and connected; the pooling module is formed by sequentially connecting an average pooling layer AvgPool and a full connection layer FC;

the organ segmentation network comprises a volume Block Conv Block, a volume layer Conv, a deconvolution layer Trans Conv, a linear rectification activation function ReLU, a maximum pooling layer MaxPool, a batch normalization layer BN and a discarding layer Dropout; the image output by the organ screening network is input into a convolution pooling discarding module, and the output of the convolution pooling discarding module sequentially passes through a fifth convolution Block Conv Block, a deconvolution discarding module and a convolution module and then outputs a binary image of a small organ area; the convolution pooling discarding module is formed by sequentially connecting a first convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a second convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a third convolution Block Conv Block, a maximum pooling layer Maxpool, a discarding layer Dropout, a fourth convolution Block Conv Block, a maximum pooling layer Maxpool and a discarding layer Dropout; the deconvolution discarding module is mainly formed by sequentially connecting a first deconvolution layer Trans Conv, a discarding layer Dropout, a sixth convolution Block Conv Block, a second deconvolution layer Trans Conv, a discarding layer Dropout, a seventh convolution Block Conv Block, a third deconvolution layer Trans Conv, a discarding layer Dropout, an eighth convolution Block Conv Block, a fourth deconvolution layer Trans Conv, a discarding layer Dropout, and a ninth convolution Block Conv Block, and simultaneously, the output of the first convolution Block Conv Block of the convolution pooling discarding module and the output of the first discarding layer Dropout of the deconvolution discarding module are subjected to pixel level superposition and then input to the sixth convolution Block Conv Block of the deconvolution discarding module; the output of the second convolution Block Conv Block of the convolution pooling discarding module and the output of the second discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to a seventh convolution Block Conv Block of the deconvolution discarding module; the output of the third convolution Block Conv Block of the convolution pooling discarding module and the output of the third discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to an eighth convolution Block Conv Block of the deconvolution discarding module; the output of the fourth convolution Block Conv Block of the convolution pooling discarding module and the output of the fourth discarding layer Dropout of the deconvolution discarding module are subjected to pixel-level superposition and then input to the ninth convolution Block Conv Block of the deconvolution discarding module; the structures of the first to ninth convolution blocks Conv Block are the same except the number of convolution kernels of the convolution layers, and are formed by connecting three convolution activation units which are continuously connected, wherein each convolution activation unit is formed by sequentially connecting one convolution layer Conv, one batch normalization layer BN and one linear rectification activation function ReLU; the structures of the first to fourth inverse convolution layers Trans Conv are the same except that the number of convolution kernels is different; the convolution module is formed by sequentially connecting a convolution layer Conv and a batch normalization layer BN.

2. The deep learning-based full-end-to-end small organ image recognition method according to claim 1, characterized in that: and the image fusion is to specifically superpose the image of the original medical image and the binarized image of the small organ region output by the organ segmentation network pixel by pixel to generate a final small organ identification image.

3. The deep learning-based full-end-to-end small organ image recognition method according to claim 1, characterized in that: the effective image screening specifically comprises the steps of selecting an imaging image of a cross section from a plurality of images, removing the imaging images of a coronary plane and a sagittal plane and a text information report image, and screening an arterial phase image from the imaging image of the cross section.

4. The deep learning-based full-end-to-end small organ image recognition method according to claim 1, characterized in that: the neural network training comprises the following steps:

constructing an optimizer;

constructing a loss function;

and training the hyper-parameters.

5. The deep learning-based full-end-to-end small organ image recognition method according to claim 1, characterized in that: the medical image data set is established by including but not limited to X-ray film, CT, MRI, PET medical image picture.

6. The deep learning-based full-end-to-end small organ image recognition method according to claim 1, characterized in that: the small organs comprise pancreas, intestinal tract, spleen, stomach and kidney.