CN113724263A

CN113724263A - Full convolution neural network model, image segmentation method and device

Info

Publication number: CN113724263A
Application number: CN202010456769.XA
Authority: CN
Inventors: 陈俊强; 杨溪; 吕文尔
Original assignee: Shanghai Weiwei Medical Technology Co ltd
Current assignee: Shanghai Weiwei Medical Technology Co ltd
Priority date: 2020-05-26
Filing date: 2020-05-26
Publication date: 2021-11-30

Abstract

The invention provides a full convolution neural network model, an image segmentation method, a device, electronic equipment and a readable storage medium, wherein after a target image to be segmented is obtained, the target image is segmented by adopting the pre-trained full convolution neural network model to obtain the segmented image. The full convolution neural network model is adopted for image segmentation, the image segmentation precision can be improved, meanwhile, the complex operation of man-machine interaction can be reduced, the image segmentation algorithm is strong in universality, an end-to-end algorithm process is realized, and a doctor can be better assisted to improve the diagnosis accuracy.

Description

Full convolution neural network model, image segmentation method and device

Technical Field

The invention relates to the technical field of image processing, in particular to a full convolution neural network model, an image segmentation method, a full convolution neural network device, electronic equipment and a readable storage medium.

Background

Cardiovascular diseases are the highest mortality diseases worldwide, and the morbidity and mortality rate rise year by year, seriously threatening the life health of human beings. Due to the complex structure of the heart, different parts of the heart have different characteristics, which makes it difficult for a physician to perform structural analysis of the heart on medical images of a patient. Therefore, it is important to accurately segment medical images of the heart, and accurate segmentation can provide high-quality heart structure information for doctors, and is helpful for quick diagnosis of the doctors.

The current heart image segmentation method comprises the following steps: (1) manual segmentation is carried out by means of experienced medical experts; (2) a method based on traditional model matching; (3) image or variable model based methods; (4) a segmentation method based on deep learning.

However, the existing heart image segmentation method has the following disadvantages:

(1) the results of manual heart segmentation methods are not only very diverse but also require a lot of time and effort;

(2) because the characteristics of different parts of the heart are different, the segmentation precision based on the model matching method is lower;

(3) the method based on the variable model requires user interaction to complete the segmentation, and has poor robustness and low segmentation precision;

(4) compared with the traditional method, the segmentation method based on deep learning has certain improvement on precision, but the convolution modules of most network models are the same, which can limit the further improvement of the segmentation precision.

Disclosure of Invention

The invention aims to provide a full convolution neural network model, an image segmentation method, a full convolution neural network device, an electronic device and a readable storage medium, which can improve the precision of an overall segmentation algorithm and effectively reduce the complicated operation of man-machine interaction.

To achieve the above object, the present invention provides an image segmentation method, comprising:

acquiring a target image to be segmented;

segmenting the target image by adopting a pre-trained full convolution neural network model to obtain a segmented image; the full convolution neural network model comprises a depth multi-scale fusion module, and the depth multi-scale fusion module is used for extracting and fusing features of different depths and scales of the input image.

Optionally, the depth multi-scale fusion module includes at least two parallel branches and a series branch, where convolution kernels of the parallel branches are different from each other in size, and the parallel branches are connected in parallel and then connected in series with the series branch.

Optionally, each parallel branch includes at least one convolutional layer and/or at least one convolutional module, and the convolutional module is formed by connecting a plurality of convolutional layers in parallel and/or in series.

Optionally, the number of the parallel branches is 3.

Optionally, the series branch comprises a convolutional layer with a size of 1 × 1 × 1.

Optionally, the full convolutional neural network model includes an encoding network and a decoding network;

the coding network comprises an input layer and n +1 cascaded first neural network groups, the first n first neural network groups comprise cascaded convolutional layers, a depth multi-scale fusion module and a pooling layer, and the n +1 first neural network group comprises a plurality of cascaded convolutional layers;

the decoding network comprises n cascaded second neural network groups, a convolutional layer and an output layer, wherein the 1 st second neural network group comprises a cascaded deconvolution layer, a merging layer and a convolutional layer, and the last n-1 second neural network groups comprise a cascaded deconvolution layer, a merging layer, a convolutional layer and a depth multi-scale fusion module;

the merging layer is used for carrying out linear addition merging on the output image of the deconvolution layer and the output image of the corresponding convolution layer in the coding network.

Optionally, the encoding network includes n +1 cascaded first residual connections, and the decoding network includes n cascaded second residual connections.

Optionally, the step of acquiring a target image to be segmented includes:

and acquiring an original image to be segmented, and preprocessing the original image to remove noise in the original image to obtain the target image.

In order to achieve the above object, the present invention further provides a full convolution neural network model, which includes a depth multi-scale fusion module, and the depth multi-scale fusion module is used for extracting and fusing features of different depths and scales of an input image.

Optionally, the number of the parallel branches is 3.

To achieve the above object, the present invention further provides an image segmentation apparatus, comprising:

the acquisition module is used for acquiring a target image to be segmented;

the segmentation module is used for segmenting the target image by adopting a pre-trained full convolution neural network model to obtain a segmented image;

the full convolution neural network model comprises a depth multi-scale fusion module, and the depth multi-scale fusion module is used for extracting and fusing features of different depths and scales of the input image.

Optionally, the depth multi-scale fusion module includes at least two parallel branches and a series branch, where convolution sizes of the parallel branches are different from each other, and the parallel branches are connected in parallel and then connected in series with the series branch.

Optionally, the number of the parallel branches is 3.

Optionally, the obtaining module is configured to: and acquiring an original image to be segmented, and preprocessing the original image to remove noise in the original image to obtain the target image.

To achieve the above object, the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the image segmentation method described above.

To achieve the above object, the present invention further provides a readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the image segmentation method described above.

Compared with the prior art, the full convolution neural network model provided by the invention has the following advantages: because the full convolution neural network model comprises the depth multi-scale fusion module, the depth multi-scale fusion module can extract and fuse the features of different scale depths of the input image, so that the feature results of the output image are more various, the feature modes are richer, and the precision of the integral segmentation algorithm is effectively improved.

The image segmentation method, the image segmentation device, the electronic equipment and the readable storage medium have the following advantages that: the method comprises the steps of obtaining a target image to be segmented, segmenting the target image by adopting a pre-trained full convolution neural network model to obtain a segmented image, wherein the full convolution neural network model adopted by the method comprises a depth multi-scale fusion module which can extract and fuse features of different scale depths of an input image, so that the output feature results are more various, the feature modes are richer, and the precision of the overall segmentation algorithm is effectively improved. The full convolution neural network model is adopted for image segmentation, the image segmentation precision can be improved, meanwhile, the complex operation of man-machine interaction can be reduced, the image segmentation algorithm is strong in universality, an end-to-end algorithm process is realized, and a doctor can be better assisted to improve the diagnosis accuracy.

Drawings

FIG. 1 is a schematic structural diagram of a depth multi-scale fusion module in a full convolution neural network model according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a full convolution neural network model according to an embodiment of the present invention;

FIG. 3 is a flowchart of an image segmentation method according to an embodiment of the present invention;

FIG. 4 is a specific example of a cardiac image to be segmented;

FIG. 5a is a segmentation result obtained by segmenting the cardiac image of FIG. 2 using the image segmentation method of the present invention;

FIG. 5b is a segmentation result obtained by segmenting the cardiac image of FIG. 2 using a prior art fully convolutional neural network model;

FIG. 6 is a schematic structural diagram of an image segmentation apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the invention.

Detailed Description

The present invention provides a full convolution neural network model, an image segmentation method, an apparatus, an electronic device and a readable storage medium, which are described in detail below with reference to the accompanying drawings. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The core idea of the invention is to provide a full convolution neural network model, an image segmentation method, a device, an electronic device and a readable storage medium, which can not only improve the precision of the whole segmentation algorithm, but also effectively reduce the complicated operation of man-machine interaction.

In the embodiment of the present invention, the image segmentation method and apparatus provided by the present invention are described by taking an organ image, for example, a heart image as an example, and the method and apparatus are not limited to segmentation of an organ image, but may also be applied to segmentation of other images. The image segmentation method of the embodiment of the present invention is applicable to the image segmentation apparatus of the embodiment of the present invention, which can be configured on an electronic device, wherein the electronic device can be a personal computer, a mobile terminal, and the like, and the mobile terminal can be a hardware device with various operating systems, such as a mobile phone, a tablet computer, and the like.

First, a full convolution neural network model provided by the present invention is introduced below. The full convolution neural network model comprises at least one depth multi-scale fusion module, and the depth multi-scale fusion module is used for extracting and fusing features of different depths and scales of an input image. The depth multi-scale fusion module extracts and fuses features of different depths and scales from the input image, so that the feature results output by the full convolution neural network model are more various, the feature modes are richer, and the image segmentation precision is effectively improved.

Preferably, the depth multi-scale fusion module includes at least two parallel branches and a series branch, wherein the convolution sizes of the parallel branches are different, and the parallel branches are connected in parallel and then connected in series with the series branch. In the invention, the convolution sizes of all parallel branches in the depth multi-scale fusion module are different, so that the depth multi-scale fusion module can extract image characteristic information by adopting the parallel branches with different depths and sizes, and then the characteristics with different scale depths are fused through the series branches, finally, the output characteristic structures are more various, the characteristic modes are more abundant, and the image segmentation precision is favorably improved.

Specifically, each parallel branch may include at least one convolutional layer and/or at least one convolution module, where the convolution module is formed by connecting a plurality of convolutional layers in parallel and/or in series. That is, any parallel branch may be composed of one or more convolution layers, or one or more convolution modules, or one or more convolution layers and one or more convolution modules, and the present invention is not limited to the specific configuration of each parallel branch as long as the convolution sizes of the parallel branches are different from each other.

Referring to fig. 1, a schematic structural diagram of a depth multi-scale fusion module in a full convolution neural network model according to an embodiment of the present invention is schematically shown, as shown in fig. 1, in the embodiment, an output image of the depth multi-scale fusion module is a sum of an output result of a serial branch and an input image (i.e., a target image to be segmented). The depth multi-scale fusion module comprises three parallel branches (branches 1, 2 and 3 respectively) and one series branch (branch 4). Wherein branch 1 is composed of a convolutional layer 1, and branch 3 is composed of a convolutional layer 3; the branch 2 is formed by connecting two convolution modules in series, each convolution module is formed by connecting two convolution layers in series and then connecting the convolution layer in parallel with the other convolution layer, namely, one convolution module is formed by connecting a convolution layer 2_2 and a convolution layer 2_3 in series and then connecting the convolution layer in parallel with a convolution layer 2_1, and the other convolution module is formed by connecting a convolution layer 2_5 and a convolution layer 2_6 in series and then connecting the convolution layer in parallel with a convolution layer 2_ 4; the branch 4 consists of a convolutional layer 4, and the output results of the branches 1, 2 and 3 are summed and then input into the convolutional layer 4 of the branch 4. Each convolution layer in the depth multi-scale fusion module is used for learning and expressing useful information in the image.

In this embodiment, the branch 1, the branch 2, and the branch 3 perform feature extraction on the input image in different scales, the sizes of convolution kernels of the branch 1 and the branch 3 are different, and the convolution kernel combination form of the branch 2 can be actually equivalent to a larger convolution kernel, so those skilled in the art can understand that the neural network model of this embodiment can relatively save more computing resources. Different scales refer to different image sizes, and because the convolution kernel size of each branch is different, each branch can extract the features of the input image in different image sizes, namely, different scale features of the input image.

Branch 4 is to add the results of branch 1, branch 2 and branch 3, then to fuse the feature image results of these three branches by a 1x1x1 convolution kernel, and then to add the obtained result with the input feature image as the final output fused feature image. Preferably, in fig. 3, the size of the convolutional layer 1 is 3 × 3 × 3, the sizes of the convolutional layers 2_1 to 2_6 are all 3 × 3 × 3, the size of the convolutional layer 3 is 5 × 5 × 5, and the size of the convolutional layer 4 is 1 × 1 × 1.

The summation operation in the embodiment adopts a residual error network connection mode in deep learning, and the step is necessary, so that the original feature image can be transmitted to a subsequent network layer for feature extraction, the feature loss is avoided, and the feature extraction accuracy is improved.

It should be noted that, the depth multi-scale fusion module shown in fig. 1 preferably includes 3 parallel branches, but the number of the parallel branches is not limited thereto, and the number of the parallel branches may be set according to actual situations in application. Next, the number and size of the convolutional layers included in the three parallel branches in fig. 1 are examples, and should not be construed as limiting the embodiments of the present invention. While branch 2 in fig. 1 includes two identical convolution modules, in other embodiments, branch 2 may also include two different convolution modules, which is not limited in the present invention.

Referring to fig. 2, a schematic structural diagram of a full convolutional neural network model according to an embodiment of the present invention is schematically shown, as shown in fig. 2, the full convolutional neural network model includes an encoding network and a decoding network;

The coding network is used for learning useful characteristic information (such as characteristic information of a blood pool and a myocardial region in a heart) from an image to be segmented, and the decoding network is used for finding the position of the region where the characteristic information is located according to the learned useful characteristic information. Reference may be made to 5a and 5b, after convolution operation is performed on a convolution kernel in the deep learning network and an image, local point and line features of a boundary of a blood pool and a myocardium in the image may be obtained, and as the pooling layer reduces the image, convolution operation is performed on the convolution kernel and the image to further obtain more boundary contour features of the blood pool and the myocardium.

As shown in fig. 2, in the present embodiment, the encoding network includes five cascaded first neural network groups a to E, and the decoding network includes four cascaded second neural network groups a to d, and the first neural network group E is connected to the second neural network group a.

As shown in fig. 2, the first neural network group a includes cascaded convolutional layer a1, depth multi-scale fusion module a2, and pooling layer A3; the first neural network group B comprises cascaded convolutional layers B11, a depth multi-scale fusion module B2, convolutional layers B12 and pooling layers B3; the first neural network group C comprises cascaded convolutional layers C11, C12, a depth multi-scale fusion module C2, a convolutional layer C13 and a pooling layer C3; the first neural network group D comprises cascaded convolutional layers D11, D12, a depth multi-scale fusion module D2, a convolutional layer D13 and a pooling layer D3; the first neural network group E comprises four concatenated convolutional layers E11, E12, E13, E14.

The first neural network group a is used for extracting feature information of an input target image, for example, feature information of a heart. Specifically, the convolutional layer a1 is used for performing convolution processing on a target image, the depth multi-scale fusion module a2 is used for performing feature extraction and fusion of different depths and scales on the convolved image, and the pooling layer A3 is used for performing pooling operation on the fused image.

The first neural network group B is used for extracting feature information of the image pooled by the pooling layer a3, for example, feature information of the heart, and the specific process is similar to that in the first neural network group a and is not described herein again.

The first neural network group C is used for extracting feature information of the image pooled by the pooling layer B3, for example, extracting feature information of the heart, and the specific process is similar to that in the first neural network group a and is not described herein again.

The first neural network group D is used for extracting feature information of the image pooled by the pooling layer C3, such as feature information of the heart, and the specific process is similar to that in the first neural network group a and is not described herein again.

Convolutional layer E11 in the first neural network group E is used to convolve the image pooled by pooling layer D3, convolutional layer E12 is used to continue to convolve the image convolved by convolutional layer E11, convolutional layer E13 is used to continue to convolve the image convolved by convolutional layer E12, and convolutional layer E14 is used to continue to convolve the image convolved by convolutional layer E13.

The second neural network group a comprises a cascaded deconvolution layer a1, a merging layer a2, a convolution layer a31, a32, a33 and a 34; the second neural network group b comprises a cascaded deconvolution layer b1, a merging layer b2, a convolution layer b31, a depth multi-scale fusion module b4 and convolution layers b32 and b 33; the second neural network group c comprises a cascaded deconvolution layer c1, a merging layer c2, a convolution layer c31, a depth multi-scale fusion module c4, convolution layers c32 and c 33; the second neural network group d includes cascaded deconvolution layers d1, a merging layer d2, a convolutional layer d31, a depth multi-scale fusion module d4, and a convolutional layer d 32. A convolutional layer e1 is provided between the convolutional layer d32 and the deepest output layer of the second neural network group d, and the convolutional layer e1 is used for realizing the logistic regression of the image, which does not belong to the second neural network group.

In this embodiment, the second neural network group a is used to restore feature information of the image, for example, feature information of the heart, to a corresponding position of the image pooled at the pooling layer C3. In particular, the deconvolution layer a1 is used to reverse the operation of the pooling layer D3 to restore the image to the corresponding location in the image before pooling layer D3. The merging layer a2 is used to recover the characteristic information of the image, for example, the characteristic information of the heart, and the output image of the convolutional layer D13 in the coding network is linearly additively merged with the output of the deconvolution layer a1 by the merging layer a2 and then serves as the input of the convolutional layer a 31. The convolutional layers a31, a32, a33, and a34 are used to recover image feature information that is lost during the pooling of images by the maximum pooling layer D3.

Similar to the second neural network group a, the second neural network groups b to d are also used for recovering information of the image, and finally, the convolutional layer d32 in the second neural network group d outputs corresponding positions of all feature information (e.g., cardiac feature information) in the finally recovered image, and finally, the corresponding positions are subjected to logistic regression through the convolutional layer e1, so as to obtain an image segmentation result, e.g., a cardiac image segmentation result.

Similar to the input of convolutional layer a31, the output image of convolutional layer C13 in the coding network is linearly additively combined with the output of anti-convolutional layer b1 through merge layer b2 to serve as the input of convolutional layer b 31; the output image of the convolutional layer B12 in the coding network is subjected to linear addition combination through the output of the merging layer c2 and the deconvolution layer c1 to be used as the input of the convolutional layer c 31; the output image of the depth multi-scale fusion module A2 in the coding network is linearly additively combined with the output of the deconvolution layer d1 through the combination layer d2 to serve as the input of the convolution layer d 31.

Preferably, the encoding network further comprises n +1 concatenated first residual connections, and the decoding network further comprises n concatenated second residual connections. Therefore, the first residual error connection is set for each first neural network group in the coding network, and the second residual error connection is set for each second neural network group in the decoding network, so that the problems of gradient disappearance and gradient explosion which are easy to occur along with the deeper network layer number are effectively solved, the transmission of effective characteristics is ensured, the image restoration is facilitated, and the accuracy of image segmentation is improved.

As shown in fig. 2, in this embodiment, the encoding network includes five concatenated first residual connections F1 through F5, and the decoding network includes four concatenated second residual connections F1 through F4. Wherein, in the coding network, the output of the input layer and the output of the depth multi-scale fusion module a2 can be added as the input of the pooling layer A3 by a first residual connection F1; the output of convolutional layer B11 may be added to the output of convolutional layer B12 by a first residual connection F2 as an input to the pooling layer B3; the output of convolutional layer C11 and the output of convolutional layer C13 may be added as input to the pooling layer C3 by a first residual connection F3; the output of convolutional layer D11 may be added to the output of convolutional layer D13 as input to the pooling layer D3 by a first residual connection F4; the output of convolutional layer E11 and the output of convolutional layer E14 can be added by the first residual connection F5 to be input to the deconvolution layer a 1. In the decoding network, the output of deconvolution layer a1 can be added to the output of convolution layer a34 by a second residual connection f1 as input to deconvolution layer b 1; the output of the deconvolution layer b1 and the output of the convolutional layer b33 can be added as the input of the deconvolution layer c1 by the second residual connection f 2; the output of the deconvolution layer c1 and the output of the convolutional layer c33 can be added as the input of the deconvolution layer d1 by the second residual connection f 3; the output of the deconvolution layer d1 and the output of the convolutional layer d32 can be added as input of the convolutional layer e1 by the second residual connection f 4.

In the full convolution neural network model shown in fig. 2, the number of the first neural network groups included in the coding network and the number of the second neural network groups included in the decoding network are examples, and should not be construed as limiting the embodiments of the present application. The number of the first neural network groups included in the encoding network and the number of the second neural network groups included in the decoding network can be set according to specific needs. In the full convolution neural network model provided in the embodiment of the present application, the number of first neural network groups included in the encoding network is one more than the number of second neural network groups included in the decoding network, because encoding and decoding have a one-to-one correspondence relationship. In addition, the number of convolutional layers included in the first neural network group a is not limited to 1, and may be 2 or more; the number of convolutional layers included in the first neural network group B is not limited to 2, and may be 1, 3, or 3 or more; the number of convolutional layers included in the first neural network groups C and D is not limited to 3, and may be 1, 2, or 3 or more; the number of convolutional layers included in the first neural network group E is not limited to 4, and may be 1, 2, 3, or 4 or more; the number of convolutional layers included in the second neural network group a is not limited to 4, and may be 1, 2, 3, or 4 or more; the number of convolutional layers included in the second neural network groups b and c is not limited to 3, and may be 1, 2 or 3 or more; the number of convolutional layers included in the second neural network group d is not limited to 2, and may be 1, 3, or 3 or more, and the present invention is not limited thereto.

Preferably, the full convolution neural network model is obtained by training through the following steps:

obtaining an original training sample, wherein the original training sample comprises an original training image and a label image corresponding to the original training image;

expanding the original training sample to obtain an expanded training sample, wherein the expanded training sample comprises an expanded training image and a label image corresponding to the expanded training image;

setting initial values of model parameters of the full convolution neural network model; and

and training a pre-built full convolution neural network model according to the expanded training sample and the initial value of the model parameter until a preset training end condition is met.

The label image may be a gold standard segmentation result obtained by segmenting the original training image by using an existing segmentation method. Because the data of the original training sample is limited, deep learning needs to be performed on certain data to have certain robustness, and in order to increase the robustness, data amplification operation needs to be performed to increase the generalization capability of the full convolution neural network model. Specifically, the same random rigid transformation may be performed on the original training image and the corresponding label image, and specifically includes: rotation, scaling, translation, flipping, and grayscale transformation. More specifically, the original training image and the corresponding label image may be translated by-20 to 20 pixels up and down, translated by-20 to 20 pixels left and right, rotated by-20 ° to 20 °, horizontally flipped, vertically flipped, up and down symmetrically transformed, scaled by 0.8 to 1.2 times, left and right symmetrically transformed, and gray transformed, respectively, to complete data amplification of the image. Through the transformation, the original 20 cases of images can be expanded to 2000 cases, one part of the images can be used for model training, the rest of the images can be used for model testing, and in deep learning, 80% of data is selected for training and 20% of data is selected for testing. Specifically, the labeled 20 images can be subjected to data amplification and expanded to 2000 cases, 1600 cases of the labeled 20 images can be used as training samples for training, and the remaining 400 cases can be used as test samples for testing.

Preferably, in order to improve the accuracy of the model, after the extended training sample is generated and before the model training is performed, the training images in the extended training sample may be preprocessed to remove noise in the images and improve the image quality of the training samples.

The model parameters of the full convolution neural network model include two types: characteristic parameters and hyper-parameters. The feature parameters are parameters for learning the image features, and include a weight parameter and a bias parameter. The hyper-parameters are parameters manually set during training, and the characteristic parameters can be learned from the sample only by setting the proper hyper-parameters. The hyper-parameters may include a learning rate, a number of hidden layers, a convolution kernel size, a number of training iterations, and a batch size per iteration. The learning rate can be considered as a step size.

For example, the learning rate is preferably set to be 0.001, the number of hidden layers is respectively 16, 32, 64, 128 and 256, the size of the convolution kernel is 3 × 3 × 3, the number of training iterations is 30000, and the batch size of each iteration is 1.

Preferably, the preset training end condition is that a prediction result of a training image in the expanded training sample and an error value of a corresponding label image converge to a preset error value. The training purpose of the full convolution neural network model is to make the image segmentation result obtained by the model close to the real and accurate image segmentation result, that is, the error between the two is reduced to a certain range, so that the preset training end condition can be that the error value between the prediction result of the training image in the expanded training sample and the corresponding label image converges to a preset error value. In addition, the training process of the full convolution neural network model is a multiple-cycle iteration process, so that the training can be finished by setting the iteration number, that is, the preset training end condition can be that the iteration number reaches the preset iteration number.

Preferably, the step of training the pre-built full convolution neural network model according to the expanded training samples and the initial values of the model parameters includes: and training a pre-built full convolution neural network model by adopting a random gradient descent method according to the expanded training sample and the initial value of the model parameter. Since the model training process is actually the process of minimizing the loss function, and the derivation can achieve this goal quickly and simply, the derivation method is the gradient descent method. Therefore, the full convolution neural network model is trained by adopting a gradient descent method, and the training of the full convolution neural network model can be quickly and simply realized.

In the deep learning in the implementation mode of the invention, the gradient descent method is mainly used for training the model, and then the back propagation algorithm is used for updating the weight parameters and the bias parameters in the optimized network model. Specifically, the position with the maximum slope of the curve is judged to be the direction of reaching the optimal value faster by adopting a gradient descent method, the partial derivative is solved by adopting a probabilistic chain derivation method to update the weight by adopting a back propagation method, and the parameter is updated by continuously iterative training to learn the image. The method for updating the weight parameter and the bias parameter by the back propagation algorithm is as follows:

1. firstly, carrying out forward propagation, updating parameters through continuous iterative training to learn an image, and calculating activation values of all layers (convolutional layers and anti-convolutional layers), namely obtaining an activation image after the image is subjected to convolution operation;

2. for the output layer (n th)_lLayer), calculating the sensitivity value

Wherein y is the true value of the sample,

in order to output the prediction value of the layer,

a partial derivative representing an output layer parameter;

3. for l ═ n_l-1,n_l-2.... for each layer, calculating a sensitivity value

Wherein, W^lRepresents the weight parameter, δ, of the l-th layer^l+1Represents the sensitivity value, f' (z) of layer l +1^l) Represents the partial derivative of the l-th layer;

4. updating the weight parameter and the bias parameter of each layer:

wherein, W^lAnd b^lRespectively representing the weight parameter and the bias parameter for the l layers,

to the learning rate, a^lRepresents the output value of the l-th layer, δ^l+1The sensitivity value of the l +1 layer is shown.

Preferably, the step of training the pre-built full convolution neural network model by using a random gradient descent method according to the expanded training sample and the initial value of the model parameter includes:

step A: taking the expanded training image as the input of a full convolution neural network model, and acquiring the prediction result of the expanded training image according to the initial value of the model parameter;

and B: calculating a loss function value according to the prediction result and a label image corresponding to the expanded training image; and

and C: and B, judging whether the loss function value converges to a preset value, if so, finishing training, if not, adjusting the model parameter, updating the initial value of the model parameter to the adjusted model parameter, and returning to execute the step A.

And when the loss function value does not converge to the preset value, the full convolution neural network model is not accurate, the model needs to be trained continuously, in this way, the model parameters are adjusted, the initial values of the model parameters are updated to the adjusted model parameters, the step A is returned to be executed, and the next iteration process is started.

The loss function L (W, b) in the present invention is expressed as:

wherein W and b represent weight parameters and bias parameters of the full convolution network, m is the number of training samples, m is a positive integer, xⁱI-th training sample representing the input, f_W,b(xⁱ) Denotes the prediction result of the i-th training sample, yⁱAnd K is a smoothing parameter to prevent the situation that the denominator is zero and cannot be calculated.

To achieve the above idea, the present invention provides an image segmentation method, please refer to fig. 3, which schematically shows a flowchart of an image segmentation method according to an embodiment of the present invention, as shown in fig. 3, the image segmentation method includes the following steps:

step S100: and acquiring a target image to be segmented.

In the present invention, the target image to be segmented may be an image of a heart, an image of other tissue organs (such as blood vessels, etc.) or other non-organ images, which is not limited by the present invention. The target image to be segmented can be obtained by scanning and acquisition of various imaging systems, and can also be obtained by transmission of an internal or external storage system such as a storage system image archiving and communication system. The imaging system includes, but is not limited to, one or more of Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), and the like in combination. It should be noted that the size of the target image to be segmented may be set according to specific situations, and the present invention is not limited to this, for example, the size of the target image may be 128 × 128 × 128 pixels.

In one implementation, the original image acquired by the imaging system may be directly subjected to image segmentation, that is, the original image acquired by the imaging system is taken as a target image to be segmented. Preferably, after obtaining an original image to be segmented, the original image may be preprocessed to remove noise in the original image, so as to obtain the target image. Therefore, by preprocessing the original image to be segmented, the noise information in the original image can be effectively filtered, so that the image quality of the image to be segmented is effectively improved, the target image with the noise removed is segmented, and the quality of the segmented image can be improved.

Specifically, the original image to be segmented may be preprocessed by using a three-dimensional gaussian filter, for example, a filtering process, to remove noise in the original image to be segmented, and a gaussian kernel parameter of the three-dimensional gaussian filter may be set to 3. Fig. 4 shows an image of a heart after denoising pre-processing, in which the region 10 is a myocardial region of the heart and the region 20 is a blood pool region of the heart. In addition, in some other embodiments, the original image to be segmented may be preprocessed by using other commonly used filters, for example, filtering, to remove noise in the original image to be segmented, which is not limited by the present invention.

Step S200: and segmenting the target image by adopting a pre-trained full convolution neural network model to obtain a segmented image.

The full convolution neural network model comprises at least one depth multi-scale fusion module, and the depth multi-scale fusion module is used for extracting and fusing features of different depths and scales of the input image. For a detailed description of the full convolution neural network model, please refer to the foregoing, which is not described herein.

Referring to fig. 5a, fig. 5a is an image obtained by segmenting the target image to be segmented (i.e. the heart image) shown in fig. 4 by using the method of the present invention. Referring to fig. 5b, fig. 5b is a diagram illustrating an image obtained by segmenting the target image to be segmented (i.e., the cardiac image) shown in fig. 2 by using a conventional full convolution neural network model. As can be seen from the comparison of the segmentation effects of fig. 5a and 5b, the segmentation accuracy of the myocardium region and the blood pool region in fig. 5a is higher. Therefore, compared with the prior art, the full convolution neural network model adopted by the invention comprises the depth multi-scale fusion module, so that the features of different scales and depths of the input image can be extracted and fused through the depth multi-scale fusion module (the image depth is related to the image scale, the image depth is deeper when the image scale is larger), the output feature results are more various, the feature modes are richer, and the precision of the integral segmentation algorithm is effectively improved. The full convolution neural network model is adopted for image segmentation, the image segmentation precision can be improved, meanwhile, the complex operation of man-machine interaction can be reduced, the image segmentation algorithm is strong in universality, an end-to-end algorithm process is realized, and a doctor can be better assisted to improve the diagnosis accuracy.

Corresponding to the image segmentation method, the present invention further provides an image segmentation apparatus, referring to fig. 6, which schematically shows a block diagram of an image segmentation apparatus according to an embodiment of the present invention, as shown in fig. 6, the image segmentation apparatus includes:

an obtaining module 201, configured to obtain a target image to be segmented;

a segmentation module 202, configured to segment the target image by using a pre-trained full-convolution neural network model to obtain a segmented image;

Preferably, the depth multi-scale fusion module includes at least two parallel branches and a series branch, where convolution kernels of the parallel branches are different from each other in size, and the parallel branches are connected in parallel and then connected in series with the series branch.

Preferably, each parallel branch comprises at least one convolutional layer and/or at least one convolution module, and the convolution module is formed by connecting a plurality of convolutional layers in parallel and/or in series.

Preferably, the number of the parallel branches is 3.

Preferably, the series branch comprises a convolutional layer of size 1 × 1 × 1.

Preferably, the full convolution neural network model comprises an encoding network and a decoding network;

Preferably, the encoding network comprises n +1 concatenated first residual connections and the decoding network comprises n concatenated second residual connections.

Preferably, the obtaining module 201 is configured to: and acquiring an original image to be segmented, and preprocessing the original image to remove noise in the original image to obtain the target image.

Based on the above inventive concept, the present invention further provides an electronic device, please refer to fig. 7, which schematically shows a structural diagram of the electronic device according to an embodiment of the present invention. As shown in fig. 7, the electronic device comprises a processor 301 and a memory 303, the memory 303 having stored thereon a computer program, which when executed by the processor 301, implements the image segmentation method described above.

As shown in fig. 7, the electronic device further includes a communication interface 302 and a communication bus 304, wherein the processor 301, the communication interface 302 and the memory 303 complete communication with each other through the communication bus 304. The communication bus 304 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 304 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface 302 is used for communication between the electronic device and other devices.

The Processor 301 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor 301 or the like, the processor 301 being the control center of the electronic device and connecting the various parts of the entire electronic device with various interfaces and lines.

The memory 303 is used for storing the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling data stored in the memory 303.

The memory 303 comprises non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The present invention also provides a readable storage medium having stored therein a computer program which, when executed by a processor, may implement the image segmentation method described above.

The readable storage media of embodiments of the invention may take any combination of one or more computer-readable media. The readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this context, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It should be noted that the apparatuses and methods disclosed in the embodiments herein can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments herein. In this regard, each block in the flowchart or block diagrams may represent a module, a program, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments herein may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only for the purpose of describing the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention, and any variations and modifications made by those skilled in the art based on the above disclosure are within the scope of the appended claims. It will be apparent to those skilled in the art that various changes and modifications may be made in the invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. An image segmentation apparatus, comprising:

the acquisition module is used for acquiring a target image to be segmented;

2. The image segmentation device according to claim 1, wherein the depth multi-scale fusion module comprises at least two parallel branches and a serial branch, wherein the convolution kernels of the parallel branches are different from each other in size, and the parallel branches are connected in parallel and then connected in series with the serial branch.

3. The image segmentation device according to claim 2, wherein each of the parallel branches comprises at least one convolutional layer and/or at least one convolutional module, and the convolutional module is formed by connecting a plurality of convolutional layers in parallel and/or in series.

4. The image segmentation device according to claim 2, characterized in that the number of parallel branches is 3.

5. The image segmentation device of claim 2, wherein the serial branch comprises a convolution layer of size 1x 1.

6. The image segmentation apparatus according to claim 1, wherein the full convolutional neural network model includes an encoding network and a decoding network;

7. The image segmentation apparatus as claimed in claim 6, wherein the coding network comprises n +1 concatenated first residual connections and the decoding network comprises n concatenated second residual connections.

8. The image segmentation apparatus according to claim 1, wherein the step of acquiring the target image to be segmented by the acquisition module comprises:

9. A full convolution neural network model is characterized by comprising a depth multi-scale fusion module, wherein the depth multi-scale fusion module is used for extracting and fusing features of different depths and scales of an input image.

10. The full convolutional neural network model of claim 9, wherein the deep multi-scale fusion module comprises at least two parallel branches and a series branch, wherein the convolution kernels of the parallel branches are different from each other, and the parallel branches are connected in parallel and then connected in series with the series branch.

11. The full convolutional neural network model of claim 10, wherein each of the parallel branches comprises at least one convolutional layer and/or at least one convolutional module, the convolutional module being formed by a plurality of convolutional layers connected in parallel and/or in series.

12. The full convolution neural network model of claim 10, wherein the number of parallel branches is 3.

13. The full convolutional neural network model of claim 10, wherein the series of branches comprises a convolutional layer of size 1x 1.

14. The full convolutional neural network model of claim 9, wherein the full convolutional neural network model comprises an encoding network and a decoding network;

15. The full convolutional neural network model of claim 14, wherein the coding network comprises n +1 cascaded first residual connections and the decoding network comprises n cascaded second residual connections.

16. An image segmentation method, comprising:

acquiring a target image to be segmented;

segmenting the target image by adopting a pre-trained full convolution neural network model to obtain a segmented image; the full convolution neural network model comprises a depth multi-scale fusion module, and the depth multi-scale fusion module is used for extracting and fusing image features of the target image.

17. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of claim 16.

18. An electronic device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method of claim 16.