CN112465754B

CN112465754B - 3D medical image segmentation method and device based on layered perception fusion and storage medium

Info

Publication number: CN112465754B
Application number: CN202011287175.7A
Authority: CN
Inventors: 孟令龙
Original assignee: Yunrun Da Data Service Co ltd
Current assignee: Yunrun Da Data Service Co ltd
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-09-03
Anticipated expiration: 2040-11-17
Also published as: CN112465754A

Abstract

The invention discloses a 3D medical image segmentation method, a device and a storage medium based on layered perception fusion, wherein the method comprises the following steps: step S1, acquiring a 3D medical image for preprocessing, and slicing the preprocessed 3D medical image to obtain a plurality of slice images; step S2, performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result of each slice image after semantic segmentation; in step S3, the prediction results of the slice images are fused, and the final medical image segmentation result is output.

Description

3D medical image segmentation method and device based on layered perception fusion and storage medium

Technical Field

The invention relates to the technical field of image segmentation, in particular to a 3D medical image segmentation method, a device and a storage medium based on layered perception fusion.

Background

The medical image processing objects are medical images of various imaging mechanisms, and the clinical widely used medical imaging categories mainly include four categories of X-ray imaging (X-CT), Magnetic Resonance Imaging (MRI), Nuclear Medicine Imaging (NMI) and Ultrasonic Imaging (UI). In current medical imaging diagnosis, the pathological changes are mainly discovered by observing a group of two-dimensional slice images, which often needs to be determined by the experience of doctors. The two-dimensional slice image is analyzed and processed by using a computer image processing technology, so that segmentation extraction, three-dimensional reconstruction and three-dimensional display of human organs, soft tissues and pathological variants are realized, and qualitative and even quantitative analysis of pathological change bodies and other interested areas can be assisted by doctors, so that the accuracy and reliability of medical diagnosis are greatly improved; the system can also play an important auxiliary role in medical teaching, operation planning, operation simulation and various medical researches. Medical image segmentation mainly takes images of various cells, tissues and organs as objects to be processed, and the process of segmenting the images into a plurality of regions according to similarity or difference among the regions. Therefore, medical image segmentation plays a very important role in disease diagnosis and prediction.

The 2015 university of hong kong computer science department and biological signal research center and the germany university of fleabag propose a mirror image full convolution neural network Unet, which has small parameters and strong fitting capability and is widely applied to defect detection and medical image segmentation, and then a series of full convolution neural networks for expanding a perception domain appear. Due to the layer jump connection mode and the multi-scale network structure of the Unet, the loss of information of the neural network in the operation process is effectively reduced, and meanwhile, due to the structure, the Unet can obtain excellent performance in medical image segmentation.

Limited to the issues of imaging technology and privacy of medical conditions, medical data is often at a premium. The size of a medical image segmentation model is limited, the model is overfitting due to the fact that the model is too large, and the accuracy is low due to the fact that the model is too small. Reasonable data enhancement and appropriate modeling are key to handling good medical image segmentation. Although a great number of full convolution neural network methods for image segmentation exist at present, the full convolution neural network needs a great amount of data as a drive, a great number of training models cannot avoid flaws due to inexplicability, and meanwhile, a single end-to-end full convolution neural network is difficult to directly meet the high-precision requirement in a medical disease diagnosis scene due to the fact that a medical image scene is complex and changeable and data is limited. At present, in many papers, in order to improve the accuracy of image segmentation, a large number of optimization unit modules are added and the network is deepened to extract more features, and these methods require a large amount of calculation and memory occupation, and cannot provide real-time measurement control in an industrial scene under the condition of limited calculation resources. And the single method for increasing the number of network layers is poor in generalization capability for small data sets, needs a large amount of data to drive, and is not suitable for medical image segmentation scenes.

The current medical image segmentation method is to cut a 3D image into individual 2D images, then use a segmentation model commonly used in natural image/computer vision for each 2D image, and finally splice the segmented result for each 2D image into a 3D form, and the 2D slice solution fully utilizes the information in the whole 2D slice, but ignores the relationship between several adjacent slice images and loses the relevance between the global slice sequences. The 3D image is directly segmented by using the 3D DCNN, because the calculated amount of a 3D CNN network model is large, the display memory is possibly insufficient by directly using the 3D CNN on an original high-resolution 3D block, the 3D original resolution needs to be cut into individual 3D blocks, and then the 3D CNN network is sent to each block for splicing.

Disclosure of Invention

In order to overcome the defects of the prior art, the present invention provides a 3D medical image segmentation method, device and storage medium based on layered perception fusion, so as to divide a 3D medical image into three 2D images in H, W, C directions and a plurality of small 3D images, and to use the fusion of a 2D channel sequence relation model and a 3D model, so as to solve the problem that a single model cannot be fully utilized and prediction is inaccurate, and establish a voting mechanism according to the fusion of multiple models, thereby achieving the purpose of efficient and accurate 3D medical image segmentation.

In order to achieve the above and other objects, the present invention provides a 3D medical image segmentation method based on layered perception fusion, comprising the following steps:

step S1, acquiring a 3D medical image for preprocessing, and slicing the preprocessed 3D medical image to obtain a plurality of slice images;

step S2, performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result of each slice image after semantic segmentation;

in step S3, the prediction results of the slice images are fused, and the final medical image segmentation result is output.

Preferably, the step S1 further includes:

s100, acquiring a 3D medical image, marking the 3D medical image, and dividing the 3D medical image into a target part and a background part;

step S101, checking whether the data of the 3D medical image is correct or incorrect after marking is finished;

step S102, segmenting the 3D medical image to obtain three 2D images of the 3D medical image according to H, W, C slices and decomposing the 3D medical image into a plurality of small 3D images;

step S103, data enhancement is performed on each sliced image.

Preferably, the object represents a target region, i.e., an organ pathological tissue image region, and the background represents a non-organ portion.

Preferably, after the 3D medical image is marked, the image becomes a background with a 0 pixel value and a target with a 1 pixel value, and if there are a plurality of pathological tissues, different pixel points are used for distinguishing.

Preferably, in step S2, the convolutional neural network is an RFEUnet network structure including a receptive field enhancement module RFEM.

Preferably, the RFEUnet network structure changes the channel combination of the Unet network into the original 1/2 on the basis of the existing Unet network structure, and enlarges the sensing performance of the network model by adding the receptive field module RFEM to the tail of the encoding structure of the Unet network and using Maxpooling, a mish activation function and a void convolution structure through the receptive field module RFEM.

Preferably, in step S3, the H, W, C direction slice image and the small 3D images enter four RFEUnet network structures to form four segmentation results, and the four output results are averaged to obtain the final image segmentation result.

In order to achieve the above object, the present invention further provides a 3D medical image segmentation apparatus based on layered perception fusion, including:

the preprocessing module is used for acquiring a 3D medical image for preprocessing, and slicing the preprocessed 3D medical image to acquire a plurality of slice images;

the segmentation module is used for performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result after each slice image is subjected to semantic segmentation;

and the fusion module is used for fusing the prediction results of all the slice images and outputting the final medical image segmentation result.

Preferably, the segmentation module adopts an RFEUnet network structure including a receptive field enhancement module RFEM as the convolutional neural network, the RFEUnet network structure changes a channel combination of the uet network to 1/2 of an original channel combination on the basis of an existing uet network structure, and the perceptual performance of the network model is expanded by adding the receptive field module RFEM to a tail of the encoding structure of the uet network, and using the maxporoling, the mish activation function, and the void convolutional structure through the receptive field module RFEM.

To achieve the above object, the present invention also provides a computer-readable storage medium for storing program code for executing the above 3D medical image segmentation method.

Compared with the prior art, the 3D medical image segmentation method, the device and the storage medium based on layered perception fusion are used for segmenting the 3D medical image into three 2D images in the H, W, C direction and a plurality of small 3D images, solving the problems that a single model cannot be fully utilized and prediction is inaccurate by utilizing the fusion of a 2D channel sequence relation model and the 3D model, and establishing a voting mechanism according to the fusion of multiple models so as to achieve the purpose of efficiently and accurately segmenting the 3D medical image.

Drawings

FIG. 1 is a flowchart illustrating the steps of a 3D medical image segmentation method based on layered perception fusion according to the present invention;

FIG. 2 is a schematic structural diagram of the RFE augmented reality enhanced Unet model in an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an embodiment of the invention, namely an RFEM (receptive field enhancement module);

FIG. 4 is a system architecture diagram of a 3D medical image segmentation apparatus based on layered perception fusion according to the present invention;

FIG. 5 is a flow chart of 3D medical image segmentation based on layered perception fusion according to an embodiment of the present invention;

fig. 6 is a comparison diagram of the results of segmenting a 3D medical image by using the original Unet network model, the RFEUnet network model and the multi-model fusion in the embodiment of the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.

Fig. 1 is a flow chart of steps of a 3D medical image segmentation method based on layered perception fusion according to the present invention. As shown in FIG. 1, the invention relates to a 3D medical image segmentation method based on layered perception fusion, which comprises the following steps:

step S1, acquiring a 3D medical image, preprocessing the 3D medical image, and performing slice processing on the preprocessed image to obtain a plurality of slice images after slice processing.

Specifically, step S1 further includes:

and S100, acquiring a 3D medical image, marking the 3D medical image, and dividing the 3D medical image into a target part and a background part.

In the specific embodiment of the invention, after the 3D medical image is obtained, marking is carried out on the 3D medical image by using marking software, and the 3D medical image is divided into a target part and a background part, wherein the target represents a target area, namely an organ pathological tissue image area, and the background represents a non-organ part.

Step S101, after marking is completed, the data of the 3D medical image is checked for correctness.

After marking is completed, the image data is checked for correctness. If the distribution of possible pixel points of the marking software is not uniform, the image is changed into a background with a 0 pixel value and a target with a 1 pixel value, if a plurality of pathological tissues exist, different pixel points can be used for distinguishing, for example, the pixel point 1 is a tumor tissue, and the pixel point 2 is the periphery of the tumor, and the like.

Step S102, segmenting the 3D medical image to obtain three 2D images of the 3D medical image according to H, W, C slices and decomposing the 3D medical image into a plurality of small 3D images.

That is, in the present invention, there are two slicing methods for slicing a 3D medical image, the first being that the 3D image is a 2D image sliced differently according to H, W, C; the second is to decompose the 3D image into several small 3D images.

In the segmentation of the medical image, if a 3D medical image is directly input into the convolutional neural network and then the 3D image is output, which results in a large amount of calculation and a slow comparison of operation speed, and since the medical image data set is very small and the effect of directly using the 3D image is not good, the present invention considers that the 3D medical image is sliced, that is, 2D image is sliced according to H, W, C into a 2D network to solve the problem of complicated calculation amount, but the network of the 2D image does not consider the problem of correlation between the 2D image and the 2D image, so the present invention intends to consider the results of forming 4 3D segmentations by using the 2D image segmented according to H, W, C three directions and several small 3D images decomposed from the 3D image through 4 convolutional neural networks respectively, the method of processing in this way can not only utilize the hierarchical relation, but also resist the overfitting effect generated by the few data sets.

Step S103, data enhancement is performed on each sliced image.

In the embodiment of the present invention, the data enhancement mainly includes rotation, clipping, and scaling, and plays a role in preventing overfitting and increasing robustness, that is, the processing of rotation, clipping, scaling, and the like is performed on each sliced image after segmentation to prevent overfitting and increase robustness.

And step S2, performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result of semantic segmentation of each slice image.

In a specific embodiment of the present invention, the convolutional neural network mainly extracts the high-level features of each slice image through five void convolutional structures, where the five void convolutional structures all include hierarchical structures. Specifically, the invention provides an RFEUnet (perceptual field enhancement Unet) according to the mechanism characteristic that a medical image needs global judgment, a network structure is shown in figure 2, wherein 1/n in the figure represents the multiple of down-sampling of the image, C represents the number of output channels, x 2 represents that two times of convolution operation are carried out, down sampling adopts Maxpooling to carry out down-sampling, Upsampling utilizes bilinear interpolation to carry out up-sampling, and feature fusion adopts a channel splicing mode.

Compared with the original Unet network structure, the RFEUnet network structure provided by the invention is improved mainly by the following three points:

RFEUnet changes the channel combination of Unet to 1/2, so that it has faster reasoning speed, for example, changes the channel number of [64, 128, 256, 512, 1024, 512, 256, 64, 2] to [32, 64, 128, 256, 512, 256, 128, 32, 2], and the channel combination makes the network parameter less and the calculation efficiency higher.

2. And establishing a receptive field module RFEM and adding the receptive field module RFEM to the tail part of the coding structure, and effectively expanding the perception performance of the receptive field enhancement model of the model by utilizing Maxpooling, a mish activation function and a hollow convolution structure. Specifically, the receptive field module RFEM extracts high-level features by using five hole convolution structures rich in hierarchical structures, as shown in fig. 3, C in the diagram represents the number of output channels, a scaling rate represents the number of holes of the hole convolution, 1/16 represents a downsampling 16 times of a desired image, the ratio of the hole convolution is [1, 3, 6, 12, 18], the left top is a maxpololing operation, the hole convolution structure is subjected to a convolution operation and then added with BN (Batch Normalization) and a hash activation function, and the receptive field module RFEM can effectively expand the perceptual performance of the receptive field enhancement model of the model by using the maxpololing, the hash activation function and the hole convolution structure.

3. In the original Unet direct connection structure, a residual error module is not added, and an overfitting phenomenon can occur after a network is deepened, so that a residual error thought and a receptive field enhancement module are added to the Unet bottom layer to form an RFEM (receptive field enhancement module), and the module effectively expands the receptive field of the bottom layer characteristics in the convolution calculation process and can not generate the overfitting phenomenon due to the deepening of the network layer as shown in FIG. 3.

In the present invention, the cutting is performedThe slice images are H, W, C-direction slice images and a plurality of small 3D images respectively, the slice images enter four convolutional neural networks to form four 3D segmentation results, and the final result is obtained by averaging according to the four output results. That is, the four slice images output four results F through four RFEUnet networks with different weights, respectively_A(X)，F_B(X)，F_C(X)，F_D(X), for example, the RFEUnet network structure outputs a probability value for each pixel of each input image, averages the probabilities of the four output values, and then indexes the maximum output between categories:

F_O(X)＝agrmax((F_A(X)+F_B(X)+F_C(X)+F_D(X))/4)

in brief, assuming that the neural network outputs are all numbers from 0 to 1, so that it can be understood that one probability value, the output of the present invention is four 3D images so that the average value is taken for each channel, where each pixel has two classes, i.e., lesion and non-lesion, and then the lesion probability and non-lesion probability are compared with each other by the size, i.e., the index maximum. If the focus large-assignment pixel point is 1, and if the non-focus large-assignment pixel point is 0, the effect of the argmax function is achieved, and thus the focus area can be known. If the image is multiplied by 255, the lesion is white and the non-lesion is black, a distinction is made, and the output is a 3D medical image segmentation effect, i.e. the region in the 3D image with the lesion is identified.

Therefore, the 3D medical image is sliced into three 2D images according to H, W, C, and multidirectional information is sent into a 2D network structure, so that the segmentation of the 3D medical image utilizes directional information (a single 2D model cannot have the characteristics), and meanwhile, the 3D prediction splicing of the 3D medical image into a plurality of small blocks utilizes the information of interlayer relation, so that the hierarchical relation of the images is utilized, and the overfitting effect caused by less medical data sets can be resisted.

Fig. 4 is a system structure diagram of a 3D medical image segmentation apparatus based on layered perception fusion according to the present invention. As shown in fig. 4, the 3D medical image segmentation apparatus based on layered perception fusion of the present invention includes:

the preprocessing module 10 is configured to acquire a 3D medical image, preprocess the 3D medical image, perform slicing processing on the 3D medical image, and obtain a plurality of sliced images after slicing processing.

In the present invention, the preprocessing module 10 is specifically configured to:

acquiring a 3D medical image, marking the 3D medical image, and dividing the 3D medical image into a target part and a background part.

After the marking is completed, the data of the 3D medical image is checked for correctness.

The 3D medical image is segmented, obtaining three 2D images sliced H, W, C from the 3D medical image and decomposing the 3D medical image into several small 3D images.

And performing data enhancement on each sliced image after segmentation.

And the segmentation module 20 is configured to perform convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm, so as to obtain a result of semantic segmentation of each slice image.

And the fusion module 30 is configured to fuse the prediction results of the slice images and output a final medical image segmentation result.

In the present invention, the slice images are H, W, C-direction slice images and a plurality of small 3D images, which enter four convolutional neural networks to form four 3D segmentation results, and the fusion module 30 averages the four output results to obtain the final result. That is, the four slice images are respectively weighted differently by fourRFEUnet network outputs four results F_A(X)，F_B(X)，F_C(X)，F_D(X), for example, the RFEUnet network structure outputs a probability value for each pixel of each input image, averages the probabilities of the four output values, and then indexes the maximum output between categories:

F_O(X)＝agrmax((F_A(X)+F_B(X)+F_C(X)+F_D(X))/4)

The present invention also provides a computer-readable storage medium for storing program code for performing the 3D medical image segmentation method provided by the above embodiments.

Examples

Fig. 5 is a flowchart of 3D medical image segmentation based on layered perceptual fusion according to an embodiment of the present invention. In an embodiment of the present invention, a 3D medical image segmentation method based on layered perception fusion includes:

step one, data making, slicing and enhancing

Step 1.1, marking the 3D medical image by using marking software, wherein the marking is divided into a target part and a background part, the target part represents a target area, namely an organ pathological tissue image area, and the background part represents a non-organ part. If a plurality of pathological tissues can be distinguished by different pixel points, for example, the pixel point 1 is tumor tissue, and the pixel point 2 is tumor periphery, etc.

And 1.2, checking whether the data is correct or incorrect after marking is finished. For example, using programming, check if the data distribution of the background and lesion areas is background: 0. focal zone 1: 1. the focal zones 2:2, etc. are simply checked for a pixel value so as not to affect the subsequent encoding, and are not described herein.

Step 1.3, the 3D medical image is sliced. There are two slicing modes, from the 3D model and the 2D model, the first being that the 3D image is a 2D image sliced differently as in H, W, C. The second is to decompose the 3D image into several small 3D images.

And step 1.4, performing data enhancement on each sliced image. Data enhancement is important for small data sets, mainly rotation, cropping and scaling, and plays a role in preventing overfitting and increasing robustness.

And step two, according to the mechanism characteristic that the medical image needs to be judged globally, providing an RFEUnet (responsive field enhancement Unet) network structure, and respectively inputting each slice image into a RFEUnet network result to obtain four semantic segmentation results. The network structure improves the number of channels on the basis of the original structure of Unet (the original channel combination of Unet is changed into the original 1/2), and simultaneously increases the receptive field RFEM of the image (the receptive field module RFEM effectively enlarges the perception performance of the receptive field enhanced network model of the model by using Maxpooling, mish activation function and a cavity convolution structure), so that the network model has a larger perception domain range on large-area image segmentation.

Step three, a voting mechanism: and fusing the output results of the H slice direction prediction network model, the W slice direction prediction network model, the C slice direction prediction network model and the 3D network model to obtain a fusion result of the 3D medical image. The total output of each network model is A, B, C, D four outputs, the model outputs a probability value aiming at each pixel point of each input image, the average probability of the four output values is obtained, and then the maximum output among index types is the final image segmentation result:

F_O(X)＝agrmax((F_A(X)+F_B(X)+F_C(X)+F_D(X))/4)

fig. 6 is a comparison diagram of results of segmenting a 3D medical image by using an original Unet network model, an original RFEUnet network model, and multi-model fusion in the embodiment of the present invention, and table 1 below is a comparison table of prediction capabilities of the original Unet network model, the original RFEUnet network model, and the original multi-model fusion. As can be seen from fig. 6 and table 1, compared with the prior art, the single model prediction capability of the RFEUnet network model is better than that of the prior art Unet, and the multi-model fusion capability is better than that of the prior art because the utilization of the direction and channel related characteristics is considered.

TABLE 1 Unet and RFEUnet model fusion table comparison table

Network	Unet	RFEMUnet	Multi-model fusion
				pixel Accuracy	90.2％	98.6％	99.8％
Amount of ginseng	31M	11M	40M
				Reasoning speed	0.08s	0.04s	0.16s

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims

1. A3D medical image segmentation method based on layered perception fusion comprises the following steps:

the convolutional neural network is an RFEUnet network structure comprising a receptive field enhancement module RFEM;

adding a residual error thought and a receptive field enhancement module into the Unet bottom layer to form RFEM so as to enlarge the receptive field of the bottom layer characteristics in the convolution calculation process;

the RFEUnet network structure changes the channel combination of the Unet network into the original 1/2 on the basis of the existing Unet network structure;

adding the receptor field module RFEM into the tail part of the Unet network coding structure, and expanding the perception performance of a receptor field enhancement model by the receptor field module RFEM by utilizing Maxpooling, a mish activation function and a cavity convolution structure; specifically, the method comprises the following steps:

the receptive field module RFEM extracts high-level features by utilizing five cavity convolution structures rich in hierarchical structures;

the cavity convolution structure is subjected to convolution operation and then added with batch normalization and a hash activation function;

2. The 3D medical image segmentation method based on layered perception fusion as claimed in claim 1, wherein the step S1 further includes:

step S103, data enhancement is performed on each sliced image.

3. The 3D medical image segmentation method based on hierarchical perceptual fusion as set forth in claim 2, wherein: the object represents a target region, i.e., an organ pathological tissue image region, and the background represents a non-organ portion.

4. The 3D medical image segmentation method based on hierarchical perceptual fusion as set forth in claim 3, wherein: after the 3D medical image is marked, the image becomes a background with a 0 pixel value and a target with a 1 pixel value, and if a plurality of pathological tissues exist, different pixel points are used for distinguishing.

5. The 3D medical image segmentation method based on hierarchical perceptual fusion as set forth in claim 4, wherein: in step S3, the H, W, C direction slice image and the small 3D images enter four RFEUnet network structures to form four segmentation results, and the four output results are averaged to obtain a final image segmentation result.

6. A layered perception fusion based 3D medical image segmentation apparatus comprising:

the fusion module is used for fusing the prediction results of all the slice images and outputting the final medical image segmentation result;

the cavity convolution structure is subjected to convolution operation and then added with batch normalization and a hash activation function.

7. A computer-readable storage medium for storing program code for performing the 3D medical image segmentation method of any one of claims 1-5.