Disclosure of Invention
In order to overcome the defects of the prior art, the present invention provides a 3D medical image segmentation method, device and storage medium based on layered perception fusion, so as to divide a 3D medical image into three 2D images in H, W, C directions and a plurality of small 3D images, and to use the fusion of a 2D channel sequence relation model and a 3D model, so as to solve the problem that a single model cannot be fully utilized and prediction is inaccurate, and establish a voting mechanism according to the fusion of multiple models, thereby achieving the purpose of efficient and accurate 3D medical image segmentation.
In order to achieve the above and other objects, the present invention provides a 3D medical image segmentation method based on layered perception fusion, comprising the following steps:
step S1, acquiring a 3D medical image for preprocessing, and slicing the preprocessed 3D medical image to obtain a plurality of slice images;
step S2, performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result of each slice image after semantic segmentation;
in step S3, the prediction results of the slice images are fused, and the final medical image segmentation result is output.
Preferably, the step S1 further includes:
s100, acquiring a 3D medical image, marking the 3D medical image, and dividing the 3D medical image into a target part and a background part;
step S101, checking whether the data of the 3D medical image is correct or incorrect after marking is finished;
step S102, segmenting the 3D medical image to obtain three 2D images of the 3D medical image according to H, W, C slices and decomposing the 3D medical image into a plurality of small 3D images;
step S103, data enhancement is performed on each sliced image.
Preferably, the object represents a target region, i.e., an organ pathological tissue image region, and the background represents a non-organ portion.
Preferably, after the 3D medical image is marked, the image becomes a background with a 0 pixel value and a target with a 1 pixel value, and if there are a plurality of pathological tissues, different pixel points are used for distinguishing.
Preferably, in step S2, the convolutional neural network is an RFEUnet network structure including a receptive field enhancement module RFEM.
Preferably, the RFEUnet network structure changes the channel combination of the Unet network into the original 1/2 on the basis of the existing Unet network structure, and enlarges the sensing performance of the network model by adding the receptive field module RFEM to the tail of the encoding structure of the Unet network and using Maxpooling, a mish activation function and a void convolution structure through the receptive field module RFEM.
Preferably, in step S3, the H, W, C direction slice image and the small 3D images enter four RFEUnet network structures to form four segmentation results, and the four output results are averaged to obtain the final image segmentation result.
In order to achieve the above object, the present invention further provides a 3D medical image segmentation apparatus based on layered perception fusion, including:
the preprocessing module is used for acquiring a 3D medical image for preprocessing, and slicing the preprocessed 3D medical image to acquire a plurality of slice images;
the segmentation module is used for performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result after each slice image is subjected to semantic segmentation;
and the fusion module is used for fusing the prediction results of all the slice images and outputting the final medical image segmentation result.
Preferably, the segmentation module adopts an RFEUnet network structure including a receptive field enhancement module RFEM as the convolutional neural network, the RFEUnet network structure changes a channel combination of the uet network to 1/2 of an original channel combination on the basis of an existing uet network structure, and the perceptual performance of the network model is expanded by adding the receptive field module RFEM to a tail of the encoding structure of the uet network, and using the maxporoling, the mish activation function, and the void convolutional structure through the receptive field module RFEM.
To achieve the above object, the present invention also provides a computer-readable storage medium for storing program code for executing the above 3D medical image segmentation method.
Compared with the prior art, the 3D medical image segmentation method, the device and the storage medium based on layered perception fusion are used for segmenting the 3D medical image into three 2D images in the H, W, C direction and a plurality of small 3D images, solving the problems that a single model cannot be fully utilized and prediction is inaccurate by utilizing the fusion of a 2D channel sequence relation model and the 3D model, and establishing a voting mechanism according to the fusion of multiple models so as to achieve the purpose of efficiently and accurately segmenting the 3D medical image.
Detailed Description
Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the spirit and scope of the present invention.
Fig. 1 is a flow chart of steps of a 3D medical image segmentation method based on layered perception fusion according to the present invention. As shown in FIG. 1, the invention relates to a 3D medical image segmentation method based on layered perception fusion, which comprises the following steps:
step S1, acquiring a 3D medical image, preprocessing the 3D medical image, and performing slice processing on the preprocessed image to obtain a plurality of slice images after slice processing.
Specifically, step S1 further includes:
and S100, acquiring a 3D medical image, marking the 3D medical image, and dividing the 3D medical image into a target part and a background part.
In the specific embodiment of the invention, after the 3D medical image is obtained, marking is carried out on the 3D medical image by using marking software, and the 3D medical image is divided into a target part and a background part, wherein the target represents a target area, namely an organ pathological tissue image area, and the background represents a non-organ part.
Step S101, after marking is completed, the data of the 3D medical image is checked for correctness.
After marking is completed, the image data is checked for correctness. If the distribution of possible pixel points of the marking software is not uniform, the image is changed into a background with a 0 pixel value and a target with a 1 pixel value, if a plurality of pathological tissues exist, different pixel points can be used for distinguishing, for example, the pixel point 1 is a tumor tissue, and the pixel point 2 is the periphery of the tumor, and the like.
Step S102, segmenting the 3D medical image to obtain three 2D images of the 3D medical image according to H, W, C slices and decomposing the 3D medical image into a plurality of small 3D images.
That is, in the present invention, there are two slicing methods for slicing a 3D medical image, the first being that the 3D image is a 2D image sliced differently according to H, W, C; the second is to decompose the 3D image into several small 3D images.
In the segmentation of the medical image, if a 3D medical image is directly input into the convolutional neural network and then the 3D image is output, which results in a large amount of calculation and a slow comparison of operation speed, and since the medical image data set is very small and the effect of directly using the 3D image is not good, the present invention considers that the 3D medical image is sliced, that is, 2D image is sliced according to H, W, C into a 2D network to solve the problem of complicated calculation amount, but the network of the 2D image does not consider the problem of correlation between the 2D image and the 2D image, so the present invention intends to consider the results of forming 4 3D segmentations by using the 2D image segmented according to H, W, C three directions and several small 3D images decomposed from the 3D image through 4 convolutional neural networks respectively, the method of processing in this way can not only utilize the hierarchical relation, but also resist the overfitting effect generated by the few data sets.
Step S103, data enhancement is performed on each sliced image.
In the embodiment of the present invention, the data enhancement mainly includes rotation, clipping, and scaling, and plays a role in preventing overfitting and increasing robustness, that is, the processing of rotation, clipping, scaling, and the like is performed on each sliced image after segmentation to prevent overfitting and increase robustness.
And step S2, performing convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm to obtain a result of semantic segmentation of each slice image.
In a specific embodiment of the present invention, the convolutional neural network mainly extracts the high-level features of each slice image through five void convolutional structures, where the five void convolutional structures all include hierarchical structures. Specifically, the invention provides an RFEUnet (perceptual field enhancement Unet) according to the mechanism characteristic that a medical image needs global judgment, a network structure is shown in figure 2, wherein 1/n in the figure represents the multiple of down-sampling of the image, C represents the number of output channels, x 2 represents that two times of convolution operation are carried out, down sampling adopts Maxpooling to carry out down-sampling, Upsampling utilizes bilinear interpolation to carry out up-sampling, and feature fusion adopts a channel splicing mode.
Compared with the original Unet network structure, the RFEUnet network structure provided by the invention is improved mainly by the following three points:
RFEUnet changes the channel combination of Unet to 1/2, so that it has faster reasoning speed, for example, changes the channel number of [64, 128, 256, 512, 1024, 512, 256, 64, 2] to [32, 64, 128, 256, 512, 256, 128, 32, 2], and the channel combination makes the network parameter less and the calculation efficiency higher.
2. And establishing a receptive field module RFEM and adding the receptive field module RFEM to the tail part of the coding structure, and effectively expanding the perception performance of the receptive field enhancement model of the model by utilizing Maxpooling, a mish activation function and a hollow convolution structure. Specifically, the receptive field module RFEM extracts high-level features by using five hole convolution structures rich in hierarchical structures, as shown in fig. 3, C in the diagram represents the number of output channels, a scaling rate represents the number of holes of the hole convolution, 1/16 represents a downsampling 16 times of a desired image, the ratio of the hole convolution is [1, 3, 6, 12, 18], the left top is a maxpololing operation, the hole convolution structure is subjected to a convolution operation and then added with BN (Batch Normalization) and a hash activation function, and the receptive field module RFEM can effectively expand the perceptual performance of the receptive field enhancement model of the model by using the maxpololing, the hash activation function and the hole convolution structure.
3. In the original Unet direct connection structure, a residual error module is not added, and an overfitting phenomenon can occur after a network is deepened, so that a residual error thought and a receptive field enhancement module are added to the Unet bottom layer to form an RFEM (receptive field enhancement module), and the module effectively expands the receptive field of the bottom layer characteristics in the convolution calculation process and can not generate the overfitting phenomenon due to the deepening of the network layer as shown in FIG. 3.
In step S3, the prediction results of the slice images are fused, and the final medical image segmentation result is output.
In the present invention, the cutting is performedThe slice images are H, W, C-direction slice images and a plurality of small 3D images respectively, the slice images enter four convolutional neural networks to form four 3D segmentation results, and the final result is obtained by averaging according to the four output results. That is, the four slice images output four results F through four RFEUnet networks with different weights, respectivelyA(X),FB(X),FC(X),FD(X), for example, the RFEUnet network structure outputs a probability value for each pixel of each input image, averages the probabilities of the four output values, and then indexes the maximum output between categories:
FO(X)=agrmax((FA(X)+FB(X)+FC(X)+FD(X))/4)
in brief, assuming that the neural network outputs are all numbers from 0 to 1, so that it can be understood that one probability value, the output of the present invention is four 3D images so that the average value is taken for each channel, where each pixel has two classes, i.e., lesion and non-lesion, and then the lesion probability and non-lesion probability are compared with each other by the size, i.e., the index maximum. If the focus large-assignment pixel point is 1, and if the non-focus large-assignment pixel point is 0, the effect of the argmax function is achieved, and thus the focus area can be known. If the image is multiplied by 255, the lesion is white and the non-lesion is black, a distinction is made, and the output is a 3D medical image segmentation effect, i.e. the region in the 3D image with the lesion is identified.
Therefore, the 3D medical image is sliced into three 2D images according to H, W, C, and multidirectional information is sent into a 2D network structure, so that the segmentation of the 3D medical image utilizes directional information (a single 2D model cannot have the characteristics), and meanwhile, the 3D prediction splicing of the 3D medical image into a plurality of small blocks utilizes the information of interlayer relation, so that the hierarchical relation of the images is utilized, and the overfitting effect caused by less medical data sets can be resisted.
Fig. 4 is a system structure diagram of a 3D medical image segmentation apparatus based on layered perception fusion according to the present invention. As shown in fig. 4, the 3D medical image segmentation apparatus based on layered perception fusion of the present invention includes:
the preprocessing module 10 is configured to acquire a 3D medical image, preprocess the 3D medical image, perform slicing processing on the 3D medical image, and obtain a plurality of sliced images after slicing processing.
In the present invention, the preprocessing module 10 is specifically configured to:
acquiring a 3D medical image, marking the 3D medical image, and dividing the 3D medical image into a target part and a background part.
In the specific embodiment of the invention, after the 3D medical image is obtained, marking is carried out on the 3D medical image by using marking software, and the 3D medical image is divided into a target part and a background part, wherein the target represents a target area, namely an organ pathological tissue image area, and the background represents a non-organ part.
After the marking is completed, the data of the 3D medical image is checked for correctness.
After marking is completed, the image data is checked for correctness. If the distribution of possible pixel points of the marking software is not uniform, the image is changed into a background with a 0 pixel value and a target with a 1 pixel value, if a plurality of pathological tissues exist, different pixel points can be used for distinguishing, for example, the pixel point 1 is a tumor tissue, and the pixel point 2 is the periphery of the tumor, and the like.
The 3D medical image is segmented, obtaining three 2D images sliced H, W, C from the 3D medical image and decomposing the 3D medical image into several small 3D images.
That is, in the present invention, there are two slicing methods for slicing a 3D medical image, the first being that the 3D image is a 2D image sliced differently according to H, W, C; the second is to decompose the 3D image into several small 3D images.
In the segmentation of the medical image, if a 3D medical image is directly input into the convolutional neural network and then the 3D image is output, which results in a large amount of calculation and a slow comparison of operation speed, and since the medical image data set is very small and the effect of directly using the 3D image is not good, the present invention considers that the 3D medical image is sliced, that is, 2D image is sliced according to H, W, C into a 2D network to solve the problem of complicated calculation amount, but the network of the 2D image does not consider the problem of correlation between the 2D image and the 2D image, so the present invention intends to consider the results of forming 4 3D segmentations by using the 2D image segmented according to H, W, C three directions and several small 3D images decomposed from the 3D image through 4 convolutional neural networks respectively, the method of processing in this way can not only utilize the hierarchical relation, but also resist the overfitting effect generated by the few data sets.
And performing data enhancement on each sliced image after segmentation.
In the embodiment of the present invention, the data enhancement mainly includes rotation, clipping, and scaling, and plays a role in preventing overfitting and increasing robustness, that is, the processing of rotation, clipping, scaling, and the like is performed on each sliced image after segmentation to prevent overfitting and increase robustness.
And the segmentation module 20 is configured to perform convolution calculation on each slice image through a convolution neural network semantic segmentation algorithm, so as to obtain a result of semantic segmentation of each slice image.
In a specific embodiment of the present invention, the convolutional neural network mainly extracts the high-level features of each slice image through five void convolutional structures, where the five void convolutional structures all include hierarchical structures. Specifically, the invention provides an RFEUnet (perceptual field enhancement Unet) according to the mechanism characteristic that a medical image needs global judgment, a network structure is shown in figure 2, wherein 1/n in the figure represents the multiple of down-sampling of the image, C represents the number of output channels, x 2 represents that two times of convolution operation are carried out, down sampling adopts Maxpooling to carry out down-sampling, Upsampling utilizes bilinear interpolation to carry out up-sampling, and feature fusion adopts a channel splicing mode.
Compared with the original Unet network structure, the RFEUnet network structure provided by the invention is improved mainly by the following three points:
RFEUnet changes the channel combination of Unet to 1/2, so that it has faster reasoning speed, for example, changes the channel number of [64, 128, 256, 512, 1024, 512, 256, 64, 2] to [32, 64, 128, 256, 512, 256, 128, 32, 2], and the channel combination makes the network parameter less and the calculation efficiency higher.
2. And establishing a receptive field module RFEM and adding the receptive field module RFEM to the tail part of the coding structure, and effectively expanding the perception performance of the receptive field enhancement model of the model by utilizing Maxpooling, a mish activation function and a hollow convolution structure. Specifically, the receptive field module RFEM extracts high-level features by using five hole convolution structures rich in hierarchical structures, as shown in fig. 3, C in the diagram represents the number of output channels, a scaling rate represents the number of holes of the hole convolution, 1/16 represents a downsampling 16 times of a desired image, the ratio of the hole convolution is [1, 3, 6, 12, 18], the left top is a maxpololing operation, the hole convolution structure is subjected to a convolution operation and then added with BN (Batch Normalization) and a hash activation function, and the receptive field module RFEM can effectively expand the perceptual performance of the receptive field enhancement model of the model by using the maxpololing, the hash activation function and the hole convolution structure.
3. In the original Unet direct connection structure, a residual error module is not added, and an overfitting phenomenon can occur after a network is deepened, so that a residual error thought and a receptive field enhancement module are added to the Unet bottom layer to form an RFEM (receptive field enhancement module), and the module effectively expands the receptive field of the bottom layer characteristics in the convolution calculation process and can not generate the overfitting phenomenon due to the deepening of the network layer as shown in FIG. 3.
And the fusion module 30 is configured to fuse the prediction results of the slice images and output a final medical image segmentation result.
In the present invention, the slice images are H, W, C-direction slice images and a plurality of small 3D images, which enter four convolutional neural networks to form four 3D segmentation results, and the fusion module 30 averages the four output results to obtain the final result. That is, the four slice images are respectively weighted differently by fourRFEUnet network outputs four results FA(X),FB(X),FC(X),FD(X), for example, the RFEUnet network structure outputs a probability value for each pixel of each input image, averages the probabilities of the four output values, and then indexes the maximum output between categories:
FO(X)=agrmax((FA(X)+FB(X)+FC(X)+FD(X))/4)
in brief, assuming that the neural network outputs are all numbers from 0 to 1, so that it can be understood that one probability value, the output of the present invention is four 3D images so that the average value is taken for each channel, where each pixel has two classes, i.e., lesion and non-lesion, and then the lesion probability and non-lesion probability are compared with each other by the size, i.e., the index maximum. If the focus large-assignment pixel point is 1, and if the non-focus large-assignment pixel point is 0, the effect of the argmax function is achieved, and thus the focus area can be known. If the image is multiplied by 255, the lesion is white and the non-lesion is black, a distinction is made, and the output is a 3D medical image segmentation effect, i.e. the region in the 3D image with the lesion is identified.
The present invention also provides a computer-readable storage medium for storing program code for performing the 3D medical image segmentation method provided by the above embodiments.
Examples
Fig. 5 is a flowchart of 3D medical image segmentation based on layered perceptual fusion according to an embodiment of the present invention. In an embodiment of the present invention, a 3D medical image segmentation method based on layered perception fusion includes:
step one, data making, slicing and enhancing
Step 1.1, marking the 3D medical image by using marking software, wherein the marking is divided into a target part and a background part, the target part represents a target area, namely an organ pathological tissue image area, and the background part represents a non-organ part. If a plurality of pathological tissues can be distinguished by different pixel points, for example, the pixel point 1 is tumor tissue, and the pixel point 2 is tumor periphery, etc.
And 1.2, checking whether the data is correct or incorrect after marking is finished. For example, using programming, check if the data distribution of the background and lesion areas is background: 0. focal zone 1: 1. the focal zones 2:2, etc. are simply checked for a pixel value so as not to affect the subsequent encoding, and are not described herein.
Step 1.3, the 3D medical image is sliced. There are two slicing modes, from the 3D model and the 2D model, the first being that the 3D image is a 2D image sliced differently as in H, W, C. The second is to decompose the 3D image into several small 3D images.
And step 1.4, performing data enhancement on each sliced image. Data enhancement is important for small data sets, mainly rotation, cropping and scaling, and plays a role in preventing overfitting and increasing robustness.
And step two, according to the mechanism characteristic that the medical image needs to be judged globally, providing an RFEUnet (responsive field enhancement Unet) network structure, and respectively inputting each slice image into a RFEUnet network result to obtain four semantic segmentation results. The network structure improves the number of channels on the basis of the original structure of Unet (the original channel combination of Unet is changed into the original 1/2), and simultaneously increases the receptive field RFEM of the image (the receptive field module RFEM effectively enlarges the perception performance of the receptive field enhanced network model of the model by using Maxpooling, mish activation function and a cavity convolution structure), so that the network model has a larger perception domain range on large-area image segmentation.
Step three, a voting mechanism: and fusing the output results of the H slice direction prediction network model, the W slice direction prediction network model, the C slice direction prediction network model and the 3D network model to obtain a fusion result of the 3D medical image. The total output of each network model is A, B, C, D four outputs, the model outputs a probability value aiming at each pixel point of each input image, the average probability of the four output values is obtained, and then the maximum output among index types is the final image segmentation result:
FO(X)=agrmax((FA(X)+FB(X)+FC(X)+FD(X))/4)
fig. 6 is a comparison diagram of results of segmenting a 3D medical image by using an original Unet network model, an original RFEUnet network model, and multi-model fusion in the embodiment of the present invention, and table 1 below is a comparison table of prediction capabilities of the original Unet network model, the original RFEUnet network model, and the original multi-model fusion. As can be seen from fig. 6 and table 1, compared with the prior art, the single model prediction capability of the RFEUnet network model is better than that of the prior art Unet, and the multi-model fusion capability is better than that of the prior art because the utilization of the direction and channel related characteristics is considered.
TABLE 1 Unet and RFEUnet model fusion table comparison table
Network
|
Unet
|
RFEMUnet
|
Multi-model fusion
|
pixel Accuracy
|
90.2%
|
98.6%
|
99.8%
|
Amount of ginseng
|
31M
|
11M
|
40M
|
Reasoning speed
|
0.08s
|
0.04s
|
0.16s |
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.