CN112967300A - Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network - Google Patents
Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network Download PDFInfo
- Publication number
- CN112967300A CN112967300A CN202110202637.9A CN202110202637A CN112967300A CN 112967300 A CN112967300 A CN 112967300A CN 202110202637 A CN202110202637 A CN 202110202637A CN 112967300 A CN112967300 A CN 112967300A
- Authority
- CN
- China
- Prior art keywords
- convolution
- layer
- network
- map
- distance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 48
- 210000001685 thyroid gland Anatomy 0.000 title claims abstract description 43
- 230000004927 fusion Effects 0.000 title claims abstract description 39
- 210000000056 organ Anatomy 0.000 claims abstract description 23
- 230000000694 effects Effects 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims description 18
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 7
- 239000011800 void material Substances 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000002474 experimental method Methods 0.000 claims description 6
- 238000007670 refining Methods 0.000 claims description 5
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 50
- 230000006870 function Effects 0.000 description 18
- 238000002604 ultrasonography Methods 0.000 description 13
- 210000001519 tissue Anatomy 0.000 description 5
- 230000008901 benefit Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 239000000523 sample Substances 0.000 description 3
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012285 ultrasound imaging Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 210000004907 gland Anatomy 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000002608 intravascular ultrasound Methods 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000149 penetrating effect Effects 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 210000000534 thyroid cartilage Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4038—Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/181—Segmentation; Edge detection involving edge growing; involving edge linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/32—Indexing scheme for image data processing or generation, in general involving image mosaicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10132—Ultrasound image
- G06T2207/10136—3D ultrasound image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Ultra Sonic Daignosis Equipment (AREA)
Abstract
The three-dimensional ultrasonic thyroid gland segmentation method and device based on the multi-scale fusion network can accurately segment the edges of heterogeneous organ structures, avoid the final segmentation result from relying on the intermediate result predicted by a distance map, utilize multi-level semantic information and keep smaller calculated amount, can accurately segment the three-dimensional thyroid gland and achieve a very good effect. The method comprises the following steps: (1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure; (2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction; (3) using a CBAM attention module at a multi-scale to focus on edge distance information; (4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
Description
Technical Field
The invention relates to the technical field of medical image processing, in particular to a three-dimensional ultrasonic thyroid gland segmentation method based on a multi-scale fusion network and a three-dimensional ultrasonic thyroid gland segmentation device based on the multi-scale fusion network.
Background
Ultrasound imaging relies on the energy of echoes penetrating tissue, which is partially attenuated, absorbed, reflected by the properties of the tissue. The returned signals are collected by an ultrasonic probe and rendered for imaging. The imaging characteristics of ultrasound cause random speckle noise in the image, low contrast, and blurred organ edges. The anisotropic nature of tissue in ultrasound images, the echogenic nature between adjacent tissues, can cause multiple different modal manifestations to be present in the same tissue. In addition, ultrasound imaging is often used for clinical real-time detection, and acquisition of three-dimensional ultrasound volume data is often performed by sliding a handheld two-dimensional ultrasound probe and reconstructing the generated volume data, which introduces inter-frame displacement and blurring. The above reasons make organ segmentation by three-dimensional ultrasound a challenging problem.
There are currently many two-dimensional ultrasound segmentation methods. There are active region-based contour models that employ masks that need to be initialized. Some methods use fuzzy C clustering, histogram clustering, region growing, random walk and the like. Some use borderless active contours, region-based active contours, distance canonical level sets. Some use classical algorithms such as threshold, region splitting and merging, watershed, graph cut, etc. There are multiple organ segmentation methods that use information based on speckle-related pixels and image artifacts. The above methods rely on manual design features and are mostly semi-automatic methods.
Many methods have also been generated for the segmentation of three-dimensional thyroid images. Some use a 3D spring deformation model to segment thyroid cartilage, but for CT images. Some methods use a geodesic active contour model to obtain the contour of the thyroid in a three-dimensional image, but are still semi-automatic segmentation methods. Some use radial basis functions for direct segmentation of ultrasound data blocks. Some methods, which compare level set, graph cut and pixel classifier, show the effectiveness of the learning-like method, or perform frame-by-frame volume data segmentation, or attempt to directly perform thyroid segmentation using 3 DUnet. Some adopt semnet based submnet to carry out three-dimensional thyroid gland segmentation frame by frame.
Recently, the full convolution neural network is more applied to ultrasound image segmentation, such as thyroid segmentation using a feedforward neural network. Some proposed IVUS-Net uses a multi-pass multi-scale convolution kernel segmentation network for the segmentation of two-dimensional intravascular ultrasound. Some deep supervised delab based networks using boundary distance maps are used for two-dimensional ultrasound kidney segmentation. Some deep supervision methods adopting multi-channel fusion improve the accuracy of the segmentation boundary of the ultrasound prostate. These methods are mainly used for two-dimensional segmentation, while some three-dimensional segmentation networks are often applied on images directly acquired by a three-dimensional ultrasound probe.
For two-dimensionally acquiring reconstructed three-dimensional ultrasound volume data, there have been many attempts to segment frame by frame or to directly perform three-dimensional segmentation in the prior art. Some people mention that the three-dimensional segmentation network is difficult to train, the two-dimensional network can better focus on the edge of an organ, and experiments also show that the mode of segmenting and reconstructing by adopting the two-dimensional network has better effect compared with the three-dimensional segmentation network. However, the previous method is only used for healthy thyroid data, and the nodules in the clinical thyroid image are often in strong contrast with the glands and are difficult to segment as a whole, and when the nodules are located at the edge of an organ, the nodules have an influence on the edge segmentation.
There are currently many ways to improve the accuracy of the segmentation of edges. Some coding and decoding structures adopt a mesh-shaped path to better utilize multi-layer characteristics, but practice finds that training is easy to generate instability. Some attention modules adopting two channels pay attention to effective information of two layers of positions and channels at the same time, but the calculation amount is huge when a feature map with higher resolution is obtained. Some guide the fusion of features using advanced features in adjacent levels, but do not make better use of more scale information. Some systems employ a multi-scale channel-based attention module and do not effectively use feature information of a position, but only perform weight learning on channels of each position of a feature map and do not learn information on the position.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a three-dimensional ultrasonic thyroid gland segmentation method based on a multi-scale fusion network, which can accurately segment the edge of a heterogeneous organ structure, avoid the final segmentation result from depending on the intermediate result of distance map prediction, take into account the utilization of multi-level semantic information and keep smaller calculated amount, can accurately segment the three-dimensional thyroid gland and achieve very good effect.
The technical scheme of the invention is as follows: the three-dimensional ultrasonic thyroid gland segmentation method based on the multi-scale fusion network comprises the following steps:
(1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure;
(2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction;
(3) using a CBAM attention module at a multi-scale to focus on edge distance information;
(4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
The invention enables the network to learn and predict the boundary distance graph and can accurately segment the edges of the heterogeneous organ structure. Unlike the tandem mode, the method adds constraint guidance training to the network in a deep supervision mode to avoid the dependence of the final segmentation result on the intermediate result of the distance map prediction. In addition, in order to take advantage of the multi-level semantic information and keep the computation load small, the CBAM attention module is used at the multi-scale to better focus on the edge distance information. And finally, fusing the probability maps of all layers by adopting a dense fusion module with cavity convolution, and gradually refining the output probability map to generate a final result of each layer. Therefore, the method can accurately segment the edges of the heterogeneous organ structure, avoid the final segmentation result from depending on the intermediate result of distance map prediction, utilize multi-level semantic information and keep smaller calculation amount, accurately segment the three-dimensional thyroid and achieve good effect.
Also provided is a three-dimensional ultrasonic thyroid segmentation device based on a multi-scale fusion network, which comprises:
a segmentation module configured to let the network learn to predict a boundary distance map in order to segment edges of the heterogeneous organ structure;
a boundary distance map module configured to add constraint guidance training to the network in a deep supervised manner to avoid that the final segmentation result depends on intermediate results of distance map prediction;
an attention module configured to focus on edge distance information using the CBAM attention module at a multi-scale;
and the hole dense fusion module is configured to fuse the probability maps of the layers by adopting the hole convolution dense fusion module, gradually refine the output probability map and generate a final result of each layer.
Drawings
Fig. 1 is an example of thyroid segmentation mask generation potential distance maps, in which (a) an original image, (b) a thyroid mask, and (c) a normalized distance map.
Fig. 2 is a flowchart of a multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to the present invention.
Detailed Description
As shown in fig. 2, the method for three-dimensional ultrasonic thyroid segmentation based on multi-scale fusion network includes the following steps:
(1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure;
(2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction;
(3) using a CBAM attention module at a multi-scale to focus on edge distance information;
(4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
The invention enables the network to learn and predict the boundary distance graph and can accurately segment the edges of the heterogeneous organ structure. Unlike the tandem mode, the method adds constraint guidance training to the network in a deep supervision mode to avoid the dependence of the final segmentation result on the intermediate result of the distance map prediction. In addition, in order to take advantage of the multi-level semantic information and keep the computation load small, the CBAM attention module is used at the multi-scale to better focus on the edge distance information. And finally, fusing the probability maps of all layers by adopting a dense fusion module with cavity convolution, and gradually refining the output probability map to generate a final result of each layer. Therefore, the method can accurately segment the edges of the heterogeneous organ structure, avoid the final segmentation result from depending on the intermediate result of distance map prediction, utilize multi-level semantic information and keep smaller calculation amount, accurately segment the three-dimensional thyroid and achieve good effect.
Preferably, in the step (1), the same number of convolution blocks are adopted in each stage of the encoder part, and the maximum pooling layer is replaced by the step convolution with the step size of 2; adopting GN-PReLU-Conv sequence to build single convolution layer, and normalizing the pre-layer form; all the convolution layers adopt the packet convolution with the packet number of 4; a hole convolution with a hole rate of 2 is used in the third and fourth stage of the encoder to increase the receptive field and obtain more instructive semantic information.
Preferably, in the step (2), feature maps of subsequent levels are unified to the feature map size of the first stage through bilinear interpolation, and the edge distance map obtained through mask calculation is used as a constraint for deep supervision.
Preferably, in the step (2), after a thyroid mask is given and an edge mask is obtained, each pixel P is calculatediDistance to thyroid edge plot DiAnd d is a normalized distance graph obtained by the formula (1),
d(Pi)=exp(-λDi) (1)
wherein Di=minbj∈bdist(Pi,bj) Is a pixel PiTo boundary pixel b ═ bj}j∈JIs used as a parameter for controlling the normalization effect,
the normalized distance map deep supervised loss function is of the form:
Preferably, in the step (2), λ is set to 0.01.
Preferably, in the step (3), the resolution of each layer of feature map slf of the encoder is unified by bilinear interpolation with the 2 nd layer resolution as a reference, after splicing, a multi-layer convolution feature mlf is obtained by convolution operation, slf and mlf are spliced into an attention module, the weight of information fusion is guided by multi-scale features, the weight is applied to mlf to obtain effective information required by hierarchical slf refinement, and the effective information is combined with slf to obtain an output feature map.
Preferably, in the step (3), the channel attention weight of the attention module is
Mc(F)=σ(W1W0(AvgPool(F))+W1W0(MaxPool(F))) (3)
Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations, respectively, W0∈RC/r×C,W1∈RC×C/rTwo convolution operations for channel compression and recovery, followed by a PReLU activation function;
spatial attention weight of
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)])) (4)
Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations respectively, and f7×7A convolution operation representing a convolution kernel of 7 × 7;
the channel and space attention modules are connected in series and constitute a residual block, the convolution in the block adopts packet convolution, the normalization layer adopts a GN layer, and the activation function adopts a PReLU.
Preferably, in the step (4), the feature maps of a plurality of hierarchies are successively input and fused in a dense connection manner; inputting a fourth-layer feature map with high-level semantic information, splicing convolution results with other hierarchical feature maps one by one, and performing iterative convolution operation; with the continuous expansion of the receptive field and the integration of new hierarchical information, the prediction result is gradually optimized, and the frontal mode of the receptive field calculation is
Rcur=Rpre+Spre×(Kcur-1)×rate (5)
Wherein R iscurIs the receptive field of the current layer, RpreIs the receptive field of the previous layer, SpreIs the step distance of the previous layer, KcurIs the convolution kernel size of the current layer, and rate is the current void rate; for the convolution operation with convolution kernel size of 3 and hole rate of 2, assuming that the stride length of the previous layer is 1, the field of the n-time rate-2 hole convolution is Rn=Rpre+4 × n, due to the way mlf is combined with each slf, so that the receptive fields at each level are virtually identical, and after passing through the attention module with global pooling, the receptive fields rapidly grow to 240, and thus already have a relatively large receptive field; in the subsequent DFM, the reception fields are moderately increased by adopting convolution with a void rate of 2, the reception field difference of each level is not opened, and the effect of refining the output result is achieved.
Preferably, the method has a loss function of
Lhybrid=Ldistance+λ1LDice+λ2LBCE (6)
Wherein L isDiceIs a loss function that evaluates the goodness of coincidence of two things, LBCEIs a cross entropy of two types, LhybridIs a mixed loss function, pred is a pixel prediction probability map, target is a ground truth, epsilon is a smoothing term, and the values are set to be 1e-8 in an experiment1,λ2To lose the weighting coefficients, the coefficients are set to λ experimentally1=0.5,λ2=0.5。
It will be understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, corresponding to the method of the invention, the invention also comprises a three-dimensional ultrasonic thyroid gland segmentation device based on the multi-scale fusion network, which is generally expressed in the form of functional modules corresponding to the steps of the method. The device includes:
a segmentation module configured to let the network learn to predict a boundary distance map in order to segment edges of the heterogeneous organ structure;
a boundary distance map module configured to add constraint guidance training to the network in a deep supervised manner to avoid that the final segmentation result depends on intermediate results of distance map prediction;
an attention module configured to focus on edge distance information using the CBAM attention module at a multi-scale;
and the hole dense fusion module is configured to fuse the probability maps of the layers by adopting the hole convolution dense fusion module, gradually refine the output probability map and generate a final result of each layer.
The present invention is described in more detail below.
In the Vnet continuation method, the same number of convolution blocks are adopted in each stage of the encoder part, and the maximum pooling layer is replaced by the step convolution with the step length of 2, so that the down-sampling process has learnable parameters and is more suitable for semantic segmentation tasks. The single convolutional layer is built by adopting the GN-PReLU-Conv sequence, and the network can be trained more quickly and effectively to achieve a better effect by adopting the preposed form of the normalization layer. GNs have better performance than BN and perform stably over a wider range of batch sizes. All the convolution layers adopt the packet convolution with the packet number of 4, so that the long-distance coupling between channels is reduced, the network parameters are reduced, and the parameter utilization rate is improved. A hole convolution with a hole rate of 2 is used in the third and fourth stage of the encoder to increase the receptive field and obtain more instructive semantic information.
In order to enable the network to pay attention to the edge information of the organ earlier, feature graphs of subsequent levels are unified to the size of the feature graph of the first stage through bilinear interpolation, and an edge distance graph obtained through mask calculation is used as a constraint to carry out deep supervision. Unlike the method that the prediction result is directly used as an intermediate result, so that the final result is influenced by the intermediate result, the method adopts a deep supervision mode, and combines the learning of the edge feature and the backward transmission of the feature.
Given a thyroid mask, obtaining its edge mask, calculating each pixel PiDistance to thyroid edge plot DiAnd through
d(Pi)=exp(-λDi)
Obtaining D as a normalized distance map, wherein Di=minbj∈bdist(Pi,bj) Is a pixel PiTo boundary pixel b ═ bj}j∈Jλ is a parameter for controlling the normalization effect, and is experimentally set to 0.01 in the experiment. Under the parameters, the distance map effect obtained by the thyroid gland mask is as shown in figure 1.
The normalized distance map deep supervised loss function is of the form:
whereinA distance map predicted for the network. The network focuses more on learning the edge information of the thyroid by learning the predicted distance map.
The feature maps of the network layers have different resolutions, and after deep supervision of the edge distance map, the features contain different levels of semantic information of the organ edges. Inspired by the research, in order to combine semantic information of each layer more effectively, the resolution of each layer feature map slf (single layer feature-map) of the encoder is unified by bilinear interpolation with the resolution of the layer 2 as the reference, and after splicing, a multi-layer convolution feature mlf (multi layer feature-map) is obtained by convolution operation. Thereafter, slf and mlf are stitched into the attention module, weights for information fusion are learned by multi-scale feature guidance, the weights are applied to mlf to obtain the available information needed for level slf refinement, and combined with slf to obtain the output feature map.
The effective combination of the multi-layer characteristic diagrams requires attention mechanism to learn the combination mode of information among channels and different positions. In order to effectively use the characteristics about the edges of organs generated under the deep supervision effect, the attention mechanism can learn important spatial position information. Considering the calculation amount of the current resolution and the better effect of the two-channel attention series, the CBAM module is adopted, and weight learning in the aspects of the channels and the space is considered while the calculation amount is low.
Channel attention weight, M, of attention modulec(F)=σ(W1W0(AvgPool(F))+W1W0(MaxPool (F))). Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations, respectively, W0∈RC/r×C,W1∈RC×C/rFor both convolution operations for channel compression and recovery, the PReLU activation function follows.
Spatial attention weight, MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)])). Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations respectively, and f7×7Representing a convolution operation with a convolution kernel of 7 x 7.
The channel and space attention modules are connected in series and constitute a residual block, the convolution in the block adopts packet convolution, the normalization layer adopts a GN layer, and the activation function adopts a PReLU.
Some have the idea of ensemble learning in a manner that the output results of each layer are averaged to be the final result, which is helpful to generate more accurate segmentation results, but no further refinement and combination is performed on the results of each layer. The idea of the decoder is to perform a multi-scale optimization of the results in a learnable process. Inspired by DenseASPP, the invention passes the feature maps of each level through a hole dense fusion module, and takes the average result of the result probability maps of each level as the final output.
The excellent performance of Deeplabv3 in semantic segmentation benefits from parallel convolution of each hole rate in ASPP, and rich long-distance semantic information is obtained. The DenseASPP obtains a wider and denser multi-scale receptive field by applying the concept of DenseNet dense connection and adopting a mode of dense connection with multiple incremental voidage. The present invention uses this idea for result refinement, and instead of processing a single feature map, a plurality of hierarchical feature maps are successively input and fused in a densely connected manner. And inputting a fourth-layer feature map with high-level semantic information, sequentially splicing convolution results with other hierarchical feature maps, and performing iterative convolution operation. With the continuous expansion of the receptive field and the integration of new hierarchical information, the prediction result is gradually optimized. Compared with the parallel mode of the ASPP, the denser mode can continuously expand the receptive field and gradually combine the characteristics of different receptive fields.
The frontal mode of the receptive field calculation is Rcur=Rpre+Spre×(Kcur-1) × rate. Wherein R iscurIs the receptive field of the current layer, RpreIs the receptive field of the previous layer, SpreIs the step distance of the previous layer, KcurIs the convolution kernel size of the current layer and rate is the current hole rate. So for a convolution operation with a convolution kernel size of 3 and a void rate of 2, assume that the previous layer stride length is 1. The reception field after n-time rate-2 hole convolution is Rn=Rpre+4 × n. Because mlf and slf are combined in such a way that the receptive fields at each level are virtually identical, after passing through the attention module with global pooling, the receptive fields rapidly grow to 240 and thus already have a relatively large receptive field. For the reasons, in the DFM, the void rate which is enlarged by twice equal ratio is not adopted, but the reception fields are moderately increased by adopting convolution with the void rate of 2, the reception field difference of each level is not opened, and the effect of refining the output result is achieved.
The BCE loss, namely the cross entropy loss function, is a common two-class segmentation loss function, is suitable for large object segmentation, and has poor effect on small target unbalanced scenes. Dice loss is commonly used for small object segmentation, and the function calculation mode of the Dice loss is not influenced by different pixel proportions. In order to give consideration to the segmentation effect of the slices with more foreground pixels in the middle of the organ and the slices with less foreground pixels at the two ends of the organ in the training process, the invention combines BCE loss and Dice loss. In addition, the loss of the object edge distance graph is used as deep supervision, so that the network pays attention to the object edge information earlier. The mixing loss function of the present invention is defined as follows:
Lhybrid=Ldistance+λ1LDice+λ2LBCE
wherein pred is a pixel prediction probability map, target is ground truth, and epsilon is a smoothing term and is set to be 1e-8 in an experiment. Lambda [ alpha ]1,λ2To lose the weighting coefficients, the coefficients are set to λ experimentally1=0.5,λ2=0.5。
The method was validated on clinical ultrasound data and OpenCAS published datasets. Experiments show that the proposed model can accurately segment the three-dimensional thyroid and achieve the best current effect.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.
Claims (10)
1. The three-dimensional ultrasonic thyroid gland segmentation method based on the multi-scale fusion network is characterized by comprising the following steps of: which comprises the following steps:
(1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure;
(2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction;
(3) using a CBAM attention module at a multi-scale to focus on edge distance information;
(4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
2. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 1, wherein: in the step (1), the same number of convolution blocks are adopted in each stage of the encoder part, and the maximum pooling layer is replaced by the step convolution with the step length of 2; adopting GN-PReLU-Conv sequence to build single convolution layer, and normalizing the pre-layer form; all the convolution layers adopt the packet convolution with the packet number of 4; a hole convolution with a hole rate of 2 is used in the third and fourth stage of the encoder to increase the receptive field and obtain more instructive semantic information.
3. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 2, wherein: in the step (2), feature maps of subsequent levels are unified to the feature map size of the first stage through bilinear interpolation, and an edge distance map obtained through mask calculation is used as a constraint for deep supervision.
4. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 3, wherein: in the step (2), a thyroid mask is given, and after an edge mask is obtained, each pixel P is calculatediDistance to thyroid edge plot DiAnd d is a normalized distance graph obtained by the formula (1),
d(Pi)=exp(-λDi) (1)
wherein Di=minbj∈bdist(Pi,bj) Is a pixel PiTo boundary pixel b ═ bj)j∈JIs used as a parameter for controlling the normalization effect,
the normalized distance map deep supervised loss function is of the form:
5. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 4, wherein: in the step (2), λ is set to 0.01.
6. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 5, wherein: in the step (3), the resolution of each layer of feature map slf of the encoder is unified by bilinear interpolation with the resolution of the 2 nd layer as the reference, after splicing, a multilayer convolution feature mlf is obtained by convolution operation, slf and mlf are spliced and sent to an attention module, the weight of information fusion is learned by guiding of multi-scale features, the weight is applied to mlf to obtain effective information required by hierarchical slf refinement, and the effective information is combined with slf to obtain an output feature map.
7. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 6, wherein: in the step (3), the channel attention weight of the attention module is
Mc(F)=σ(W1W0(AvgPool(F))+W1W0(MaxPool(F))) (3)
Wherein σ represents sigmoid function, AvgPool, MaxPool is the average pooling and maximum pooling operation, W0∈RC /r×C,W1∈RC×C/rTwo convolution operations for channel compression and recovery, followed by a PReLU activation function;
spatial attention weight of
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)])) (4)
Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations respectively, and f7×7A convolution operation representing a convolution kernel of 7 × 7;
the channel and space attention modules are connected in series and constitute a residual block, the convolution in the block adopts packet convolution, the normalization layer adopts a GN layer, and the activation function adopts a PReLU.
8. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 7, wherein: in the step (4), the feature maps of a plurality of hierarchies are input one by one and are fused in a dense connection mode; inputting a fourth-layer feature map with high-level semantic information, splicing convolution results with other hierarchical feature maps one by one, and performing iterative convolution operation; with the continuous expansion of the receptive field and the integration of new hierarchical information, the prediction result is gradually optimized, and the frontal mode of the receptive field calculation is
Rcur=Rpre+Spre×(Kcur-1)×rate (5)
Wherein R iscurIs the receptive field of the current layer, RpreIs the receptive field of the previous layer, SpreIs the step distance of the previous layer, KcurIs the convolution kernel size of the current layer, and rate is the current void rate; for the convolution operation with convolution kernel size of 3 and hole rate of 2, assuming that the stride length of the previous layer is 1, the field of the n-time rate-2 hole convolution is Rn=Rpre+4 xn, because of the way mlf is combined with slf, so that the receptive fields at each level are virtually identical, after passing through the attention module with global pooling, the receptive fields grow rapidly to 240 and thus have already been possessedHas a relatively large receptive field; in the subsequent DFM, the reception fields are moderately increased by adopting convolution with a void rate of 2, the reception field difference of each level is not opened, and the effect of refining the output result is achieved.
9. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 8, wherein: the loss function of the method is
Lhybrid=Ldistance+λ1LDice+λ2LBCE (6)
Wherein L isDiceIs a loss function that evaluates the goodness of coincidence of two things, LBCEIs a cross entropy of two types, LhybridIs a mixed loss function, pred is a pixel prediction probability map, target is a ground truth, epsilon is a smoothing term, and the values are set to be 1e-8 in an experiment1,λ2To lose the weighting coefficients, the coefficients are set to λ experimentally1=0.5,λ2=0.5。
10. Three-dimensional supersound thyroid gland segmenting device based on multiscale fusion network, its characterized in that: it includes:
a segmentation module configured to let the network learn to predict a boundary distance map in order to segment edges of the heterogeneous organ structure;
a boundary distance map module configured to add constraint guidance training to the network in a deep supervised manner to avoid that the final segmentation result depends on intermediate results of distance map prediction;
an attention module configured to focus on edge distance information using the CBAM attention module at a multi-scale;
and the hole dense fusion module is configured to fuse the probability maps of the layers by adopting the hole convolution dense fusion module, gradually refine the output probability map and generate a final result of each layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110202637.9A CN112967300A (en) | 2021-02-23 | 2021-02-23 | Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110202637.9A CN112967300A (en) | 2021-02-23 | 2021-02-23 | Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112967300A true CN112967300A (en) | 2021-06-15 |
Family
ID=76285747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110202637.9A Pending CN112967300A (en) | 2021-02-23 | 2021-02-23 | Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112967300A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037714A (en) * | 2021-11-02 | 2022-02-11 | 大连理工大学人工智能大连研究院 | 3D MR and TRUS image segmentation method for prostate system puncture |
CN114565770A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on edge auxiliary calculation and mask attention |
CN116524191A (en) * | 2023-05-11 | 2023-08-01 | 山东省人工智能研究院 | Blood vessel segmentation method of deep learning network integrated with geodesic voting algorithm |
CN116823842A (en) * | 2023-06-25 | 2023-09-29 | 山东省人工智能研究院 | Vessel segmentation method of double decoder network fused with geodesic model |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109215079A (en) * | 2018-07-17 | 2019-01-15 | 艾瑞迈迪医疗科技(北京)有限公司 | Image processing method, operation navigation device, electronic equipment, storage medium |
US20190015059A1 (en) * | 2017-07-17 | 2019-01-17 | Siemens Healthcare Gmbh | Semantic segmentation for cancer detection in digital breast tomosynthesis |
CN109671086A (en) * | 2018-12-19 | 2019-04-23 | 深圳大学 | A kind of fetus head full-automatic partition method based on three-D ultrasonic |
CN111260741A (en) * | 2020-02-07 | 2020-06-09 | 北京理工大学 | Three-dimensional ultrasonic simulation method and device by utilizing generated countermeasure network |
CN111833273A (en) * | 2020-07-17 | 2020-10-27 | 华东师范大学 | Semantic boundary enhancement method based on long-distance dependence |
CN111968138A (en) * | 2020-07-15 | 2020-11-20 | 复旦大学 | Medical image segmentation method based on 3D dynamic edge insensitivity loss function |
-
2021
- 2021-02-23 CN CN202110202637.9A patent/CN112967300A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190015059A1 (en) * | 2017-07-17 | 2019-01-17 | Siemens Healthcare Gmbh | Semantic segmentation for cancer detection in digital breast tomosynthesis |
CN109215079A (en) * | 2018-07-17 | 2019-01-15 | 艾瑞迈迪医疗科技(北京)有限公司 | Image processing method, operation navigation device, electronic equipment, storage medium |
CN109671086A (en) * | 2018-12-19 | 2019-04-23 | 深圳大学 | A kind of fetus head full-automatic partition method based on three-D ultrasonic |
CN111260741A (en) * | 2020-02-07 | 2020-06-09 | 北京理工大学 | Three-dimensional ultrasonic simulation method and device by utilizing generated countermeasure network |
CN111968138A (en) * | 2020-07-15 | 2020-11-20 | 复旦大学 | Medical image segmentation method based on 3D dynamic edge insensitivity loss function |
CN111833273A (en) * | 2020-07-17 | 2020-10-27 | 华东师范大学 | Semantic boundary enhancement method based on long-distance dependence |
Non-Patent Citations (6)
Title |
---|
D. JHA等: "A Comprehensive Study on Colorectal Polyp Segmentation With ResUNet++, Conditional Random Field and Test-Time Augmentation", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS》, vol. 25, no. 6, 5 January 2021 (2021-01-05), pages 2029 - 2040 * |
FENG C等: "MDIFNet: Multiscale Distant Information Fusion Network for Thyroid Segmentation in 3D Ultrasound Image", 《PROCEEDINGS OF THE 2021 6TH INTERNATIONAL CONFERENCE ON MULTIMEDIA SYSTEMS AND SIGNAL PROCESSING》, 6 September 2021 (2021-09-06), pages 22 - 28 * |
MILLETARI F等: "V-net: Fully convolutional neural networks for volumetric medical image segmentation", 《ARXIV:1606.04797V1 》, 15 June 2016 (2016-06-15), pages 1 - 11, XP055293637, DOI: 10.1109/3DV.2016.79 * |
YIN S等: "Automatic kidney segmentation in ultrasound images using subsequent boundary distance regression and pixelwise classification networks", 《ARXIV:1811.04815V3》, 30 May 2019 (2019-05-30), pages 1 - 22 * |
ZHANG C等: "Dial/Hybrid cascade 3DResUNet for liver and tumor segmentation", 《PROCEEDINGS OF THE 2020 4TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING》, 10 September 2020 (2020-09-10), pages 92 - 96 * |
时斐斐等: "基于深度残差网络与边缘监督学习的显著性检测", 《激光与光电子学进展》, vol. 56, no. 15, 31 August 2019 (2019-08-31), pages 1 - 9 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037714A (en) * | 2021-11-02 | 2022-02-11 | 大连理工大学人工智能大连研究院 | 3D MR and TRUS image segmentation method for prostate system puncture |
CN114565770A (en) * | 2022-03-23 | 2022-05-31 | 中南大学 | Image segmentation method and system based on edge auxiliary calculation and mask attention |
CN114565770B (en) * | 2022-03-23 | 2022-09-13 | 中南大学 | Image segmentation method and system based on edge auxiliary calculation and mask attention |
CN116524191A (en) * | 2023-05-11 | 2023-08-01 | 山东省人工智能研究院 | Blood vessel segmentation method of deep learning network integrated with geodesic voting algorithm |
CN116524191B (en) * | 2023-05-11 | 2024-01-19 | 山东省人工智能研究院 | Blood vessel segmentation method of deep learning network integrated with geodesic voting algorithm |
CN116823842A (en) * | 2023-06-25 | 2023-09-29 | 山东省人工智能研究院 | Vessel segmentation method of double decoder network fused with geodesic model |
CN116823842B (en) * | 2023-06-25 | 2024-02-02 | 山东省人工智能研究院 | Vessel segmentation method of double decoder network fused with geodesic model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111627019B (en) | Liver tumor segmentation method and system based on convolutional neural network | |
CN113077471B (en) | Medical image segmentation method based on U-shaped network | |
CN112967300A (en) | Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network | |
CN110992252B (en) | Image multi-grid conversion method based on latent variable feature generation | |
CN111259945B (en) | Binocular parallax estimation method introducing attention map | |
CN110853051A (en) | Cerebrovascular image segmentation method based on multi-attention dense connection generation countermeasure network | |
CN114663440A (en) | Fundus image focus segmentation method based on deep learning | |
Jafari et al. | Semi-supervised learning for cardiac left ventricle segmentation using conditional deep generative models as prior | |
CN116309648A (en) | Medical image segmentation model construction method based on multi-attention fusion | |
Zuo et al. | Minimum spanning forest with embedded edge inconsistency measurement model for guided depth map enhancement | |
CN112966747A (en) | Improved vehicle detection method based on anchor-frame-free detection network | |
CN114548265A (en) | Crop leaf disease image generation model training method, crop leaf disease identification method, electronic device and storage medium | |
CN115880720A (en) | Non-labeling scene self-adaptive human body posture and shape estimation method based on confidence degree sharing | |
Xiao et al. | A visualization method based on the Grad-CAM for medical image segmentation model | |
CN117058307A (en) | Method, system, equipment and storage medium for generating heart three-dimensional nuclear magnetic resonance image | |
CN116524048A (en) | Natural image compressed sensing method based on potential diffusion model | |
Ning et al. | CF2-Net: Coarse-to-fine fusion convolutional network for breast ultrasound image segmentation | |
US20220164927A1 (en) | Method and system of statistical image restoration for low-dose ct image using deep learning | |
CN113780389B (en) | Deep learning semi-supervised dense matching method and system based on consistency constraint | |
CN112101456B (en) | Attention characteristic diagram acquisition method and device and target detection method and device | |
CN111667488B (en) | Medical image segmentation method based on multi-angle U-Net | |
CN112529930A (en) | Context learning medical image segmentation method based on focus fusion | |
Zhong et al. | Displacement-invariant cost computation for stereo matching | |
Hou et al. | Joint learning of image deblurring and depth estimation through adversarial multi-task network | |
CN113706684A (en) | Three-dimensional blood vessel image reconstruction method, system, medical device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |