CN112967300A - Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network - Google Patents

Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network Download PDF

Info

Publication number
CN112967300A
CN112967300A CN202110202637.9A CN202110202637A CN112967300A CN 112967300 A CN112967300 A CN 112967300A CN 202110202637 A CN202110202637 A CN 202110202637A CN 112967300 A CN112967300 A CN 112967300A
Authority
CN
China
Prior art keywords
convolution
layer
network
map
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110202637.9A
Other languages
Chinese (zh)
Inventor
杨峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ariemedi Medical Technology Beijing Co ltd
Original Assignee
Ariemedi Medical Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ariemedi Medical Technology Beijing Co ltd filed Critical Ariemedi Medical Technology Beijing Co ltd
Priority to CN202110202637.9A priority Critical patent/CN112967300A/en
Publication of CN112967300A publication Critical patent/CN112967300A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/181Segmentation; Edge detection involving edge growing; involving edge linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10132Ultrasound image
    • G06T2207/101363D ultrasound image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Ultra Sonic Daignosis Equipment (AREA)

Abstract

The three-dimensional ultrasonic thyroid gland segmentation method and device based on the multi-scale fusion network can accurately segment the edges of heterogeneous organ structures, avoid the final segmentation result from relying on the intermediate result predicted by a distance map, utilize multi-level semantic information and keep smaller calculated amount, can accurately segment the three-dimensional thyroid gland and achieve a very good effect. The method comprises the following steps: (1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure; (2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction; (3) using a CBAM attention module at a multi-scale to focus on edge distance information; (4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.

Description

Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network
Technical Field
The invention relates to the technical field of medical image processing, in particular to a three-dimensional ultrasonic thyroid gland segmentation method based on a multi-scale fusion network and a three-dimensional ultrasonic thyroid gland segmentation device based on the multi-scale fusion network.
Background
Ultrasound imaging relies on the energy of echoes penetrating tissue, which is partially attenuated, absorbed, reflected by the properties of the tissue. The returned signals are collected by an ultrasonic probe and rendered for imaging. The imaging characteristics of ultrasound cause random speckle noise in the image, low contrast, and blurred organ edges. The anisotropic nature of tissue in ultrasound images, the echogenic nature between adjacent tissues, can cause multiple different modal manifestations to be present in the same tissue. In addition, ultrasound imaging is often used for clinical real-time detection, and acquisition of three-dimensional ultrasound volume data is often performed by sliding a handheld two-dimensional ultrasound probe and reconstructing the generated volume data, which introduces inter-frame displacement and blurring. The above reasons make organ segmentation by three-dimensional ultrasound a challenging problem.
There are currently many two-dimensional ultrasound segmentation methods. There are active region-based contour models that employ masks that need to be initialized. Some methods use fuzzy C clustering, histogram clustering, region growing, random walk and the like. Some use borderless active contours, region-based active contours, distance canonical level sets. Some use classical algorithms such as threshold, region splitting and merging, watershed, graph cut, etc. There are multiple organ segmentation methods that use information based on speckle-related pixels and image artifacts. The above methods rely on manual design features and are mostly semi-automatic methods.
Many methods have also been generated for the segmentation of three-dimensional thyroid images. Some use a 3D spring deformation model to segment thyroid cartilage, but for CT images. Some methods use a geodesic active contour model to obtain the contour of the thyroid in a three-dimensional image, but are still semi-automatic segmentation methods. Some use radial basis functions for direct segmentation of ultrasound data blocks. Some methods, which compare level set, graph cut and pixel classifier, show the effectiveness of the learning-like method, or perform frame-by-frame volume data segmentation, or attempt to directly perform thyroid segmentation using 3 DUnet. Some adopt semnet based submnet to carry out three-dimensional thyroid gland segmentation frame by frame.
Recently, the full convolution neural network is more applied to ultrasound image segmentation, such as thyroid segmentation using a feedforward neural network. Some proposed IVUS-Net uses a multi-pass multi-scale convolution kernel segmentation network for the segmentation of two-dimensional intravascular ultrasound. Some deep supervised delab based networks using boundary distance maps are used for two-dimensional ultrasound kidney segmentation. Some deep supervision methods adopting multi-channel fusion improve the accuracy of the segmentation boundary of the ultrasound prostate. These methods are mainly used for two-dimensional segmentation, while some three-dimensional segmentation networks are often applied on images directly acquired by a three-dimensional ultrasound probe.
For two-dimensionally acquiring reconstructed three-dimensional ultrasound volume data, there have been many attempts to segment frame by frame or to directly perform three-dimensional segmentation in the prior art. Some people mention that the three-dimensional segmentation network is difficult to train, the two-dimensional network can better focus on the edge of an organ, and experiments also show that the mode of segmenting and reconstructing by adopting the two-dimensional network has better effect compared with the three-dimensional segmentation network. However, the previous method is only used for healthy thyroid data, and the nodules in the clinical thyroid image are often in strong contrast with the glands and are difficult to segment as a whole, and when the nodules are located at the edge of an organ, the nodules have an influence on the edge segmentation.
There are currently many ways to improve the accuracy of the segmentation of edges. Some coding and decoding structures adopt a mesh-shaped path to better utilize multi-layer characteristics, but practice finds that training is easy to generate instability. Some attention modules adopting two channels pay attention to effective information of two layers of positions and channels at the same time, but the calculation amount is huge when a feature map with higher resolution is obtained. Some guide the fusion of features using advanced features in adjacent levels, but do not make better use of more scale information. Some systems employ a multi-scale channel-based attention module and do not effectively use feature information of a position, but only perform weight learning on channels of each position of a feature map and do not learn information on the position.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a three-dimensional ultrasonic thyroid gland segmentation method based on a multi-scale fusion network, which can accurately segment the edge of a heterogeneous organ structure, avoid the final segmentation result from depending on the intermediate result of distance map prediction, take into account the utilization of multi-level semantic information and keep smaller calculated amount, can accurately segment the three-dimensional thyroid gland and achieve very good effect.
The technical scheme of the invention is as follows: the three-dimensional ultrasonic thyroid gland segmentation method based on the multi-scale fusion network comprises the following steps:
(1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure;
(2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction;
(3) using a CBAM attention module at a multi-scale to focus on edge distance information;
(4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
The invention enables the network to learn and predict the boundary distance graph and can accurately segment the edges of the heterogeneous organ structure. Unlike the tandem mode, the method adds constraint guidance training to the network in a deep supervision mode to avoid the dependence of the final segmentation result on the intermediate result of the distance map prediction. In addition, in order to take advantage of the multi-level semantic information and keep the computation load small, the CBAM attention module is used at the multi-scale to better focus on the edge distance information. And finally, fusing the probability maps of all layers by adopting a dense fusion module with cavity convolution, and gradually refining the output probability map to generate a final result of each layer. Therefore, the method can accurately segment the edges of the heterogeneous organ structure, avoid the final segmentation result from depending on the intermediate result of distance map prediction, utilize multi-level semantic information and keep smaller calculation amount, accurately segment the three-dimensional thyroid and achieve good effect.
Also provided is a three-dimensional ultrasonic thyroid segmentation device based on a multi-scale fusion network, which comprises:
a segmentation module configured to let the network learn to predict a boundary distance map in order to segment edges of the heterogeneous organ structure;
a boundary distance map module configured to add constraint guidance training to the network in a deep supervised manner to avoid that the final segmentation result depends on intermediate results of distance map prediction;
an attention module configured to focus on edge distance information using the CBAM attention module at a multi-scale;
and the hole dense fusion module is configured to fuse the probability maps of the layers by adopting the hole convolution dense fusion module, gradually refine the output probability map and generate a final result of each layer.
Drawings
Fig. 1 is an example of thyroid segmentation mask generation potential distance maps, in which (a) an original image, (b) a thyroid mask, and (c) a normalized distance map.
Fig. 2 is a flowchart of a multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to the present invention.
Detailed Description
As shown in fig. 2, the method for three-dimensional ultrasonic thyroid segmentation based on multi-scale fusion network includes the following steps:
(1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure;
(2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction;
(3) using a CBAM attention module at a multi-scale to focus on edge distance information;
(4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
The invention enables the network to learn and predict the boundary distance graph and can accurately segment the edges of the heterogeneous organ structure. Unlike the tandem mode, the method adds constraint guidance training to the network in a deep supervision mode to avoid the dependence of the final segmentation result on the intermediate result of the distance map prediction. In addition, in order to take advantage of the multi-level semantic information and keep the computation load small, the CBAM attention module is used at the multi-scale to better focus on the edge distance information. And finally, fusing the probability maps of all layers by adopting a dense fusion module with cavity convolution, and gradually refining the output probability map to generate a final result of each layer. Therefore, the method can accurately segment the edges of the heterogeneous organ structure, avoid the final segmentation result from depending on the intermediate result of distance map prediction, utilize multi-level semantic information and keep smaller calculation amount, accurately segment the three-dimensional thyroid and achieve good effect.
Preferably, in the step (1), the same number of convolution blocks are adopted in each stage of the encoder part, and the maximum pooling layer is replaced by the step convolution with the step size of 2; adopting GN-PReLU-Conv sequence to build single convolution layer, and normalizing the pre-layer form; all the convolution layers adopt the packet convolution with the packet number of 4; a hole convolution with a hole rate of 2 is used in the third and fourth stage of the encoder to increase the receptive field and obtain more instructive semantic information.
Preferably, in the step (2), feature maps of subsequent levels are unified to the feature map size of the first stage through bilinear interpolation, and the edge distance map obtained through mask calculation is used as a constraint for deep supervision.
Preferably, in the step (2), after a thyroid mask is given and an edge mask is obtained, each pixel P is calculatediDistance to thyroid edge plot DiAnd d is a normalized distance graph obtained by the formula (1),
d(Pi)=exp(-λDi) (1)
wherein Di=minbj∈bdist(Pi,bj) Is a pixel PiTo boundary pixel b ═ bj}j∈JIs used as a parameter for controlling the normalization effect,
the normalized distance map deep supervised loss function is of the form:
Figure BDA0002948414330000051
wherein
Figure BDA0002948414330000061
A distance map predicted for the network.
Preferably, in the step (2), λ is set to 0.01.
Preferably, in the step (3), the resolution of each layer of feature map slf of the encoder is unified by bilinear interpolation with the 2 nd layer resolution as a reference, after splicing, a multi-layer convolution feature mlf is obtained by convolution operation, slf and mlf are spliced into an attention module, the weight of information fusion is guided by multi-scale features, the weight is applied to mlf to obtain effective information required by hierarchical slf refinement, and the effective information is combined with slf to obtain an output feature map.
Preferably, in the step (3), the channel attention weight of the attention module is
Mc(F)=σ(W1W0(AvgPool(F))+W1W0(MaxPool(F))) (3)
Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations, respectively, W0∈RC/r×C,W1∈RC×C/rTwo convolution operations for channel compression and recovery, followed by a PReLU activation function;
spatial attention weight of
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)])) (4)
Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations respectively, and f7×7A convolution operation representing a convolution kernel of 7 × 7;
the channel and space attention modules are connected in series and constitute a residual block, the convolution in the block adopts packet convolution, the normalization layer adopts a GN layer, and the activation function adopts a PReLU.
Preferably, in the step (4), the feature maps of a plurality of hierarchies are successively input and fused in a dense connection manner; inputting a fourth-layer feature map with high-level semantic information, splicing convolution results with other hierarchical feature maps one by one, and performing iterative convolution operation; with the continuous expansion of the receptive field and the integration of new hierarchical information, the prediction result is gradually optimized, and the frontal mode of the receptive field calculation is
Rcur=Rpre+Spre×(Kcur-1)×rate (5)
Wherein R iscurIs the receptive field of the current layer, RpreIs the receptive field of the previous layer, SpreIs the step distance of the previous layer, KcurIs the convolution kernel size of the current layer, and rate is the current void rate; for the convolution operation with convolution kernel size of 3 and hole rate of 2, assuming that the stride length of the previous layer is 1, the field of the n-time rate-2 hole convolution is Rn=Rpre+4 × n, due to the way mlf is combined with each slf, so that the receptive fields at each level are virtually identical, and after passing through the attention module with global pooling, the receptive fields rapidly grow to 240, and thus already have a relatively large receptive field; in the subsequent DFM, the reception fields are moderately increased by adopting convolution with a void rate of 2, the reception field difference of each level is not opened, and the effect of refining the output result is achieved.
Preferably, the method has a loss function of
Figure BDA0002948414330000071
Figure BDA0002948414330000072
Lhybrid=Ldistance1LDice2LBCE (6)
Wherein L isDiceIs a loss function that evaluates the goodness of coincidence of two things, LBCEIs a cross entropy of two types, LhybridIs a mixed loss function, pred is a pixel prediction probability map, target is a ground truth, epsilon is a smoothing term, and the values are set to be 1e-8 in an experiment1,λ2To lose the weighting coefficients, the coefficients are set to λ experimentally1=0.5,λ2=0.5。
It will be understood by those skilled in the art that all or part of the steps in the method of the above embodiments may be implemented by hardware instructions related to a program, the program may be stored in a computer-readable storage medium, and when executed, the program includes the steps of the method of the above embodiments, and the storage medium may be: ROM/RAM, magnetic disks, optical disks, memory cards, and the like. Therefore, corresponding to the method of the invention, the invention also comprises a three-dimensional ultrasonic thyroid gland segmentation device based on the multi-scale fusion network, which is generally expressed in the form of functional modules corresponding to the steps of the method. The device includes:
a segmentation module configured to let the network learn to predict a boundary distance map in order to segment edges of the heterogeneous organ structure;
a boundary distance map module configured to add constraint guidance training to the network in a deep supervised manner to avoid that the final segmentation result depends on intermediate results of distance map prediction;
an attention module configured to focus on edge distance information using the CBAM attention module at a multi-scale;
and the hole dense fusion module is configured to fuse the probability maps of the layers by adopting the hole convolution dense fusion module, gradually refine the output probability map and generate a final result of each layer.
The present invention is described in more detail below.
In the Vnet continuation method, the same number of convolution blocks are adopted in each stage of the encoder part, and the maximum pooling layer is replaced by the step convolution with the step length of 2, so that the down-sampling process has learnable parameters and is more suitable for semantic segmentation tasks. The single convolutional layer is built by adopting the GN-PReLU-Conv sequence, and the network can be trained more quickly and effectively to achieve a better effect by adopting the preposed form of the normalization layer. GNs have better performance than BN and perform stably over a wider range of batch sizes. All the convolution layers adopt the packet convolution with the packet number of 4, so that the long-distance coupling between channels is reduced, the network parameters are reduced, and the parameter utilization rate is improved. A hole convolution with a hole rate of 2 is used in the third and fourth stage of the encoder to increase the receptive field and obtain more instructive semantic information.
In order to enable the network to pay attention to the edge information of the organ earlier, feature graphs of subsequent levels are unified to the size of the feature graph of the first stage through bilinear interpolation, and an edge distance graph obtained through mask calculation is used as a constraint to carry out deep supervision. Unlike the method that the prediction result is directly used as an intermediate result, so that the final result is influenced by the intermediate result, the method adopts a deep supervision mode, and combines the learning of the edge feature and the backward transmission of the feature.
Given a thyroid mask, obtaining its edge mask, calculating each pixel PiDistance to thyroid edge plot DiAnd through
d(Pi)=exp(-λDi)
Obtaining D as a normalized distance map, wherein Di=minbj∈bdist(Pi,bj) Is a pixel PiTo boundary pixel b ═ bj}j∈Jλ is a parameter for controlling the normalization effect, and is experimentally set to 0.01 in the experiment. Under the parameters, the distance map effect obtained by the thyroid gland mask is as shown in figure 1.
The normalized distance map deep supervised loss function is of the form:
Figure BDA0002948414330000091
wherein
Figure BDA0002948414330000092
A distance map predicted for the network. The network focuses more on learning the edge information of the thyroid by learning the predicted distance map.
The feature maps of the network layers have different resolutions, and after deep supervision of the edge distance map, the features contain different levels of semantic information of the organ edges. Inspired by the research, in order to combine semantic information of each layer more effectively, the resolution of each layer feature map slf (single layer feature-map) of the encoder is unified by bilinear interpolation with the resolution of the layer 2 as the reference, and after splicing, a multi-layer convolution feature mlf (multi layer feature-map) is obtained by convolution operation. Thereafter, slf and mlf are stitched into the attention module, weights for information fusion are learned by multi-scale feature guidance, the weights are applied to mlf to obtain the available information needed for level slf refinement, and combined with slf to obtain the output feature map.
The effective combination of the multi-layer characteristic diagrams requires attention mechanism to learn the combination mode of information among channels and different positions. In order to effectively use the characteristics about the edges of organs generated under the deep supervision effect, the attention mechanism can learn important spatial position information. Considering the calculation amount of the current resolution and the better effect of the two-channel attention series, the CBAM module is adopted, and weight learning in the aspects of the channels and the space is considered while the calculation amount is low.
Channel attention weight, M, of attention modulec(F)=σ(W1W0(AvgPool(F))+W1W0(MaxPool (F))). Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations, respectively, W0∈RC/r×C,W1∈RC×C/rFor both convolution operations for channel compression and recovery, the PReLU activation function follows.
Spatial attention weight, MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)])). Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations respectively, and f7×7Representing a convolution operation with a convolution kernel of 7 x 7.
The channel and space attention modules are connected in series and constitute a residual block, the convolution in the block adopts packet convolution, the normalization layer adopts a GN layer, and the activation function adopts a PReLU.
Some have the idea of ensemble learning in a manner that the output results of each layer are averaged to be the final result, which is helpful to generate more accurate segmentation results, but no further refinement and combination is performed on the results of each layer. The idea of the decoder is to perform a multi-scale optimization of the results in a learnable process. Inspired by DenseASPP, the invention passes the feature maps of each level through a hole dense fusion module, and takes the average result of the result probability maps of each level as the final output.
The excellent performance of Deeplabv3 in semantic segmentation benefits from parallel convolution of each hole rate in ASPP, and rich long-distance semantic information is obtained. The DenseASPP obtains a wider and denser multi-scale receptive field by applying the concept of DenseNet dense connection and adopting a mode of dense connection with multiple incremental voidage. The present invention uses this idea for result refinement, and instead of processing a single feature map, a plurality of hierarchical feature maps are successively input and fused in a densely connected manner. And inputting a fourth-layer feature map with high-level semantic information, sequentially splicing convolution results with other hierarchical feature maps, and performing iterative convolution operation. With the continuous expansion of the receptive field and the integration of new hierarchical information, the prediction result is gradually optimized. Compared with the parallel mode of the ASPP, the denser mode can continuously expand the receptive field and gradually combine the characteristics of different receptive fields.
The frontal mode of the receptive field calculation is Rcur=Rpre+Spre×(Kcur-1) × rate. Wherein R iscurIs the receptive field of the current layer, RpreIs the receptive field of the previous layer, SpreIs the step distance of the previous layer, KcurIs the convolution kernel size of the current layer and rate is the current hole rate. So for a convolution operation with a convolution kernel size of 3 and a void rate of 2, assume that the previous layer stride length is 1. The reception field after n-time rate-2 hole convolution is Rn=Rpre+4 × n. Because mlf and slf are combined in such a way that the receptive fields at each level are virtually identical, after passing through the attention module with global pooling, the receptive fields rapidly grow to 240 and thus already have a relatively large receptive field. For the reasons, in the DFM, the void rate which is enlarged by twice equal ratio is not adopted, but the reception fields are moderately increased by adopting convolution with the void rate of 2, the reception field difference of each level is not opened, and the effect of refining the output result is achieved.
The BCE loss, namely the cross entropy loss function, is a common two-class segmentation loss function, is suitable for large object segmentation, and has poor effect on small target unbalanced scenes. Dice loss is commonly used for small object segmentation, and the function calculation mode of the Dice loss is not influenced by different pixel proportions. In order to give consideration to the segmentation effect of the slices with more foreground pixels in the middle of the organ and the slices with less foreground pixels at the two ends of the organ in the training process, the invention combines BCE loss and Dice loss. In addition, the loss of the object edge distance graph is used as deep supervision, so that the network pays attention to the object edge information earlier. The mixing loss function of the present invention is defined as follows:
Figure BDA0002948414330000111
Figure BDA0002948414330000112
Lhybrid=Ldistance1LDice2LBCE
wherein pred is a pixel prediction probability map, target is ground truth, and epsilon is a smoothing term and is set to be 1e-8 in an experiment. Lambda [ alpha ]1,λ2To lose the weighting coefficients, the coefficients are set to λ experimentally1=0.5,λ2=0.5。
The method was validated on clinical ultrasound data and OpenCAS published datasets. Experiments show that the proposed model can accurately segment the three-dimensional thyroid and achieve the best current effect.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent variations and modifications made to the above embodiment according to the technical spirit of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (10)

1. The three-dimensional ultrasonic thyroid gland segmentation method based on the multi-scale fusion network is characterized by comprising the following steps of: which comprises the following steps:
(1) making the network study to predict the boundary distance graph so as to segment the edges of the heterogeneous organ structure;
(2) adding constraint guidance training to the network in a deep supervision mode to avoid that the final segmentation result depends on an intermediate result of distance map prediction;
(3) using a CBAM attention module at a multi-scale to focus on edge distance information;
(4) and fusing the probability maps of all layers by adopting a dense fusion module of the cavity convolution, and gradually thinning the output probability map to generate a final result of each layer.
2. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 1, wherein: in the step (1), the same number of convolution blocks are adopted in each stage of the encoder part, and the maximum pooling layer is replaced by the step convolution with the step length of 2; adopting GN-PReLU-Conv sequence to build single convolution layer, and normalizing the pre-layer form; all the convolution layers adopt the packet convolution with the packet number of 4; a hole convolution with a hole rate of 2 is used in the third and fourth stage of the encoder to increase the receptive field and obtain more instructive semantic information.
3. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 2, wherein: in the step (2), feature maps of subsequent levels are unified to the feature map size of the first stage through bilinear interpolation, and an edge distance map obtained through mask calculation is used as a constraint for deep supervision.
4. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 3, wherein: in the step (2), a thyroid mask is given, and after an edge mask is obtained, each pixel P is calculatediDistance to thyroid edge plot DiAnd d is a normalized distance graph obtained by the formula (1),
d(Pi)=exp(-λDi) (1)
wherein Di=minbj∈bdist(Pi,bj) Is a pixel PiTo boundary pixel b ═ bj)j∈JIs used as a parameter for controlling the normalization effect,
the normalized distance map deep supervised loss function is of the form:
Figure FDA0002948414320000021
wherein
Figure FDA0002948414320000022
A distance map predicted for the network.
5. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 4, wherein: in the step (2), λ is set to 0.01.
6. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 5, wherein: in the step (3), the resolution of each layer of feature map slf of the encoder is unified by bilinear interpolation with the resolution of the 2 nd layer as the reference, after splicing, a multilayer convolution feature mlf is obtained by convolution operation, slf and mlf are spliced and sent to an attention module, the weight of information fusion is learned by guiding of multi-scale features, the weight is applied to mlf to obtain effective information required by hierarchical slf refinement, and the effective information is combined with slf to obtain an output feature map.
7. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 6, wherein: in the step (3), the channel attention weight of the attention module is
Mc(F)=σ(W1W0(AvgPool(F))+W1W0(MaxPool(F))) (3)
Wherein σ represents sigmoid function, AvgPool, MaxPool is the average pooling and maximum pooling operation, W0∈RC /r×C,W1∈RC×C/rTwo convolution operations for channel compression and recovery, followed by a PReLU activation function;
spatial attention weight of
MS(F)=σ(f7×7([AvgPool(F);MaxPool(F)])) (4)
Wherein σ represents sigmoid function, AvgPool and MaxPool are average pooling and maximum pooling operations respectively, and f7×7A convolution operation representing a convolution kernel of 7 × 7;
the channel and space attention modules are connected in series and constitute a residual block, the convolution in the block adopts packet convolution, the normalization layer adopts a GN layer, and the activation function adopts a PReLU.
8. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 7, wherein: in the step (4), the feature maps of a plurality of hierarchies are input one by one and are fused in a dense connection mode; inputting a fourth-layer feature map with high-level semantic information, splicing convolution results with other hierarchical feature maps one by one, and performing iterative convolution operation; with the continuous expansion of the receptive field and the integration of new hierarchical information, the prediction result is gradually optimized, and the frontal mode of the receptive field calculation is
Rcur=Rpre+Spre×(Kcur-1)×rate (5)
Wherein R iscurIs the receptive field of the current layer, RpreIs the receptive field of the previous layer, SpreIs the step distance of the previous layer, KcurIs the convolution kernel size of the current layer, and rate is the current void rate; for the convolution operation with convolution kernel size of 3 and hole rate of 2, assuming that the stride length of the previous layer is 1, the field of the n-time rate-2 hole convolution is Rn=Rpre+4 xn, because of the way mlf is combined with slf, so that the receptive fields at each level are virtually identical, after passing through the attention module with global pooling, the receptive fields grow rapidly to 240 and thus have already been possessedHas a relatively large receptive field; in the subsequent DFM, the reception fields are moderately increased by adopting convolution with a void rate of 2, the reception field difference of each level is not opened, and the effect of refining the output result is achieved.
9. The multi-scale fusion network-based three-dimensional ultrasonic thyroid segmentation method according to claim 8, wherein: the loss function of the method is
Figure FDA0002948414320000031
Figure FDA0002948414320000032
Lhybrid=Ldistance1LDice2LBCE (6)
Wherein L isDiceIs a loss function that evaluates the goodness of coincidence of two things, LBCEIs a cross entropy of two types, LhybridIs a mixed loss function, pred is a pixel prediction probability map, target is a ground truth, epsilon is a smoothing term, and the values are set to be 1e-8 in an experiment1,λ2To lose the weighting coefficients, the coefficients are set to λ experimentally1=0.5,λ2=0.5。
10. Three-dimensional supersound thyroid gland segmenting device based on multiscale fusion network, its characterized in that: it includes:
a segmentation module configured to let the network learn to predict a boundary distance map in order to segment edges of the heterogeneous organ structure;
a boundary distance map module configured to add constraint guidance training to the network in a deep supervised manner to avoid that the final segmentation result depends on intermediate results of distance map prediction;
an attention module configured to focus on edge distance information using the CBAM attention module at a multi-scale;
and the hole dense fusion module is configured to fuse the probability maps of the layers by adopting the hole convolution dense fusion module, gradually refine the output probability map and generate a final result of each layer.
CN202110202637.9A 2021-02-23 2021-02-23 Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network Pending CN112967300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110202637.9A CN112967300A (en) 2021-02-23 2021-02-23 Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110202637.9A CN112967300A (en) 2021-02-23 2021-02-23 Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network

Publications (1)

Publication Number Publication Date
CN112967300A true CN112967300A (en) 2021-06-15

Family

ID=76285747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110202637.9A Pending CN112967300A (en) 2021-02-23 2021-02-23 Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network

Country Status (1)

Country Link
CN (1) CN112967300A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037714A (en) * 2021-11-02 2022-02-11 大连理工大学人工智能大连研究院 3D MR and TRUS image segmentation method for prostate system puncture
CN114565770A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on edge auxiliary calculation and mask attention
CN116524191A (en) * 2023-05-11 2023-08-01 山东省人工智能研究院 Blood vessel segmentation method of deep learning network integrated with geodesic voting algorithm
CN116823842A (en) * 2023-06-25 2023-09-29 山东省人工智能研究院 Vessel segmentation method of double decoder network fused with geodesic model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109215079A (en) * 2018-07-17 2019-01-15 艾瑞迈迪医疗科技(北京)有限公司 Image processing method, operation navigation device, electronic equipment, storage medium
US20190015059A1 (en) * 2017-07-17 2019-01-17 Siemens Healthcare Gmbh Semantic segmentation for cancer detection in digital breast tomosynthesis
CN109671086A (en) * 2018-12-19 2019-04-23 深圳大学 A kind of fetus head full-automatic partition method based on three-D ultrasonic
CN111260741A (en) * 2020-02-07 2020-06-09 北京理工大学 Three-dimensional ultrasonic simulation method and device by utilizing generated countermeasure network
CN111833273A (en) * 2020-07-17 2020-10-27 华东师范大学 Semantic boundary enhancement method based on long-distance dependence
CN111968138A (en) * 2020-07-15 2020-11-20 复旦大学 Medical image segmentation method based on 3D dynamic edge insensitivity loss function

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190015059A1 (en) * 2017-07-17 2019-01-17 Siemens Healthcare Gmbh Semantic segmentation for cancer detection in digital breast tomosynthesis
CN109215079A (en) * 2018-07-17 2019-01-15 艾瑞迈迪医疗科技(北京)有限公司 Image processing method, operation navigation device, electronic equipment, storage medium
CN109671086A (en) * 2018-12-19 2019-04-23 深圳大学 A kind of fetus head full-automatic partition method based on three-D ultrasonic
CN111260741A (en) * 2020-02-07 2020-06-09 北京理工大学 Three-dimensional ultrasonic simulation method and device by utilizing generated countermeasure network
CN111968138A (en) * 2020-07-15 2020-11-20 复旦大学 Medical image segmentation method based on 3D dynamic edge insensitivity loss function
CN111833273A (en) * 2020-07-17 2020-10-27 华东师范大学 Semantic boundary enhancement method based on long-distance dependence

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
D. JHA等: "A Comprehensive Study on Colorectal Polyp Segmentation With ResUNet++, Conditional Random Field and Test-Time Augmentation", 《IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS》, vol. 25, no. 6, 5 January 2021 (2021-01-05), pages 2029 - 2040 *
FENG C等: "MDIFNet: Multiscale Distant Information Fusion Network for Thyroid Segmentation in 3D Ultrasound Image", 《PROCEEDINGS OF THE 2021 6TH INTERNATIONAL CONFERENCE ON MULTIMEDIA SYSTEMS AND SIGNAL PROCESSING》, 6 September 2021 (2021-09-06), pages 22 - 28 *
MILLETARI F等: "V-net: Fully convolutional neural networks for volumetric medical image segmentation", 《ARXIV:1606.04797V1 》, 15 June 2016 (2016-06-15), pages 1 - 11, XP055293637, DOI: 10.1109/3DV.2016.79 *
YIN S等: "Automatic kidney segmentation in ultrasound images using subsequent boundary distance regression and pixelwise classification networks", 《ARXIV:1811.04815V3》, 30 May 2019 (2019-05-30), pages 1 - 22 *
ZHANG C等: "Dial/Hybrid cascade 3DResUNet for liver and tumor segmentation", 《PROCEEDINGS OF THE 2020 4TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING》, 10 September 2020 (2020-09-10), pages 92 - 96 *
时斐斐等: "基于深度残差网络与边缘监督学习的显著性检测", 《激光与光电子学进展》, vol. 56, no. 15, 31 August 2019 (2019-08-31), pages 1 - 9 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037714A (en) * 2021-11-02 2022-02-11 大连理工大学人工智能大连研究院 3D MR and TRUS image segmentation method for prostate system puncture
CN114565770A (en) * 2022-03-23 2022-05-31 中南大学 Image segmentation method and system based on edge auxiliary calculation and mask attention
CN114565770B (en) * 2022-03-23 2022-09-13 中南大学 Image segmentation method and system based on edge auxiliary calculation and mask attention
CN116524191A (en) * 2023-05-11 2023-08-01 山东省人工智能研究院 Blood vessel segmentation method of deep learning network integrated with geodesic voting algorithm
CN116524191B (en) * 2023-05-11 2024-01-19 山东省人工智能研究院 Blood vessel segmentation method of deep learning network integrated with geodesic voting algorithm
CN116823842A (en) * 2023-06-25 2023-09-29 山东省人工智能研究院 Vessel segmentation method of double decoder network fused with geodesic model
CN116823842B (en) * 2023-06-25 2024-02-02 山东省人工智能研究院 Vessel segmentation method of double decoder network fused with geodesic model

Similar Documents

Publication Publication Date Title
CN111627019B (en) Liver tumor segmentation method and system based on convolutional neural network
CN113077471B (en) Medical image segmentation method based on U-shaped network
CN112967300A (en) Three-dimensional ultrasonic thyroid segmentation method and device based on multi-scale fusion network
CN110992252B (en) Image multi-grid conversion method based on latent variable feature generation
CN111259945B (en) Binocular parallax estimation method introducing attention map
CN110853051A (en) Cerebrovascular image segmentation method based on multi-attention dense connection generation countermeasure network
CN114663440A (en) Fundus image focus segmentation method based on deep learning
Jafari et al. Semi-supervised learning for cardiac left ventricle segmentation using conditional deep generative models as prior
CN116309648A (en) Medical image segmentation model construction method based on multi-attention fusion
Zuo et al. Minimum spanning forest with embedded edge inconsistency measurement model for guided depth map enhancement
CN112966747A (en) Improved vehicle detection method based on anchor-frame-free detection network
CN114548265A (en) Crop leaf disease image generation model training method, crop leaf disease identification method, electronic device and storage medium
CN115880720A (en) Non-labeling scene self-adaptive human body posture and shape estimation method based on confidence degree sharing
Xiao et al. A visualization method based on the Grad-CAM for medical image segmentation model
CN117058307A (en) Method, system, equipment and storage medium for generating heart three-dimensional nuclear magnetic resonance image
CN116524048A (en) Natural image compressed sensing method based on potential diffusion model
Ning et al. CF2-Net: Coarse-to-fine fusion convolutional network for breast ultrasound image segmentation
US20220164927A1 (en) Method and system of statistical image restoration for low-dose ct image using deep learning
CN113780389B (en) Deep learning semi-supervised dense matching method and system based on consistency constraint
CN112101456B (en) Attention characteristic diagram acquisition method and device and target detection method and device
CN111667488B (en) Medical image segmentation method based on multi-angle U-Net
CN112529930A (en) Context learning medical image segmentation method based on focus fusion
Zhong et al. Displacement-invariant cost computation for stereo matching
Hou et al. Joint learning of image deblurring and depth estimation through adversarial multi-task network
CN113706684A (en) Three-dimensional blood vessel image reconstruction method, system, medical device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination