CN117557791A - Medical image segmentation method combining selective edge aggregation and deep neural network - Google Patents

Medical image segmentation method combining selective edge aggregation and deep neural network Download PDF

Info

Publication number
CN117557791A
CN117557791A CN202311231035.1A CN202311231035A CN117557791A CN 117557791 A CN117557791 A CN 117557791A CN 202311231035 A CN202311231035 A CN 202311231035A CN 117557791 A CN117557791 A CN 117557791A
Authority
CN
China
Prior art keywords
encoder
transducer
medical image
decoder
cnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311231035.1A
Other languages
Chinese (zh)
Inventor
朱敏
陈纪龙
程俊龙
姜磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202311231035.1A priority Critical patent/CN117557791A/en
Publication of CN117557791A publication Critical patent/CN117557791A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a medical image segmentation method combining selective edge aggregation and a deep neural network, which comprises the steps of firstly constructing a transducer-based encoder, replacing MSA and MLP in a standard transducer block with a selective edge aggregation module and a densely connected feedforward network, and realizing feature fusion and complementation; then constructing an encoder and a decoder based on dense connection CNN, and connecting the two encoders in parallel, so that the network can perform information interaction on a plurality of levels, and the decoder based on dense connection CNN fuses multi-scale features from the double encoders and up-sampling paths from low to high, thereby recovering the spatial resolution of the feature map in a fine-granularity and deep-level mode; and finally, designing and combining the loss functions of the target edge and the region to simultaneously optimize the encoder and the decoder by a multi-level optimization strategy, so that the network further learns more semantic information and boundary details, and refines the segmentation result. The invention can solve the problem of medical image segmentation in a real scene.

Description

Medical image segmentation method combining selective edge aggregation and deep neural network
Technical Field
The invention relates to a medical image segmentation technology in the field of image processing, in particular to a medical image segmentation method combining selective edge aggregation and a deep neural network.
Background
Medical image segmentation is a widely studied and challenging task with the aim of helping clinicians to pay more attention to pathological areas, extracting detailed information of medical images for more accurate diagnosis and analysis. Currently common medical image segmentation tasks include skin lesion segmentation, gland segmentation, thyroid nodule segmentation, and the like. However, due to the large scale variation of the target to be segmented in the medical image, the problems of fuzzy boundary of the target structure, numerous modes and the like, and the lack of a high-quality marker image for training in practice, the accurate segmentation result is very difficult to obtain.
With the rapid development of deep learning technology, many end-to-end automatic segmentation methods have been proposed and applied to the field of medical image analysis. U-Net is one of the most widely used medical image segmentation models at present, which uses an encoder to learn advanced semantic representations, a decoder to recover lost spatial information, and uses a jump to connect different scale features of the fusion encoder and decoder to generate a more accurate segmentation mask, and many variants of improving U-Net have been proposed later, but the medical image segmentation method based on deep learning of U-Net and variants thereof does not explicitly consider that accurate boundary prediction can generate a higher quality segmentation mask. To solve the problem of structural boundary ambiguity, methods such as DeepLab, EANet have been reported which recover the boundary details by learning inter-pixel dependencies. However, they either require manual adjustment of parameters at the time of post-processing or require careful design of a learnable module to accomplish this labor intensive task. In addition, most of the existing CNNs methods cannot establish phase dependency relationship and global context relationship due to the limitation of receptive field in convolution operation. Repeated stride and pooling operations inevitably lose the resolution of the image, which makes the dense prediction task challenging. The advent of transformers greatly eases this problem, which was first used for natural language processing tasks, was able to encode distance dependencies. While transfomers are good at global context modeling, lack of spatial information of images, especially when capturing image structure boundaries, has limitations. These problems limit the successful application of pure transformers to smaller data volumes of medical image datasets.
In summary, efficient boundary prediction, fusion of context and spatial features, and good performance on smaller datasets are key issues in medical image segmentation.
Disclosure of Invention
In view of the above problems, it is an object of the present invention to provide a medical image segmentation method combining selective edge aggregation and deep neural networks, capturing global contextual features and shallow spatial features of the image, enabling multi-scale learning capabilities of the network and enabling the network to select and retain features related to edges without additional learning. The technical proposal is as follows:
a medical image segmentation method combining selective edge aggregation and deep neural networks, comprising the steps of:
step 1: selecting a disclosed medical image segmentation dataset, and preprocessing a training set in the dataset;
step 2: constructing a selective edge aggregation module to enable the network to pay attention to the accuracy of edge division;
step 3: constructing a densely connected feed-forward network to realize feature reuse and multi-scale learning capacity of the network;
step 4: designing a transducer-based encoder structure comprising a selective edge aggregation module and a densely connected feedforward network, and reserving image global context information;
step 5: designing an encoder and decoder structure based on dense connection CNN, and extracting image local information and spatial texture information;
step 6: constructing a multistage optimization strategy, and simultaneously optimizing an encoder and a decoder to learn boundary related information to generate better characteristic representation;
step 7: and designing an image segmentation framework consisting of a transducer encoder structure, an encoder and decoder structure based on dense connection CNN and a multi-level optimization strategy, and completing the segmentation of the medical image.
Further, the step 1 of the medical image segmentation dataset is as follows: ISIC2017, PH2, TN-SCUI 2020 change, GLAnd segmentation and COVID-19Infection segmentation; preprocessing the training set in the data set is as follows: after the ISIC2017 and PH2 normalize the image colors, all image resolutions are adjusted to 224x224 pixel size, TN-SCUI 2020challenge and GLAnd segmentation all image resolutions are adjusted to 224x224 pixel size, and the covd-19 Infection segmentation all image resolution is adjusted to 352x352 pixel size.
Further, the specific process of the step 2 is as follows:
step 2.1: the input feature map after any convolution layer is activated is represented as:
X Sig ∈R H×W×C
wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
step 2.2: extracting input feature map X by a max pooling operation using an edge extraction block Sig Outputting the pooling operation result X EEB Expressed by the following formula:
X EEB =Maxpooling(1-X Sig ,K)-(1-X Sig )
wherein K represents a sliding window size;
step 2.3: setting threshold selection input feature map X using salient feature selection block Sig Is accomplished by the following three steps:
(1) depth aggregation input feature mapX-ray Sig Channel information of (a), i.e.)
(2) Calculating aggregated channel information X agg Average of all positions in (a)
(3) Will average the valueSelecting aggregated channel information X as a threshold agg Is characterized by the significance of (1) and output the resultThe formula is as follows;
wherein the superscript (x, y) represents coordinates of a specific location; x, y E [0,1, …, H-1],[0,1,…,W-1]And X is SFS ∈R H×W×1
Step 2.4: x is to be EEB And X SFS Element-by-element multiplication to obtain a feature map M which simultaneously shields the background region and the target boundary 0 ∈R H×W×C
Step 2.5: preserving feature map M using channel selection algorithm 0 ∈R H×W×C The channel which accords with the expected effect is shielded, the channel which does not accord with the expected effect is shielded, and the channel selection result X is output out ∈R H×W×C
Step 2.6: the input feature map after the resolution reduction operation is mapped and expressed as:
T in ∈R H×W×C
wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
step 2.7: aggregating input feature graphs T using tie pooling operations in Is characterized by outputting result T avg Expressed by the following formula:
T avg =Avgpooling(T in ,K)-T in
wherein K represents the size of a sliding window, and K is 3 in the invention;
step 2.8: t after activation using Sigmoid activation function avg And X is out Performing feature stitching, performing average aggregation operation on the stitched feature graphs, and activating the feature graphs after average aggregation to obtain a weight graph T out The formula is as follows:
where f represents a Sigmoid activation function, C represents the number of channels, C represents the index of the image channel,representing feature stitching operations;
step 2.9: for X EEB And T out Performing element-by-element addition, and combining the element-by-element addition result with T out Element-by-element multiplication to obtain the output result SEA of the Selective Edge Aggregation (SEA) module out
Further, the specific process of constructing the densely connected feedforward network (Dense MLP) in the step 3 is as follows:
step 3.1: the feature map output by the selective feature aggregation (SEA) module is represented as:
wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
step 3.2: will beRemodelling shape +.>Wherein s=h×w;
step 3.3: will beAll as input to the next layer, expressed by the following formula:
.......,
where MLP denotes a layer in a Dense feed-forward network (Dense MLP), M denotes the growth rate of the channel, i.e. the output dimension of the MLP, which in the present invention is 16.
Furthermore, the decoder based on the transformers in the step 4 is formed by repeatedly connecting a plurality of transformers, each Transformer block comprises a standardization layer, a Selective Edge Aggregation (SEA) module and a Dense connection feedforward network (transmission MLP), and a Patch Embedding layer is added before each Transformer block to reduce the resolution of the input feature map; the treatment process is as follows:
step 4.1: the method comprises the following steps of reducing the resolution of a feature map input into a transducer block by using a Patch Embedding layer:
(1) mapping the input feature map of the transducer block to X in ∈R H×W×C Wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
(2) for X in Sampling the pixels, and expanding the number of channels to 4 times of the original number to obtain
(3) With a convolution kernel of 1, a packet convolution of 4 willChannel of (2) is mapped to X in The same channel number, obtain X emb ∈R H/2×W/2×C
Step 4.2: x is to be emb A transducer block consisting of three parts, the normalization layer, SEA module and the transform MLP, is input, expressed by the following formula:
wherein,representing features from the transducer branch through Patch Embedding and features from the CNN branch, respectively, norm represents the normalization layer, SEA represents the Selective Edge Aggregation (SEA) module, and DenseMLP represents the Dense connectivity feed forward network (Dense MLP).
Further, the specific step of designing the encoder and decoder structure based on the densely connected CNN in the step 5 includes:
step 5.1: downsampling the input feature map twice in sequence starting from the first convolution block of the encoder, with the final resolution becoming (H/16, w/16);
step 5.2: the fusion feature of the jump connection between the encoder and the decoder is constructed and expressed by the following formula:
wherein,representing element-by-element additions>And->Representing the outputs of the CNN encoder and the transform encoder, respectively, block l-1, denseconv representing densely connected convolution blocks;
step 5.3: the concatenated channels are reduced to 1/4 of the original using a standard convolution, and then the number of channels is increased to 1/2 of the original channels by a series of densely connected convolution blocks.
Further, the specific steps of constructing the multistage optimization strategy in the step 6 are as follows:
step 6.1: calculating the overlay error between the predicted result and the true value using IoU loss, i.e., target area loss l IoU Expressed by the formula:
wherein P represents the prediction result of the network, G represents the true value, and the subscript i represents the index of the pixel;
step 6.2: the boundary loss for minimizing the boundary error between P and G is calculated by:
(1) extracting P and G boundary P using max pooling operations b And G b The formula is as follows:
G b =Maxpooling(1-G,K)-(1-G),
P b =Maxpooling(1-P,K)-(1-P)
wherein K represents the size of a sliding window, and K is 3 in the invention;
(2) by P b And G b Structural boundary loss l Edge The formula is as follows:
wherein,and->Respectively representing a true value and a prediction boundary probability value of an ith position, wherein alpha is a weight coefficient for balancing the number of pixels;
step 6.3: using target area loss l IoU And boundary loss l Edge Calculating a loss function l Seg
l Seg =λ 1 l IoU2 l Edge
Wherein lambda is 1 And lambda (lambda) 2 To balance the target area loss l IoU And boundary loss l Edge Weight coefficient of (2);
step 6.4: probability map P for output based on a transform encoder e And probability map P based on decoder output of dense connection CNN d Performing multistage optimization to obtain total loss l in training stage Total The formula is as follows:
where N represents the number of transducer blocks in the transducer encoder and N represents the index of the transducer blocks.
Furthermore, the step 7 designs an image segmentation frame composed of a transducer encoder structure, an encoder and decoder structure based on dense connection CNN and a multi-level optimization strategy, and the specific process of completing the segmentation of the medical image is as follows:
step 7.1: inputting an original image into a transducer-based encoder and a dense connection CNN-based encoder, capturing and retaining global context information by utilizing a transducer branch, and extracting local information and spatial texture information by utilizing a CNN branch;
step 7.2: merging multi-scale features from low to high from dual encoders and up-sampling paths through densely connected CNN decoders;
step 7.3: the output of the transform encoder is directly extended to the target size and true value calculation loss, the output of the densely connected CNN decoder is calculated to the true value calculation loss, and the encoder and decoder are simultaneously optimized in a multistage optimization mode.
The beneficial effects brought by adopting the technical scheme are that:
1) The invention provides a novel and effective medical image segmentation framework combining selective edge aggregation and a deep neural network to comprehensively solve the problem of medical image segmentation, and the framework can be used for solving the problems of multiple scales and fuzzy structural boundaries in medical images of different modes and still can show excellent segmentation performance even on a smaller medical image segmentation dataset.
2) The invention designs a method for selectively aggregating the edge information without an additional supervision Selective Edge Aggregation (SEA) module, so that the network is more concerned with the accuracy of edge division. In addition, the codec has smaller parameters and multi-scale learning capability by adopting a dense connection mode all the time.
3) The invention constructs a loss function combining the target edge and region and simultaneously optimizes the encoder and decoder using a multi-level optimization strategy. This optimization encourages the encoder to learn more boundary-related information, yielding a better characterization.
Description of the drawings:
FIG. 1 is a schematic diagram of a selective edge aggregation module according to the present invention.
Fig. 2 is an edge extraction block of the present invention.
Fig. 3 is a salient feature selection block of the present invention.
FIG. 4 is a transducer block of the present invention.
FIG. 5 is a flow chart of a medical image segmentation method incorporating selective edge aggregation and deep neural networks of the present invention.
Detailed Description
The technical scheme of the invention will be further described in detail below with reference to the accompanying drawings.
The invention designs a medical image segmentation method combining selective edge aggregation and a deep neural network. Firstly, combining CNNs with Dense connections and transformers with Dense feed-forward networks (Dense MLPs) in a parallel manner to form an encoder, and taking the Dense connections as a decoder, effectively capturing shallow texture information and global context information in a medical image in a deeper and multi-scale manner; secondly, we propose a plug and play Selective Edge Aggregation (SEA) module that removes noise background without supervision, selects and retains useful edge features, making the network more concerned about information related to the target boundary; in addition, we have designed a loss function that combines the target content and edges, and use multi-level optimization strategies to refine the fuzzy structure, helping the network learn better feature representation, yielding more accurate segmentation results.
The present invention evaluates the proposed method over a number of different challenging medical segmentation tasks, performs well compared to most of the most advanced methods, and has fewer parameters and gflips than other methods.
Step 1: the disclosed medical image segmentation dataset is picked up and the dataset is preprocessed.
The specific implementation of preprocessing the training set is as follows:
the invention performs segmentation training tasks on four disclosed medical image segmentation datasets. Wherein the data sets are respectively: ISIC2017, PH2, TN-SCUI 2020challenge, GLAnd segmentation and COVID-19Infection segmentation.
The ISIC2017 dataset is provided by the international skin imaging co-organization, including 2000 training images, 150 verification images, and 600 test images. The PH2 dataset comprised 200 mirror images of skin with a resolution of 765 x 572 pixels, by randomly selecting 140 images as the training set, 20 images as the validation set, and the remaining 40 images as the test set. Firstly, normalizing the colors of the images by using a gray world color consistency algorithm for the two data sets, then adjusting the resolution of all the images to 224×224 pixels for experiments, and finally enhancing training data in the training process to improve the generalization capability of the model.
The TN-SCUI 2020challenge dataset provides images of the 3644 Zhang Jiejie thyroid of different sizes, and the annotation of nodules has been annotated by a highly experienced physician. The training set, validation set and test set are first divided into 6:2:2. The training set is subjected to data enhancement methods such as daily random rotation, random horizontal and vertical displacement, random overturn and the like in the training process to increase the diversity of training data, and the resolution of all images is uniformly adjusted to 224 multiplied by 224 pixel size.
GLAnd segmentation (GLAS) dataset contains microscopic images of hematoxylin and eosin (Hematoxylin and Eosin) stained slides, as well as true values provided by expert pathologists. The dataset contained 165 images with non-uniform resolution sizes, with a minimum resolution of 433 x 574 pixels and a maximum resolution of 775 x 522 pixels. 85 images were selected for training and 80 images were used for testing. The resolution of all images was adjusted to 224x224 pixel size in the experiment.
The covd-19 Infection segmentation dataset contained 100 axial CT images and corresponding labeling images from over 40 covd-19 patients. Taking into account that the data volume of the dataset is very small, experiments were performed with five-fold cross-validation (i.e. 80 images were used for training at a time, 20 images were used for validation). In training, data enhancement strategies are also employed to increase the diversity of the training set and uniformly scale the image to a 352x352 pixel size.
Step 2: constructing a Selective Edge Aggregation (SEA) module that receives features from both the transducer and CNN branches focuses the accuracy of the edge partitioning for the network. Because the CNN can better capture the spatial information of the segmented target, the CNN branches are used to supplement the transducer branches, so that the two branches realize feature fusion and complementation, and the selective edge aggregation module of the invention is shown in fig. 1. The specific construction steps are as follows:
1) The input feature map after any convolution layer is activated is represented as:
X Sig ∈R H×W×C
where H and W are the height and width of the image, respectively, and C represents the number of channels.
2) Extracting X in CNN branches by a max pooling operation using Edge Extraction Blocks (EEBs) Sig Outputting the result X EEB Referring to fig. 2, the edge extraction block of the present invention is represented by the following formula:
X EEB =Maxpooling(1-X Sig ,K)-(1-X Sig )
where K represents the maximum pooled sliding window size, K is 3 in the present invention.
3) Setting threshold selection X in CNN branches using a salient feature selection Block (SFS) Sig Referring to fig. 3, is a salient feature selection block of the present invention, which is accomplished by the following three steps:
(1) deep polymerization X Sig Channel information of (a), i.e.)
(2) Calculate X agg Average of all positions in (a)
(3) Will beSelecting X as a threshold value agg Is a significant feature of (1) outputting a result +.>The formula is as follows;
wherein the superscript (x, y) represents coordinates of a specific location; x, y E [0,1, …, H-1],[0,1,…,W-1]And X is SFS ∈R H×W×1
4) X is to be EEB And X SFS Element-by-element multiplication to obtain a feature map M which simultaneously shields the background region and the target boundary 0 ∈R H×W×C
5) Preserving feature map M using channel selection algorithm 0 ∈R H×W×C The channel which accords with the expected effect is shielded, the channel which does not accord with the expected effect is shielded, and the result X is output out ∈R H×W×C
6) The input feature map after the resolution reduction operation is mapped and expressed as:
T in ∈R H×W×C
where H and W are the height and width of the image, respectively, and C represents the number of channels.
7) Aggregating input feature graphs T using tie pooling operations in Is characterized by outputting result T avg Expressed by the following formula:
T avg =Avgpooling(T in ,K)-T in
where K represents the average pooled sliding window size, K is 3 in the present invention.
8) T after activation using Sigmoid activation function avg And X is out Performing feature stitching, performing average aggregation operation on the stitched feature graphs, and activating the feature graphs after average aggregation to obtain a weight graph T out The formula is as follows:
where f represents a Sigmoid activation function, C represents the number of channels, C represents the index of the image channel,representing a feature stitching operation.
9) For X EEB And T out Performing element-by-element addition, and combining the element-by-element addition result with T out Element-by-element multiplication to obtain the output result SEA of the Selective Edge Aggregation (SEA) module out
Step 3: the method comprises the steps of constructing a transducer block, wherein each transducer block comprises a densely connected feedforward network (Dense MLP), and the densely connected feedforward network is constructed by applying a linear layer in the channel direction in a densely connected mode, so that the information flow between channels is further improved, and referring to FIG. 4, the transducer block of the invention is disclosed. The specific construction flow is as follows:
1) The feature map output by the selective feature aggregation (SEA) module is represented as:
where H and W are the height and width of the image, respectively, and C represents the number of channels.
2) Will beRemodelling shape +.>Where n=h×w.
3) Will beAll as input to the next layer, expressed by the following formula:
……,
where MLP denotes a layer in a Dense feed-forward network (Dense MLP), M denotes the growth rate of the channel, i.e. the output dimension of the MLP, which in the present invention is 16.
Step 4: a decoder based on a transducer is constructed, and the decoder is formed by repeatedly connecting a plurality of transducer blocks, wherein each transducer block comprises a standardized layer, a Selective Edge Aggregation (SEA) module and a densely connected feedforward network (Dense MLP), and can adapt to a high-resolution image, and is complementary with spatial features captured by CNN, and a Patch enhancement layer is added before each transducer block to reduce the resolution of an input feature map, so that the transducer can expand receptive fields layer by layer like CNN, and referring to a flow chart of the medical image segmentation method combining selective edge aggregation and a deep neural network, wherein the transducer-based encoder is a branch of 'Transformer Encoder' in FIG. 5. The specific implementation steps are as follows:
1) The method for reducing the resolution of the feature map of the input transducer block by using the Patch Embedding layer mainly comprises the following three steps
And (3) completion:
(1) mapping the input feature map of the transducer block to X in ∈R H×W×C Wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
(2) for X in Sampling the pixels, and expanding the number of channels to 4 times of the original number to obtain
(3) With a convolution kernel of 1, a packet convolution of 4 willChannel of (2) is mapped to X in The same channel number, obtain X emb ∈R Hl2×W/2×C
2) X is to be emb Input byThe transducer block consisting of three parts, the normalization layer, the SEA module and the Dense MLP, is expressed by the following formula:
wherein,representing features from the transducer branch through Patch Embedding and features from the CNN branch, respectively, norm represents the normalization layer, SEA represents the Selective Edge Aggregation (SEA) module, and DenseMLP represents the Dense connectivity feed forward network (Dense MLP).
Step 5: an encoder and a decoder based on dense connectivity CNN are constructed, the encoder and the decoder being a U-shaped network, wherein the encoder is used for extracting semantic information of medical images from shallow layers to deep layers, and the decoder is used for recovering the spatial resolution of the output characteristics of the encoder. In addition, a skip connection is applied to acquire detailed information from the encoder and decoder to compensate for information loss due to downsampling and convolution operations. Referring to fig. 5, a flowchart of a medical image segmentation method combining selective edge aggregation and depth neural network of the present invention, in which encoders and decoders based on densely connected CNNs, i.e. "CNN Encoder" and "CNN Decoder" branches in fig. 5, are shown. The specific steps for designing the encoder and decoder architecture based on densely connected CNNs include:
1) The input feature map is downsampled twice in sequence starting from the first convolution block of the encoder, and the final resolution becomes (H/16, w/16).
2) The fusion feature of the jump connection between the encoder and the decoder is constructed and expressed by the following formula:
wherein,representing element-by-element additions>And->Representing the outputs of the CNN encoder and the transform encoder, respectively, block l-1, denseconv represents densely connected convolution blocks.
3) The concatenated channels are reduced to 1/4 of the original using a standard convolution, and then the number of channels is increased to 1/2 of the original channels by a series of densely connected convolution blocks.
Step 6: in order to reduce the difference between the predicted result and the true value, two loss functions are used herein to focus on two independent aspects of the segmentation content and the segmentation boundary, respectively. The first is IoU loss to minimize the overlay error between the predicted result and the real value, and the second is boundary loss to minimize the boundary error between the predicted result and the real value. In addition, a multi-level optimization Strategy is introduced to optimize both the encoder and decoder, and referring to fig. 5, a flowchart of the medical image segmentation method combining selective edge aggregation and deep neural network of the present invention is introduced, wherein the multi-level optimization Strategy and the "MLO Strategy" branch in fig. 5. The specific steps for designing the loss function and the multistage optimization strategy comprise:
1) Calculating the overlay error between the predicted result and the true value using IoU loss, i.e., target area loss l IoU Expressed by the formula:
where P represents the predicted result of the network, G represents the true value, and i represents the index of all pixels in P and G.
2) For minimizing P i And G i The boundary loss of the boundary error between is calculated by:
(1) extracting P and G boundary P using max pooling operations b And G b The formula is as follows:
G b =Maxpooling(1-G,K)-(1-G),
P b =Maxpooling(1-P,K)-(1-P)
wherein K represents the maximum pooled sliding window size, and K is 3 in the invention;
(2) by means ofAnd->Structural boundary loss l Edge The formula is as follows:
wherein,and->The true value and the predicted boundary probability value of the i-th position are respectively represented, and alpha is a weight coefficient for balancing the number of pixels.
3) By using l IoU And l Edge Calculating a loss function l Seg I.e.
l Seg =λ 1 l IoU2 l Edge
Wherein lambda is 1 And lambda (lambda) 2 To balance the target area loss l IoU And boundary loss l Edge Weight coefficient of (2);
4) Probability map P for output based on a transform encoder e And probability map P based on decoder output of dense connection CNN d The multi-stage optimization is carried out,obtain the total loss of training phase Total The formula is as follows:
where N represents the number of transducer blocks in the transducer encoder and N represents the index of the transducer blocks.
Step 7: an image segmentation framework consisting of a transducer encoder structure, an encoder and decoder structure based on dense connectivity CNN and a multi-level optimization strategy is designed, and referring to fig. 4, a flowchart of the full resolution representation network-based medical image segmentation method of the present invention is shown. The specific process of completing the segmentation of the medical image is as follows:
1) The frame consists of three modules:
constructing a transducer-based encoder consisting of a plurality of transducer blocks, capturing and retaining important global context information, introducing a Patch embedding layer in front of the transducer blocks to adapt to high-resolution images and Dense prediction tasks, and replacing MSA and MLP in the standard transducer blocks with a Selective Edge Aggregation (SEA) module and a Dense connection feed-forward network (Dense MLP) constructed for the invention, so that the network can accept the characteristics of two branches from the transducer-based encoder and the Dense connection CNN-based encoder, and realizing characteristic fusion and complementation. The network has natural multi-scale feature extraction capability through the encoder and decoder based on the dense connection CNN, and the encoder based on the transducer and the encoder based on the dense connection CNN are connected in parallel to perform information interaction on multiple levels, so that the local and global information of the image is fully utilized, the decoder based on the dense connection CNN fuses multi-scale features from the dual encoder and the up-sampling path from low to high, and the spatial resolution of the feature map is recovered in a finer granularity and deeper mode. In addition, the loss function loss of the combination target edge and the region is designed, the encoder and the decoder are simultaneously optimized by a multi-level optimization strategy, so that the network further learns more semantic information and boundary details, and the segmentation result is refined.
2) Model architecture and super parameter setting:
the Keras-based method is realized on NVIDIA RTX3090 GPU (24 g) through training. The learning rate was fixed at 1e-4 using Adam optimizer. The mini batch size was set to 16 and training was stopped using an early stop mechanism when the validation loss was stable and there was no significant change in 30 epochs. Training data was augmented by applying random rotations (+ -25 °), random horizontal and vertical shifts (15%) and random flips (horizontal and vertical). Furthermore, all comparative experiments used the same training set and validation set. After the second phase of the SEAformer CNN branch, the initial weights come from Block2, block3 and Block4 of the pre-trained DenseNet121 on ImageNet, the other layers the invention trains from scratch.
3) Model evaluation method
Five widely used metrics are used to evaluate model performance. I.e., accuracy (Acc), sensitivity (Sens), specificity (Spec), intersection over Union (IoU) and Dice similarity coefficient (Dice). The parameter amounts, GFLOPs and FPS, of the present invention are also reported.
4) The model is implemented as follows:
the method comprises the steps that an original image is input into a encoder based on a transducer and an encoder based on dense connection CNN, global context information is captured and reserved through a transducer branch, and local information and spatial texture information are extracted through a CNN branch; merging multi-scale features from low to high from dual encoders and up-sampling paths through densely connected CNN decoders; the output of the transform encoder is directly extended to the target size and true value calculation loss, the output of the densely connected CNN decoder is calculated to the true value calculation loss, and the encoder and decoder are simultaneously optimized in a multistage optimization mode.

Claims (8)

1. A medical image segmentation method combining selective edge aggregation and deep neural network, comprising the steps of:
step 1: selecting a disclosed medical image segmentation dataset, and preprocessing a training set in the dataset;
step 2: constructing a selective edge aggregation module to enable the network to pay attention to the accuracy of edge division;
step 3: constructing a densely connected feed-forward network to realize feature reuse and multi-scale learning capacity of the network;
step 4: designing a transducer-based encoder structure comprising a selective edge aggregation module and a densely connected feedforward network, and reserving image global context information;
step 5: designing an encoder and decoder structure based on dense connection CNN, and extracting image local information and spatial texture information;
step 6: constructing a multistage optimization strategy, and simultaneously optimizing an encoder and a decoder to learn boundary related information to generate better characteristic representation;
step 7: and designing an image segmentation framework consisting of a transducer encoder structure, an encoder and decoder structure based on dense connection CNN and a multi-level optimization strategy, and completing the segmentation of the medical image.
2. The medical image segmentation method in combination with selective edge aggregation and deep neural network according to claim 1, wherein the medical image segmentation dataset of step 1 is: ISIC2017, PH2, TN-SCUI 2020 change, GLAnd segmentation and COVID-19Infection segmentation; preprocessing the training set in the data set is as follows: after the ISIC2017 and PH2 normalize the image colors, all image resolutions are adjusted to 224x224 pixel size, TN-SCUI 2020challenge and GLAnd segmentation all image resolutions are adjusted to 224x224 pixel size, and the covd-19 Infection segmentation all image resolution is adjusted to 352x352 pixel size.
3. The medical image segmentation method combining selective edge aggregation and deep neural network according to claim 1, wherein the specific procedure of the step 2 is as follows:
step 2.1: activating any convolution layerPost input feature map mapping X Sig Expressed as:
X Sig ∈R H×W×C
wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
step 2.2: extracting input feature map X by a max pooling operation using an edge extraction block Sig Outputting the pooling operation result X EEB Expressed by the following formula:
X EEB =Maxpooling(1-X Sig ,K)-(1-X Sig )
wherein K represents a sliding window size;
step 2.3: setting threshold selection input feature map X using salient feature selection block Sig Is accomplished by the following three steps:
(1) depth aggregation input feature map mapping X Sig Channel information of (a), i.e.)
(2) Calculating aggregated channel information X agg Average of all positions in (a)
(3) Will average the valueSelecting channel information X as threshold agg Outputting the result of the significance signatureThe formula is as follows;
wherein the superscript (x, y) represents coordinates of a specific location;x,y∈[0,1,…,H-1],[0,1,…,W-1]and X is SFS ∈R H ×W×1
Step 2.4: x is to be EEB And X SFS Element-by-element multiplication to obtain a feature map M which simultaneously shields the background region and the target boundary 0 ∈R H×W×C
Step 2.5: preserving feature map M using channel selection algorithm 0 ∈R H×W×C The channel which accords with the expected effect is shielded, the channel which does not accord with the expected effect is shielded, and the channel selection result X is output out ∈R H×W×C
Step 2.6: the input feature map after the resolution reduction operation is mapped and expressed as:
T in ∈R H×W×C
wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
step 2.7: aggregating input feature graphs T using tie pooling operations in Is characterized by outputting the result T of the tie pooling operation avg Expressed by the following formula:
T avg =Avgpooling(T in ,K)-T in
step 2.8: t after activation using Sigmoid activation function avg And X is out Performing feature stitching, performing average aggregation operation on the stitched feature graphs, and activating the feature graphs after average aggregation to obtain a weight graph T out The formula is as follows:
where f represents a Sigmoid activation function, C represents the number of channels, C represents the index of the image channel,representing feature stitching operations;
step 2.9: for X EEB And weight map T out Performing element-by-element addition and integrating the element-by-element additionFruit and weight map T out Obtaining the output result SEA of the selective edge aggregation module by multiplying element by element out
4. The medical image segmentation method combining selective edge aggregation and deep neural network according to claim 3, wherein the specific process of constructing the densely connected feedforward network in the step 3 is as follows:
step 3.1: the feature map output by the selective feature aggregation module is represented as:
wherein H and W are the height and width of the image, respectively, and C represents the number of channels;
step 3.2: will beRemodelling shape +.>Wherein s=h×w;
step 3.3: will beAll as input to the next layer, expressed by the following formula:
……,
where MLP denotes a layer in a densely connected feed forward network, M denotes the growth rate of the channel, i.e. the output dimension of the MLP.
5. The method for segmenting the medical image by combining selective edge aggregation and deep neural network according to claim 4, wherein the decoder based on the transducer in the step 4 is formed by repeatedly connecting a plurality of transducer blocks, each transducer block comprises a standardized layer, a selective edge aggregation module and a densely connected feedforward network, and a Patch Embedding layer is added before each transducer block to reduce the resolution of an input feature map; the treatment process is as follows:
step 4.1: the method comprises the following three steps of reducing the resolution of a feature map input into a transducer block by using a Patch Embedding layer:
(1) mapping the input feature map of the transducer block to X in ∈R H×W×C
(2) For X in Sampling the pixels, and expanding the number of channels to 4 times of the original number to obtain
(3) With a convolution kernel of 1, a packet convolution of 4 willChannel of (2) is mapped to X in The same channel number, obtain X emb ∈R H/2×W/2×C
Step 4.2: x is to be emb A transducer block consisting of three parts, the normalization layer, SEA module and the transform MLP, is input, expressed by the following formula:
wherein,representing the features from the transducer branch through Patch Embedding and from the CNN branch, respectively, the Norm represents the normalization layer, the SEA represents the selective edge aggregation module, and the DenseMLP represents the densely connected feed forward network.
6. The method for medical image segmentation combining selective edge aggregation and deep neural networks according to claim 5, wherein the step 5 of designing the encoder and decoder structure based on densely connected CNNs comprises:
step 5.1: downsampling the input feature map twice in sequence starting from the first convolution block of the encoder, with the final resolution becoming (H/16, w/16);
step 5.2: the fusion feature of the jump connection between the encoder and the decoder is constructed and expressed by the following formula:
wherein,representing element-by-element additions>And->Representing the outputs of the CNN encoder and the transform encoder, respectively, block l-1, denseconv representing densely connected convolution blocks;
step 5.3: the concatenated channels are reduced to 1/4 of the original using a standard convolution, and then the number of channels is increased to 1/2 of the original channels by a series of densely connected convolution blocks.
7. The medical image segmentation method combining selective edge aggregation and deep neural network according to claim 6, wherein the specific steps of constructing the multistage optimization strategy in step 6 are as follows:
step 6.1: calculating overlay error between predicted and true values using IoU loss, i.e. target area lossExpressed by the formula:
wherein P represents the prediction result of the network, G represents the true value, and the subscript i represents the index of the pixel;
step 6.2: the boundary loss for minimizing the boundary error between P and G is calculated by:
(1) extracting P and G boundary P using max pooling operations b And G b The formula is as follows:
G b =Maxpooling(1-G,K)-(1-G),
P b =Maxpooling(1-P,K)-(1-P)
wherein K represents a sliding window size;
(2) by P b And G b Structural boundary lossThe formula is as follows:
wherein,and->Respectively representing a true value and a prediction boundary probability value of an ith position, wherein alpha is a weight coefficient for balancing the number of pixels;
step 6.3: using target area lossAnd boundary loss->Calculating a loss function->
Wherein lambda is 1 And lambda (lambda) 2 To balance the target area lossAnd boundary loss->Weight coefficient of (2);
step 6.4: probability map P for output based on a transform encoder e And probability map P based on decoder output of dense connection CNN d Performing multistage optimization to obtain total loss in training stageThe formula is as follows:
where N represents the number of transducer blocks in the transducer encoder and N represents the index of the transducer blocks.
8. The method for segmenting the medical image by combining selective edge aggregation and depth neural network according to claim 7, wherein the step 7 designs an image segmentation framework consisting of a transducer encoder structure, an encoder and decoder structure based on densely connected CNNs and a multi-level optimization strategy, and the specific process of completing the segmentation of the medical image is as follows:
step 7.1: inputting an original image into a transducer-based encoder and a dense connection CNN-based encoder, capturing and retaining global context information by utilizing a transducer branch, and extracting local information and spatial texture information by utilizing a CNN branch;
step 7.2: merging multi-scale features from low to high from dual encoders and up-sampling paths through densely connected CNN decoders;
step 7.3: the output of the transform encoder is directly extended to the target size and true value calculation loss, the output of the densely connected CNN decoder is calculated to the true value calculation loss, and the encoder and decoder are simultaneously optimized in a multistage optimization mode.
CN202311231035.1A 2023-09-22 2023-09-22 Medical image segmentation method combining selective edge aggregation and deep neural network Pending CN117557791A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311231035.1A CN117557791A (en) 2023-09-22 2023-09-22 Medical image segmentation method combining selective edge aggregation and deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311231035.1A CN117557791A (en) 2023-09-22 2023-09-22 Medical image segmentation method combining selective edge aggregation and deep neural network

Publications (1)

Publication Number Publication Date
CN117557791A true CN117557791A (en) 2024-02-13

Family

ID=89817390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311231035.1A Pending CN117557791A (en) 2023-09-22 2023-09-22 Medical image segmentation method combining selective edge aggregation and deep neural network

Country Status (1)

Country Link
CN (1) CN117557791A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118072024A (en) * 2024-04-16 2024-05-24 英瑞云医疗科技(烟台)有限公司 Fusion convolution self-adaptive network skin lesion segmentation method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118072024A (en) * 2024-04-16 2024-05-24 英瑞云医疗科技(烟台)有限公司 Fusion convolution self-adaptive network skin lesion segmentation method
CN118072024B (en) * 2024-04-16 2024-07-19 英瑞云医疗科技(烟台)有限公司 Fusion convolution self-adaptive network skin lesion segmentation method

Similar Documents

Publication Publication Date Title
CN113077471B (en) Medical image segmentation method based on U-shaped network
CN110930416B (en) MRI image prostate segmentation method based on U-shaped network
CN112446834B (en) Image enhancement method and device
CN111798462B (en) Automatic delineation method of nasopharyngeal carcinoma radiotherapy target area based on CT image
CN112819910B (en) Hyperspectral image reconstruction method based on double-ghost attention machine mechanism network
CN111784671B (en) Pathological image focus region detection method based on multi-scale deep learning
CN110648334A (en) Multi-feature cyclic convolution saliency target detection method based on attention mechanism
CN109993707B (en) Image denoising method and device
CN115661144B (en) Adaptive medical image segmentation method based on deformable U-Net
CN107492071A (en) Medical image processing method and equipment
CN115457021A (en) Skin disease image segmentation method and system based on joint attention convolution neural network
CN111325750B (en) Medical image segmentation method based on multi-scale fusion U-shaped chain neural network
CN111951164B (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
CN112446835B (en) Image restoration method, image restoration network training method, device and storage medium
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
CN113554665A (en) Blood vessel segmentation method and device
CN114723669A (en) Liver tumor two-point five-dimensional deep learning segmentation algorithm based on context information perception
CN117557791A (en) Medical image segmentation method combining selective edge aggregation and deep neural network
CN115375711A (en) Image segmentation method of global context attention network based on multi-scale fusion
CN113379707A (en) RGB-D significance detection method based on dynamic filtering decoupling convolution network
CN114548265A (en) Crop leaf disease image generation model training method, crop leaf disease identification method, electronic device and storage medium
CN110648331A (en) Detection method for medical image segmentation, medical image segmentation method and device
CN116645569A (en) Infrared image colorization method and system based on generation countermeasure network
CN116739899A (en) Image super-resolution reconstruction method based on SAUGAN network
CN113838067A (en) Segmentation method and device of lung nodule, computing equipment and storable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination