CN117934824A

CN117934824A - Target region segmentation method and system for ultrasonic image and electronic equipment

Info

Publication number: CN117934824A
Application number: CN202311691938.8A
Authority: CN
Inventors: 王旭; 李海燕; 马玉军; 白崇斌; 明文俊; 曾文
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-04-26

Abstract

The invention discloses a method, a system and electronic equipment for segmenting a target area of an ultrasonic image, and relates to the technical field of image processing. The method comprises the following steps: acquiring an ultrasonic image to be detected; inputting the ultrasonic image to be detected into a full-scale convolutional neural network model to obtain a target region segmentation result; the full-scale convolutional neural network model is obtained by training the initial full-scale convolutional neural network model by utilizing a joint loss function according to the historical target ultrasonic image set after the enhancement treatment. According to the method, the initial full-scale convolutional neural network model is constructed and trained, so that the accuracy of the segmentation of the ultrasonic image target region can be improved.

Description

Target region segmentation method and system for ultrasonic image and electronic equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, a system, and an electronic device for segmenting a target region of an ultrasound image.

Background

The target region segmentation technique of ultrasound images is widely used in different fields. However, conventional target region segmentation techniques rely largely on manual labeling, which is a time-consuming and labor-intensive process, subject to multiple factors, resulting in lower accuracy and reliability of the segmentation results. Conventional segmentation methods are typically based on optimal thresholds, region growing, active contour methods, supervision methods, and edge detection algorithms. However, conventional segmentation methods often require manual intervention or extensive superparameter fine tuning, resulting in poor usability in complex scenarios. In contrast, the deep learning algorithm can automatically extract the features, effectively overcomes the defects of the traditional target region segmentation algorithm, and can be rapidly expanded to different task scenes by means of transfer learning. The advent of deep Convolutional Neural Networks (CNNs) has greatly driven the development of image segmentation, U-Net networks are widely used in the field of image segmentation, and U-Net modified networks have been widely used for target region segmentation, inspired by U-Net network architecture, including U-Net based on gated attention (AttU-Net), U-Net (TransUNet) based on transducer encoder enhancement, U-Net (UCTransNet) based on transducer jump connection enhancement, U-Net (UNext) based on convolution and MLP, and U-Net (CMU-Net) based on ConvMixer. 1) AttU-Net: based on the attention mechanism introduced by the U-Net network, the jump connection of the U-Net network is reconstructed, the extraction capability of the network to the spatial information characteristics is enhanced, and the learning of noise and irrelevant information is restrained. 2) TransUNet: and integrating the Transformer into a U-Net encoder structure, combining the generalized deviation of CNN and the global context information extraction capability of Vit, and capturing the correlation between different targets so as to improve the image segmentation performance. 3) UCTransNet: the multi-scale channel cross fusion method based on the transform architecture replaces U-Net jump connection, and further supplements important space structure information, so that the image segmentation performance is improved. 4) UNext: multilayer perceptrons (MLPs) are integrated into the U-network architecture to improve the extraction of global context information, enabling more accurate target area localization. 5) CMU-Net: a novel method for integrating ConvMixer and multiscale attention gating with large kernel convolution into a U-Net framework to capture remote dependencies and detailed multiscale spatial features makes the features more diverse.

The above ultrasound image segmentation algorithm has mainly the following disadvantages: (1) The network feature extraction capability has the limitation that the features are easy to lose, the accuracy is low, and the target segmentation effect is poor. (2) The ultrasonic image has obvious speckle noise and artifact, large shape difference of target areas and blurred boundary, and has limitation on the target segmentation result. (3) The available data samples are deficient, and the performance and target segmentation effect of the neural network are poor. The reason for the above disadvantage is that: (1) The global context information extraction capability is insufficient, the dense prediction of the detail space information is insufficient, and the large-core convolution is ignored to extract the high-level semantic feature global context information, so that the edge-blurred target cannot be accurately positioned. (2) Ignoring information of different scales of advanced semantic features results in an inability to accurately predict target regions of large shape variance. (3) Ignoring the characteristics of different channels in the encoding stage, extracting more valuable characteristics from the noise-filled target ultrasonic image is insufficient, and the problem of inaccurate target boundary is caused. (4) Automatic data enhancement is ignored to improve the data sample size, so that the data sample size and the category are not rich enough, and the segmentation effect of the neural network model is affected.

Disclosure of Invention

The invention aims to provide a method, a system and electronic equipment for segmenting a target region of an ultrasonic image, which can improve the accuracy of segmenting the target region of the ultrasonic image.

In order to achieve the above object, the present invention provides the following solutions:

a target region segmentation method of an ultrasonic image comprises the following steps:

Acquiring an ultrasonic image to be detected;

Inputting the ultrasonic image to be detected into a full-scale convolutional neural network model to obtain a target region segmentation result; the full-scale convolutional neural network model is obtained by training an initial full-scale convolutional neural network model by utilizing a joint loss function according to the historical target ultrasonic image set after the enhancement treatment; the initial full-scale convolutional neural network model comprises an encoder, a global local feature mixing module network and a decoder which are sequentially connected; the encoder and the decoder are connected by a plurality of channel multi-scale attention gating modules.

Optionally, the encoder includes: a plurality of coding layers connected in sequence;

The difference between the number N of the coding layers and the number of the channel multi-scale attention gating modules is 1;

the input of the 1 st coding layer is an ultrasonic image to be detected or a historical target ultrasonic image set;

The output end of the ith coding layer is connected with the input end of the ith channel multi-scale attention gating module; i=1, 2,/i, N-1;

the output end of the Nth coding layer is connected with the input end of the global local feature mixing module network;

Any coding layer includes: the first convolution block, the second convolution block and the maximum pooling downsampling block are sequentially connected;

the structures of the first convolution block and the second convolution block are the same;

The first convolution block includes: the first convolution layer, the first batch of normalization layers and the first ReLU activation function are sequentially connected.

Optionally, the decoder includes: a plurality of decoding layers connected in sequence;

the difference between the number of decoding layers and the number of decoding layers is 1;

The input end of the 1 st decoding layer is connected with the output end of the global local feature mixing module network;

The input end of the ith decoding layer is connected with the output end of the N-ith channel multi-scale attention gating module; i=1, 2,/i, N-1;

The output of the (N-1) -th decoding layer is a target region segmentation result;

any decoding layer includes: the characteristic decoding block and the connection convolution block are connected in sequence;

the feature decoding block includes: the bilinear upsampling layer, the second convolution layer, the second batch of normalization layers and the second ReLu activation functions are connected in sequence;

The concatenated convolution block includes: the third convolution layer, the third batch of normalization layers, the third ReLu activation functions, the fourth convolution layer, the fourth batch of normalization layers, and the fourth ReLu activation functions are sequentially connected.

Optionally, the global local feature mixing module network includes: a plurality of global local feature mixing modules connected in sequence;

any global local feature mixing module includes: the integral detail perception layer and the channel feature fusion unit;

The input end of the integral detail perception layer is connected with the first input end of the channel characteristic fusion unit;

the output end of the integral detail perception layer is connected with the second input end of the channel characteristic fusion unit;

the overall detail perception layer comprises: the system comprises a channel grouping unit, a multi-channel depth separable convolution unit and a channel splicing unit which are connected in sequence;

The multi-channel depth separable convolution unit comprises a plurality of single-channel depth separable convolution subunits which are connected in parallel;

Any single channel depth separable convolution subunit includes: a fifth convolution layer and a large kernel depth separable convolution layer which are connected in sequence;

The channel feature fusion unit includes: a multi-layer perceptron layer and a sixth convolution layer connected in sequence; the input end of the sixth convolution layer is used as a first input end of the channel characteristic fusion unit and is connected with the input end of the channel grouping unit.

Optionally, the channel multi-scale attention gating module includes: the multi-scale feature extraction unit, the channel dimension splicing unit, the seventh convolution layer, the eighth convolution layer, the Sigmiod activation function and the multiplication splicing unit are connected in sequence;

The input end of the multi-scale feature extraction unit is connected with the input end of the multiplication and splicing unit;

the multi-scale feature extraction unit includes: a plurality of parallel single-scale feature extraction subunits;

any single-scale feature extraction subunit includes: the ninth convolution layer, the fifth normalization layer and the fifth ReLu activation functions, which are connected in sequence.

Optionally, before the obtaining of the ultrasonic image to be measured, the method further includes:

constructing an initial full-scale convolutional neural network model;

Acquiring a historical target ultrasonic image set; the historical target ultrasonic image set comprises a plurality of historical target ultrasonic images;

Performing enhancement processing on the historical target ultrasonic image set to obtain an enhanced historical target ultrasonic image set;

and training the initial full-scale convolutional neural network model by utilizing a joint loss function according to the enhanced historical target ultrasonic image set to obtain the full-scale convolutional neural network model.

Optionally, performing enhancement processing on the historical target ultrasonic image set to obtain an enhanced historical target ultrasonic image set, including:

Selecting one or more pixel-level enhancement algorithms from a pixel-level serial enhancement processing set, and performing pixel-level serial enhancement processing on the historical target ultrasonic image set to obtain a first enhanced historical target ultrasonic image set; the pixel-level serial enhancement processing set includes brightness adjustment, sharpness adjustment, gaussian noise, gaussian blur, and contrast adjustment;

optionally selecting one or more pixel-level enhancement algorithms from the spatial-level serial enhancement processing set, and performing spatial-level serial enhancement processing on the historical target ultrasonic image set to obtain a second enhanced historical target ultrasonic image set; the spatial-level serial enhancement processing set includes: rotation, scaling, horizontal flipping, vertical translation, horizontal translation, vertical miscut and horizontal miscut;

Optionally selecting one or more pixel-level enhancement algorithms from the spatial-level serial enhancement processing set, and performing spatial-level serial enhancement processing on the first enhancement historical target ultrasonic image set to obtain a third enhancement historical target ultrasonic image set;

And determining the union set of the second enhanced historical target ultrasonic image set and the third enhanced historical target ultrasonic image set as an enhanced historical target ultrasonic image set.

A target region segmentation system for ultrasound images, comprising:

the ultrasonic image acquisition module to be detected is used for acquiring an ultrasonic image to be detected;

The target region segmentation module is used for inputting the ultrasonic image to be detected into a full-scale convolutional neural network model to obtain a target region segmentation result; the full-scale convolutional neural network model is obtained by training an initial full-scale convolutional neural network model by utilizing a joint loss function according to the historical target ultrasonic image set after the enhancement treatment; the initial full-scale convolutional neural network model comprises an encoder, a global local feature mixing module network and a decoder which are sequentially connected; the encoder and the decoder are connected by a plurality of channel multi-scale attention gating modules.

An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the method of object region segmentation of an ultrasound image.

Optionally, the memory is a readable storage medium.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

According to the method, the system and the electronic equipment for segmenting the target region of the ultrasonic image, provided by the invention, the target ultrasonic image segmentation is completed based on an automatic target ultrasonic image enhancement strategy and a full-scale convolutional neural network. In the first stage, the automatic enhancement of the target ultrasonic image is to expand training data through a pixel space combined data enhancement method, and the automatic enhancement of the target ultrasonic image is applied to network training, so that the target ultrasonic image with stronger diversity and generalization capability is obtained, the data enhancement efficiency can be improved, the problem of data scarcity is solved, and the performance and generalization capability of the model are obviously improved. In the second stage, the target ultrasound image is segmented based on a full-scale convolutional neural network. On the basis of U-Net, a global local feature mixing module is used for fusing global and local features in a neural network, so that long-term dependence among pixels is established, and effective local features and global context information are extracted; and a channel multi-scale attention gating module is adopted to fully integrate multi-scale features of different layers of the encoder stage, so that important channel features are enhanced. The channel multi-scale attention gating module extracts spatial information with different scales on each channel, so that the network can more comprehensively understand and utilize the characteristics of input data, and the expressive capacity of the model on complex tasks is improved; the joint loss function based on boundary highlighting and region consistency is adopted to effectively guide the network to learn from the probability and pixel level, accurately predict the target region with clear boundary and improve the accuracy and reliability of the target region segmentation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for segmenting a target region of an ultrasound image in embodiment 1 of the present invention;

FIG. 2 is a flowchart of the image enhancement process in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of a full-scale convolutional neural network model in embodiment 1 of the present invention;

FIG. 4 is a schematic diagram of a global local feature mixing module according to embodiment 1 of the present invention;

FIG. 5 is a schematic diagram of a multi-scale attention gating module in embodiment 1 of the present invention;

fig. 6 is a graph of the target segmentation effect of different networks in embodiment 1 of the present invention;

fig. 7 is a schematic diagram of a target region segmentation system for ultrasound images in embodiment 1 of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1

As shown in fig. 1, the present embodiment provides a method for segmenting a target region of an ultrasound image, including:

Step 101: and obtaining an ultrasonic image to be measured.

Step 102: and inputting the ultrasonic image to be detected into the full-scale convolutional neural network model to obtain a target region segmentation result. The full-scale convolutional neural network model is obtained by training the initial full-scale convolutional neural network model by utilizing a joint loss function according to the historical target ultrasonic image set after the enhancement treatment. The initial full-scale convolutional neural network model comprises an encoder, a global local feature mixing module network and a decoder which are sequentially connected. The encoder and decoder are connected by a plurality of channel multi-scale attention gating modules.

The encoder includes: and a plurality of coding layers connected in sequence. The difference between the number of coding layers N and the number of channel multi-scale attention gating modules is 1. The input of the 1st coding layer is the ultrasonic image to be detected or the historical target ultrasonic image set. The output end of the ith coding layer is connected with the input end of the ith channel multi-scale attention gating module; i=1, 2,... The output end of the Nth coding layer is connected with the input end of the global local feature mixing module network. Any coding layer includes: the first convolution block, the second convolution block and the maximum pooling downsampling block are sequentially connected. The first convolution block and the second convolution block have the same structure. The first convolution block includes: the first convolution layer, the first batch of normalization layers and the first ReLU activation function are sequentially connected.

The decoder includes: and a plurality of decoding layers connected in sequence. The difference between the number of decoding layers and the number of decoding layers is 1. The input end of the 1 st decoding layer is connected with the output end of the global local feature mixing module network. The input end of the ith decoding layer is connected with the output end of the N-ith channel multi-scale attention gating module. i=1, 2,/i, N-1; the output of the N-1 decoding layer is the target region segmentation result. Any decoding layer includes: the characteristic decoding block and the connection convolution block are connected in sequence. The feature decoding block includes: the bilinear upsampling layer, the second convolution layer, the second batch normalization layer, and the second ReLu activation functions are connected in sequence. The concatenated convolution block includes: the third convolution layer, the third batch of normalization layers, the third ReLu activation functions, the fourth convolution layer, the fourth batch of normalization layers, and the fourth ReLu activation functions are sequentially connected.

The global local feature mixing module network comprises: and the global local feature mixing modules are sequentially connected. Any global local feature mixing module includes: and the whole detail perception layer and the channel characteristic fusion unit. The input end of the integral detail perception layer is connected with the first input end of the channel characteristic fusion unit. The output end of the integral detail perception layer is connected with the second input end of the channel characteristic fusion unit. The overall detail perception layer comprises: the system comprises a channel grouping unit, a multi-channel depth separable convolution unit and a channel splicing unit which are connected in sequence. The multi-channel depth separable convolution unit includes a plurality of single-channel depth separable convolution subunits connected in parallel. Any single channel depth separable convolution subunit includes: the fifth convolution layer and the large kernel depth separable convolution layer are connected in sequence. The channel feature fusion unit includes: and the multi-layer perceptron layer and the sixth convolution layer are sequentially connected. The input end of the sixth convolution layer is connected with the input end of the channel grouping unit as the first input end of the channel characteristic fusion unit.

The channel multi-scale attention gating module comprises: the multi-scale feature extraction unit, the channel dimension splicing unit, the seventh convolution layer, the eighth convolution layer, sigmiod activation functions and the multiplication splicing unit are connected in sequence. The input end of the multi-scale feature extraction unit is connected with the input end of the multiplication and splicing unit. The multi-scale feature extraction unit includes: a plurality of parallel single-scale feature extraction subunits. Any single-scale feature extraction subunit includes: the ninth convolution layer, the fifth normalization layer and the fifth ReLu activation functions, which are connected in sequence.

Prior to step 101, further comprising:

Step 103: and constructing an initial full-scale convolutional neural network model.

Step 104: and acquiring a historical target ultrasonic image set. The historical target ultrasound image set includes a plurality of historical target ultrasound images.

Step 105: and carrying out enhancement treatment on the historical target ultrasonic image set to obtain the enhanced historical target ultrasonic image set.

Step 106: and training the initial full-scale convolutional neural network model by utilizing a joint loss function according to the enhanced historical target ultrasonic image set to obtain the full-scale convolutional neural network model.

Step 105, comprising:

Step 105-1: and optionally selecting one or more pixel-level enhancement algorithms from the pixel-level serial enhancement processing set, and performing pixel-level serial enhancement processing on the historical target ultrasonic image set to obtain a first enhanced historical target ultrasonic image set.

Step 105-2: and optionally selecting one or more pixel-level enhancement algorithms from the spatial-level serial enhancement processing set, and performing spatial-level serial enhancement processing on the historical target ultrasonic image set to obtain a second enhanced historical target ultrasonic image set. The pixel-level serial enhancement processing set includes brightness adjustment, sharpness adjustment, gaussian noise, gaussian blur, and contrast adjustment.

Step 105-3: and optionally selecting one or more pixel-level enhancement algorithms from the spatial-level serial enhancement processing set, and performing spatial-level serial enhancement processing on the first enhancement historical target ultrasonic image set to obtain a third enhancement historical target ultrasonic image set. The spatial-level serial enhancement processing set includes: rotation, scaling, horizontal flipping, vertical translation, horizontal translation, vertical miscut, and horizontal miscut.

Step 105-4: and determining the union set of the second enhanced historical target ultrasonic image set and the third enhanced historical target ultrasonic image set as the enhanced historical target ultrasonic image set.

Specifically, the target ultrasonic image segmentation model is a full-scale convolutional neural network trained based on the target ultrasonic image automatic enhancement data. The full-scale convolutional neural network is a neural network based on U-Net, and convolution operation and maximum pooling are adopted in the full-scale convolutional neural network as an encoder in the U-Net. The global local feature mixing module is adopted in the full-scale convolutional neural network as a bottleneck layer in the U-Net, and is input as the last layer of features of the encoder and output as global local mixing features. A channel multi-scale attention gating module is adopted in the full-scale convolutional neural network to serve as jump connection between an encoder and a decoder in the U-Net. In the decoder, each decoding layer is combined by the output of the feature decoding block and the output of the channel multi-scale attention gating module, and a decoding operation is performed by convolution. The output of the last decoding layer of the full-scale convolutional neural network is the target ultrasonic image segmentation result.

The training process of the full-scale convolutional neural network comprises the following steps:

1. a dataset of a target ultrasound image is acquired.

Wherein the dataset comprises a skin lesion image UDIAT dataset and a BUSI dataset. The UDIAT dataset contains 163 target ultrasound images. The BUSI dataset provides 780-target ultrasound images, of which 210 first targets, 437 second targets, and the rest is ultrasound images without targets.

Each target ultrasound image in the dataset is resized to 256 x 256.

2. The target ultrasound image is automatically enhanced.

And generating the target ultrasonic image with rich characteristics by adopting a pixel space joint enhancement method for the target ultrasonic image with the adjusted size, so as to increase the diversity of image data samples in the training process.

The target ultrasonic image enhancement method as shown in fig. 2 comprises the following steps:

Firstly, carrying out p times of random pixel-level serial enhancement operation on an input original target ultrasonic image, then carrying out s times of random space-level serial enhancement operation on the target ultrasonic image after pixel enhancement, and finally obtaining the target ultrasonic image after pixel space joint enhancement. To avoid destroying the original data features, no enhancement operations such as color inversion, exposure, etc. are employed, so the pixel enhancement set includes only 5 enhancement methods: brightness adjustment, sharpness adjustment, gaussian noise, gaussian blur, and contrast adjustment; the spatial enhancement set contains 7 methods: rotation, scaling, horizontal flipping, vertical translation, horizontal translation, vertical miscut, and horizontal miscut. In addition, in order to avoid distortion of the ultrasonic image caused by excessive enhancement, a pixel-level enhancement operation p.ltoreq.1 is set, and a total enhancement operation 2.ltoreq.p+s.ltoreq.3 is performed as follows:

O_p＝SSample_2-p[PSample_p(I)]∪SSample_3-p[PSample_p(I)]

Wherein I represents a resized target ultrasound image; PSample _p denotes randomly taking p of the enhancement methods from the pixel enhancement set for serial enhancement; SSample _s represents randomly taking s enhancement methods from the spatial enhancement set to perform serial enhancement; u represents parallel enhancement, and the finally generated 2 enhanced target ultrasonic images are combined into a data set; o _p represents a target ultrasonic image enhancement data set generated on the premise of carrying out p-time pixel serial enhancement; the p value is 0 and 1, and an input target ultrasonic image I can finally obtain 4 enhanced images.

The generated target ultrasonic images are added into the original data set of the target ultrasonic images in batches, standardized through the mean value and the standard deviation, used for training the full-scale convolutional neural network, and the trained full-scale convolutional neural network is used as a target ultrasonic image segmentation model.

3. Training a full-scale attention convolutional neural network.

Based on PyTorch platform, the training of the full-scale attention convolutional neural network is completed on a NVIDIA GeForce RTX 3060Ti GPU with a video memory of 8G. The network was optimized using Adam optimizer with an initial learning rate of 0.0001 and momentum (momentum) set to 0.9. Batch size (batch size) was set to 8 and 300 iterations (epoch) of training were performed. The performance of the different methods was evaluated using six indices, the Dice coefficient (Dice), the cross ratio (IoU), the Sensitivity (SE), the Specificity (SP) and the Accuracy (ACC).

Wherein TP, FP, TN and FN are true positive, false positive, true negative and false negative, respectively.

In the training process of the full-scale convolutional neural network, the joint loss function based on region consistency and boundary highlighting is adopted, so that the position of a target region is effectively highlighted, and the clear boundary and the shape structure of the target region are accurately predicted. The joint loss function L _total＝λL_BCE+L_Dice performs supervised training on the network from the probability and pixel level respectively to obtain accurate boundary shape information and a segmentation result with high confidence. Where L _BCE and L _Dice are binary cross entropy Loss (Binary Cross Entropy Loss) and Dice Loss (Dice Loss), respectively. λ is the weight coefficient between the loss functions, set to 0.5.

The Dice impairment is used to evaluate the similarity between the predicted result and the real label in the image segmentation task. Binary cross entropy loss measures the difference between the probability distribution of model predictions and the true probability distribution by calculating the cross entropy between the true labels and the model predictions, which helps to improve the degree of discrimination between similar targets and improve the accuracy of the boundaries. The specific calculations of the binary cross entropy loss and the Dice loss are as follows:

Wherein y _i represents a real label, and p _i represents model prediction; l _Dice represents the Dice loss (Dice), a represents the feature map of the network split of the present invention, and B represents the ground truth (mask).

Performing feature extraction on an input target ultrasonic image by a full-scale convolutional neural network (FSC-Net), capturing global context features and local texture detail features, mixing global local features, carrying out multi-scale attention features of channels, and fusing different layers of features in a coding stage; the pixel space based joint enhancement method is adopted when the full-scale network is trained, so that the diversity of image data samples in the training process is increased, and more abundant lesion information is provided. The full-scale convolutional neural network is based on U-Net design, convolution operation and maximum pooling are adopted as encoders, and rich characteristic information is extracted. The global local feature mixing module (LGFM) is used as a bottleneck layer in the U-Net, remote dependency relations among pixels are captured through large-kernel convolution operation, local detail features are extracted, and targets with different shapes and sizes are better acquired, positioned and identified through fusion of the whole and detail information. The symmetric encoder-decoder structure using a channel multi-scale attention gating Module (MSAG) as a network backbone reconstructs jump connection, performs feature extraction on encoder features of different scales containing noise and redundant information by adopting multi-scale operation, and performs self-adaptive selection on the features by dynamically adjusting attention weights, thereby being beneficial to better retaining useful information. And performing network supervision training in terms of probability and pixel level by using a joint loss algorithm to obtain accurate boundary shape information and a segmentation result with high confidence.

The technical effects of the joint attention convolution neural network of the invention are as follows: (1) The ultrasonic target data enhancement method is adopted to increase the diversity of samples, relieve the scarcity of data and improve the performance of a network; (2) The global local feature mixing module LGFM is used as a bottleneck layer, the detail features are reserved while the relations among all pixels are captured, targets and boundaries with different scales are effectively captured, and the segmentation precision is improved; (3) The channel multi-scale attention module MSAG is used for reconstructing jump links, so that more useful features are captured from a target ultrasonic image filled with noise, noise redundancy is reduced, and misclassification of pixels is avoided.

As shown in fig. 3, the encoder in the full-scale convolutional neural network comprises a1 st encoding layer to an N-th encoding layer which are sequentially connected, and further comprises a1 st decoding layer to an N-1 st decoding layer which are sequentially connected, wherein N is a positive integer; each coding layer contains two normal convolution blocks and one downsampling operation. Each normal convolution block consists of a convolution layer with a3 x3 convolution kernel, a step size of 1, a padding of 1, a batch normalization layer and a ReLU activation function. Each coding layer finally performs a downsampling operation using a maximum pooling of 2 x2 window sizes.

As shown in fig. 4, the global local feature mixing modules include 1 st global local feature mixing module to L th global local feature mixing module, and L global local feature mixing modules. The input end of the 1 st global local feature mixing module is the output feature of the N-th coding layer, the input end of the j-1 st global local feature mixing module is the output feature of the j-1 st global local feature mixing module, and the output end of the L-th global local feature mixing module is connected with the 1 st decoding layer; j has a value ranging from 2 to L.

As shown in fig. 5, the channel multi-scale attention gating module includes 1 st channel multi-scale attention gating module to N-1 th channel multi-scale attention gating module, N-1 th channel multi-scale attention gating module altogether, the input end of the i th channel multi-scale attention gating module is the output characteristic of the i th coding layer, and the output end of the i th channel multi-scale attention gating module is connected with the N-i decoding layer; i has a value ranging from 1 to N-1.

As shown in fig. 4, the global local feature mixing module includes a global detail sensing block and a channel feature fusion unit. The overall detail perception block divides channels of an input feature map X into four groups, the large-core depth separable convolution with the convolution cores of 7 multiplied by 1 and 1 multiplied by 7 is utilized to extract global features, the convolution with the convolution core of 3 multiplied by 3 is utilized to reserve partial local features, four groups of convolution results are spliced in the channel dimension, and a first global local feature map X' with the channel number of C is obtained. The channel feature fusion unit adopts a multi-layer perceptron (MLP) to increase the number of channels of the first global local feature map from C dimension to 4 XC, so as to obtain more abstract and advanced feature representation; reducing the channel number from 4 XC to C by using MLP dimension reduction to more abstract and advanced features, weighting and integrating the features to extract a second global local feature map with more representativeness; then the second global local feature map is convolved with kernel size of 1 and input channel and output channel of CAnd carrying out feature fusion to realize global and local feature fusion, and finally obtaining fusion features Y.

As shown in fig. 5, the channel multi-scale attention gating module gives the input feature F, divides the channels of the output feature of the i-th encoder into four groups, and convolves the four groups of channels with different sizes of convolution checks to obtain a multi-scale feature F ₁′,F₂′,F₃′,F₄'. The channel multi-scale features are obtained using the formula:

where Norm represents the batch normalization and Relu is an activation function for nonlinear variation.

The multi-scale channel features are spliced along the channel dimension, the feature channels are changed into C/2 from C to C by using two 1X 1 convolutions, then the multi-scale feature output probabilistic attention weights are obtained through Sigmiod activation functions, finally the flow and screening of the features are controlled by multiplying the attention weights with the input features F, and finally the multi-scale attention feature map of the channel is obtained. The process may be represented by the formulaA channel multi-scale attention profile is obtained.

Wherein F represents the output characteristics of the ith encoder, concat represents the channel connection, F' represents the channel multi-scale characteristics,Representing 1×1 convolution, input channel being C, middle layer channel being C/2, output channel being C; delta represents Sigmoid nonlinear variation operation. a represents the attention weight and F _out represents the channel multi-scale attention profile.

The decoding layer comprises a characteristic decoding block and a connection convolution block; and the input end of the connection convolution block is used for carrying out characteristic splicing of channel dimensions on the channel multi-scale attention characteristic diagram and the first decoding characteristic.

The first decoding feature is obtained by the feature decoding block by using a bilinear upsampling operation (Bilinear Interpolation Upsampling) and a convolution operation with a convolution kernel of 3 x 3, and then activating the function by batch normalization and ReLu; the concatenated convolution block includes convolutions with a convolution kernel of 3 x 3, batch normalization, and ReLu activation function operations repeated twice, and finally outputs a second decoding characteristic of the decoding layer. The input of the 1 st decoding layer is the output Y of the L-th layer global local feature mixing module, and the inputs of the other decoding layers are the second decoding features of the last decoding layer.

And the input end of the connection convolution block performs channel dimension splicing on the first decoding characteristic and the channel multi-scale attention characteristic diagram. And carrying out 1×1 convolution operation on the output characteristics of the last decoding layer, outputting a characteristic diagram with the number of channels of 1 and the size of 256×256, and finally obtaining a target ultrasonic image segmentation result. The decoder analyzes the global context information and the local detail characteristics in the encoder step by step, and then the efficient channel multi-scale attention gating module CMAG is embedded, so that the processing capacity of the model on the multi-scale characteristics is enhanced, the detail information is better reserved in the characteristics containing a large amount of noise, the semantic information is enhanced, the decoder is helped to restore the detail information better, and the accuracy of a segmentation result is improved.

By experiment, the performance of the different models on "savdelil Parc Tauli's UDIAT diagnostic center breast cancer ultrasound image dataset (UDIAT dataset)" and "egypt Baheya hospital breast cancer ultrasound image dataset (BUSI dataset)" are shown in tables 1 and 2, respectively. As shown in tables 1 and 2, the model of the invention can be seen to have significantly better results than other comparative methods through the evaluation of the segmentation index; the Dice of the model FSC-Net is improved by 4.04%, 5.64%, 7.01%, 5.05%, 6.29% and 8.09% respectively compared with a U-shaped network (U-Net), a U-Net based on gating attention (AttU-Net), a U-Net (TransUNet) based on a transducer encoder enhancement, a U-Net (UCTransNet) based on a transducer jump connection enhancement, a U-Net (UNext) based on rolling and MLP and a U-Net (CMU-Net) model based on ConvMixer.

Table 1 comparison of the performance of different methods on UDIAT datasets

Model	IOU	Dice	SE	Precision	ACC
						U-Net	73.45±0.30	84.42±0.37	86.84±0.40	82.34±0.34	98.52±0.48
AttUNet	75.25±0.29	85.46±0.36	85.49±0.36	85.55±0.36	98.66±0.48
						TransUNet	76.49±0.33	86.47±0.39	85.98±0.39	87.01±0.40	98.82±0.48
UCTransNet	74.53±0.25	85.05±0.31	84.23±0.28	87.03±0.36	98.62±0.41
						UNext	75.90±0.33	86.14±0.39	84.42±0.37	88.19±0.42	98.79±0.48
CMU-Net	78.31±0.32	87.55±0.39	88.02±0.40	87.16±0.37	98.79±0.48
						FSC-Net	81.54±0.42	89.78±0.46	89.87±0.44	90.19±0.47	99.04±0.49

Table 2 comparison of the performance of different methods on BUSI datasets

Fig. 6 shows the segmentation effect graphs of different networks in UDIAT data sets and BUSI data sets, wherein the first two rows represent the segmentation effect graphs of different networks in UDIAT data sets, and the last three rows represent the segmentation effect graphs of different networks in BUSI data sets.

The FSC-Net model provided by the invention has better segmentation effect, and particularly has better lesion segmentation effect on unobvious comparison of foreground and background, shadow noise interference, fuzzy boundary, large size change and irregular shape. The target ultrasonic image is adopted to automatically enhance data to train the network, so that the problem of data scarcity is relieved, and the training effect and performance of the network are improved; the efficient global local feature mixing module solves the problem of cross-scale information fusion in a target ultrasonic image segmentation task, and simultaneously focuses on global context information and local detail information, so that a network can better understand an ultrasonic image, capture fine edge features and accurately position a target; extracting multi-scale characteristics of a channel, and solving the problem that the diversity of the characteristics is lost due to the fixed size of a convolution kernel; the method is beneficial to the network to acquire richer and diversified feature representations, so that the model is more robust to the scale change of the target; the full-scale convolutional neural network is applied to a computer-aided diagnosis system for analyzing skin images, has an important guiding effect on subsequent clinical operations, and can be used in other medical image segmentation fields.

Example 2

As shown in fig. 7, in order to execute the method corresponding to the above embodiment 1 to achieve the corresponding functions and technical effects, a target region segmentation system for ultrasound images is provided below, which includes:

The ultrasonic image to be measured acquisition module 201 is used for acquiring the ultrasonic image to be measured.

The target region segmentation module 202 is configured to input an ultrasonic image to be detected into the full-scale convolutional neural network model, and obtain a target region segmentation result. The full-scale convolutional neural network model is obtained by training the initial full-scale convolutional neural network model by utilizing a joint loss function according to the historical target ultrasonic image set after the enhancement treatment. The initial full-scale convolutional neural network model comprises an encoder, a global local feature mixing module network and a decoder which are sequentially connected. The encoder and decoder are connected by a plurality of channel multi-scale attention gating modules.

Example 3

The present embodiment provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to execute the computer program to cause the electronic device to execute a method for segmenting a target region of an ultrasound image in embodiment 1. Wherein the memory is a readable storage medium.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. The method for segmenting the target region of the ultrasonic image is characterized by comprising the following steps of:

Acquiring an ultrasonic image to be detected;

2. The method of claim 1, wherein the encoder comprises: a plurality of coding layers connected in sequence;

3. The method of claim 2, wherein the decoder comprises: a plurality of decoding layers connected in sequence;

4. A method of segmenting a target region of an ultrasound image according to claim 3, wherein the global local feature blending module network comprises: a plurality of global local feature mixing modules connected in sequence;

5. The method for segmenting a target region in an ultrasound image according to claim 4, wherein the channel multi-scale attention gating module comprises: the multi-scale feature extraction unit, the channel dimension splicing unit, the seventh convolution layer, the eighth convolution layer, the Sigmiod activation function and the multiplication splicing unit are connected in sequence;

6. The method for segmenting a target region of an ultrasound image according to claim 1, further comprising, before acquiring the ultrasound image to be measured:

constructing an initial full-scale convolutional neural network model;

7. The method for segmenting the target region of the ultrasound image according to claim 6, wherein the step of performing enhancement processing on the historical target ultrasound image set to obtain the enhanced historical target ultrasound image set comprises:

8. A target region segmentation system for ultrasound images, comprising:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, the processor running the computer program to cause the electronic device to perform a method of segmenting a target region of an ultrasound image according to any one of claims 1 to 7.

10. The electronic device of claim 9, wherein the memory is a readable storage medium.