CN114066902A - Medical image segmentation method, system and device based on convolution and transformer fusion - Google Patents

Medical image segmentation method, system and device based on convolution and transformer fusion Download PDF

Info

Publication number
CN114066902A
CN114066902A CN202111381789.6A CN202111381789A CN114066902A CN 114066902 A CN114066902 A CN 114066902A CN 202111381789 A CN202111381789 A CN 202111381789A CN 114066902 A CN114066902 A CN 114066902A
Authority
CN
China
Prior art keywords
convolution
layer
fusion
network
transformer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111381789.6A
Other languages
Chinese (zh)
Inventor
方贤勇
王凯兵
汪粼波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202111381789.6A priority Critical patent/CN114066902A/en
Publication of CN114066902A publication Critical patent/CN114066902A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Abstract

The invention belongs to the field of image segmentation, and particularly relates to a medical image segmentation method, system and device based on convolution and transform fusion. The method comprises the following steps: s1: and constructing an improved transformer module with a sliding window based on the standard transformer module. S2: and constructing a deep fusion network comprising a convolution module, an improved transformer module, a feature fusion module and a decoder module. S3: a plurality of medical images with polyps are selected to form an original data set, and the original data set is divided into a training set and a testing set. S4: and setting a learning strategy, a training epoch and a loss function in a training stage, and training and testing the deep fusion network by using a training set and a testing set. S5: and saving the trained deep fusion network for the segmented medical image. The invention solves the problems of insufficient receptive field, incapability of effectively establishing remote dependence, utilization of global context information and the like of various conventional convolutional neural networks on the medical image segmentation problem.

Description

Medical image segmentation method, system and device based on convolution and transformer fusion
Technical Field
The invention belongs to the field of image segmentation, and particularly relates to a medical image segmentation method, system and device based on convolution and transform fusion.
Background
Medical image segmentation is an important and challenging research topic; many common tasks involved in clinical applications, such as: polyp segmentation, lesion segmentation, cell segmentation, and the like. Is also the most complex and critical step in the field of medical image processing. Medical image segmentation plays an important role in computer-aided clinical diagnosis systems; the method can be used for semi-automatically or automatically segmenting and extracting partial features with special significance in the medical image, so that reliable basis is provided for clinical diagnosis and pathological research, and doctors are assisted to make more accurate diagnosis.
Convolutional neural networks represented by Res-net have enjoyed great success in the field of computer vision, particularly in the directions of target detection, picture classification, picture segmentation, and the like. Similarly, the convolutional neural network is dominant in a series of medical image segmentation. The U-net proposes a classic coding and decoding structure, and is prominent in segmentation task; the encoder extracts features through continuous down-sampling, and the decoder performs up-sampling by gradually utilizing the features output by the encoder through skipping connection, so that the network can more fully utilize the features. On the basis, technicians develop a series of algorithm networks specially designed for medical image segmentation, such as Unet + +, Res-Unet, Attention-Unet, DenseUnet, R2U-net and the like in succession, and all achieve good segmentation results.
CNNs (convolutional neural networks) have met bottlenecks, although they have met with great success in the field of medical image segmentation. The field of experience of the convolution operation is very limited, and only very local features can be calculated, but global features cannot be calculated, and context information cannot be utilized. Although the field of view may be increased in some networks by stacking convolutional layers and downsampling, this approach still loses much information. Furthermore, the skilled person tries to improve the problem by using new convolution operations (such as hole convolution or inflected convolution), but this again complicates the network and makes it easier for overfitting phenomena to occur. These problems all limit the application of convolutional networks to high-precision medical image segmentation.
Disclosure of Invention
The method aims to solve the problems that the conventional various convolutional neural networks have insufficient receptive field, cannot effectively calculate and obtain global characteristics and have network parameter expansion and overfitting phenomena in the medical image segmentation problem; the invention provides a medical image segmentation method, system and device based on convolution and transformer fusion.
The invention is realized by adopting the following technical scheme:
a medical image segmentation method based on convolution and transformer fusion comprises the following steps:
s1: and constructing an improved transformer module with a sliding window based on the standard transformer module. The improved transform module consists of two consecutive Swin transform blocks. The former Swin Transformer Block comprises a window-based MSA (W-MSA, Multi-headed attention) layer and an MLP (Multi-layer susceptor) layer connected in series; both the window based MSA layer and the MLP layer are preceded by an ln (layernorm) layer and connected using residual after the window based MSA layer and the MLP layer. The latter Swin Transformer Block comprises a shifted window based MSA (SW-MSA, multi-headed attention with sliding window) layer and an MLP layer which are connected in sequence; the shifted window based MSA layer and the MLP layer are both provided with LN layers in front of the LN layers, and are connected by using residual errors after the shifted window based MSA layer and the MLP layer.
S2: and constructing a deep fusion network, wherein the deep fusion network comprises a convolution module, an improved transformer network, a feature fusion module and a decoder module. The input of the depth fusion network is a medical image, and the output is a segmentation result of a target region in the medical image. After being input into the depth fusion network, the medical image is firstly processed by a convolution module. The convolution module is a backbone network, and the output path of the convolution characteristic output by the convolution module is divided into three paths. The first path is input into a modified transformer network. And outputting the second path to a characteristic fusion module, and performing characteristic fusion with the transformer characteristics output by the improved transformer network. And the third path and the feature fusion module output fusion features and jointly send the fusion features to the decoder module to complete decoding, so that a required segmentation result is obtained.
S3: selecting a plurality of medical images with polyps as original data to form an original data set, and carrying out image transformation and enhancement processing on the original data in the original data set to expand the number of the original data set. The raw data set was then divided into a training set and a test set according to a 2:1 data size ratio.
S4: and setting a learning strategy, a training epoch and a loss function in a training stage, training the constructed deep fusion network by using a training set, and testing a training effect by using a test set.
S5: storing a deep fusion network with the performance reaching a preset index after training is finished and testing; and performing semantic segmentation on the medical image to be segmented by using the network as an image segmentation network.
As a further improvement of the present invention, in step S1, in the improved transform network, the calculation formula of the consecutive Swin transform Block is:
Figure BDA0003365886380000021
Figure BDA0003365886380000022
Figure BDA0003365886380000023
Figure BDA0003365886380000024
in the above formula, Zi-1Representing the input characteristics of the Swin Transformer Block of the i-th layer;
Figure BDA0003365886380000031
representing the output of the W-MSA of the i-th layer;ZiThe output characteristic of Swin Transformer Block at the i-th layer is also the input characteristic of the i + 1-th layer; zi+1The output characteristics of the Swin Transformer Block of the (i + 1) th layer are shown;
Figure BDA0003365886380000032
represents the output of the W-MSA of the i +1 th layer;
as a further improvement of the invention, in the deep fusion network of step S2, the convolution module selects Res2net-50 to form the backbone part of the network, and after the medical image is output to the convolution module, the convolution characteristics e from shallow layer to deep layer are obtained in turn through convolution processingiI is 1, 2, 3, 4; the channel dimensions of the four sets of convolution features are 256, 512, 1024, 2048, respectively, and the feature scales are 128, 64, 32, 16, respectively.
Convolution characteristic e of convolution module outputiAfter being processed by the improved transformer network, four groups of transformer characteristics containing global characteristics are respectively obtained and marked as di,i=1、2、3、4。
As a further improvement of the present invention, in the deep fusion network of step S1, the feature fusion module includes a front convolutional layer, an upsampling layer, a feature splicing layer, and a back convolutional layer. The front and back convolutional layers are both two 3 x 3 convolutional modules. The feature map scale of the up-sampling layer output is twice that of the input. The feature splicing layer splices the two input features in the channel dimension, and outputs the two input features as a fusion feature after post-convolution layer processing.
As a further improvement of the invention, in the feature fusion module, the improved transformer network outputs the transformer feature diProcessing by a pre-convolution layer, then carrying out scale transformation by an up-sampling layer, and carrying out size and convolution characteristic e of the transform characteristic processed by the up-sampling layeriThe same is true. Followed by the same size of transformer feature MiAnd inputting the convolution characteristics into a network of a characteristic splicing layer, splicing the two input paths of characteristics on a channel dimension by the characteristic splicing layer, and then processing the post-convolution layer to obtain the required fusion characteristics Zi
Wherein, the fusion character Z output by the feature fusion moduleiThe expression of the characters is as follows:
Mi=upsample(conv(conv(ei)),
Zi=σ(conv(cat(Mi,di)))
wherein conv represents 3 × 3 convolution, step size is 1, upsample is upsampling, cat is splicing operation, and σ is relu activation function.
As a further improvement of the invention, in the deep fusion network of step S1, the input of the decoder module is the convolution characteristic e of the convolution module outputiAnd the fusion characteristics output by the characteristic fusion module; the output of the decoder module is the decoded image segmentation result.
As a further refinement of the present invention, in step S3, the raw data in the raw data set is derived from the public polyp data sets kvasir, cvc-clicicDB, ETIS, cvc-colonDB, and EndoScene. The image transformation methods adopted in the data set amplification process comprise random horizontal mirror image inversion, vertical mirror image inversion and angle rotation of 90 degrees, 180 degrees and 270 degrees. The adopted image enhancement processing method comprises random brightness, contrast and sharpening adjustment. The random probability of each image transformation method and image enhancement processing method is set to be 0.5.
As a further improvement of the present invention, in the training process of step S4, the beclos function and the IOU function are selected as the loss function, the PolyLr learning rate reduction strategy is selected, the learning rate is set to 0.0001, and the epoch of the training is set to 240.
The invention also comprises a medical image segmentation system based on convolution and transformer fusion, wherein the medical image segmentation system adopts the medical image segmentation method based on convolution and transformer fusion to perform semantic segmentation on the acquired medical image so as to obtain an image segmentation prediction result of the target characteristic.
The medical image segmentation system comprises: the system comprises an image acquisition module, a convolution network, an improved transformer network, a feature fusion network and a decoder.
The image acquisition module is used for acquiring a medical image to be segmented and preprocessing the medical image so as to meet the input standard of the system.
The convolutional network uses Res2net-50 to form the backbone network of the system. After the medical image is input into the convolution network for processing, the output of the convolution network is convolution characteristics, and the output path of the convolution characteristics comprises three paths.
The improved transformer network receives a first convolution characteristic output by the convolution network. The improved transform network consists of two consecutive Swin transform blocks. The former Swin Transformer Block comprises a window based MSA layer and an MLP layer which are connected in sequence; the window based MSA layer and the MLP layer are both preceded by an LN layer and connected using a residual after the window based MSA layer and the MLP layer. The latter Swin Transformer Block comprises a shifted window based MSA layer and an MLP layer connected in sequence. The shifted window based MSA layer and the MLP layer are both provided with LN layers in front of the LN layers, and are connected by using residual errors after the shifted window based MSA layer and the MLP layer. The input convolution characteristics are processed by the improved transformer network, and then the output is transformer characteristics.
And the feature fusion network receives the second path of convolution features output by the convolution network and the transformer features output by the improved transformer network. The feature fusion module comprises a front convolution layer, an upper sampling layer, a feature splicing layer and a rear convolution layer. The front convolution layer and the rear convolution layer are two convolution modules of 3 x 3, and the characteristic graph output by the upper sampling layer is twice of the input characteristic graph. The feature fusion network firstly carries out pre-convolution layer processing on input transformer features and then carries out scale transformation on the input transformer features through an up-sampling layer, and the dimensions of the transformer features processed by the up-sampling layer are the same as those of the convolution features. And splicing the convolution characteristics and the transformer characteristics on the channel dimension in the characteristic splicing network, and then outputting the spliced characteristics after the processing of the post convolution layer.
And the decoder receives the third path of convolution characteristics output by the convolution network and the fusion characteristics output by the characteristic fusion network, and then decodes the third path of convolution characteristics and the fusion characteristics to obtain the semantic segmentation result of the required medical image.
The invention also comprises a medical image segmentation apparatus based on convolution and transformer fusion, comprising a memory, a processor and a computer program stored on the memory and executable on the processor. The processor, when executing the program, performs the steps of the medical image segmentation method based on convolution and transform fusion as described above.
The technical scheme provided by the invention has the following beneficial effects:
in the medical image segmentation method based on convolution and transformer fusion, a new depth fusion network is creatively constructed. In the deep fusion network, an improved swin transformer module is introduced; the self-attention mechanism in the swin transformer module can fully utilize the context information and establish remote dependence. And the problems that the traditional convolutional neural network is small in receptive field and cannot utilize global information, extracted feature information is lost, and a segmentation processing result is inaccurate are solved.
The improved transform network used in the embodiment also solves the problems of large data volume and high complexity in the large-scale characteristic diagram processing process through two consecutive Swin transform blocks with sliding windows, thereby improving the network robustness and avoiding the over-fitting of the network. Meanwhile, richer global information is obtained, and the accuracy of medical image segmentation is improved.
In order to better utilize the features generated by the swin transformer module, a feature fusion module is further provided, and the feature fusion module can well fuse the two features of the convolution feature and the transform feature together to generate a fusion feature, so that the embodiment provides the advantage that the deep fusion neural network can combine the features of both the convolution feature and the transform.
Drawings
Fig. 1 is a flowchart illustrating steps of a medical image segmentation method based on convolution and transform fusion according to embodiment 1 of the present invention.
Fig. 2 is a schematic structural diagram of an improved transformer module in embodiment 1 of the present invention.
Fig. 3 is a model architecture diagram of the deep convergence network constructed in embodiment 1 of the present invention.
Fig. 4 is a basic flowchart of a deep convergence network processing procedure in embodiment 1 of the present invention.
Fig. 5 is a schematic structural diagram of a feature fusion module in embodiment 1 of the present invention.
Fig. 6 is a schematic block diagram of a medical image segmentation system based on convolution and transform fusion according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example 1
The present embodiment provides a medical image segmentation method based on convolution and transform fusion, as shown in fig. 1, the medical image segmentation method includes the following steps:
s1: and constructing an improved transformer network with a sliding window based on the standard transformer module. As shown in fig. 2, the improved Transformer network consists of two consecutive Swin Transformer blocks. The former Swin Transformer Block comprises a window based MSA layer and an MLP layer which are connected in sequence; the window based MSA layer and the MLP layer are both preceded by an LN layer and connected using a residual after the window based MSA layer and the MLP layer. The latter Swin Transformer Block comprises a shifted window based MSA layer and an MLP layer which are connected in sequence; the shifted window based MSA layer and the MLP layer are both provided with LN layers in front of the LN layers, and are connected by using residual errors after the shifted window based MSA layer and the MLP layer.
The standard transform module is usually composed of a multi-headed attention Module (MSA) and a multi-layered perceptron (MLP); and applying a layernorm (ln) layer before each MSA module and each MLP module; while a residual join is also applied after each module. Therefore, the output of i-layer in the encoder of the transform module can be expressed as:
Figure BDA0003365886380000061
Figure BDA0003365886380000062
in the above formula, Zl-1Representing the input characteristics of the transform Block of the i-th layer;
Figure BDA0003365886380000063
represents the output of the i-th layer MSA; zlIndicating that the output characteristic of the transform Block at the i-th layer is also the input characteristic of the i + 1-th layer;
wherein the operation of the multi-head attention Module (MSA) is defined as follows:
Figure BDA0003365886380000064
wherein the content of the first and second substances,
Figure BDA0003365886380000065
the input features are multiplied by three matrixes respectively to obtain; and d represents the number of characteristic division regions and the channel dimension respectively; the values in B are from the bias matrix
Figure BDA0003365886380000066
The standard transformer module performs self-attention calculation on the whole input feature scale, so that the calculation amount of the processing mode and the complexity of the calculation process are large. In the improved transform network of this embodiment, the swin transform module can divide the whole input features into several non-overlapping regions, and transform is performed in each region, so that the computational complexity is reduced. The improved transform network comprises two continuous swin transform modules, and in the next swin transform module, the region is divided again, so that the existing divided region and the previously divided region are intersected, and richer global information is acquired.
Specifically, in the improved transform network of this embodiment, the calculation formula of the continuous Swin transform Block is as follows:
Figure BDA0003365886380000067
Figure BDA0003365886380000068
Figure BDA0003365886380000069
Figure BDA00033658863800000610
in the above formula, Zi-1Representing the input characteristics of the Swin Transformer Block of the i-th layer;
Figure BDA00033658863800000611
represents the output of the W-MSA of the ith layer; ziThe output characteristic of Swin Transformer Block at the i-th layer is also the input characteristic of the i + 1-th layer; zi+1The output characteristics of the Swin Transformer Block of the (i + 1) th layer are shown;
Figure BDA0003365886380000071
represents the output of the W-MSA of the i +1 th layer;
s2: and constructing a deep fusion network, wherein the deep fusion network comprises a convolution module, an improved transformer network, a feature fusion module and a decoder module, as shown in fig. 3. The input of the depth fusion network is a medical image, and the output is a segmentation result of a target region in the medical image. After being input into the depth fusion network, the medical image is firstly processed by a convolution module. The convolution module is a backbone network, and the output path of the convolution characteristic output by the convolution module is divided into three paths. The first path is input into a modified transformer network. And outputting the second path to a characteristic fusion module, and performing characteristic fusion with the transformer characteristics output by the improved transformer network. And the third path and the feature fusion module output fusion features and jointly send the fusion features to the decoder module to complete decoding, so that a required segmentation result is obtained.
In the deep convergence network of this embodiment, the network workflow is as shown in fig. 4:
specifically, the convolution module selects Res2net-50 to form a backbone part of the network, and after the medical image is input into the deep fusion network, the medical image is firstly processed by the convolution module to extract characteristic information in the medical image. In the Res2net-50 network, four groups of convolution characteristics from a shallow layer to a deep layer are sequentially obtained from an input medical image through convolution processing, and the four groups of convolution characteristics are marked as eiAnd i is 1, 2, 3 and 4. The channel dimensions of the four sets of convolution features are 256, 512, 1024, 2048, respectively, and the feature scales are 128, 64, 32, 16, respectively.
Convolution characteristic e of convolution module outputiOne of the paths is output to the improved transform network with a sliding window. In an improved transform network, two Swin transform modules in series would be applied to four sets of convolution features e1、e2、e3、e4The transformer treatments were performed separately. Processing, convolution features eiFirstly, deforming, multiplying by three matrixes to obtain Q, K, V three characteristics, performing attention operation by using three characteristic values, sending output to an LN layer for processing, and then entering an MLP module to finish the processing process of a Swin transform module. And after the Swin Transformer module at the upper stage finishes processing, repeating the operations in the Swin Transformer module at the lower stage in sequence, and finally outputting the required Transformer characteristics. The convolution characteristics that allow for the convolution model input to the improved transformer network include four groups, respectively: e.g. of the type1、e2、e3、e4(ii) a The transform features output are also four groups, which are: d1、d2、d3、d4
Convolution characteristic e1、e2、e3、e4And transformer characteristics d1、d2、d3、d4Are input into the feature fusion module for feature fusion processing. As shown in fig. 5, the feature fusion module includes a front convolutional layer, an upsampling layer, a feature splicing layer, and a back convolutional layer. The front and back convolutional layers are both two 3 x 3 convolutional modules. The feature map scale of the up-sampling layer output is twice that of the input. The feature splicing layer splices the two input features in the channel dimension, and outputs the two input features as a fusion feature after post-convolution layer processing.
Therefore, the fusion process of the two types of features in the feature fusion module is as follows: transformer characteristics diFirstly, the pre-convolution layer is processed, and then the scale transformation is carried out through the upper sampling layer. Due to transformer characteristic diIs smaller than e of convolution characteristic after being processediThe latter is twice the former. Therefore, the size of the transform feature after the double upsampling operation is compared with the convolution feature eiThe same is true. When the sizes of the two are the same, the fusion treatment can be performed. transformer feature MiAnd inputting the convolution characteristics into a network of a characteristic splicing layer, splicing the two input characteristics in a channel dimension by the characteristic splicing layer to obtain a mixed characteristic, and processing the mixed characteristic by a post-convolution layer to obtain a required fusion characteristic Zi
Wherein, the fusion character Z output by the feature fusion moduleiThe expression of the characters is as follows:
Mi=upsample(conv(conv(ei)),
Zi=σ(conv(cat(Mi,di)))
wherein conv represents 3 × 3 convolution, step size is 1, upsample is upsampling, cat is splicing operation, and σ is relu activation function.
The obtained fusion feature is output to a decoder module, and the decoder module outputs a convolution feature e according to the convolution moduleiAnd fusing the characteristics, decoding the characteristic information, and further obtaining the processing result of image segmentation.
S3: selecting a plurality of medical images with polyps as original data to form an original data set, and carrying out image transformation and enhancement processing on the original data in the original data set to expand the number of the original data set. The raw data set was then divided into a training set and a test set according to a 2:1 data size ratio.
S4: and setting a learning strategy, a training epoch and a loss function in a training stage, training the constructed deep fusion network by using a training set, and testing a training effect by using a test set.
S5: storing a deep fusion network with the performance reaching a preset index after training is finished and testing; and performing semantic segmentation on the medical image to be segmented by using the network as an image segmentation network.
In the medical image segmentation method based on convolution and transform fusion, a new depth fusion network is creatively constructed. In the deep fusion network, an improved swin transformer module is introduced; the self-attention mechanism in the swin transformer module can fully utilize the context information and establish remote dependence. And the problems that the traditional convolutional neural network is small in receptive field and cannot utilize global information, extracted feature information is lost, and a segmentation processing result is inaccurate are solved.
The improved transform network used in the embodiment also solves the problems of large data volume and high complexity in the large-scale characteristic diagram processing process through two consecutive Swin transform blocks with sliding windows, thereby improving the network robustness and avoiding the over-fitting of the network. Meanwhile, richer global information is obtained, and the accuracy of medical image segmentation is improved.
The Transformer network used in this embodiment is a novel architecture designed for sequence-to-sequence modeling in natural language processing. Great progress has been made in most nlp tasks, such as machine translation, naming, entity recognition, and question and answer. The embodiment applies it in the field of medical image segmentation, and unexpectedly realizes the effective utilization of the multi-head self-attention (MSA) mechanism. The network model leverages the powerful ability to establish global connections between tokens of a sequence, as well as to contact remote context information.
In order to better utilize the features generated by the swin transformer module, a feature fusion module is further provided, and the feature fusion module can well fuse the two features of the convolution feature and the transform feature together to generate a fusion feature, so that the embodiment provides the advantage that the deep fusion neural network can combine the features of both the convolution feature and the transform.
In order to verify the effectiveness of the method provided by the embodiment, a simulation experiment is also designed in the embodiment. The experimental environment in this example is: intel (R) Xeon (R) CPU E5-2609V 4@1.70GHz, 16G memory, Ubuntu20.04 system, graphics card GTX2060, programming environment pycharm, deep learning framework pytorch 1.5.1.
In terms of training data sets, this embodiment uses the 5 polyp data sets kvasir, cvc-clinicDB (cvc-612), ETIS, cvc-colonDB, and EndoScene that are published on the web. Wherein, 1000 images are contained in the kvasir data set, 900 images are randomly picked out from the kvasir data set and put into the training set, and the rest 100 images are put into the testing set. The cvc-clinicDB data set had 612 images, and 550 images were randomly picked from the 612 images into the training set and the rest into the testing set using the same operation as kvasir. At this point, one of the training sets had 1450 images and the test set had 162 pictures.
In addition, there are 196 and 380 pictures in the ETIS dataset and CVc-colonDB dataset, respectively. The embodiment uses both of them as a test set; and further to verify the generalization capability of the network. The EndoScene dataset is made up of cvc-612 and cvc 300. Since a portion of the cvc-612 data set has been used as training set data, only the data in EndoScene-cvc300 was used as a test set, containing 60 pictures. Thus far, the test set employed in the present embodiment includes 1450 polyp images, and 789 polyp images. The data volume ratio of the training set to the test set is close to 2: 1.
For the original data in the test set and the training set, the data set amplification method is adopted to perform data amplification in the embodiment, so as to improve the data volume of the training data and enhance the robustness of the deep fusion network. The data set amplification method comprises an image change method and an image enhancement processing method. Wherein. The image transformation method adopted by the present embodiment includes random horizontal mirror inversion, vertical mirror inversion, and angular rotations of 90 °, 180 °, and 270 °. The adopted image enhancement processing method comprises random brightness, contrast and sharpening adjustment. The random probability of each image transformation method and image enhancement processing method is set to be 0.5.
In the initial stage of training, the present embodiment sets all the images to 448 x 448, and in the later stage, the generalization performance of the deep fusion network for different tasks is improved. The present embodiment also adopts a multi-scale training strategy. In addition, in the training process, the present embodiment selects a beclos function and an IOU function as loss functions, selects a PolyLr learning rate reduction strategy, sets the learning rate in the training phase to 0.0001, and sets the epoch of the training to 240.
After training, the training effect of the deep fusion network is verified through the test set in the embodiment. And counting the segmentation effect of the depth fusion network aiming at different data set images, wherein the adopted evaluation criteria are mIoU (average cross-over ratio) and mDisc (average Disc), and obtaining the data as shown in the following table.
Table 1: segmentation result of depth fusion network for images in different data sets in embodiment
Figure BDA0003365886380000101
Analysis of the above data reveals that: the depth fusion network adopted in the method provided by the embodiment shows good segmentation effect for images in the kvasir dataset and the cvc-cliciddb dataset. mIoU reaches 0.856 and 0.877 respectively; higher than the average value of the traditional convolution neural network such as U-net, SFA and PraNet. For the data in the ETIS, the cvc-colonDB and the EndoScene data sets, the deep fusion neural network of the embodiment is not adopted in the training stage, but still shows a good segmentation effect in the testing process. This can prove that: the deep fusion neural network model provided by the embodiment also has good generalization and is suitable for segmentation processing of various different medical images.
Example 2
The present embodiment provides a medical image segmentation system based on convolution and transform fusion, which performs semantic segmentation on an acquired medical image by using the medical image segmentation method based on convolution and transform fusion as described in embodiment 1, so as to obtain an image segmentation prediction result of a target feature.
As shown in fig. 6, the medical image segmentation system includes: the system comprises an image acquisition module, a convolution network, an improved transformer network, a feature fusion network and a decoder.
The image acquisition module is used for acquiring a medical image to be segmented and preprocessing the medical image so as to meet the input standard of the system.
The convolutional network uses Res2net-50 to form the backbone network of the system. After the medical image is input into the convolution network for processing, the output of the convolution network is convolution characteristics, and the output path of the convolution characteristics comprises three paths.
The improved transformer network receives a first convolution characteristic output by the convolution network. The improved transform network consists of two consecutive Swin transform blocks. The former Swin Transformer Block comprises a window based MSA layer and an MLP layer which are connected in sequence; the window based MSA layer and the MLP layer are both preceded by an LN layer and connected using a residual after the window based MSA layer and the MLP layer. The latter Swin Transformer Block comprises a shifted window based MSA layer and an MLP layer connected in sequence. The shifted window based MSA layer and the MLP layer are both provided with LN layers in front of the LN layers, and are connected by using residual errors after the shifted window based MSA layer and the MLP layer. The input convolution characteristics are processed by the improved transformer network, and then the output is transformer characteristics.
And the feature fusion network receives the second path of convolution features output by the convolution network and the transformer features output by the improved transformer network. The feature fusion module comprises a front convolution layer, an upper sampling layer, a feature splicing layer and a rear convolution layer. The front convolution layer and the rear convolution layer are two convolution modules of 3 x 3, and the characteristic graph output by the upper sampling layer is twice of the input characteristic graph. The feature fusion network firstly carries out pre-convolution layer processing on input transformer features and then carries out scale transformation on the input transformer features through an up-sampling layer, and the dimensions of the transformer features processed by the up-sampling layer are the same as those of the convolution features. And splicing the convolution characteristics and the transformer characteristics on the channel dimension in the characteristic splicing network, and then outputting the spliced characteristics after the processing of the post convolution layer.
And the decoder receives the third path of convolution characteristics output by the convolution network and the fusion characteristics output by the characteristic fusion network, and then decodes the third path of convolution characteristics and the fusion characteristics to obtain the semantic segmentation result of the required medical image.
Example 3
The invention also comprises a medical image segmentation apparatus based on convolution and transformer fusion, comprising a memory, a processor and a computer program stored on the memory and executable on the processor. The processor executes the program to implement the steps of the medical image segmentation method based on volume and transform fusion as described in embodiment 1.
The computer device may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including an independent server or a server cluster composed of a plurality of servers) capable of executing programs, and the like. The computer device of the embodiment at least includes but is not limited to: a memory, a processor communicatively coupled to each other via a system bus.
In this embodiment, the memory (i.e., the readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the memory may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device. Of course, the memory may also include both internal and external storage devices for the computer device. In this embodiment, the memory is generally used for storing an operating system, various types of application software, and the like installed in the computer device. In addition, the memory may also be used to temporarily store various types of data that have been output or are to be output.
The processor may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor is typically used to control the overall operation of the computer device. In this embodiment, the processor is configured to run the program code stored in the memory or process data to implement the processing procedure of the medical image segmentation method based on convolution and transform fusion in the foregoing embodiment, so as to obtain a segmentation result of feature information such as polyps in an image according to a given medical image.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A medical image segmentation method based on convolution and transformer fusion is characterized by comprising the following steps:
s1: constructing an improved Transformer network with a sliding window based on a standard Transformer module, wherein the improved Transformer network consists of two continuous Swin Transformer blocks; the former Swin Transformer Block comprises a window based MSA layer and an MLP layer which are connected in sequence; an LN layer is arranged in front of the window based MSA layer and the MLP layer, and residual errors are used for connection after the window based MSA layer and the MLP layer; the latter Swin Transformer Block comprises a shifted window based MSA layer and an MLP layer which are connected in sequence; an LN layer is arranged in front of the shifted window based MSA layer and the MLP layer, and residual errors are used for connection after the shifted window based MSA layer and the MLP layer;
s2: constructing a deep fusion network, wherein the deep fusion network comprises a convolution module, the improved transformer module, a feature fusion module and a decoder module; the input of the depth fusion network is a medical image, and the output is a segmentation result of a target region in the medical image; after being input into a depth fusion network, a medical image is firstly processed by a convolution module; the convolution module is a backbone network, the output path of the convolution characteristic output by the convolution module is divided into three paths, and the first path is input into the improved transformer network; the second path is output to a characteristic fusion module and is subjected to characteristic fusion with the transformer characteristics output by the improved transformer network; the third path and the feature fusion module output fusion features and jointly send the fusion features to the decoder module to complete decoding, and further required segmentation results are obtained;
s3: selecting a plurality of medical images with polyps as original data to form an original data set, and dividing the original data set into a training set and a testing set according to a data volume ratio of 2: 1;
s4: setting a learning strategy, a training epoch and a loss function in a training stage, training the constructed deep fusion network by using a training set, and testing a training effect by using a test set;
s5: storing a deep fusion network with the performance reaching a preset index after training is finished and testing; and performing semantic segmentation on the medical image to be segmented by using the network as an image segmentation network.
2. The medical image segmentation method based on convolution and transform fusion of claim 1, characterized by: in step S1, in the improved transform network, the calculation formula of the continuous Swin transform Block is:
Figure FDA0003365886370000011
Figure FDA0003365886370000012
Figure FDA0003365886370000013
Figure FDA0003365886370000014
in the above formula, Zi-1Representing the input characteristics of the Swin Transformer Block of the i-th layer;
Figure FDA0003365886370000021
represents the output of the W-MSA of the ith layer; ziThe output characteristic of Swin Transformer Block at the i-th layer is also the input characteristic of the i + 1-th layer; zi+1The output characteristics of the Swin Transformer Block of the (i + 1) th layer are shown;
Figure FDA0003365886370000022
represents the output of the W-MSA of the i +1 th layer.
3. The medical image segmentation method based on convolution and transform fusion of claim 2, characterized in that: in the deep fusion network of step S2, the convolution module selects Res2net-50 to form the backbone part of the network, and after the medical image is output to the convolution module, four shallow-to-deep convolution features are sequentially obtained through convolution processing, and e is the respective oneiI is 1, 2, 3, 4; the channel dimensions of the four groups of convolution features are respectively 256, 512, 1024 and 2048, and the feature scales are respectively 128, 64, 32 and 16;
convolution characteristic e of the convolution module outputiAfter being processed by the improved transformer network, four groups of transfo containing global features are respectively obtainedrmer feature, labeled di,i=1、2、3、4。
4. A method of medical image segmentation based on convolution and transform fusion according to claim 3, characterized in that: in the deep fusion network of step S1, the feature fusion module includes a front convolutional layer, an upsampling layer, a feature splicing layer, and a rear convolutional layer; the front convolution layer and the rear convolution layer are both two convolution modules of 3 x 3, and the characteristic graph scale output by the upper sampling layer is twice of the input characteristic graph scale; the feature splicing layer splices the two input features in the channel dimension, and outputs the two input features as a fusion feature after post-convolution layer processing.
5. The medical image segmentation method based on convolution and transform fusion of claim 4, characterized in that: in the feature fusion module, the transformer features d output by the improved transformer networkiPerforming pre-convolution layer processing, performing scale transformation on the pre-convolution layer, and performing up-sampling layer processing on the pre-convolution layer to obtain transform characteristics MiSize and convolution characteristic eiThe same; inputting the transformer characteristics and the convolution characteristics with the same size into a characteristic splicing layer, splicing the two input characteristics on a channel dimension by the characteristic splicing layer, and then processing a post-convolution layer to obtain the required fusion characteristics; the fusion characteristic Z output by the characteristic fusion moduleiThe expression of (A) is as follows:
Mi=upsample(conv(conv(ei)),
Zi=σ(conv(cat(Mi,di)))
wherein conv represents 3 × 3 convolution, step size is 1, upsample is upsampling, cat is splicing operation, and σ is relu activation function.
6. The medical image segmentation method based on convolution and transform fusion of claim 5, characterized by: in the deep fusion network of step S1, the input of the decoder module is the convolution characteristic e output by the convolution moduleiAnd the fusion characteristics output by the characteristic fusion module; the output of the decoder module is the decoded image segmentation result.
7. The medical image segmentation method based on convolution and transform fusion of claim 1, characterized by: in step S3, the original data in the original data set is derived from the public polyp data sets kvasir, cvc-clinicDB, ETIS, cvc-colonDB and EndoScene; the number of the original data sets is amplified by carrying out image transformation and enhancement processing on the original data in the original data sets; the image transformation method adopted in the data set amplification process comprises random horizontal mirror image turning, vertical mirror image turning and angle rotation of 90 degrees, 180 degrees and 270 degrees; the adopted image enhancement processing method comprises random brightness, contrast and sharpening adjustment; the random probability of each image transformation method and image enhancement processing method is set to be 0.5.
8. The medical image segmentation method based on convolution and transform fusion of claim 1, characterized by: in the training process of step S4, the beclos function and the IOU function are selected as loss functions, the PolyLr learning rate reduction strategy is selected, the learning rate is set to 0.0001, and the epoch of the training is set to 240.
9. A medical image segmentation system based on convolution and transform fusion, wherein the medical image segmentation system adopts the medical image segmentation method based on convolution and transform fusion according to any one of claims 1 to 8 to perform semantic segmentation on an acquired medical image so as to obtain an image segmentation prediction result of a target feature; the medical image segmentation system comprises:
the image acquisition module is used for acquiring a medical image to be segmented and preprocessing the medical image so as to meet the input standard of a system;
a convolution network, which adopts Res2net-50 to form a backbone network of the system; after the medical image is input into a convolution network for processing, the output of the convolution network is a convolution characteristic, and the output path of the convolution characteristic comprises three paths;
the improved transformer network receives a first path of convolution characteristics output by the convolution network; the improved Transformer network consists of two consecutive Swin Transformer blocks; the former Swin Transformer Block comprises a window based MSA layer and an MLP layer which are connected in sequence; an LN layer is arranged in front of the window based MSA layer and the MLP layer, and residual errors are used for connection after the window based MSA layer and the MLP layer; the latter Swin Transformer Block comprises a shifted window based MSA layer and an MLP layer which are connected in sequence; an LN layer is arranged in front of the shifted window based MSA layer and the MLP layer, and residual errors are used for connection after the shifted window based MSA layer and the MLP layer; after the input convolution characteristics are processed by the improved transformer network, outputting the convolution characteristics as transformer characteristics;
a feature fusion network receiving the second path of convolution features output by the convolution network and the transform features output by the improved transform network; the characteristic fusion module comprises a front convolution layer, an upper sampling layer, a characteristic splicing layer and a rear convolution layer; the front convolution layer and the rear convolution layer are both two convolution modules of 3 x 3, and the characteristic graph scale output by the upper sampling layer is twice of the input characteristic graph scale; the feature fusion network firstly carries out pre-convolution processing on input transformer features and then carries out scale transformation on the input transformer features through an upper sampling layer, and the dimensions of the transformer features processed by the upper sampling layer are the same as those of the convolution features; splicing the convolution characteristics and the transformer characteristics in the channel dimension in a characteristic splicing network, and then outputting the spliced characteristics as fusion characteristics after the processing of a post convolution layer; and the decoder is used for receiving the third path of convolution characteristics output by the convolution network and the fusion characteristics output by the characteristic fusion network, and then decoding the third path of convolution characteristics and the fusion characteristics to obtain a semantic segmentation result of the required medical image.
10. Medical image segmentation apparatus based on convolution and transform fusion, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the medical image segmentation method based on convolution and transform fusion according to any one of claims 1 to 8 when executing the program.
CN202111381789.6A 2021-11-22 2021-11-22 Medical image segmentation method, system and device based on convolution and transformer fusion Pending CN114066902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111381789.6A CN114066902A (en) 2021-11-22 2021-11-22 Medical image segmentation method, system and device based on convolution and transformer fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111381789.6A CN114066902A (en) 2021-11-22 2021-11-22 Medical image segmentation method, system and device based on convolution and transformer fusion

Publications (1)

Publication Number Publication Date
CN114066902A true CN114066902A (en) 2022-02-18

Family

ID=80278600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111381789.6A Pending CN114066902A (en) 2021-11-22 2021-11-22 Medical image segmentation method, system and device based on convolution and transformer fusion

Country Status (1)

Country Link
CN (1) CN114066902A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419449A (en) * 2022-03-28 2022-04-29 成都信息工程大学 Self-attention multi-scale feature fusion remote sensing image semantic segmentation method
CN114494254A (en) * 2022-04-14 2022-05-13 科大智能物联技术股份有限公司 Product appearance defect classification method based on fusion of GLCM and CNN-Transformer and storage medium
CN114565763A (en) * 2022-02-28 2022-05-31 北京百度网讯科技有限公司 Image segmentation method, apparatus, device, medium, and program product
CN114612759A (en) * 2022-03-22 2022-06-10 北京百度网讯科技有限公司 Video processing method, video query method, model training method and model training device
CN114638842A (en) * 2022-03-15 2022-06-17 桂林电子科技大学 Medical image segmentation method based on MLP
CN114898110A (en) * 2022-04-25 2022-08-12 四川大学 Medical image segmentation method based on full-resolution representation network
CN114912575A (en) * 2022-04-06 2022-08-16 西安交通大学 Medical image segmentation model and method based on Swin transform connection path
CN115115523A (en) * 2022-08-26 2022-09-27 中加健康工程研究院(合肥)有限公司 CNN and Transformer fused medical image depth information extraction method
CN115170808A (en) * 2022-09-05 2022-10-11 中邮消费金融有限公司 Image segmentation method and system
CN115311317A (en) * 2022-10-12 2022-11-08 广州中平智能科技有限公司 Laparoscope image segmentation method and system based on ScaleFormer algorithm
CN115393321A (en) * 2022-08-26 2022-11-25 南通大学 Multi-classification pulmonary tuberculosis detection method based on deep learning multi-layer spiral CT
CN115409990A (en) * 2022-09-28 2022-11-29 北京医准智能科技有限公司 Medical image segmentation method, device, equipment and storage medium
CN115578406A (en) * 2022-12-13 2023-01-06 四川大学 CBCT jaw bone region segmentation method and system based on context fusion mechanism
CN115661507A (en) * 2022-09-22 2023-01-31 北京建筑大学 Building garbage classification method and device based on optimized Swin Transformer network
CN116258658A (en) * 2023-05-11 2023-06-13 齐鲁工业大学(山东省科学院) Swin transducer-based image fusion method
CN116258914A (en) * 2023-05-15 2023-06-13 齐鲁工业大学(山东省科学院) Remote sensing image classification method based on machine learning and local and global feature fusion
CN116309596A (en) * 2023-05-23 2023-06-23 杭州华得森生物技术有限公司 CTC cell detection method and system based on micro-fluidic chip
GB2617555A (en) * 2022-04-07 2023-10-18 Milestone Systems As Image processing method, apparatus, computer program and computer-readable data carrier
CN116912253A (en) * 2023-09-14 2023-10-20 吉林大学 Lung cancer pathological image classification method based on multi-scale mixed neural network

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114565763B (en) * 2022-02-28 2024-01-05 北京百度网讯科技有限公司 Image segmentation method, device, apparatus, medium and program product
CN114565763A (en) * 2022-02-28 2022-05-31 北京百度网讯科技有限公司 Image segmentation method, apparatus, device, medium, and program product
CN114638842B (en) * 2022-03-15 2024-03-22 桂林电子科技大学 Medical image segmentation method based on MLP
CN114638842A (en) * 2022-03-15 2022-06-17 桂林电子科技大学 Medical image segmentation method based on MLP
CN114612759A (en) * 2022-03-22 2022-06-10 北京百度网讯科技有限公司 Video processing method, video query method, model training method and model training device
CN114419449A (en) * 2022-03-28 2022-04-29 成都信息工程大学 Self-attention multi-scale feature fusion remote sensing image semantic segmentation method
CN114912575A (en) * 2022-04-06 2022-08-16 西安交通大学 Medical image segmentation model and method based on Swin transform connection path
CN114912575B (en) * 2022-04-06 2024-04-09 西安交通大学 Medical image segmentation model and method based on connection Swin transducer path
GB2617555A (en) * 2022-04-07 2023-10-18 Milestone Systems As Image processing method, apparatus, computer program and computer-readable data carrier
CN114494254A (en) * 2022-04-14 2022-05-13 科大智能物联技术股份有限公司 Product appearance defect classification method based on fusion of GLCM and CNN-Transformer and storage medium
CN114898110A (en) * 2022-04-25 2022-08-12 四川大学 Medical image segmentation method based on full-resolution representation network
CN115115523A (en) * 2022-08-26 2022-09-27 中加健康工程研究院(合肥)有限公司 CNN and Transformer fused medical image depth information extraction method
CN115393321A (en) * 2022-08-26 2022-11-25 南通大学 Multi-classification pulmonary tuberculosis detection method based on deep learning multi-layer spiral CT
CN115170808A (en) * 2022-09-05 2022-10-11 中邮消费金融有限公司 Image segmentation method and system
CN115661507A (en) * 2022-09-22 2023-01-31 北京建筑大学 Building garbage classification method and device based on optimized Swin Transformer network
CN115409990A (en) * 2022-09-28 2022-11-29 北京医准智能科技有限公司 Medical image segmentation method, device, equipment and storage medium
CN115311317A (en) * 2022-10-12 2022-11-08 广州中平智能科技有限公司 Laparoscope image segmentation method and system based on ScaleFormer algorithm
CN115578406A (en) * 2022-12-13 2023-01-06 四川大学 CBCT jaw bone region segmentation method and system based on context fusion mechanism
CN116258658A (en) * 2023-05-11 2023-06-13 齐鲁工业大学(山东省科学院) Swin transducer-based image fusion method
CN116258914B (en) * 2023-05-15 2023-08-25 齐鲁工业大学(山东省科学院) Remote Sensing Image Classification Method Based on Machine Learning and Local and Global Feature Fusion
CN116258914A (en) * 2023-05-15 2023-06-13 齐鲁工业大学(山东省科学院) Remote sensing image classification method based on machine learning and local and global feature fusion
CN116309596B (en) * 2023-05-23 2023-08-04 杭州华得森生物技术有限公司 CTC cell detection method and system based on micro-fluidic chip
CN116309596A (en) * 2023-05-23 2023-06-23 杭州华得森生物技术有限公司 CTC cell detection method and system based on micro-fluidic chip
CN116912253A (en) * 2023-09-14 2023-10-20 吉林大学 Lung cancer pathological image classification method based on multi-scale mixed neural network
CN116912253B (en) * 2023-09-14 2023-12-05 吉林大学 Lung cancer pathological image classification method based on multi-scale mixed neural network

Similar Documents

Publication Publication Date Title
CN114066902A (en) Medical image segmentation method, system and device based on convolution and transformer fusion
CN113468996B (en) Camouflage object detection method based on edge refinement
EP4085369A1 (en) Forgery detection of face image
CN113888541B (en) Image identification method, device and storage medium for laparoscopic surgery stage
CN113012155A (en) Bone segmentation method in hip image, electronic device, and storage medium
CN110969089A (en) Lightweight face recognition system and recognition method under noise environment
CN114445904A (en) Iris segmentation method, apparatus, medium, and device based on full convolution neural network
CN114529574A (en) Image matting method and device based on image segmentation, computer equipment and medium
CN111104941B (en) Image direction correction method and device and electronic equipment
CN115880317A (en) Medical image segmentation method based on multi-branch feature fusion refining
Zhu et al. DFTR: Depth-supervised fusion transformer for salient object detection
Wang et al. Thermal images-aware guided early fusion network for cross-illumination RGB-T salient object detection
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
CN114066905A (en) Medical image segmentation method, system and device based on deep learning
TWI803243B (en) Method for expanding images, computer device and storage medium
CN113610856B (en) Method and device for training image segmentation model and image segmentation
CN114241411B (en) Counting model processing method and device based on target detection and computer equipment
Fan et al. EGFNet: Efficient guided feature fusion network for skin cancer lesion segmentation
CN115049546A (en) Sample data processing method and device, electronic equipment and storage medium
CN114694150A (en) Method and system for improving generalization capability of digital image classification model
Pei et al. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction
Kim et al. Stereo confidence estimation via locally adaptive fusion and knowledge distillation
CN112633285A (en) Domain adaptation method, domain adaptation device, electronic equipment and storage medium
CN111476267A (en) Method and electronic device for classifying drug efficacy according to cell image
Jones Deep learning for image enhancement and visibility improvement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination