CN116433898A - Method for segmenting transform multi-mode image based on semantic constraint - Google Patents

Method for segmenting transform multi-mode image based on semantic constraint Download PDF

Info

Publication number
CN116433898A
CN116433898A CN202310150411.8A CN202310150411A CN116433898A CN 116433898 A CN116433898 A CN 116433898A CN 202310150411 A CN202310150411 A CN 202310150411A CN 116433898 A CN116433898 A CN 116433898A
Authority
CN
China
Prior art keywords
modal
mode
features
decoder
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310150411.8A
Other languages
Chinese (zh)
Inventor
马伟
陈颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202310150411.8A priority Critical patent/CN116433898A/en
Publication of CN116433898A publication Critical patent/CN116433898A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for segmenting a transform multi-mode image based on semantic constraint, which comprises the following steps: extracting n different modal characteristics from m modalities through the line characteristics of a trunk encoder to obtain a characteristic diagram of the corresponding modality; redundant features are removed through the multi-modal feature interaction module, and the current modal features are reinforced to different degrees according to a gating matrix G generated by the cross-modal interaction module (CFI); then splicing and inputting the specific mode enhancement feature map to a transducer for carrying out inter-mode feature fusion to obtain final coding features; finally, the features are input to a Kmeans-transform decoder. Because the modal fusion network fuses the multi-modal features and gives corresponding weights to different modalities, the embodiment of the disclosure can effectively highlight important modalities which are favorable for multi-sequence image segmentation, inhibit interference of non-important modalities on multi-modal segmentation, and effectively improve multi-modal image segmentation accuracy.

Description

Method for segmenting transform multi-mode image based on semantic constraint
Technical Field
The invention belongs to the field of computer vision images, and relates to a method for segmenting a transform multi-mode image based on semantic constraint.
Background
Multimodal image segmentation plays a vital role in image segmentation. The supplemental information can be segmented with higher accuracy. Magnetic resonance imaging is a common imaging technique practice for quantitative evaluation, with a variety of imaging modes, namely T1 weighting (T1), T2 weighting (T2), contrast enhanced T1 weighting (T1 c), and fluid attenuation inversion recovery (FLAIR) images. If each imaging modality provides unique contrast, multi-modality magnetic resonance imaging provides rich supplemental information for analysis. Provided that each imaging mode provides unique contrast and structure. Associative learning for segmented multimodal images. Furthermore, in practice, enhanced image viewing is often used. The contrast agent passes through to produce a distinct contrast between normal tissue and abnormalities. Three morphological contrast enhanced imaging protocols include venous and arterial phases and intravenous contrast delay. The three-phase images help to better segment the image because they can be well complementary to each other.
The multi-mode image segmentation data has important research significance and value. However, the existing segmentation algorithm is poor in performance, multi-mode information is not fully utilized, and improvement is needed. Due to the strong feature representation capabilities. Convolutional Neural Networks (CNNs) have been widely used for image segmentation tasks and have achieved improved performance. Recently, vision transformer (ViT) has brought the most powerful technology in natural language processing into the field of computer vision imaging. Thanks to the self-focusing mechanism, the transducer can capture long-range features, which perfectly fit the 3D volumetric data. Thus, it has quickly adapted to segmentation in 3D MRI sequences. Based on these two popular techniques, many outstanding approaches have been proposed for image segmentation to address challenges including positional and morphological uncertainties, low contrast, and annotation bias. However, existing work ignores an important issue, namely how to fuse multi-modal images in a reasonable way. Most of which incorporate the modalities of the input stage or the feature stage. However, existing studies rarely consider how to fuse multimodal images in a reasonable way.
Accurate multi-modality image segmentation generally requires efficient learning of complementary information from multi-modality data and removal of redundant information. And an efficient multi-sequence segmentation algorithm is developed, so that segmentation capability can be improved. Therefore, the algorithm for multi-sequence segmentation has important research significance and wide application value.
Disclosure of Invention
The invention provides a multi-layer fused region transform multi-mode image segmentation method which aims at overcoming the defects of the existing multi-sequence image segmentation method and firstly codes single-mode characteristics through a single-mode hierarchical encoder. And the gating mechanism is adopted to perform inter-mode interaction by multi-mode characteristics, the characteristic enhancement of different degrees is performed on the current sequence according to the corresponding importance, and the gating module enhances the expression beneficial to multi-sequence images. And fusing non-local information among different modes through a transducer self-attention mechanism to further enhance the characteristic expression of the multiple sequences, wherein a region fusion module and a truth value calculate a truth value region probability map focus on a region, and suppressing the characteristics of non-focus regions. And finally accelerating network convergence through a K-means converter decoder. The whole network enhances the characteristic expression of the multiple sequences, and experimental results show that the enhanced multiple sequences are utilized for segmentation, so that the network accuracy is effectively improved, and the method has good performance.
To achieve the object, the technical scheme of the invention is as follows: step 1, extracting features of m modes through a trunk encoder to obtain a feature map of the corresponding mode; step 2, judging the importance degree of each M modes to the current mode segmentation by the mode weight matrix generated by the cross-mode interaction module to generate a mode weight matrix G, wherein the mode weight matrix G can be divided into M independent { G } 1 ,...,g m ,...,g M Maps, each map a pattern. Next, the content code is re-weighted to F m =z m ·g m The initial feature map of each mode is multiplied by the gating matrix of each mode through element multiplication, the current mode features are reinforced to different degrees, and a mode reinforced feature map F is obtained m The method comprises the steps of carrying out a first treatment on the surface of the Step 3, splicing the mode enhancement feature map with feature F r Inputting the final coding feature F into a transducer for inter-mode feature fusion global The method comprises the steps of carrying out a first treatment on the surface of the Step 4, finally inputting the coding characteristicsTo a Kmeans-transform decoder, multi-sequence image segmentation is realized. The invention provides a multi-layer fusion region transform multi-mode image segmentation method.
Advantageous effects
1) By a multi-scale encoder: the performance of the interleaved sparse transducer encoder with convolutional Token hierarchical fusion is superior to that of the tandem superposition approach. 2) Cross-modal interaction module CIF and multi-modal interaction module MFF: and eliminating the inherent information redundancy of the multiple modes, and simultaneously considering the inherent complementary enhancement relationship of the multiple modes, so that the fusion of the characteristics of the multiple modes is more sufficient. 3) K-means converter decoder: the affinity logarithm between the pixel feature and the cluster center directly corresponds to the softmax logarithm of the segmentation mask, speeding up convergence.
Drawings
FIG. 1 is a schematic diagram of a network framework of the method of the present invention;
FIG. 2 is a schematic diagram of cross-modal interactions in an example of the invention;
FIG. 3 is a multimodal fusion transducer according to the present invention;
Detailed Description
The invention discloses a deep learning-based open source tool Pytorch implementation, which uses a GPU processor NVIDIA GTX3090 to train a network model.
The various block configurations of the method of the present invention will be further described with reference to the drawings and detailed description, it being understood that the description of the specific examples which follow is intended to illustrate the invention and not to limit the scope of the invention, and that various modifications of the invention, which are equivalent to those skilled in the art upon reading the invention, will fall within the scope of the claims appended hereto.
The network framework composition and flow of the invention are shown in figure 1, and specifically comprise the following steps:
wherein, step 1 includes: the multiple sequences f= { F 1 ,...,f m ,...,f M The picture passes through the stem encoder model. Feature maps with local context within each modality produced by a convolutional encoder, each block containing concatenated group normalization, reLU and kernel size 3 convolutional layers, with the first stage of the first stageA convolution block contains only convolution layers. The input Token gradually downsamples from stage to stage by dividing the input volume into blocks and linearly embedding patches. A multi-layer perceptron (MLP) block is used to encode local features in the first two stages. The first stage of the MLP region is one and the second stage is two, each MLP is activated by one layer normalization and a GELU function between two fully connected layers. In the third and fourth stages, three and four transducer blocks are employed, respectively, to capture long dependencies by multi-headed self-attention (MSA). f (f) m Initial modal feature map representing M-th modal feature extraction of M sequences each
Figure SMS_1
Wherein->
Figure SMS_2
For the mth modality map of the image, < >>
Figure SMS_3
Wherein R represents a feature, m represents the number of multiple sequences, C represents the number of channels per sequence feature map, H represents the height of each sequence feature map, and W represents the width of each sequence feature map.
Wherein, step 2 includes: initial modality feature map of n modalities
Figure SMS_4
Input to a multi-modal interaction module (CIF), which will filter the multi-sequence for modal information, connect each modal feature, then input to a convolutional layer activation with M output channels, the convolutional kernel size being 3 x3, step size 1, boundary fill 0, obtain a modal weight matrix G, which can be divided into M individual { G1, & gt, gm, & gt, G M Maps, each map a pattern. Next, the content code is re-weighted to F m =z m ·g m Through element multiplication, the initial feature map of each mode is multiplied by a gating matrix of the initial feature map to obtain M mode enhancement feature maps F= { F of the image 1 ,...,F m ,...,F M },F m ∈R C×H×W . Four phases of intersection are performed in totalInteroperation, these outputs are concatenated to obtain features and forwarded to a 1×1 convolution, and then input to an activation function LeakyReLU, whereby features rich in sequence feature information can be highlighted. Some modal weights are randomly set to 0 in the training process, so that robustness of the model to missing data is improved.
Wherein step 3 comprises the sub-steps of: enhancement of the mode profile f= { F 1 ,...,F m ,...,F M Respectively, to a multi-modal interaction Module (MFF). Wherein the MFF convolves each mode first, the convolution kernel has a size of 3 x3, a step size of 1, a boundary fill of 0, an input channel of 1, an output channel of 3, a foreground probability map is calculated and supervised with a true value,
Figure SMS_5
the probability map calculation formula is shown, where φ () is the foreground/background classifier with parameter set θ. Conv () is a 3×3 convolution operation. FG and BG represent foreground and background, respectively. And then, carrying out dot product calculation by using the modal foreground probability map and the modal original features, highlighting the discriminant area and inhibiting redundant information. And then, the re-expressed modal characteristics concat are combined together to obtain a characteristic F r After the multi-modal features are extracted, they are fused in a multi-modal feature fusion module. Multiple degrees of freedom may be combined with related and complementary functions of different modes depending on inter-and intra-modal contexts. First, the multi-modal feature is converted to Token, and then applied to a transducer using Token to enhance the discrimination of the fused feature. And the foreground is estimated, i.e. based on the ROI in the form of a probability map for each modality and embedding the probability map into Token. The foreground probability map prediction based on the characteristics is sent to a VisionTransformer module to generate new characteristics F global 。/>
Figure SMS_6
The MSA () and FFN () represent the operation of the layers are standardized, multi-headed self-care and feed-forward multi-layer perceptron, respectively. By embedding foreground cues, cross-modal fusion is performed with perceived foreground. The transducer multi-head self-attention Mechanism (MSA) breaks the locality of the features, realizes the crossingThe non-local characteristic of the mode is enhanced, so that the characteristic representation of any space of any mode is richer, and the expression of the mode characteristic is effectively enhanced.
Step 4 comprises: cross-modal feature map F '= { F' 1 ,...,F’ m ,...,F' M Splicing to obtain F r And splice F by the feature after dimension reduction global Input to the K-means decoder for segmentation. In order to enhance the discrimination of fusion features, K-memstransformer was introduced as a decoder. K-means Transformer applies intra-class consistency to features to enhance their expressive power. The goal is to enhance the semantics of the decoded features by regularizing the decoded features using semantic centers. The K-means decoder includes a pixel decoder and KMaX decoder layers. The pixel decoder is composed of a transformer encoder and an upsampling layer. This KMaX decoder updates the cluster centers of the target class by taking a set of cluster centers and corresponding layers and outputting the updated cluster centers. The cluster center of the first KMaX decoder is randomly initialized. Others output their previous KMaX decoders. The KMaX input decoder is first re-expressed by a K-means cross-attention processing module, K-means cross-attention as follows:
Figure SMS_7
wherein->
Figure SMS_8
Representing an input cluster center with N segmentation classes (ROI plus background) and D channels. C' represents the center of the update. The features projected from the pixel features and class queries are denoted using superscripts j and c. Q (Q) c ∈R N×D ,K j ∈R HW×D ,V j ∈R HW×D Linear projection features representing queries, keys and values. K-means cross-attention replaces the spatial direction softmax function operation in the normal cross-attention mechanism with argmax function. In this way, similar pixel features are clustered into the same cluster. The cluster center of the last KMaX decoder output is used to regularize the pixel characteristics to enhance the representation consistency of pixels within the same class. In particular, decoding from pixelsThe characteristics of the device are denoted as F de ,F de ∈R HW×D 。/>
Figure SMS_9
The clustered regularized pixel features are denoted as F de′ The middle subscript N indicates that the multi-modal image segmentation is implemented using the softmax axis.
In this embodiment, a comparison experiment is also performed on the three-dimensional object recognition method combining the modal importance network and the self-attention mechanism to evaluate the classification recognition effect. The BraTs2020 dataset, which contained 369 cases of data, of which 315 were scored as training sets, 37 were scored as test sets, and 17 as validation sets, was selected for the experiments and evaluations. Compliance with the two indices of Dice and HD95 reported in other works.
TABLE 1 results of Dice and HD95 comparisons for different multiple sequence partitioning methods
Figure SMS_10
As shown in Table 1, in the invention, under four modes, the Dice values of ET, TC and WT are 0.821, 0.867 and 0.923 respectively, and the segmentation accuracy is higher than that of other multi-mode segmentation methods, so experiments show that the invention has advanced performance in multi-mode image segmentation and can better realize multi-mode image segmentation.
In summary, the method of the present invention is better than the previous methods from a quantitative and qualitative result. From the point of view of computational cost, the method obtains a higher evaluation index with less computational cost, so the network is efficient.
Experimental effects and effects
According to the multi-mode image segmentation method combining the K-means transform with the gating mechanism, the gating mechanism is used for carrying out multi-mode feature interaction on a plurality of modes, the expression of the corresponding modes is weighted again, the modes which are beneficial to multi-mode segmentation are highlighted, meanwhile, the information of redundant modes which are less helpful to multi-sequence segmentation is restrained, and the regional fusion module can enable the information with discrimination to be highlighted. The trans-former multi-head self-attention mechanism is used for splicing the characteristic concat of each mode to realize cross-mode non-local characteristic enhancement, and local information and non-local information can be effectively fused to enhance the characteristic expression of each mode. Finally, the segmentation of the multi-sequence images is realized through a K-Max decoder. In summary, the present embodiment can be applied to multi-sequence image segmentation.
The above embodiments are preferred examples of the present invention, and are not intended to limit the scope of the present invention.

Claims (5)

1. A method for segmenting a transform multi-mode image based on semantic constraint is characterized by comprising the following steps of:
step 1, extracting features of m modes through a trunk encoder to obtain a feature map of the corresponding mode;
step 2, judging the importance degree of each M modes to the current mode segmentation by the mode weight matrix generated by the cross-mode interaction module to generate a mode weight matrix G, wherein the mode weight matrix G can be divided into M independent { G } 1 ,...,g m ,...,g M -mapping, each mapping a pattern; next, the content code is re-weighted to F m =z m ·g m The initial feature map of each mode is multiplied by the gating matrix of each mode through element multiplication, the current mode features are reinforced to different degrees, and a mode reinforced feature map F is obtained m
Step 3, splicing the mode enhancement feature map with feature F r Inputting the final coding feature F into a transducer for inter-mode feature fusion global
And 4, finally inputting the coding features into a Kmeans-transform decoder to realize multi-sequence image segmentation.
2. The method according to claim 1, characterized in that:
wherein, step 1 includes: the multiple sequences f= { F 1 ,...,f m ,...,f M The picture passes through a trunk encoder model; each mode generated by the convolution encoder has a local up and downA feature map of the text, each block containing concatenated group normalized, reLU and kernel size 3 convolutional layers, while the first convolutional block of the first stage contains only convolutional layers; the input Token gradually downsamples from stage to stage by dividing the input volume into blocks and linearly embedding patches; the multi-layer perceptron MLP block is used for encoding local features in the first two stages; the first stage of the MLP area is one, the second stage is two, and each MLP is activated by one layer normalization and a GELU function between two fully connected layers; in the third and fourth stages, three and four transducer blocks are employed, respectively, to capture long dependencies by multi-headed self-attention (MSA); f (f) m Initial modal feature map representing M-th modal feature extraction of M sequences each
Figure QLYQS_1
Wherein z is m For the mth modality view of the image,
Figure QLYQS_2
wherein R represents a feature, m represents the number of multiple sequences, C represents the number of channels per sequence feature map, H represents the height of each sequence feature map, and W represents the width of each sequence feature map.
3. The method of claim 1, wherein step 2 comprises: initial modality feature map of n modalities
Figure QLYQS_3
Input to a multi-modal interaction module (CIF), which will filter the multi-sequence for modal information, connect each modal feature, then input to a convolutional layer activation with M output channels, the convolutional kernel size being 3 x3, step size 1, boundary fill 0, obtain a modal weight matrix G, which can be divided into M individual { G1, & gt, gm, & gt, G M -mapping, each mapping a pattern; next, the content code is re-weighted to F m =z m ·g m The initial feature map of each mode is multiplied by the gating matrix thereof through element multiplication to obtain an image MIndividual modality enhancement feature map f= { F 1 ,...,F m ,...,F M },F m ∈R C×H×W The method comprises the steps of carrying out a first treatment on the surface of the A total of four phases of interaction are performed, and these outputs are concatenated to obtain features and forwarded to a 1X 1 convolution, which is then input to the activation function, leak ReLU.
4. The method according to claim 1, characterized in that step 3 comprises the sub-steps of: enhancement of the mode profile f= { F 1 ,...,F m ,...,F M Respectively input to a multi-modal interaction Module (MFF); wherein the MFF convolves each mode first, the convolution kernel has a size of 3 x3, a step size of 1, a boundary fill of 0, an input channel of 1, an output channel of 3, a foreground probability map is calculated and supervised with a true value,
Figure QLYQS_4
where phi () is the foreground/background classifier with parameter set θ; conv () is a 3×3 convolution operation; f G and BG represent foreground and background, respectively; then, carrying out dot product calculation by using the modal foreground probability map and the modal original features, highlighting the discriminant area and inhibiting redundant information; and then, the re-expressed modal characteristics concat are combined together to obtain a characteristic F r After the multi-modal characteristics are extracted, fusing the multi-modal characteristics in a multi-modal characteristic fusion module;
firstly, converting the multi-mode characteristics into Token, and then using Token to apply to a transducer; and estimating the foreground, namely based on the ROI in the form of the probability map of each mode and embedding the probability map into the Token; the foreground probability map prediction based on the characteristics is sent to a VisionTransformer module to generate new characteristics F global
Figure QLYQS_5
The operation of the MSA () and FF N () presentation layers is a normalized, multi-headed self-care and feed-forward multi-layer perceptron, respectively.
5. The method according to claim 1, characterized in that: step 4 packageThe method comprises the following steps: cross-modal feature map F '= { F' 1 ,...,F’ m ,...,F′ M Splicing to obtain Fr and splicing F by the feature after dimension reduction global Inputting to a K-means decoder for segmentation; introducing K-memstransformer as a decoder; the K-means decoder includes a pixel decoder and a KMaX decoder layer; the pixel decoder consists of a transformer encoder and an up-sampling layer; the KMaX decoder updates the cluster centers of the target class by acquiring a group of cluster centers and corresponding layers and outputting updated cluster centers; the cluster center of the first KMaX decoder is randomly initialized; the other is to output the previous KMaX decoder; the KMaX input decoder is first re-expressed by a K-means cross-attention processing module, K-means cross-attention as follows:
Figure QLYQS_6
wherein->
Figure QLYQS_7
Representing an input cluster center with N segmentation classes (ROI plus background) and D channels; c' represents the center of update; features projected from pixel features and class queries are represented using superscripts j and c; q (Q) c ∈R N×D ,K j ∈R HW×D ,V j ∈R HW×D Linear projection features representing queries, keys, and values; k-means cross attention replaces the spatial direction softmax function operation in the common cross attention mechanism with argmax function; in this way, similar pixel features are clustered into the same cluster; the clustering center output by the last KMaX decoder is used for regularizing pixel characteristics so as to enhance the representation consistency of pixels in the same category; representing features from a pixel decoder as F de ,F de ∈R HW×D ;/>
Figure QLYQS_8
The clustered regularized pixel features are denoted as F de′ The subscript N indicates that the axis of softmax is applied to achieve multi-modal image segmentation.
CN202310150411.8A 2023-02-22 2023-02-22 Method for segmenting transform multi-mode image based on semantic constraint Pending CN116433898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310150411.8A CN116433898A (en) 2023-02-22 2023-02-22 Method for segmenting transform multi-mode image based on semantic constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310150411.8A CN116433898A (en) 2023-02-22 2023-02-22 Method for segmenting transform multi-mode image based on semantic constraint

Publications (1)

Publication Number Publication Date
CN116433898A true CN116433898A (en) 2023-07-14

Family

ID=87091470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310150411.8A Pending CN116433898A (en) 2023-02-22 2023-02-22 Method for segmenting transform multi-mode image based on semantic constraint

Country Status (1)

Country Link
CN (1) CN116433898A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912503A (en) * 2023-09-14 2023-10-20 湖南大学 Multi-mode MRI brain tumor semantic segmentation method based on hierarchical fusion strategy
CN117152156A (en) * 2023-10-31 2023-12-01 通号通信信息集团有限公司 Railway anomaly detection method and system based on multi-mode data fusion
CN117726990A (en) * 2023-12-27 2024-03-19 浙江恒逸石化有限公司 Method and device for detecting spinning workshop, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912503A (en) * 2023-09-14 2023-10-20 湖南大学 Multi-mode MRI brain tumor semantic segmentation method based on hierarchical fusion strategy
CN116912503B (en) * 2023-09-14 2023-12-01 湖南大学 Multi-mode MRI brain tumor semantic segmentation method based on hierarchical fusion strategy
CN117152156A (en) * 2023-10-31 2023-12-01 通号通信信息集团有限公司 Railway anomaly detection method and system based on multi-mode data fusion
CN117152156B (en) * 2023-10-31 2024-02-13 通号通信信息集团有限公司 Railway anomaly detection method and system based on multi-mode data fusion
CN117726990A (en) * 2023-12-27 2024-03-19 浙江恒逸石化有限公司 Method and device for detecting spinning workshop, electronic equipment and storage medium
CN117726990B (en) * 2023-12-27 2024-05-03 浙江恒逸石化有限公司 Method and device for detecting spinning workshop, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Ren et al. A coarse-to-fine indoor layout estimation (cfile) method
CN116433898A (en) Method for segmenting transform multi-mode image based on semantic constraint
CN111461232A (en) Nuclear magnetic resonance image classification method based on multi-strategy batch type active learning
CN110675316B (en) Multi-domain image conversion method, system and medium for generating countermeasure network based on condition
CN111369565A (en) Digital pathological image segmentation and classification method based on graph convolution network
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN114638994B (en) Multi-modal image classification system and method based on attention multi-interaction network
CN112949707A (en) Cross-mode face image generation method based on multi-scale semantic information supervision
Wang et al. Multiscale transunet++: dense hybrid u-net with transformer for medical image segmentation
Chen et al. Harnessing semantic segmentation masks for accurate facial attribute editing
WO2021139351A1 (en) Image segmentation method, apparatus, medium, and electronic device
CN115249382A (en) Method for detecting silence living body based on Transformer and CNN
CN114972016A (en) Image processing method, image processing apparatus, computer device, storage medium, and program product
Yang et al. RAU-Net: U-Net network based on residual multi-scale fusion and attention skip layer for overall spine segmentation
Li et al. Transformer and group parallel axial attention co-encoder for medical image segmentation
Gao A method for face image inpainting based on generative adversarial networks
Zhao et al. Generative landmarks guided eyeglasses removal 3D face reconstruction
CN111667488B (en) Medical image segmentation method based on multi-angle U-Net
WO2023160157A1 (en) Three-dimensional medical image recognition method and apparatus, and device, storage medium and product
Li et al. CorrDiff: Corrective Diffusion Model for Accurate MRI Brain Tumor Segmentation
WO2022226744A1 (en) Texture completion
CN112164078B (en) RGB-D multi-scale semantic segmentation method based on encoder-decoder
CN115345886B (en) Brain glioma segmentation method based on multi-modal fusion
Chen et al. FSC-UNet: a lightweight medical image segmentation algorithm fused with skip connections
CN117911705B (en) Brain MRI (magnetic resonance imaging) tumor segmentation method based on GAN-UNet variant network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination