CN117314938B - Image segmentation method and device based on multi-scale feature fusion decoding - Google Patents

Image segmentation method and device based on multi-scale feature fusion decoding Download PDF

Info

Publication number
CN117314938B
CN117314938B CN202311529949.6A CN202311529949A CN117314938B CN 117314938 B CN117314938 B CN 117314938B CN 202311529949 A CN202311529949 A CN 202311529949A CN 117314938 B CN117314938 B CN 117314938B
Authority
CN
China
Prior art keywords
scale
tensor
feature map
fusion
embedded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311529949.6A
Other languages
Chinese (zh)
Other versions
CN117314938A (en
Inventor
马腾辉
李叶
许乐乐
徐金中
郭丽丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Technology and Engineering Center for Space Utilization of CAS
Original Assignee
Technology and Engineering Center for Space Utilization of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technology and Engineering Center for Space Utilization of CAS filed Critical Technology and Engineering Center for Space Utilization of CAS
Priority to CN202311529949.6A priority Critical patent/CN117314938B/en
Publication of CN117314938A publication Critical patent/CN117314938A/en
Application granted granted Critical
Publication of CN117314938B publication Critical patent/CN117314938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

An image segmentation method and device based on multi-scale feature fusion decoding in an embodiment of the disclosure, the method comprises: acquiring a multi-scale feature map of an original image; upsampling the multiscale feature map to obtain an upsampled feature map, fusing the multiscale feature map and the upsampled feature map to obtain a multiscale fused feature map, and sequentially encoding the multiscale fused feature map to generate a multiscale embedded tensor; decoding the multi-scale embedded tensor to obtain a multi-scale mask tensor and a multi-scale key corner tensor; re-decoding the multi-scale embedded tensor to obtain a multi-scale contour tensor; and splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale contour tensor into a multi-scale fusion query quantity, and encoding the multi-scale fusion query quantity to obtain a final image segmentation result. The method analyzes the mask, the outline and the key corner points of the local features of the global features, performs multi-scale feature fusion decoding, and improves the image instance segmentation precision.

Description

Image segmentation method and device based on multi-scale feature fusion decoding
Technical Field
The embodiment of the disclosure relates to the technical field of computer vision, in particular to an image segmentation method, an image segmentation device, computer equipment and a computer readable storage medium based on multi-scale feature fusion decoding.
Background
Image instance segmentation is an important task in the field of computer vision, aimed at separating and marking out different object instances in an image. The technology has wide application prospect in the fields of automatic driving, medical image processing, video monitoring and the like. Conventional image instance segmentation methods typically use manually designed features and classifiers that have limited effectiveness in addressing complex instance segmentation problems. In recent years, development of deep learning techniques has advanced rapid progress in the field of image instance segmentation. The deep learning models such as Convolutional Neural Network (CNN) can extract high-level features from the images, so that the example segmentation task is more accurate and robust. However, the instance segmentation task remains challenging due to the different sizes, shapes, and complexities of object instances in the image. In the existing deep learning model, all details and features of an object instance cannot be captured based on feature extraction of a single scale, and the segmentation accuracy of the existing deep learning model needs to be further improved.
Disclosure of Invention
An object of an embodiment of the present disclosure is to provide an image segmentation method, apparatus, computer device and computer readable storage medium based on multi-scale feature fusion decoding, so as to solve the foregoing problems in the prior art.
In order to achieve the above objective, the technical solution adopted in the embodiments of the present disclosure is as follows:
an aspect of an embodiment of the present disclosure provides an image segmentation method based on multi-scale feature fusion coding, the method including:
acquiring a multi-scale feature map of an image to be segmented;
performing up-sampling on the minimum-scale feature map in the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encoding the multi-scale fused feature map to generate a multi-scale embedded tensor;
decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor;
and splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result.
Illustratively, the acquiring the multi-scale feature map of the image to be segmented includes:
acquiring an original image to be segmented;
and carrying out convolution calculation and downsampling on the original image in sequence by adopting a maximum pooling method to obtain a multi-scale feature map.
Exemplary, the performing up-sampling on the minimum scale feature map of the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, and fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, including:
continuously upsampling the minimum scale feature map of the multi-scale feature map for multiple times to obtain multi-scale upsampled feature maps, the number of which is the same as that of the multi-scale feature maps;
and respectively superposing the multi-scale feature map and the up-sampling feature map with corresponding scales, and performing convolution smoothing on the superposed multi-scale feature map to obtain a multi-scale fusion feature map.
Illustratively, the encoding the multi-scale fusion feature map sequentially generates a multi-scale embedded tensor, including:
respectively carrying out self-attention calculation on the multi-scale fusion feature images to obtain corresponding initial embedded tensors;
and respectively carrying out two linear transformations on the multi-scale initial embedded tensor, and carrying out nonlinear ReLU activation in the middle of the two linear transformations to generate a final multi-scale embedded tensor.
Illustratively, decoding the multi-scale embedded tensor according to the learnable query volume to obtain a multi-scale mask tensor and a multi-scale key corner tensor, including:
performing self-attention computation and nonlinear transformation on the multiscale embedded tensor respectively to obtain corresponding first output, wherein the query quantity, key and value of the self-attention computation are all corresponding embedded tensors;
performing cross attention calculation and nonlinear transformation on the first output to obtain a corresponding second output; the query quantity in the cross attention calculation is a parameter quantity which can be learned, and the key and the value are the first output for carrying out self attention calculation corresponding to the multi-scale embedded tensor;
and respectively carrying out dot product operation on the second output and the fusion feature map with the largest scale to obtain a multi-scale mask tensor and a multi-scale key corner tensor.
Illustratively, the re-decoding the multi-scale embedded tensor by using the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor includes:
and respectively taking the multi-scale mask tensors as query quantities, wherein keys and values correspond to the multi-scale embedded tensors, respectively carrying out cross attention calculation on the multi-scale embedded tensors, carrying out nonlinear transformation on cross attention calculation output, and carrying out dot product operation on a nonlinear transformation result and a fused feature map with the largest scale to obtain the multi-scale contour tensor.
Illustratively, the stitching the multi-scale mask tensor, the multi-scale key corner tensor, and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result, includes:
respectively splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor to obtain a multi-scale fusion query quantity;
and performing self-attention calculation and nonlinear transformation on the multi-scale fusion query volume, performing dot product operation with the fusion feature map with the largest scale to obtain segmentation results with different scales, and accumulating the segmentation results with different scales to obtain a final image instance segmentation result.
Another aspect of the disclosed embodiments provides an image segmentation apparatus based on multi-scale feature fusion coding, the apparatus comprising:
the feature extraction network is used for acquiring images and extracting a multi-scale feature map;
the encoder is used for carrying out up-sampling on the multi-scale feature map for multiple times to obtain a corresponding up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of a corresponding scale, and sequentially encoding the fused multi-scale feature map to generate a multi-scale embedded tensor;
the multi-scale feature decoder is used for decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
the contour decoder is used for re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor;
the fusion decoder is used for splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor into multi-scale fusion query volume, and encoding the multi-scale fusion tensor to obtain a final image segmentation result.
Another aspect of the disclosed embodiments provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.
Another aspect of the disclosed embodiments provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as described above.
The beneficial effects of the embodiment of the disclosure are that:
according to the image instance segmentation method based on multi-scale feature fusion decoding, global features such as masks, outlines and key corner points of local features are analyzed, multi-scale feature fusion decoding is carried out, and image instance segmentation accuracy is improved. The method disclosed by the invention is simple and convenient to operate and good in segmentation effect.
Drawings
FIG. 1 is a schematic flow diagram of an image segmentation method based on multi-scale feature fusion decoding according to an embodiment of the disclosure;
FIG. 2 is a schematic structural diagram of an image segmentation apparatus based on multi-scale feature fusion decoding according to an embodiment of the disclosure;
fig. 3 is a workflow diagram of an image segmentation apparatus based on multi-scale feature fusion coding in accordance with an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of the disclosed embodiments and is not intended to limit the disclosed embodiments.
As shown in fig. 1, an embodiment of the present disclosure proposes an image segmentation method based on multi-scale feature fusion decoding, where the method includes:
and S1, acquiring a multi-scale feature map of the image to be segmented.
As an example, the acquiring a multi-scale feature map of an image to be segmented includes:
step S11, obtaining an original image to be segmented.
Step S12, carrying out convolution calculation on the original image in sequence and adopting a maximum pooling method to carry out downsampling
The sample is used for obtaining a multi-scale characteristic diagram, and the specific formula is as follows:
D n+1 =f downsample (f Conv (D n ),n=0,1,2,3
wherein f downsample () Representing the downsampling process, f Conv Representing the convolution process, D 0 D is the original image 1 ~D 4 A feature map of progressively decreasing size is obtained for each of four consecutive downsampling processes.
The maximum pooling method in the step S12 collects the information of the local areas by downsampling the local areas in the input feature map, so that the number of parameters is reduced, the complexity of calculation is reduced, the calculation cost of training and reasoning is reduced, the risk of fitting is reduced, key information in the input feature map is reserved, and important features can be better identified and learned.
And S2, carrying out up-sampling on the minimum scale feature map in the multi-scale feature map for a plurality of times to obtain a multi-scale up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encoding the multi-scale fused feature map to generate a multi-scale embedded tensor.
As an example, the performing up-sampling on the minimum scale feature map of the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, and fusing the multi-scale feature map with the up-sampling feature map of a corresponding scale to obtain a multi-scale fused feature map, including:
step 21, performing continuous multiple upsampling on the minimum scale feature map of the multi-scale feature map to obtain multi-scale upsampled feature maps with the same number as the multi-scale feature maps, which specifically includes the following steps:
U n+1 =f upsample (U n ),n=0,1,2,3
wherein f upsample () Representing the upsampling process, U 0 Feature map D, which is the smallest dimension in a multi-scale feature map 4 ,U 1 ~U 4 For carrying out up sampling for four times continuously, an up-sampling characteristic diagram with gradually increased size is obtained;
step 22, respectively superposing the multi-scale feature map and the up-sampling feature map with corresponding scale, and performing convolution smoothing on the superposed multi-scale feature map to obtain a multi-scale fusion feature map, wherein the steps are as follows:
C n =f Conv3×3 (U n +D 5-n ),n=1,2,3,4
wherein f Conv3×3 Representing a 3 x 3 convolution calculation, C 1 ~C 4 And (3) superposing and smoothing the multi-scale feature map and the up-sampling feature map with the corresponding scale to obtain a fusion feature map.
In the embodiment of the disclosure, in the process of overlapping the multi-scale feature map and the corresponding up-sampling feature map, up-sampling operation with transverse connection is used respectively, and in order to eliminate the problem of insufficient fusion possibly caused by direct addition of corresponding elements of the two feature maps, smoothing processing is performed on the feature map after fusion by using a 3×3 convolution, so that a fused feature map with more sufficient fusion is obtained.
As an example, the encoding the multi-scale fusion feature map sequentially generates a multi-scale embedded tensor, including:
step 23, fusing the characteristic graphs C with multiple scales n (n=1, 2,3, 4) respectively performing self-attention calculation to obtain corresponding initial embedding tensor Z 1 ~Z 4 The self-attention function formula is:
wherein Attention () is an Attention function, d k Representing the query quantity Q n Key K n Value V n Dimension size; query quantity Q n Key K n Value V n Are all nth fusion feature map C n Tensors of (c). Attention function Attention () can be described as mapping the query quantity Q and a set of key-value K-V pairs to an output, resulting in a fused feature map C n The score of each pixel position is the corresponding initial embedded tensor Z n To capture long-range dependencies between different locations of the image.
Step (a)24. For multiscale initial embedding tensor Z n Respectively performing two linear transformations, and generating a final embedded tensor Z 'through nonlinear ReLU activation in the middle' n The method specifically comprises the following steps:
FFN(Z n )=max(0,Z n W 1 +b 1 )W 2 +b 2 ,(n=1,2,3,4)
wherein FFN () represents the nonlinear ReLU activation computation between two linear transforms, W 1 、b 1 、W 2 、b 2 All represent parameters.
Initial embedding tensor Z in embodiments of the present disclosure 1 ~Z 4 Through two linear transformations and through a nonlinear ReLU activation in between for initial embedding tensor Z 1 ~Z 4 Non-linear transformation and mapping to generate a final output embedded tensor Z' 1 ~Z’ 4 The model expression and generalization capability are increased, so that the performance is improved.
And step S3, decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor.
Step S3 of an embodiment of the present disclosure is to interact and integrate features of different scales to capture global context information.
As an example, the decoding the multi-scale embedded tensor according to the learnable query volume to obtain a multi-scale mask tensor and a multi-scale key corner tensor includes:
step S31, embedding tensor Z 'for multiple scales' n Respectively performing self-attention calculation and nonlinear transformation to obtain corresponding first output Z sn The specific formula is as follows:
Z sn =FFN(Attention(Z’ n ,Z’ n ,Z’ n )) n=1,2,3,4
step S31 of the presently disclosed embodiments queries the quantity Q 'in the self-attention computation' n Bond K' n Value V' n Are all embedded tensors Z' n Correspondingly generate Z s1 ~Z s4
Step S32, for the first output Z sn Respectively performing cross attention calculation and nonlinear transformation to obtain corresponding second output Z c1 ~Z c4 The specific formula is as follows:
Z cn =FFN(Attention(Q sn ,Z sn ,Z sn )) n=1,2,3,4
wherein the query quantity Q sn Is [100, b,256 ]]Is a learnable parameter number, key K sn Value V sn Embedding tensors Z 'for corresponding multiscales' n Output Z for self-attention calculation sn
Query volume Q in embodiments of the present disclosure sn Is [100, b,256 ]]Wherein b is the number of input images per batch, and each 256-dimensional vector represents detected box information consisting of class information for distinguishing classes and spatial information (box coordinates) describing the position of the object in the images.
Step S33, outputting the second output Z c1 ~Z c4 Respectively with the largest scale fusion feature map C 4 Dot product operation is carried out to obtain a multi-scale mask tensor V mn And multiscale key corner tensor V pn The method specifically comprises the following steps:
V mn ,V pn =torch.mul(Z cn ,C 4 )n=1,2,3,4
where torch.mul () represents a dot product operation.
And S4, decoding the multi-scale embedded tensor again by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor.
As an example, the re-decoding the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor includes
Respectively by multiscale mask tensor V mn As a query volume, a tensor Z 'is embedded for multiple scales' 1 ~Z’ 4 Respectively performing cross attention calculation, performing nonlinear transformation on the cross attention calculation output, and combining the nonlinear transformationFusion characteristic diagram C with largest fruit and scale 4 Dot product operation is carried out to obtain a multi-scale contour tensor V rn The method specifically comprises the following steps:
V rn =torch.mul(FFN(Attention(V mn ,Z’ n ,Z’ n )),C 4 )n=1,2,3,4
wherein the key, value of the cross-attention computation is the corresponding multi-scale embedded tensor.
In step S4 of the embodiment of the disclosure, the multi-scale mask is used as the query quantity, so that the quality of the query quantity is improved, the perception of global features is increased, and the decoding capability is improved.
And S5, splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor into multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result.
As an example, the stitching the multi-scale mask tensor, the multi-scale key corner tensor, and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result, includes:
respectively divide the multi-scale mask tensor V mn Multi-scale key corner tensor V pn Multiscale contour tensor V rn Splicing to obtain multi-scale fusion query quantity B n The splicing formula is specifically as follows:
B n =Contact(V mn ,V pn ,V rn ),n=1,2,3,4
where Contact represents tensor stitching.
For the multiscale fusion query volume B n Performing self-attention calculation and nonlinear transformation, establishing association and interaction between global features and local features, and performing dot product operation with fused feature graphs with maximum scales to obtain segmentation results M with different scales n For M n The final image example segmentation result is obtained through accumulation, specifically:
M n =torch.mul(FFN(Attention(B n ,Z’ n ,Z’ n )),C 4 ) n=1,2,3,4
result=torch.add(M n ) n=1,2,3,4
where torch.add () represents the accumulation calculation.
The embodiment of the disclosure relates to an image instance segmentation method based on multi-scale feature fusion decoding, which is used for acquiring a multi-scale feature map from an input image; performing cross-layer superposition and encoding on the multi-scale feature map to generate a multi-scale embedded tensor; decoding from the multiscale embedded tensor through the learned query quantity to obtain a multiscale mask tensor and a multiscale key corner tensor; re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale outline tensor; and splicing the multi-scale mask tensor, the key corner tensor and the outline tensor into a fusion tensor, and encoding the fusion tensor to obtain a final image instance segmentation result. According to the image instance segmentation method based on multi-scale feature fusion decoding provided by the embodiment of the disclosure, global features (masks and contours) and local features (local key corner points) are analyzed, multi-scale feature fusion decoding is carried out, a false segmentation area is removed, a missing part of masks is filled, the saw-tooth effect of segmentation boundaries is reduced, and the image instance segmentation precision is improved.
As shown in fig. 2 and 3, another aspect of the embodiments of the present disclosure provides an image segmentation apparatus based on multi-scale feature fusion decoding, the apparatus including: feature extraction network 100, encoder 200, multi-scale feature decoder 300, contour decoder 400, and fusion decoder 500.
The feature extraction network 100 is configured to obtain a multi-scale feature map of an image to be segmented. The feature extraction network may include at least 1 convolution layer and 1 pooling layer; the convolution layer in the feature extraction network is used for obtaining an original image to be segmented, the pooling layer carries out downsampling on the original image by adopting a maximum pooling method after carrying out convolution calculation, and a multi-scale feature map is obtained, and the specific implementation method is as follows:
D n+1 =f downsample (f Conv (D n ),n=0,1,2,3
wherein f downsample () Representing the downsampling process, f Conv Representing the convolution process, D 0 D is the original image 1 ~D 4 A feature map of progressively decreasing size is obtained for each of four consecutive downsampling processes.
Downsampling to realize dimension reduction processing of the feature map; the pooling layer collects the information of the local areas in the input feature map by sampling the local areas, so that the number of parameters is reduced, the complexity of a model is reduced, the calculation cost of training and reasoning is reduced, the risk of overfitting is reduced, key information in the input feature map is reserved, and the segmentation device can better identify and learn important features.
The feature extraction network in the disclosed implementation example may be a residual network or a transform feature extraction network.
The feature extraction network is used as a backbone network for acquiring a multi-scale feature map of the image, and is used for processing the multi-scale problem in image instance segmentation.
The encoder 200 is configured to upsample the minimum-scale feature map of the multi-scale feature map multiple times to obtain a multi-scale upsampled feature map, fuse the multi-scale feature map with the upsampled feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encode the multi-scale fused feature map to generate a multi-scale embedded tensor.
The encoder may be a multi-scale deformable self-attention encoder comprising at least 1 base transducer layer, 1 upsampling layer, 1 superimposed layer, employing a self-attention mechanism; the up-sampling layer of the encoder is used for up-sampling the small-scale feature map of the image, and one or more up-sampling methods such as nearest interpolation, bilinear interpolation, transposition convolution and the like can be adopted for up-sampling; the up-sampling layer carries out continuous up-sampling for a plurality of times on the minimum scale feature map of the multi-scale feature map to obtain multi-scale up-sampling feature maps with the same number as the multi-scale feature map, and the method comprises the following steps:
U n+1 =f upsample (U n ),n=0,1,2,3
wherein f upsample () Representing the upsampling process, U 0 Feature map D, which is the smallest dimension in a multi-scale feature map 4 ,U 1 ~U 4 For four consecutive upsampling, an upsampled feature map of progressively larger size is obtained.
The superposition layer of the encoder is used for respectively superposing and smoothing the multi-scale feature map and the up-sampling feature map with corresponding scales to obtain an image multi-scale feature fusion feature map; the method comprises the following steps:
C n =f Conv3×3 (U n +D 5-n ),n=1,2,3,4
wherein f Conv3×3 Representing a 3 x 3 convolution calculation, C 1 ~C 4 And (3) superposing and smoothing the multi-scale feature map and the corresponding up-sampling feature map to obtain a fusion feature map.
Each superimposed layer is respectively provided with an up-sampling layer which is transversely connected, the number of network layers is increased by the up-sampling layer, the receptive field is increased, the expression capacity of the model is increased, and meanwhile, the size of the feature map is increased; meanwhile, in order to eliminate the problem of insufficient fusion possibly caused by direct addition of corresponding elements of the two feature images, smoothing processing is performed on the feature images after fusion by using 3X 3 convolution, so that a fused feature image with more sufficient fusion is obtained.
The basic transducer layer of the encoder comprises at least 1 self-attention module and 1 feedforward neural network, wherein the number of the heads in the basic transducer layer is at least 1, the self-attention module is used for capturing long-distance dependency relations between different positions of an image, and the self-attention module fuses the characteristic diagram C for multiple scales n Respectively performing self-attention calculation to obtain corresponding initial embedded tensor Z n The self-attention calculation formula is:
wherein, attention () is an Attention function, and the query quantity Q n Key K n Value V n Are all nth fusion feature map C n Tensor, d k Representing the query quantity Q n Key K n Value V n Dimension size.
Feedforward neural network of encoder embeds tensor Z to multiscale initiation n Performing two linear transformations, and performing nonlinear ReLU activation in the middle of the two linear transformations to generate a final multiscale embedded tensor Z' n The method specifically comprises the following steps:
FFN(Z n )=max(0,Z n W 1 +b 1 )W 2 +b 2
wherein FFN () represents the nonlinear ReLU activation computation between two linear transforms, W 1 、b 1 、W 2 、b 2 All are indicated as parameters.
The feedforward neural network is used for embedding tensor Z in the initial stage 1 ~Z 4 Performing nonlinear transformation and mapping to generate final output, and increasing model expression and generalization capability, thereby improving performance; the encoder encodes the multi-scale feature fusion feature map using a self-attention module and a feedforward neural network to generate a multi-scale embedded tensor.
The encoder of the embodiment of the disclosure carries out up-sampling on the small-scale feature map through the up-sampling layer, then stacks the up-sampled feature map and the multi-scale feature map by utilizing the stacking layer to realize fusion of the multi-scale feature map, and then calculates a multi-scale embedded tensor by utilizing the self-attention module aiming at the fused multi-scale feature map so as to capture long-distance dependency relations among different positions, so that the method can better process object examples with different scales and shapes in the image.
The multi-scale feature decoder 300 is configured to decode the multi-scale embedded tensor according to a learnable query volume to obtain a multi-scale mask tensor and a multi-scale key corner tensor.
The multi-scale feature decoder may include at least 1 layer of DetrTransformer decoding layers having at least 1 header number, the DetrTransformer decoding layers including at least 1 self-attention module, 1 cross-attention module, and 1 feedforward neural networkThe self-attention module of the decoder decodes the multi-scale embedded tensor according to the learned query quantity, and the self-attention module and the feedforward neural network pair the multi-scale embedded tensor Z' n Respectively performing self-attention calculation and nonlinear transformation to obtain corresponding first output Z sn The specific formula is as follows:
Z sn =FFN(Attention(Z’ n ,Z’ n ,Z’ n )) n=1,2,3,4
query volume Q 'in self-attention computation of embodiments of the present disclosure' n Bond K' n Value V' n Are all embedded tensors Z' n Correspondingly generate Z sn
Cross-attention module and feedforward neural network pair first output Z of multi-scale feature decoder sn Performing cross attention calculation and nonlinear transformation to obtain corresponding second output Z cn The specific formula is as follows:
Z cn =FFN(Attention(Q sn ,Z sn ,Z sn )) n=1,2,3,4
wherein the query quantity Q sn Is [100, b,256 ]]Is a learnable query quantity, key K sn Value V sn To embed tensors Z 'for multiple scales' n Output Z for self-attention calculation sn
The cross-attention module of the multi-scale feature decoder outputs a second output Z cn Fused feature map C with largest scale 4 Dot product operation is carried out to obtain a multi-scale mask tensor V mn And multiscale key corner tensor V pn The method specifically comprises the following steps:
V mn ,V pn =torch.mul(Z cn ,C 4 ) n=1,2,3,4
where torch.mul () represents a dot product operation.
The multi-scale feature decoder of the disclosed embodiments extracts local features by constraining cross-attention (cross-attention module) within the foreground region of the prediction mask for each query volume, resulting in multi-scale mask tensors and multi-scale key-point tensors.
The multi-scale feature decoder of the embodiment of the disclosure decodes from the multi-scale embedded tensor according to the learned query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor; the self-attention module and the cross-attention module of the multi-scale feature decoder are used for interacting and integrating features with different scales and capturing global context information. The query volume is a learnable embedded tensor, which can inject the information of the target category into the self-attention module, so that the method can obtain faster convergence speed and better performance.
The contour decoder 400 is configured to re-decode the multi-scale embedded tensor by using the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor.
The contour decoder may include at least 1 layer of DetrTransformer decoding layers, the number of heads in the DetrTransformer decoding layers being at least 1, the DetrTransformer decoding layers including at least 1 self-attention module, 1 cross-attention module and 1 feedforward neural network, the contour decoder re-decodes the multi-scale embedded tensor with the multi-scale mask tensor as a query volume, in particular, the cross-attention module of the contour decoder re-decodes the multi-scale embedded tensor with the respective multi-scale mask V mn As a query volume, a tensor Z 'is embedded for multiple scales' 1 ~Z’ 4 Respectively performing cross attention calculation, inputting the cross attention calculation output to a feedforward neural network to perform nonlinear transformation, and fusing the nonlinear transformation result with the largest scale of the feature graph C 4 Dot product operation is carried out on the cross attention module to obtain a multi-scale contour tensor V rn The method specifically comprises the following steps:
V rn =torch.mul(FFN(Attention(V mn ,Z’ n ,Z’ n )),C 4 ) n=1,2,3,4
wherein the key, value of the cross-attention computation is the corresponding multi-scale embedded tensor.
The contour decoder takes the multi-scale mask tensor as the query quantity, improves the quality of the query quantity, increases the perception of the model on the global characteristics, and improves the decoding capability.
The contour decoder of the embodiment of the disclosure uses the multi-scale mask tensor as a query quantity to decode the multi-scale embedded tensor again to obtain the multi-scale contour tensor; the cross attention module takes a multi-scale mask as the query quantity, improves the quality of the query quantity, increases the perception of the model on the global features, and improves the decoding capability. The feed-forward neural network is used to perform nonlinear transformation and mapping on the cross-attention computation results to generate final outputs that help model learn task-specific representations, thereby improving performance.
The fusion decoder 500 is configured to splice the multi-scale mask tensor, the multi-scale key corner tensor, and the multi-scale contour tensor into a fusion query volume, and encode the fusion tensor by using an encoder network structure to obtain a final image segmentation result.
The fusion decoder can be a self-attention encoder, comprises at least 1 layer of basic Transformer layer, and adopts a self-attention encoding mechanism; the number of heads in the base transducer layer is at least 1, including at least 1 self-attention module and 1 feedforward neural network. The self-attention module of the fusion decoder respectively processes the multi-scale mask tensor V mn Tensor of key corner V pn Contour tensor V rn Spliced into multi-scale fusion query quantity B n The splicing formula is specifically as follows:
B n =Contact(V mn ,V pn ,V rn ),n=1,2,3,4
where Contact represents tensor stitching.
The self-attention module of the fusion decoder is used for establishing a query-key-value relation based on the fusion query quantity, capturing multi-scale and multi-type information in the image and obtaining a final image instance segmentation result.
The self-attention module and the feedforward neural network of the fusion decoder pair the multiscale fusion query quantity B n Performing self-attention calculation and nonlinear transformation, establishing association and interaction between global features and local features, and performing dot product operation with fused feature graphs with maximum scales to obtain segmentation results M with different scales n For M n Accumulating to obtain final image instance segmentation nodesThe fruit is specifically as follows:
M n =torch.mul(FFN(Attention(B n ,Z’ n ,Z’ n )),C4) n=1,2,3,4
result=torch.add(M n ) n=1,2,3,4
where torch.add () represents the accumulation calculation.
The fusion decoder of the embodiment of the disclosure splices the multi-scale mask tensor, the key corner tensor and the outline tensor into the fusion query volume, and the self-attention module is used for establishing association and interaction between the global features and the local features so as to help the model to better understand semantic association and correlation of input data. And the fusion decoder encodes the fusion tensor by utilizing the encoder network structure to obtain a final image instance segmentation result.
The embodiment of the disclosure relates to an image instance segmentation device based on multi-scale feature fusion decoding, which comprises a feature extraction network, an encoder, a multi-scale feature decoder, a contour decoder and a fusion decoder; the feature extraction network is used for acquiring a multi-scale feature map from an input image; the encoder carries out cross-layer superposition and encoding on the multi-scale feature map to generate a multi-scale embedded tensor; the multi-scale feature decoder decodes the multi-scale embedded tensor through the learned query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor; and the contour decoder re-decodes the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain the multi-scale contour tensor. And the fusion decoder is used for splicing the multi-scale mask tensor, the key corner tensor and the outline tensor into a fusion tensor, and the fusion tensor is encoded by the fusion decoder by utilizing the encoder network structure to obtain a final image instance segmentation result.
Another aspect of the disclosed embodiments provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image segmentation method as described above when executing the program.
Another aspect of the disclosed embodiments provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image segmentation method as described above.
Wherein a computer readable storage medium may be any tangible medium that can contain, or store a program that can be an electronic, magnetic, optical, electromagnetic, infrared, semiconductor system, apparatus, device, more specific examples of which include, but are not limited to: a connection having one or more wires, a portable computer diskette, a hard disk, an optical fiber, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof. The computer-readable storage medium may also include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein, specific examples of which include, but are not limited to, electromagnetic signals, optical signals, or any suitable combination thereof.
The foregoing is merely a preferred implementation of the embodiments of the disclosure, and it should be noted that, for a person skilled in the art, several improvements and modifications may be made without departing from the principles of the embodiments of the disclosure, which should also be considered as protective scope of the embodiments of the disclosure.

Claims (6)

1. An image segmentation method based on multi-scale feature fusion decoding, the method comprising:
acquiring a multi-scale feature map of an image to be segmented;
performing up-sampling on the minimum-scale feature map in the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encoding the multi-scale fused feature map to generate a multi-scale embedded tensor;
the encoding of the multi-scale fusion feature map in turn generates a multi-scale embedded tensor, comprising: respectively carrying out self-attention calculation on the multi-scale fusion feature images to obtain corresponding initial embedded tensors;
respectively carrying out two linear transformations on the multi-scale initial embedding tensor, and carrying out nonlinear ReLU activation in the middle of the two linear transformations to generate a final multi-scale embedding tensor;
decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor, wherein the method specifically comprises the following steps of: performing self-attention computation and nonlinear transformation on the multi-scale embedded tensors respectively to obtain corresponding first output, wherein the query quantity, key and value of the self-attention computation are all corresponding multi-scale embedded tensors;
performing cross attention calculation and nonlinear transformation on the first output to obtain a corresponding second output; the query quantity in the cross attention calculation is a parameter quantity which can be learned, and the key and the value are the first output for carrying out self attention calculation corresponding to the multi-scale embedded tensor;
respectively carrying out dot product operation on the second output and the fusion feature map with the largest scale to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor, wherein the method specifically comprises the following steps of: respectively taking the multi-scale mask tensor as query quantity, wherein keys and values correspond to the multi-scale embedded tensor, respectively carrying out cross attention calculation on the multi-scale embedded tensor, carrying out nonlinear transformation on cross attention calculation output, and carrying out dot product operation on a nonlinear transformation result and a fused feature map with the largest scale to obtain a multi-scale contour tensor;
splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result, wherein the method specifically comprises the following steps of:
respectively splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor to obtain a multi-scale fusion query quantity;
and performing self-attention calculation and nonlinear transformation on the multi-scale fusion query volume, performing dot product operation with the fusion feature map with the largest scale to obtain segmentation results with different scales, and accumulating the segmentation results with different scales to obtain a final image instance segmentation result.
2. The method of claim 1, wherein the acquiring the multi-scale feature map of the image to be segmented comprises:
acquiring an original image to be segmented;
and carrying out convolution calculation and downsampling on the original image in sequence by adopting a maximum pooling method to obtain a multi-scale feature map.
3. The method according to claim 1 or 2, wherein the upsampling the minimum-scale feature map of the multi-scale feature map multiple times to obtain a multi-scale upsampled feature map, and fusing the multi-scale feature map with the upsampled feature map of the corresponding scale to obtain a multi-scale fused feature map, includes:
continuously upsampling the minimum scale feature map of the multi-scale feature map for multiple times to obtain multi-scale upsampled feature maps, the number of which is the same as that of the multi-scale feature maps;
and respectively superposing the multi-scale feature map and the up-sampling feature map with corresponding scales, and performing convolution smoothing on the superposed multi-scale feature map to obtain a multi-scale fusion feature map.
4. An image segmentation apparatus based on multi-scale feature fusion coding, the apparatus comprising:
the feature extraction network is used for acquiring images and extracting multi-scale feature images;
the coder is used for carrying out up-sampling on the multi-scale feature map for a plurality of times to obtain a corresponding up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map with a corresponding scale, and sequentially coding the fused multi-scale feature map to generate a multi-scale embedded tensor; the encoding of the multi-scale fusion feature map in turn generates a multi-scale embedded tensor, comprising: respectively carrying out self-attention calculation on the multi-scale fusion feature images to obtain corresponding initial embedded tensors;
respectively carrying out two linear transformations on the multi-scale initial embedding tensor, and carrying out nonlinear ReLU activation in the middle of the two linear transformations to generate a final multi-scale embedding tensor;
the multi-scale feature decoder is configured to decode the multi-scale embedded tensor according to a learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor, and specifically includes: performing self-attention computation and nonlinear transformation on the multi-scale embedded tensors respectively to obtain corresponding first output, wherein the query quantity, key and value of the self-attention computation are all corresponding multi-scale embedded tensors;
performing cross attention calculation and nonlinear transformation on the first output to obtain a corresponding second output; the query quantity in the cross attention calculation is a parameter quantity which can be learned, and the key and the value are the first output for carrying out self attention calculation corresponding to the multi-scale embedded tensor;
respectively carrying out dot product operation on the second output and the fusion feature map with the largest scale to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
the contour decoder is configured to re-decode the multi-scale embedded tensor by using the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor, and specifically includes: respectively taking the multi-scale mask tensor as query quantity, wherein keys and values correspond to the multi-scale embedded tensor, respectively carrying out cross attention calculation on the multi-scale embedded tensor, carrying out nonlinear transformation on cross attention calculation output, and carrying out dot product operation on a nonlinear transformation result and a fused feature map with the largest scale to obtain a multi-scale contour tensor;
the fusion decoder is used for splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor into multi-scale fusion query quantity, and encoding the multi-scale fusion tensor to obtain a final image segmentation result, and specifically comprises the following steps:
respectively splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor to obtain a multi-scale fusion query quantity;
and performing self-attention calculation and nonlinear transformation on the multi-scale fusion query volume, performing dot product operation with the fusion feature map with the largest scale to obtain segmentation results with different scales, and accumulating the segmentation results with different scales to obtain a final image instance segmentation result.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 3 when the program is executed by the processor.
6. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method according to any one of claims 1-3.
CN202311529949.6A 2023-11-16 2023-11-16 Image segmentation method and device based on multi-scale feature fusion decoding Active CN117314938B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311529949.6A CN117314938B (en) 2023-11-16 2023-11-16 Image segmentation method and device based on multi-scale feature fusion decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311529949.6A CN117314938B (en) 2023-11-16 2023-11-16 Image segmentation method and device based on multi-scale feature fusion decoding

Publications (2)

Publication Number Publication Date
CN117314938A CN117314938A (en) 2023-12-29
CN117314938B true CN117314938B (en) 2024-04-05

Family

ID=89237565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311529949.6A Active CN117314938B (en) 2023-11-16 2023-11-16 Image segmentation method and device based on multi-scale feature fusion decoding

Country Status (1)

Country Link
CN (1) CN117314938B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020216227A1 (en) * 2019-04-24 2020-10-29 华为技术有限公司 Image classification method and apparatus, and data processing method and apparatus
CN115761222A (en) * 2022-09-27 2023-03-07 阿里巴巴(中国)有限公司 Image segmentation method, remote sensing image segmentation method and device
CN116091942A (en) * 2023-02-16 2023-05-09 中国科学院半导体研究所 Feature enhancement and fusion small target detection method, device and equipment
CN116597263A (en) * 2023-05-12 2023-08-15 深圳亿嘉和科技研发有限公司 Training method and related device for image synthesis model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020216227A1 (en) * 2019-04-24 2020-10-29 华为技术有限公司 Image classification method and apparatus, and data processing method and apparatus
CN115761222A (en) * 2022-09-27 2023-03-07 阿里巴巴(中国)有限公司 Image segmentation method, remote sensing image segmentation method and device
CN116091942A (en) * 2023-02-16 2023-05-09 中国科学院半导体研究所 Feature enhancement and fusion small target detection method, device and equipment
CN116597263A (en) * 2023-05-12 2023-08-15 深圳亿嘉和科技研发有限公司 Training method and related device for image synthesis model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Double-branch U-Net for multi-scale organ segmentation;Liu Yuhao et al.;《Methods》;20221231;第1-8页 *
基于密集连接和Inception模块的前列腺图像分割;许瑶瑶 等;《电子测量技术》;20221231;第1-9页 *

Also Published As

Publication number Publication date
CN117314938A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN111444881A (en) Fake face video detection method and device
CN110929736B (en) Multi-feature cascading RGB-D significance target detection method
CN111984772B (en) Medical image question-answering method and system based on deep learning
CN112560831B (en) Pedestrian attribute identification method based on multi-scale space correction
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
CN114418030B (en) Image classification method, training method and device for image classification model
CN111881731A (en) Behavior recognition method, system, device and medium based on human skeleton
CN114529982A (en) Lightweight human body posture estimation method and system based on stream attention
CN114140831B (en) Human body posture estimation method and device, electronic equipment and storage medium
CN113807361A (en) Neural network, target detection method, neural network training method and related products
CN111325766A (en) Three-dimensional edge detection method and device, storage medium and computer equipment
CN114283352A (en) Video semantic segmentation device, training method and video semantic segmentation method
CN114756763A (en) False news detection method and device for social network
CN117078930A (en) Medical image segmentation method based on boundary sensing and attention mechanism
CN115797731A (en) Target detection model training method, target detection model detection method, terminal device and storage medium
CN113743521B (en) Target detection method based on multi-scale context awareness
CN114913342A (en) Motion blurred image line segment detection method and system fusing event and image
CN114677536A (en) Pre-training method and device based on Transformer structure
CN117314938B (en) Image segmentation method and device based on multi-scale feature fusion decoding
CN111539435A (en) Semantic segmentation model construction method, image segmentation equipment and storage medium
CN116229584A (en) Text segmentation recognition method, system, equipment and medium in artificial intelligence field
CN116152334A (en) Image processing method and related equipment
CN113780241A (en) Acceleration method and device for detecting salient object
CN112966569B (en) Image processing method and device, computer equipment and storage medium
Wyzykowski et al. A Universal Latent Fingerprint Enhancer Using Transformers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant