CN117314938B - Image segmentation method and device based on multi-scale feature fusion decoding - Google Patents
Image segmentation method and device based on multi-scale feature fusion decoding Download PDFInfo
- Publication number
- CN117314938B CN117314938B CN202311529949.6A CN202311529949A CN117314938B CN 117314938 B CN117314938 B CN 117314938B CN 202311529949 A CN202311529949 A CN 202311529949A CN 117314938 B CN117314938 B CN 117314938B
- Authority
- CN
- China
- Prior art keywords
- scale
- tensor
- feature map
- fusion
- embedded
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 99
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000003709 image segmentation Methods 0.000 title claims abstract description 26
- 230000011218 segmentation Effects 0.000 claims abstract description 36
- 238000004364 calculation method Methods 0.000 claims description 47
- 238000005070 sampling Methods 0.000 claims description 43
- 230000009466 transformation Effects 0.000 claims description 37
- 238000000605 extraction Methods 0.000 claims description 12
- 238000000844 transformation Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 238000009499 grossing Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000009825 accumulation Methods 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
An image segmentation method and device based on multi-scale feature fusion decoding in an embodiment of the disclosure, the method comprises: acquiring a multi-scale feature map of an original image; upsampling the multiscale feature map to obtain an upsampled feature map, fusing the multiscale feature map and the upsampled feature map to obtain a multiscale fused feature map, and sequentially encoding the multiscale fused feature map to generate a multiscale embedded tensor; decoding the multi-scale embedded tensor to obtain a multi-scale mask tensor and a multi-scale key corner tensor; re-decoding the multi-scale embedded tensor to obtain a multi-scale contour tensor; and splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale contour tensor into a multi-scale fusion query quantity, and encoding the multi-scale fusion query quantity to obtain a final image segmentation result. The method analyzes the mask, the outline and the key corner points of the local features of the global features, performs multi-scale feature fusion decoding, and improves the image instance segmentation precision.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of computer vision, in particular to an image segmentation method, an image segmentation device, computer equipment and a computer readable storage medium based on multi-scale feature fusion decoding.
Background
Image instance segmentation is an important task in the field of computer vision, aimed at separating and marking out different object instances in an image. The technology has wide application prospect in the fields of automatic driving, medical image processing, video monitoring and the like. Conventional image instance segmentation methods typically use manually designed features and classifiers that have limited effectiveness in addressing complex instance segmentation problems. In recent years, development of deep learning techniques has advanced rapid progress in the field of image instance segmentation. The deep learning models such as Convolutional Neural Network (CNN) can extract high-level features from the images, so that the example segmentation task is more accurate and robust. However, the instance segmentation task remains challenging due to the different sizes, shapes, and complexities of object instances in the image. In the existing deep learning model, all details and features of an object instance cannot be captured based on feature extraction of a single scale, and the segmentation accuracy of the existing deep learning model needs to be further improved.
Disclosure of Invention
An object of an embodiment of the present disclosure is to provide an image segmentation method, apparatus, computer device and computer readable storage medium based on multi-scale feature fusion decoding, so as to solve the foregoing problems in the prior art.
In order to achieve the above objective, the technical solution adopted in the embodiments of the present disclosure is as follows:
an aspect of an embodiment of the present disclosure provides an image segmentation method based on multi-scale feature fusion coding, the method including:
acquiring a multi-scale feature map of an image to be segmented;
performing up-sampling on the minimum-scale feature map in the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encoding the multi-scale fused feature map to generate a multi-scale embedded tensor;
decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor;
and splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result.
Illustratively, the acquiring the multi-scale feature map of the image to be segmented includes:
acquiring an original image to be segmented;
and carrying out convolution calculation and downsampling on the original image in sequence by adopting a maximum pooling method to obtain a multi-scale feature map.
Exemplary, the performing up-sampling on the minimum scale feature map of the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, and fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, including:
continuously upsampling the minimum scale feature map of the multi-scale feature map for multiple times to obtain multi-scale upsampled feature maps, the number of which is the same as that of the multi-scale feature maps;
and respectively superposing the multi-scale feature map and the up-sampling feature map with corresponding scales, and performing convolution smoothing on the superposed multi-scale feature map to obtain a multi-scale fusion feature map.
Illustratively, the encoding the multi-scale fusion feature map sequentially generates a multi-scale embedded tensor, including:
respectively carrying out self-attention calculation on the multi-scale fusion feature images to obtain corresponding initial embedded tensors;
and respectively carrying out two linear transformations on the multi-scale initial embedded tensor, and carrying out nonlinear ReLU activation in the middle of the two linear transformations to generate a final multi-scale embedded tensor.
Illustratively, decoding the multi-scale embedded tensor according to the learnable query volume to obtain a multi-scale mask tensor and a multi-scale key corner tensor, including:
performing self-attention computation and nonlinear transformation on the multiscale embedded tensor respectively to obtain corresponding first output, wherein the query quantity, key and value of the self-attention computation are all corresponding embedded tensors;
performing cross attention calculation and nonlinear transformation on the first output to obtain a corresponding second output; the query quantity in the cross attention calculation is a parameter quantity which can be learned, and the key and the value are the first output for carrying out self attention calculation corresponding to the multi-scale embedded tensor;
and respectively carrying out dot product operation on the second output and the fusion feature map with the largest scale to obtain a multi-scale mask tensor and a multi-scale key corner tensor.
Illustratively, the re-decoding the multi-scale embedded tensor by using the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor includes:
and respectively taking the multi-scale mask tensors as query quantities, wherein keys and values correspond to the multi-scale embedded tensors, respectively carrying out cross attention calculation on the multi-scale embedded tensors, carrying out nonlinear transformation on cross attention calculation output, and carrying out dot product operation on a nonlinear transformation result and a fused feature map with the largest scale to obtain the multi-scale contour tensor.
Illustratively, the stitching the multi-scale mask tensor, the multi-scale key corner tensor, and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result, includes:
respectively splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor to obtain a multi-scale fusion query quantity;
and performing self-attention calculation and nonlinear transformation on the multi-scale fusion query volume, performing dot product operation with the fusion feature map with the largest scale to obtain segmentation results with different scales, and accumulating the segmentation results with different scales to obtain a final image instance segmentation result.
Another aspect of the disclosed embodiments provides an image segmentation apparatus based on multi-scale feature fusion coding, the apparatus comprising:
the feature extraction network is used for acquiring images and extracting a multi-scale feature map;
the encoder is used for carrying out up-sampling on the multi-scale feature map for multiple times to obtain a corresponding up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of a corresponding scale, and sequentially encoding the fused multi-scale feature map to generate a multi-scale embedded tensor;
the multi-scale feature decoder is used for decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
the contour decoder is used for re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor;
the fusion decoder is used for splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor into multi-scale fusion query volume, and encoding the multi-scale fusion tensor to obtain a final image segmentation result.
Another aspect of the disclosed embodiments provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the program.
Another aspect of the disclosed embodiments provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method as described above.
The beneficial effects of the embodiment of the disclosure are that:
according to the image instance segmentation method based on multi-scale feature fusion decoding, global features such as masks, outlines and key corner points of local features are analyzed, multi-scale feature fusion decoding is carried out, and image instance segmentation accuracy is improved. The method disclosed by the invention is simple and convenient to operate and good in segmentation effect.
Drawings
FIG. 1 is a schematic flow diagram of an image segmentation method based on multi-scale feature fusion decoding according to an embodiment of the disclosure;
FIG. 2 is a schematic structural diagram of an image segmentation apparatus based on multi-scale feature fusion decoding according to an embodiment of the disclosure;
fig. 3 is a workflow diagram of an image segmentation apparatus based on multi-scale feature fusion coding in accordance with an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the embodiments of the present disclosure will be further described in detail below with reference to the accompanying drawings. It should be understood that the detailed description is merely illustrative of the disclosed embodiments and is not intended to limit the disclosed embodiments.
As shown in fig. 1, an embodiment of the present disclosure proposes an image segmentation method based on multi-scale feature fusion decoding, where the method includes:
and S1, acquiring a multi-scale feature map of the image to be segmented.
As an example, the acquiring a multi-scale feature map of an image to be segmented includes:
step S11, obtaining an original image to be segmented.
Step S12, carrying out convolution calculation on the original image in sequence and adopting a maximum pooling method to carry out downsampling
The sample is used for obtaining a multi-scale characteristic diagram, and the specific formula is as follows:
D n+1 =f downsample (f Conv (D n ),n=0,1,2,3
wherein f downsample () Representing the downsampling process, f Conv Representing the convolution process, D 0 D is the original image 1 ~D 4 A feature map of progressively decreasing size is obtained for each of four consecutive downsampling processes.
The maximum pooling method in the step S12 collects the information of the local areas by downsampling the local areas in the input feature map, so that the number of parameters is reduced, the complexity of calculation is reduced, the calculation cost of training and reasoning is reduced, the risk of fitting is reduced, key information in the input feature map is reserved, and important features can be better identified and learned.
And S2, carrying out up-sampling on the minimum scale feature map in the multi-scale feature map for a plurality of times to obtain a multi-scale up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encoding the multi-scale fused feature map to generate a multi-scale embedded tensor.
As an example, the performing up-sampling on the minimum scale feature map of the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, and fusing the multi-scale feature map with the up-sampling feature map of a corresponding scale to obtain a multi-scale fused feature map, including:
step 21, performing continuous multiple upsampling on the minimum scale feature map of the multi-scale feature map to obtain multi-scale upsampled feature maps with the same number as the multi-scale feature maps, which specifically includes the following steps:
U n+1 =f upsample (U n ),n=0,1,2,3
wherein f upsample () Representing the upsampling process, U 0 Feature map D, which is the smallest dimension in a multi-scale feature map 4 ,U 1 ~U 4 For carrying out up sampling for four times continuously, an up-sampling characteristic diagram with gradually increased size is obtained;
step 22, respectively superposing the multi-scale feature map and the up-sampling feature map with corresponding scale, and performing convolution smoothing on the superposed multi-scale feature map to obtain a multi-scale fusion feature map, wherein the steps are as follows:
C n =f Conv3×3 (U n +D 5-n ),n=1,2,3,4
wherein f Conv3×3 Representing a 3 x 3 convolution calculation, C 1 ~C 4 And (3) superposing and smoothing the multi-scale feature map and the up-sampling feature map with the corresponding scale to obtain a fusion feature map.
In the embodiment of the disclosure, in the process of overlapping the multi-scale feature map and the corresponding up-sampling feature map, up-sampling operation with transverse connection is used respectively, and in order to eliminate the problem of insufficient fusion possibly caused by direct addition of corresponding elements of the two feature maps, smoothing processing is performed on the feature map after fusion by using a 3×3 convolution, so that a fused feature map with more sufficient fusion is obtained.
As an example, the encoding the multi-scale fusion feature map sequentially generates a multi-scale embedded tensor, including:
step 23, fusing the characteristic graphs C with multiple scales n (n=1, 2,3, 4) respectively performing self-attention calculation to obtain corresponding initial embedding tensor Z 1 ~Z 4 The self-attention function formula is:
wherein Attention () is an Attention function, d k Representing the query quantity Q n Key K n Value V n Dimension size; query quantity Q n Key K n Value V n Are all nth fusion feature map C n Tensors of (c). Attention function Attention () can be described as mapping the query quantity Q and a set of key-value K-V pairs to an output, resulting in a fused feature map C n The score of each pixel position is the corresponding initial embedded tensor Z n To capture long-range dependencies between different locations of the image.
Step (a)24. For multiscale initial embedding tensor Z n Respectively performing two linear transformations, and generating a final embedded tensor Z 'through nonlinear ReLU activation in the middle' n The method specifically comprises the following steps:
FFN(Z n )=max(0,Z n W 1 +b 1 )W 2 +b 2 ,(n=1,2,3,4)
wherein FFN () represents the nonlinear ReLU activation computation between two linear transforms, W 1 、b 1 、W 2 、b 2 All represent parameters.
Initial embedding tensor Z in embodiments of the present disclosure 1 ~Z 4 Through two linear transformations and through a nonlinear ReLU activation in between for initial embedding tensor Z 1 ~Z 4 Non-linear transformation and mapping to generate a final output embedded tensor Z' 1 ~Z’ 4 The model expression and generalization capability are increased, so that the performance is improved.
And step S3, decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor.
Step S3 of an embodiment of the present disclosure is to interact and integrate features of different scales to capture global context information.
As an example, the decoding the multi-scale embedded tensor according to the learnable query volume to obtain a multi-scale mask tensor and a multi-scale key corner tensor includes:
step S31, embedding tensor Z 'for multiple scales' n Respectively performing self-attention calculation and nonlinear transformation to obtain corresponding first output Z sn The specific formula is as follows:
Z sn =FFN(Attention(Z’ n ,Z’ n ,Z’ n )) n=1,2,3,4
step S31 of the presently disclosed embodiments queries the quantity Q 'in the self-attention computation' n Bond K' n Value V' n Are all embedded tensors Z' n Correspondingly generate Z s1 ~Z s4 。
Step S32, for the first output Z sn Respectively performing cross attention calculation and nonlinear transformation to obtain corresponding second output Z c1 ~Z c4 The specific formula is as follows:
Z cn =FFN(Attention(Q sn ,Z sn ,Z sn )) n=1,2,3,4
wherein the query quantity Q sn Is [100, b,256 ]]Is a learnable parameter number, key K sn Value V sn Embedding tensors Z 'for corresponding multiscales' n Output Z for self-attention calculation sn 。
Query volume Q in embodiments of the present disclosure sn Is [100, b,256 ]]Wherein b is the number of input images per batch, and each 256-dimensional vector represents detected box information consisting of class information for distinguishing classes and spatial information (box coordinates) describing the position of the object in the images.
Step S33, outputting the second output Z c1 ~Z c4 Respectively with the largest scale fusion feature map C 4 Dot product operation is carried out to obtain a multi-scale mask tensor V mn And multiscale key corner tensor V pn The method specifically comprises the following steps:
V mn ,V pn =torch.mul(Z cn ,C 4 )n=1,2,3,4
where torch.mul () represents a dot product operation.
And S4, decoding the multi-scale embedded tensor again by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor.
As an example, the re-decoding the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor includes
Respectively by multiscale mask tensor V mn As a query volume, a tensor Z 'is embedded for multiple scales' 1 ~Z’ 4 Respectively performing cross attention calculation, performing nonlinear transformation on the cross attention calculation output, and combining the nonlinear transformationFusion characteristic diagram C with largest fruit and scale 4 Dot product operation is carried out to obtain a multi-scale contour tensor V rn The method specifically comprises the following steps:
V rn =torch.mul(FFN(Attention(V mn ,Z’ n ,Z’ n )),C 4 )n=1,2,3,4
wherein the key, value of the cross-attention computation is the corresponding multi-scale embedded tensor.
In step S4 of the embodiment of the disclosure, the multi-scale mask is used as the query quantity, so that the quality of the query quantity is improved, the perception of global features is increased, and the decoding capability is improved.
And S5, splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor into multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result.
As an example, the stitching the multi-scale mask tensor, the multi-scale key corner tensor, and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result, includes:
respectively divide the multi-scale mask tensor V mn Multi-scale key corner tensor V pn Multiscale contour tensor V rn Splicing to obtain multi-scale fusion query quantity B n The splicing formula is specifically as follows:
B n =Contact(V mn ,V pn ,V rn ),n=1,2,3,4
where Contact represents tensor stitching.
For the multiscale fusion query volume B n Performing self-attention calculation and nonlinear transformation, establishing association and interaction between global features and local features, and performing dot product operation with fused feature graphs with maximum scales to obtain segmentation results M with different scales n For M n The final image example segmentation result is obtained through accumulation, specifically:
M n =torch.mul(FFN(Attention(B n ,Z’ n ,Z’ n )),C 4 ) n=1,2,3,4
result=torch.add(M n ) n=1,2,3,4
where torch.add () represents the accumulation calculation.
The embodiment of the disclosure relates to an image instance segmentation method based on multi-scale feature fusion decoding, which is used for acquiring a multi-scale feature map from an input image; performing cross-layer superposition and encoding on the multi-scale feature map to generate a multi-scale embedded tensor; decoding from the multiscale embedded tensor through the learned query quantity to obtain a multiscale mask tensor and a multiscale key corner tensor; re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale outline tensor; and splicing the multi-scale mask tensor, the key corner tensor and the outline tensor into a fusion tensor, and encoding the fusion tensor to obtain a final image instance segmentation result. According to the image instance segmentation method based on multi-scale feature fusion decoding provided by the embodiment of the disclosure, global features (masks and contours) and local features (local key corner points) are analyzed, multi-scale feature fusion decoding is carried out, a false segmentation area is removed, a missing part of masks is filled, the saw-tooth effect of segmentation boundaries is reduced, and the image instance segmentation precision is improved.
As shown in fig. 2 and 3, another aspect of the embodiments of the present disclosure provides an image segmentation apparatus based on multi-scale feature fusion decoding, the apparatus including: feature extraction network 100, encoder 200, multi-scale feature decoder 300, contour decoder 400, and fusion decoder 500.
The feature extraction network 100 is configured to obtain a multi-scale feature map of an image to be segmented. The feature extraction network may include at least 1 convolution layer and 1 pooling layer; the convolution layer in the feature extraction network is used for obtaining an original image to be segmented, the pooling layer carries out downsampling on the original image by adopting a maximum pooling method after carrying out convolution calculation, and a multi-scale feature map is obtained, and the specific implementation method is as follows:
D n+1 =f downsample (f Conv (D n ),n=0,1,2,3
wherein f downsample () Representing the downsampling process, f Conv Representing the convolution process, D 0 D is the original image 1 ~D 4 A feature map of progressively decreasing size is obtained for each of four consecutive downsampling processes.
Downsampling to realize dimension reduction processing of the feature map; the pooling layer collects the information of the local areas in the input feature map by sampling the local areas, so that the number of parameters is reduced, the complexity of a model is reduced, the calculation cost of training and reasoning is reduced, the risk of overfitting is reduced, key information in the input feature map is reserved, and the segmentation device can better identify and learn important features.
The feature extraction network in the disclosed implementation example may be a residual network or a transform feature extraction network.
The feature extraction network is used as a backbone network for acquiring a multi-scale feature map of the image, and is used for processing the multi-scale problem in image instance segmentation.
The encoder 200 is configured to upsample the minimum-scale feature map of the multi-scale feature map multiple times to obtain a multi-scale upsampled feature map, fuse the multi-scale feature map with the upsampled feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encode the multi-scale fused feature map to generate a multi-scale embedded tensor.
The encoder may be a multi-scale deformable self-attention encoder comprising at least 1 base transducer layer, 1 upsampling layer, 1 superimposed layer, employing a self-attention mechanism; the up-sampling layer of the encoder is used for up-sampling the small-scale feature map of the image, and one or more up-sampling methods such as nearest interpolation, bilinear interpolation, transposition convolution and the like can be adopted for up-sampling; the up-sampling layer carries out continuous up-sampling for a plurality of times on the minimum scale feature map of the multi-scale feature map to obtain multi-scale up-sampling feature maps with the same number as the multi-scale feature map, and the method comprises the following steps:
U n+1 =f upsample (U n ),n=0,1,2,3
wherein f upsample () Representing the upsampling process, U 0 Feature map D, which is the smallest dimension in a multi-scale feature map 4 ,U 1 ~U 4 For four consecutive upsampling, an upsampled feature map of progressively larger size is obtained.
The superposition layer of the encoder is used for respectively superposing and smoothing the multi-scale feature map and the up-sampling feature map with corresponding scales to obtain an image multi-scale feature fusion feature map; the method comprises the following steps:
C n =f Conv3×3 (U n +D 5-n ),n=1,2,3,4
wherein f Conv3×3 Representing a 3 x 3 convolution calculation, C 1 ~C 4 And (3) superposing and smoothing the multi-scale feature map and the corresponding up-sampling feature map to obtain a fusion feature map.
Each superimposed layer is respectively provided with an up-sampling layer which is transversely connected, the number of network layers is increased by the up-sampling layer, the receptive field is increased, the expression capacity of the model is increased, and meanwhile, the size of the feature map is increased; meanwhile, in order to eliminate the problem of insufficient fusion possibly caused by direct addition of corresponding elements of the two feature images, smoothing processing is performed on the feature images after fusion by using 3X 3 convolution, so that a fused feature image with more sufficient fusion is obtained.
The basic transducer layer of the encoder comprises at least 1 self-attention module and 1 feedforward neural network, wherein the number of the heads in the basic transducer layer is at least 1, the self-attention module is used for capturing long-distance dependency relations between different positions of an image, and the self-attention module fuses the characteristic diagram C for multiple scales n Respectively performing self-attention calculation to obtain corresponding initial embedded tensor Z n The self-attention calculation formula is:
wherein, attention () is an Attention function, and the query quantity Q n Key K n Value V n Are all nth fusion feature map C n Tensor, d k Representing the query quantity Q n Key K n Value V n Dimension size.
Feedforward neural network of encoder embeds tensor Z to multiscale initiation n Performing two linear transformations, and performing nonlinear ReLU activation in the middle of the two linear transformations to generate a final multiscale embedded tensor Z' n The method specifically comprises the following steps:
FFN(Z n )=max(0,Z n W 1 +b 1 )W 2 +b 2
wherein FFN () represents the nonlinear ReLU activation computation between two linear transforms, W 1 、b 1 、W 2 、b 2 All are indicated as parameters.
The feedforward neural network is used for embedding tensor Z in the initial stage 1 ~Z 4 Performing nonlinear transformation and mapping to generate final output, and increasing model expression and generalization capability, thereby improving performance; the encoder encodes the multi-scale feature fusion feature map using a self-attention module and a feedforward neural network to generate a multi-scale embedded tensor.
The encoder of the embodiment of the disclosure carries out up-sampling on the small-scale feature map through the up-sampling layer, then stacks the up-sampled feature map and the multi-scale feature map by utilizing the stacking layer to realize fusion of the multi-scale feature map, and then calculates a multi-scale embedded tensor by utilizing the self-attention module aiming at the fused multi-scale feature map so as to capture long-distance dependency relations among different positions, so that the method can better process object examples with different scales and shapes in the image.
The multi-scale feature decoder 300 is configured to decode the multi-scale embedded tensor according to a learnable query volume to obtain a multi-scale mask tensor and a multi-scale key corner tensor.
The multi-scale feature decoder may include at least 1 layer of DetrTransformer decoding layers having at least 1 header number, the DetrTransformer decoding layers including at least 1 self-attention module, 1 cross-attention module, and 1 feedforward neural networkThe self-attention module of the decoder decodes the multi-scale embedded tensor according to the learned query quantity, and the self-attention module and the feedforward neural network pair the multi-scale embedded tensor Z' n Respectively performing self-attention calculation and nonlinear transformation to obtain corresponding first output Z sn The specific formula is as follows:
Z sn =FFN(Attention(Z’ n ,Z’ n ,Z’ n )) n=1,2,3,4
query volume Q 'in self-attention computation of embodiments of the present disclosure' n Bond K' n Value V' n Are all embedded tensors Z' n Correspondingly generate Z sn 。
Cross-attention module and feedforward neural network pair first output Z of multi-scale feature decoder sn Performing cross attention calculation and nonlinear transformation to obtain corresponding second output Z cn The specific formula is as follows:
Z cn =FFN(Attention(Q sn ,Z sn ,Z sn )) n=1,2,3,4
wherein the query quantity Q sn Is [100, b,256 ]]Is a learnable query quantity, key K sn Value V sn To embed tensors Z 'for multiple scales' n Output Z for self-attention calculation sn 。
The cross-attention module of the multi-scale feature decoder outputs a second output Z cn Fused feature map C with largest scale 4 Dot product operation is carried out to obtain a multi-scale mask tensor V mn And multiscale key corner tensor V pn The method specifically comprises the following steps:
V mn ,V pn =torch.mul(Z cn ,C 4 ) n=1,2,3,4
where torch.mul () represents a dot product operation.
The multi-scale feature decoder of the disclosed embodiments extracts local features by constraining cross-attention (cross-attention module) within the foreground region of the prediction mask for each query volume, resulting in multi-scale mask tensors and multi-scale key-point tensors.
The multi-scale feature decoder of the embodiment of the disclosure decodes from the multi-scale embedded tensor according to the learned query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor; the self-attention module and the cross-attention module of the multi-scale feature decoder are used for interacting and integrating features with different scales and capturing global context information. The query volume is a learnable embedded tensor, which can inject the information of the target category into the self-attention module, so that the method can obtain faster convergence speed and better performance.
The contour decoder 400 is configured to re-decode the multi-scale embedded tensor by using the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor.
The contour decoder may include at least 1 layer of DetrTransformer decoding layers, the number of heads in the DetrTransformer decoding layers being at least 1, the DetrTransformer decoding layers including at least 1 self-attention module, 1 cross-attention module and 1 feedforward neural network, the contour decoder re-decodes the multi-scale embedded tensor with the multi-scale mask tensor as a query volume, in particular, the cross-attention module of the contour decoder re-decodes the multi-scale embedded tensor with the respective multi-scale mask V mn As a query volume, a tensor Z 'is embedded for multiple scales' 1 ~Z’ 4 Respectively performing cross attention calculation, inputting the cross attention calculation output to a feedforward neural network to perform nonlinear transformation, and fusing the nonlinear transformation result with the largest scale of the feature graph C 4 Dot product operation is carried out on the cross attention module to obtain a multi-scale contour tensor V rn The method specifically comprises the following steps:
V rn =torch.mul(FFN(Attention(V mn ,Z’ n ,Z’ n )),C 4 ) n=1,2,3,4
wherein the key, value of the cross-attention computation is the corresponding multi-scale embedded tensor.
The contour decoder takes the multi-scale mask tensor as the query quantity, improves the quality of the query quantity, increases the perception of the model on the global characteristics, and improves the decoding capability.
The contour decoder of the embodiment of the disclosure uses the multi-scale mask tensor as a query quantity to decode the multi-scale embedded tensor again to obtain the multi-scale contour tensor; the cross attention module takes a multi-scale mask as the query quantity, improves the quality of the query quantity, increases the perception of the model on the global features, and improves the decoding capability. The feed-forward neural network is used to perform nonlinear transformation and mapping on the cross-attention computation results to generate final outputs that help model learn task-specific representations, thereby improving performance.
The fusion decoder 500 is configured to splice the multi-scale mask tensor, the multi-scale key corner tensor, and the multi-scale contour tensor into a fusion query volume, and encode the fusion tensor by using an encoder network structure to obtain a final image segmentation result.
The fusion decoder can be a self-attention encoder, comprises at least 1 layer of basic Transformer layer, and adopts a self-attention encoding mechanism; the number of heads in the base transducer layer is at least 1, including at least 1 self-attention module and 1 feedforward neural network. The self-attention module of the fusion decoder respectively processes the multi-scale mask tensor V mn Tensor of key corner V pn Contour tensor V rn Spliced into multi-scale fusion query quantity B n The splicing formula is specifically as follows:
B n =Contact(V mn ,V pn ,V rn ),n=1,2,3,4
where Contact represents tensor stitching.
The self-attention module of the fusion decoder is used for establishing a query-key-value relation based on the fusion query quantity, capturing multi-scale and multi-type information in the image and obtaining a final image instance segmentation result.
The self-attention module and the feedforward neural network of the fusion decoder pair the multiscale fusion query quantity B n Performing self-attention calculation and nonlinear transformation, establishing association and interaction between global features and local features, and performing dot product operation with fused feature graphs with maximum scales to obtain segmentation results M with different scales n For M n Accumulating to obtain final image instance segmentation nodesThe fruit is specifically as follows:
M n =torch.mul(FFN(Attention(B n ,Z’ n ,Z’ n )),C4) n=1,2,3,4
result=torch.add(M n ) n=1,2,3,4
where torch.add () represents the accumulation calculation.
The fusion decoder of the embodiment of the disclosure splices the multi-scale mask tensor, the key corner tensor and the outline tensor into the fusion query volume, and the self-attention module is used for establishing association and interaction between the global features and the local features so as to help the model to better understand semantic association and correlation of input data. And the fusion decoder encodes the fusion tensor by utilizing the encoder network structure to obtain a final image instance segmentation result.
The embodiment of the disclosure relates to an image instance segmentation device based on multi-scale feature fusion decoding, which comprises a feature extraction network, an encoder, a multi-scale feature decoder, a contour decoder and a fusion decoder; the feature extraction network is used for acquiring a multi-scale feature map from an input image; the encoder carries out cross-layer superposition and encoding on the multi-scale feature map to generate a multi-scale embedded tensor; the multi-scale feature decoder decodes the multi-scale embedded tensor through the learned query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor; and the contour decoder re-decodes the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain the multi-scale contour tensor. And the fusion decoder is used for splicing the multi-scale mask tensor, the key corner tensor and the outline tensor into a fusion tensor, and the fusion tensor is encoded by the fusion decoder by utilizing the encoder network structure to obtain a final image instance segmentation result.
Another aspect of the disclosed embodiments provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image segmentation method as described above when executing the program.
Another aspect of the disclosed embodiments provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image segmentation method as described above.
Wherein a computer readable storage medium may be any tangible medium that can contain, or store a program that can be an electronic, magnetic, optical, electromagnetic, infrared, semiconductor system, apparatus, device, more specific examples of which include, but are not limited to: a connection having one or more wires, a portable computer diskette, a hard disk, an optical fiber, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof. The computer-readable storage medium may also include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein, specific examples of which include, but are not limited to, electromagnetic signals, optical signals, or any suitable combination thereof.
The foregoing is merely a preferred implementation of the embodiments of the disclosure, and it should be noted that, for a person skilled in the art, several improvements and modifications may be made without departing from the principles of the embodiments of the disclosure, which should also be considered as protective scope of the embodiments of the disclosure.
Claims (6)
1. An image segmentation method based on multi-scale feature fusion decoding, the method comprising:
acquiring a multi-scale feature map of an image to be segmented;
performing up-sampling on the minimum-scale feature map in the multi-scale feature map for multiple times to obtain a multi-scale up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map of the corresponding scale to obtain a multi-scale fused feature map, and sequentially encoding the multi-scale fused feature map to generate a multi-scale embedded tensor;
the encoding of the multi-scale fusion feature map in turn generates a multi-scale embedded tensor, comprising: respectively carrying out self-attention calculation on the multi-scale fusion feature images to obtain corresponding initial embedded tensors;
respectively carrying out two linear transformations on the multi-scale initial embedding tensor, and carrying out nonlinear ReLU activation in the middle of the two linear transformations to generate a final multi-scale embedding tensor;
decoding the multi-scale embedded tensor according to the learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor, wherein the method specifically comprises the following steps of: performing self-attention computation and nonlinear transformation on the multi-scale embedded tensors respectively to obtain corresponding first output, wherein the query quantity, key and value of the self-attention computation are all corresponding multi-scale embedded tensors;
performing cross attention calculation and nonlinear transformation on the first output to obtain a corresponding second output; the query quantity in the cross attention calculation is a parameter quantity which can be learned, and the key and the value are the first output for carrying out self attention calculation corresponding to the multi-scale embedded tensor;
respectively carrying out dot product operation on the second output and the fusion feature map with the largest scale to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
re-decoding the multi-scale embedded tensor by taking the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor, wherein the method specifically comprises the following steps of: respectively taking the multi-scale mask tensor as query quantity, wherein keys and values correspond to the multi-scale embedded tensor, respectively carrying out cross attention calculation on the multi-scale embedded tensor, carrying out nonlinear transformation on cross attention calculation output, and carrying out dot product operation on a nonlinear transformation result and a fused feature map with the largest scale to obtain a multi-scale contour tensor;
splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale contour tensor into a multi-scale fusion query volume, and encoding the multi-scale fusion query volume to obtain a final image segmentation result, wherein the method specifically comprises the following steps of:
respectively splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor to obtain a multi-scale fusion query quantity;
and performing self-attention calculation and nonlinear transformation on the multi-scale fusion query volume, performing dot product operation with the fusion feature map with the largest scale to obtain segmentation results with different scales, and accumulating the segmentation results with different scales to obtain a final image instance segmentation result.
2. The method of claim 1, wherein the acquiring the multi-scale feature map of the image to be segmented comprises:
acquiring an original image to be segmented;
and carrying out convolution calculation and downsampling on the original image in sequence by adopting a maximum pooling method to obtain a multi-scale feature map.
3. The method according to claim 1 or 2, wherein the upsampling the minimum-scale feature map of the multi-scale feature map multiple times to obtain a multi-scale upsampled feature map, and fusing the multi-scale feature map with the upsampled feature map of the corresponding scale to obtain a multi-scale fused feature map, includes:
continuously upsampling the minimum scale feature map of the multi-scale feature map for multiple times to obtain multi-scale upsampled feature maps, the number of which is the same as that of the multi-scale feature maps;
and respectively superposing the multi-scale feature map and the up-sampling feature map with corresponding scales, and performing convolution smoothing on the superposed multi-scale feature map to obtain a multi-scale fusion feature map.
4. An image segmentation apparatus based on multi-scale feature fusion coding, the apparatus comprising:
the feature extraction network is used for acquiring images and extracting multi-scale feature images;
the coder is used for carrying out up-sampling on the multi-scale feature map for a plurality of times to obtain a corresponding up-sampling feature map, fusing the multi-scale feature map with the up-sampling feature map with a corresponding scale, and sequentially coding the fused multi-scale feature map to generate a multi-scale embedded tensor; the encoding of the multi-scale fusion feature map in turn generates a multi-scale embedded tensor, comprising: respectively carrying out self-attention calculation on the multi-scale fusion feature images to obtain corresponding initial embedded tensors;
respectively carrying out two linear transformations on the multi-scale initial embedding tensor, and carrying out nonlinear ReLU activation in the middle of the two linear transformations to generate a final multi-scale embedding tensor;
the multi-scale feature decoder is configured to decode the multi-scale embedded tensor according to a learnable query quantity to obtain a multi-scale mask tensor and a multi-scale key corner tensor, and specifically includes: performing self-attention computation and nonlinear transformation on the multi-scale embedded tensors respectively to obtain corresponding first output, wherein the query quantity, key and value of the self-attention computation are all corresponding multi-scale embedded tensors;
performing cross attention calculation and nonlinear transformation on the first output to obtain a corresponding second output; the query quantity in the cross attention calculation is a parameter quantity which can be learned, and the key and the value are the first output for carrying out self attention calculation corresponding to the multi-scale embedded tensor;
respectively carrying out dot product operation on the second output and the fusion feature map with the largest scale to obtain a multi-scale mask tensor and a multi-scale key corner tensor;
the contour decoder is configured to re-decode the multi-scale embedded tensor by using the multi-scale mask tensor as a query quantity to obtain a multi-scale contour tensor, and specifically includes: respectively taking the multi-scale mask tensor as query quantity, wherein keys and values correspond to the multi-scale embedded tensor, respectively carrying out cross attention calculation on the multi-scale embedded tensor, carrying out nonlinear transformation on cross attention calculation output, and carrying out dot product operation on a nonlinear transformation result and a fused feature map with the largest scale to obtain a multi-scale contour tensor;
the fusion decoder is used for splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor into multi-scale fusion query quantity, and encoding the multi-scale fusion tensor to obtain a final image segmentation result, and specifically comprises the following steps:
respectively splicing the multi-scale mask tensor, the multi-scale key corner tensor and the multi-scale outline tensor to obtain a multi-scale fusion query quantity;
and performing self-attention calculation and nonlinear transformation on the multi-scale fusion query volume, performing dot product operation with the fusion feature map with the largest scale to obtain segmentation results with different scales, and accumulating the segmentation results with different scales to obtain a final image instance segmentation result.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 3 when the program is executed by the processor.
6. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements a method according to any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311529949.6A CN117314938B (en) | 2023-11-16 | 2023-11-16 | Image segmentation method and device based on multi-scale feature fusion decoding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311529949.6A CN117314938B (en) | 2023-11-16 | 2023-11-16 | Image segmentation method and device based on multi-scale feature fusion decoding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117314938A CN117314938A (en) | 2023-12-29 |
CN117314938B true CN117314938B (en) | 2024-04-05 |
Family
ID=89237565
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311529949.6A Active CN117314938B (en) | 2023-11-16 | 2023-11-16 | Image segmentation method and device based on multi-scale feature fusion decoding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117314938B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020216227A1 (en) * | 2019-04-24 | 2020-10-29 | 华为技术有限公司 | Image classification method and apparatus, and data processing method and apparatus |
CN115761222A (en) * | 2022-09-27 | 2023-03-07 | 阿里巴巴(中国)有限公司 | Image segmentation method, remote sensing image segmentation method and device |
CN116091942A (en) * | 2023-02-16 | 2023-05-09 | 中国科学院半导体研究所 | Feature enhancement and fusion small target detection method, device and equipment |
CN116597263A (en) * | 2023-05-12 | 2023-08-15 | 深圳亿嘉和科技研发有限公司 | Training method and related device for image synthesis model |
-
2023
- 2023-11-16 CN CN202311529949.6A patent/CN117314938B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020216227A1 (en) * | 2019-04-24 | 2020-10-29 | 华为技术有限公司 | Image classification method and apparatus, and data processing method and apparatus |
CN115761222A (en) * | 2022-09-27 | 2023-03-07 | 阿里巴巴(中国)有限公司 | Image segmentation method, remote sensing image segmentation method and device |
CN116091942A (en) * | 2023-02-16 | 2023-05-09 | 中国科学院半导体研究所 | Feature enhancement and fusion small target detection method, device and equipment |
CN116597263A (en) * | 2023-05-12 | 2023-08-15 | 深圳亿嘉和科技研发有限公司 | Training method and related device for image synthesis model |
Non-Patent Citations (2)
Title |
---|
Double-branch U-Net for multi-scale organ segmentation;Liu Yuhao et al.;《Methods》;20221231;第1-8页 * |
基于密集连接和Inception模块的前列腺图像分割;许瑶瑶 等;《电子测量技术》;20221231;第1-9页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117314938A (en) | 2023-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444881A (en) | Fake face video detection method and device | |
CN110929736B (en) | Multi-feature cascading RGB-D significance target detection method | |
CN111984772B (en) | Medical image question-answering method and system based on deep learning | |
CN112560831B (en) | Pedestrian attribute identification method based on multi-scale space correction | |
CN110705566B (en) | Multi-mode fusion significance detection method based on spatial pyramid pool | |
CN114418030B (en) | Image classification method, training method and device for image classification model | |
CN111881731A (en) | Behavior recognition method, system, device and medium based on human skeleton | |
CN114529982A (en) | Lightweight human body posture estimation method and system based on stream attention | |
CN114140831B (en) | Human body posture estimation method and device, electronic equipment and storage medium | |
CN113807361A (en) | Neural network, target detection method, neural network training method and related products | |
CN111325766A (en) | Three-dimensional edge detection method and device, storage medium and computer equipment | |
CN114283352A (en) | Video semantic segmentation device, training method and video semantic segmentation method | |
CN114756763A (en) | False news detection method and device for social network | |
CN117078930A (en) | Medical image segmentation method based on boundary sensing and attention mechanism | |
CN115797731A (en) | Target detection model training method, target detection model detection method, terminal device and storage medium | |
CN113743521B (en) | Target detection method based on multi-scale context awareness | |
CN114913342A (en) | Motion blurred image line segment detection method and system fusing event and image | |
CN114677536A (en) | Pre-training method and device based on Transformer structure | |
CN117314938B (en) | Image segmentation method and device based on multi-scale feature fusion decoding | |
CN111539435A (en) | Semantic segmentation model construction method, image segmentation equipment and storage medium | |
CN116229584A (en) | Text segmentation recognition method, system, equipment and medium in artificial intelligence field | |
CN116152334A (en) | Image processing method and related equipment | |
CN113780241A (en) | Acceleration method and device for detecting salient object | |
CN112966569B (en) | Image processing method and device, computer equipment and storage medium | |
Wyzykowski et al. | A Universal Latent Fingerprint Enhancer Using Transformers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |