CN113554657A - Super-pixel segmentation method and system based on attention mechanism and convolutional neural network - Google Patents

Super-pixel segmentation method and system based on attention mechanism and convolutional neural network Download PDF

Info

Publication number
CN113554657A
CN113554657A CN202110943140.2A CN202110943140A CN113554657A CN 113554657 A CN113554657 A CN 113554657A CN 202110943140 A CN202110943140 A CN 202110943140A CN 113554657 A CN113554657 A CN 113554657A
Authority
CN
China
Prior art keywords
superpixel
pixel
segmentation
super
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110943140.2A
Other languages
Chinese (zh)
Inventor
王晶晶
栾振业
于子舒
任金雯
张立人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Aowande Information Technology Co ltd
Original Assignee
Shandong Aowande Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Aowande Information Technology Co ltd filed Critical Shandong Aowande Information Technology Co ltd
Priority to CN202110943140.2A priority Critical patent/CN113554657A/en
Publication of CN113554657A publication Critical patent/CN113554657A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a superpixel segmentation method and system based on an attention mechanism and a convolutional neural network. Second, the resulting crush and motivating network is trained end-to-end. The superpixels for a particular task are then learned with a flexible loss function. And finally, the superpixel segmentation of the image can be carried out through the trained network, the dimensionality is greatly reduced, and some abnormal pixels are eliminated. A better segmentation result can be obtained through an algorithm based on an attention mechanism and the convolution neural network superpixel segmentation, and a method with obvious advantages is provided for the field of image superpixel segmentation.

Description

Super-pixel segmentation method and system based on attention mechanism and convolutional neural network
Technical Field
The disclosure belongs to the technical field of image processing, and particularly relates to a superpixel segmentation method and system based on an attention mechanism and a convolutional neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Superpixel segmentation is an important pre-processing step in computer image processing. In computer vision, superpixels are large elements that are more representative of pixels with similar characteristics. This new element will become the basic unit for other image processing algorithms. It not only can greatly reduce the size, but also can eliminate some abnormal pixels. A large number of experimental analyses show that the super-pixel segmentation method based on deep learning is superior to the existing super-pixel algorithm on the traditional segmentation reference, and can also learn the super-pixels for other tasks. Furthermore, the deep learning network can be easily integrated into a downstream deep network, thereby improving performance. Currently, due to its representativeness and computational efficiency, superpixels have been widely used in computer vision algorithms such as object detection, semantic segmentation, saliency estimation, optical flow estimation, depth estimation, tracking, and the like.
The inventor finds that the existing super pixel segmentation method has the following defects:
(1) most gradient-based superpixels start with an initial clustering of pixels, with the clusters being iteratively updated by gradient changes until certain criteria are met to form the superpixel.
(2) Slic (simple Linear Iterative clustering) is a K-means clustering based superpixel segmentation algorithm that, while able to control the number and degree of compaction of superpixel blocks. However, this method only considers the color and coordinate relationship between the pixel point and the seed point, and does not consider the relationship between the pixel point and the boundary, so the fitting degree to the image boundary is not good.
(3) DB-SCAN (Density-based Spatial Clustering of Applications with Noise) is a superpixel segmentation algorithm based on Density Clustering. The DB-SCAN clustering algorithm can find clusters in any shapes, so that the method has good segmentation potential on objects with complex and irregular shapes, but does not consider the spatial relationship between pixel points and seed points, so that the shape of the super pixel is irregular.
Disclosure of Invention
The super-pixel segmentation method and the super-pixel segmentation system based on the attention mechanism and the convolutional neural network are provided for solving the problems.
According to a first aspect of the embodiments of the present disclosure, there is provided a superpixel segmentation method based on an attention mechanism and a convolutional neural network, including:
acquiring an image to be subjected to superpixel segmentation;
inputting the image into a pre-trained super-pixel segmentation model to obtain a predicted super-pixel association diagram, and determining an image super-pixel segmentation result based on the super-pixel association diagram;
the super-pixel segmentation model is designed by adopting an encoder-decoder, the encoder comprises a plurality of convolution layers, and the images generate feature maps with different scales through the convolution layers at different levels in the encoder; the decoder comprises a plurality of deconvolution layers, and the feature maps generated by the convolution layers of different levels in the encoder are transmitted to the deconvolution layers of corresponding levels in the decoder through jump connection; meanwhile, an attention module is arranged in front of the input of each deconvolution layer in the decoder.
Further, the attention module adopts an SE-Net attention module which comprises a squeezing operation and an excitation operation, wherein the squeezing operation generates a channel descriptor by gathering the feature maps in a spatial dimension; the excitation operation uses a gating mechanism, and takes the embedding of the global distribution generated by the channel descriptor as an input to obtain a set of modulation weights of each channel.
Further, the determining a super-pixel segmentation result of the image based on the super-pixel correlation map specifically includes: obtaining a predicted superpixel association graph through the superpixel segmentation model, wherein the superpixel association graph determines the probability of each pixel being allocated to different grid units on the basis of a soft association graph instead of actual hard pixel allocation; the superpixel segmentation result is obtained by assigning each pixel to a grid cell having the highest probability.
Further, the loss function adopted in the super-pixel segmentation model training process comprises two parts, wherein the first part is used for combining pixels with similar attributes; the second part is used to enforce constraints on the superpixel to remain compact in space.
According to a second aspect of the embodiments of the present disclosure, there is provided a superpixel segmentation system based on an attention mechanism and a convolutional neural network, including:
a data acquisition unit for acquiring an image to be subjected to superpixel segmentation;
a superpixel segmentation unit, which is used for inputting the image into a pre-trained superpixel segmentation model, obtaining a predicted superpixel association diagram, and determining an image superpixel segmentation result based on the superpixel association diagram;
the super-pixel segmentation model is designed by adopting an encoder-decoder, the encoder comprises a plurality of convolution layers, and the images generate feature maps with different scales through the convolution layers at different levels in the encoder; the decoder comprises a plurality of deconvolution layers, and the feature maps generated by the convolution layers of different levels in the encoder are transmitted to the deconvolution layers of corresponding levels in the decoder through jump connection; meanwhile, an attention module is arranged in front of the input of each deconvolution layer in the decoder.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the memory, wherein the processor implements the method for superpixel segmentation based on an attention mechanism and a convolutional neural network when executing the program.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method for superpixel segmentation based on an attention mechanism and a convolutional neural network.
Compared with the prior art, the beneficial effect of this disclosure is:
(1) the scheme of the disclosure provides a superpixel segmentation method based on an attention mechanism and a convolutional neural network, and the scheme is characterized in that a superpixel segmentation model is constructed by introducing an SE-Net attention module into the convolutional neural network, the scheme utilizes extrusion and excitation networks generated by SE-Net to carry out end-to-end training, and utilizes a flexible loss function to learn superpixels of a specific task; and finally, the superpixel segmentation of the image can be carried out through the trained network, the dimensionality is greatly reduced, and some abnormal pixels are eliminated.
(2) Compared with the traditional FCN convolution, the scheme disclosed by the invention has the advantages that the attention module is added into the convolutional neural network, the dependency relationship between the channels can be better simulated, and the characteristic response value of each channel can be adaptively adjusted. Adding an attention module to the network adds only a small amount of computational overhead, but can greatly improve network performance.
(3) The scheme of the disclosure can effectively improve the efficiency and accuracy of superpixel segmentation by finding the associated scores between image pixels and regular grid cells, directly predicting the scores by using a squeezing and excitation network, and obtaining a final superpixel segmentation result by allocating each pixel to the regular grid cell with the highest probability to obtain the superpixel.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a schematic diagram illustrating an overall network structure of a superpixel segmentation method based on an attention mechanism and a convolutional neural network according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a super-pixel squeeze and excitation network structure according to a first embodiment of the disclosure;
FIG. 3 is a schematic diagram illustrating a configuration of a squeeze and fire module according to a first embodiment of the present disclosure;
FIG. 4 is a diagram illustrating the result of super-pixel segmentation on a BSDS500s data set according to an embodiment of the disclosure;
fig. 5 is a diagram illustrating a super-pixel segmentation result on the NYUv2 data set according to the first embodiment of the present disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The first embodiment is as follows:
the present embodiment is directed to a superpixel segmentation method based on an attention mechanism and a convolutional neural network.
A superpixel segmentation method based on an attention mechanism and a convolutional neural network comprises the following steps:
acquiring an image to be subjected to superpixel segmentation;
inputting the image into a pre-trained super-pixel segmentation model to obtain a predicted super-pixel association diagram, and determining an image super-pixel segmentation result based on the super-pixel association diagram;
the super-pixel segmentation model is designed by adopting an encoder-decoder, the encoder comprises a plurality of convolution layers, and the images generate feature maps with different scales through the convolution layers at different levels in the encoder; the decoder comprises a plurality of deconvolution layers, and the feature maps generated by the convolution layers of different levels in the encoder are transmitted to the deconvolution layers of corresponding levels in the decoder through jump connection; meanwhile, an attention module is arranged in front of the input of each deconvolution layer in the decoder.
Further, the attention module adopts an SE-Net (Squeeze-and-Excitation Networks) attention module which comprises a squeezing operation and an Excitation operation, wherein the squeezing operation generates a channel descriptor by gathering the feature maps in a spatial dimension; the excitation operation uses a gating mechanism, and takes the embedding of the global distribution generated by the channel descriptor as an input to obtain a set of modulation weights of each channel.
Further, the determining a super-pixel segmentation result of the image based on the super-pixel correlation map specifically includes: obtaining a predicted superpixel association graph through the superpixel segmentation model, wherein the superpixel association graph determines the probability of each pixel being allocated to different grid units on the basis of a soft association graph instead of actual hard pixel allocation; the superpixel segmentation result is obtained by assigning each pixel to a grid cell having the highest probability.
Further, the loss function adopted in the super-pixel segmentation model training process comprises two parts, wherein the first part is used for combining pixels with similar attributes; the second part is used to enforce constraints on the superpixel to remain compact in space.
Further, the predetermined loss function is specifically expressed as follows:
Figure BDA0003215647090000061
wherein p is a certain pixel point in the image, p 'is a certain pixel point in the reconstructed image, f (p) is the characteristic of the pixel point, f' (p) is the characteristic of the pixel point in the reconstructed image, dist () represents the difference between the reconstructed characteristic and the original characteristic and the difference between the reconstructed position and the original position, m is the weight for balancing the two items, and s is the sampling interval of the superpixel.
Specifically, for ease of understanding, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings:
as shown in fig. 1, the scheme of the present disclosure employs an encoder-decoder design to predict a superpixel association graph Q through a superpixel segmentation model with a skip connection and an attention module. The encoder takes a color image as input and generates a high-level feature map through a convolutional network. The decoder then gradually upsamples the feature map by deconvolution. An attention module is added before each deconvolution to pay more attention to the feature weight of each super pixel, so that the segmentation accuracy is improved and the final prediction is carried out.
For any given transformation that maps an input to a feature map, such as a convolution operation, the scheme described in this disclosure sets a corresponding attention module to perform feature recalibration prior to each convolution operation. In the embodiment, an SE-Net attention module is adopted, and the SE-Net attention module comprises a squeezing operation and an excitation operation, and the input features of the convolution operation are firstly subjected to the squeezing operation to gather feature maps in a spatial dimension to generate a channel descriptor, and the function of the descriptor is to generate the embedding of global distribution of channel feature responses, so that the information of the global receptive field of the network can be used by all layers of the network. The squeeze operation is followed by an excitation operation in the form of a simple gating mechanism that takes as input the embedding produced by the squeeze operation, producing a set of per-channel modulation weights. These weights are applied to the feature map to produce the output of the attention module, which can be directly input to subsequent layers of the network. Then, we can write the output as U ═ U1,u2,...,uC]Wherein:
Figure BDA0003215647090000062
wherein, represents the convolution of the data,
Figure BDA0003215647090000063
X=[X1,X2,...,XC'],uc∈RH×W
Figure BDA0003215647090000064
is a two-dimensional space kernel, representing vcActing on the corresponding channel of X; x is an input image, vcIs the c-th convolution kernel parameter, Vc 1Is the first channel parameter, V, of the c-th convolution kernelc 2The second channel parameter of the c-th convolution kernel, R is the image feature map, Vc C’Is the parameter of the channel C 'of the C-th convolution kernel, C is the number of convolution kernels, C' is the number of characteristic channels after convolution transformation, u1For the output of the convolution kernel 1 operation, u2For the output of the convolution kernel 2 operation, ucFor the output of the convolution kernel c operation, H is the height of the pixel and W is the width of the pixel.
Further, the scheme disclosed by the invention passes through a soft correlation diagram Q epsilon RH×W×|NP|(i.e., a superpixel association map that represents the probability that each pixel belongs to each superpixel) instead of the hard assignment G of pixels. For example:
Figure BDA0003215647090000071
where s is the center of the superpixel, NpAs a set of superpixels around a pixel, qs(p) denotes that one pixel p is assigned to each s e NpThe probability of (c). Finally, the superpixel is obtained by assigning each pixel to the grid cell with the highest probability: s*=arg maxsqs(p) of the formula (I). Different from a method based on a full convolution neural network, the method adds an attention mechanism in an original network structure, and can effectively improve the segmentation precision.
Further, the scheme of the present disclosure has good flexibility in terms of loss functions. In general, we mean by f (p) that we wish to exceedThe pixel attribute retained by the pixel, in this embodiment, f (p), includes a 3-dimensional CIELab color vector and an N-dimensional semantic tag one-time encoding vector, where N represents the number of classes. We further use the image coordinates p ═ x, y]TIndicating the location of one pixel. In view of the predicted superpixel correlation map Q, we can compute the center of any superpixel, where usAs an attribute vector,/sThe position vector is specifically expressed as follows:
Figure BDA0003215647090000072
Figure BDA0003215647090000073
where Np is the set of surrounding superpixels of p, qs(p) is the probability that p is associated with a super-pixel that the net predicts. In equations (2) and (3), each sum is made for all pixels that may be assigned to s. The reconstruction property and position of any pixel p is then given by the following formula:
Figure BDA0003215647090000081
finally, our general expression of the loss function has two terms. The first term encourages trained models to combine pixels with similar attributes, and the second term enforces that superpixels are spatially compact.
Figure BDA0003215647090000082
And training the superpixel segmentation model based on the determined loss function, and realizing the superpixel segmentation of the image by using the trained model.
Further, to verify the advantages of the present disclosure for image superpixel segmentation, the solution described in the present embodiment of the present disclosure was subjected to a large number of superpixel segmentation experiments on BSDS500s dataset and NYUv2 dataset. Firstly, training a model on a training set in a data set, then verifying a verification set, and testing a test set. The experimental results are shown in fig. 4 and 5, respectively. As can be seen from fig. 4 and 5, the convolution neural network superpixel segmentation method based on the attention mechanism established in the present disclosure can achieve a good effect on the superpixel segmentation of the image, and the superpixel segmentation of the image can not only greatly reduce the image dimensionality, but also eliminate some abnormal pixels. The super-pixel segmentation method based on the attention mechanism is effective, improves the calculation efficiency in the fields of subsequent object detection, semantic segmentation, significance estimation, optical flow estimation, depth estimation, tracking and the like, and has certain practicability.
Example two:
the present embodiment is directed to a superpixel segmentation system based on attention mechanism and convolutional neural network.
A superpixel segmentation system based on an attention mechanism and a convolutional neural network, comprising:
a data acquisition unit for acquiring an image to be subjected to superpixel segmentation;
a superpixel segmentation unit, which is used for inputting the image into a pre-trained superpixel segmentation model, obtaining a predicted superpixel association diagram, and determining an image superpixel segmentation result based on the superpixel association diagram;
the super-pixel segmentation model is designed by adopting an encoder-decoder, the encoder comprises a plurality of convolution layers, and the images generate feature maps with different scales through the convolution layers at different levels in the encoder; the decoder comprises a plurality of deconvolution layers, and the feature maps generated by the convolution layers of different levels in the encoder are transmitted to the deconvolution layers of corresponding levels in the decoder through jump connection; meanwhile, an attention module is arranged in front of the input of each deconvolution layer in the decoder.
Further, the attention module adopts an SE-Net attention module which comprises a squeezing operation and an excitation operation, wherein the squeezing operation generates a channel descriptor by gathering the feature maps in a spatial dimension; the excitation operation uses a gating mechanism, and takes the embedding of the global distribution generated by the channel descriptor as an input to obtain a set of modulation weights of each channel.
Further, the determining a super-pixel segmentation result of the image based on the super-pixel correlation map specifically includes: obtaining a predicted superpixel association graph through the superpixel segmentation model, wherein the superpixel association graph determines the probability of each pixel being allocated to different grid units on the basis of a soft association graph instead of actual hard pixel allocation; the superpixel segmentation result is obtained by assigning each pixel to a grid cell having the highest probability. In further embodiments, there is also provided:
an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment one. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits AS ic, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The super-pixel segmentation method and the super-pixel segmentation system based on the attention mechanism and the convolutional neural network can be realized, and have wide application prospects.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A superpixel segmentation method based on an attention mechanism and a convolutional neural network is characterized by comprising the following steps:
acquiring an image to be subjected to superpixel segmentation;
inputting the image into a pre-trained super-pixel segmentation model to obtain a predicted super-pixel association diagram, and determining an image super-pixel segmentation result based on the super-pixel association diagram;
the super-pixel segmentation model is designed by adopting an encoder-decoder, the encoder comprises a plurality of convolution layers, and the images generate feature maps with different scales through the convolution layers at different levels in the encoder; the decoder comprises a plurality of deconvolution layers, and the feature maps generated by the convolution layers of different levels in the encoder are transmitted to the deconvolution layers of corresponding levels in the decoder through jump connection; meanwhile, an attention module is arranged in front of the input of each deconvolution layer in the decoder.
2. The method of claim 1, wherein the attention module employs an SE-Net attention module comprising a squeeze operation and an excitation operation, the squeeze operation generating a channel descriptor by clustering feature maps in spatial dimensions; the excitation operation uses a gating mechanism, and takes the embedding of the global distribution generated by the channel descriptor as an input to obtain a set of modulation weights of each channel.
3. The method for superpixel segmentation based on an attention mechanism and a convolutional neural network as claimed in claim 1, wherein said determining an image superpixel segmentation result based on said superpixel association map specifically comprises: obtaining a predicted superpixel association graph through the superpixel segmentation model, wherein the superpixel association graph determines the probability of each pixel being allocated to different grid units on the basis of a soft association graph instead of actual hard pixel allocation; the superpixel segmentation result is obtained by assigning each pixel to a grid cell having the highest probability.
4. The superpixel segmentation method based on the attention mechanism and the convolutional neural network as claimed in claim 1, wherein the loss function adopted in the superpixel segmentation model training process comprises two parts, the first part is used for combining pixels with similar attributes; the second part is used to enforce constraints on the superpixel to remain compact in space.
5. The method of claim 4, wherein the predetermined loss function is expressed as follows:
Figure FDA0003215647080000021
wherein p is a certain pixel point in the image, p 'is a certain pixel point in the reconstructed image, f (p) is the characteristic of the pixel point, f' (p) is the characteristic of the pixel point in the reconstructed image, dist () represents the difference between the reconstructed characteristic and the original characteristic and the difference between the reconstructed position and the original position, m is the weight for balancing the two items, and s is the sampling interval of the superpixel.
6. A superpixel segmentation system based on an attention mechanism and a convolutional neural network, comprising:
a data acquisition unit for acquiring an image to be subjected to superpixel segmentation;
a superpixel segmentation unit, which is used for inputting the image into a pre-trained superpixel segmentation model, obtaining a predicted superpixel association diagram, and determining an image superpixel segmentation result based on the superpixel association diagram;
the super-pixel segmentation model is designed by adopting an encoder-decoder, the encoder comprises a plurality of convolution layers, and the images generate feature maps with different scales through the convolution layers at different levels in the encoder; the decoder comprises a plurality of deconvolution layers, and the feature maps generated by the convolution layers of different levels in the encoder are transmitted to the deconvolution layers of corresponding levels in the decoder through jump connection; meanwhile, an attention module is arranged in front of the input of each deconvolution layer in the decoder.
7. The system of claim 6, comprising:
the attention module adopts an SE-Net attention module which comprises a squeezing operation and an excitation operation, wherein the squeezing operation generates a channel descriptor by gathering feature maps in a spatial dimension; the excitation operation uses a gating mechanism, and takes the embedding of the global distribution generated by the channel descriptor as an input to obtain a set of modulation weights of each channel.
8. The super-pixel segmentation system based on the attention mechanism and the convolutional neural network as claimed in claim 6, wherein the determining the image super-pixel segmentation result based on the super-pixel correlation map specifically comprises: obtaining a predicted superpixel association graph through the superpixel segmentation model, wherein the superpixel association graph determines the probability of each pixel being allocated to different grid units on the basis of a soft association graph instead of actual hard pixel allocation; the superpixel segmentation result is obtained by assigning each pixel to a grid cell having the highest probability.
9. An electronic device comprising a memory, a processor and a computer program stored and executed on the memory, wherein the processor when executing the program implements a method of superpixel segmentation based on an attention and convolutional neural network as claimed in any one of claims 1-5.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method for superpixel segmentation based on an attention-driven and convolutional neural network as claimed in any one of claims 1 to 5.
CN202110943140.2A 2021-08-17 2021-08-17 Super-pixel segmentation method and system based on attention mechanism and convolutional neural network Pending CN113554657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110943140.2A CN113554657A (en) 2021-08-17 2021-08-17 Super-pixel segmentation method and system based on attention mechanism and convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110943140.2A CN113554657A (en) 2021-08-17 2021-08-17 Super-pixel segmentation method and system based on attention mechanism and convolutional neural network

Publications (1)

Publication Number Publication Date
CN113554657A true CN113554657A (en) 2021-10-26

Family

ID=78133926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110943140.2A Pending CN113554657A (en) 2021-08-17 2021-08-17 Super-pixel segmentation method and system based on attention mechanism and convolutional neural network

Country Status (1)

Country Link
CN (1) CN113554657A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037715A (en) * 2021-11-09 2022-02-11 北京字节跳动网络技术有限公司 Image segmentation method, device, equipment and storage medium
CN115886839A (en) * 2022-12-19 2023-04-04 广州华见智能科技有限公司 Diagnosis system based on brain wave analysis
CN117349545A (en) * 2023-12-04 2024-01-05 中国电子科技集团公司第五十四研究所 Target space-time distribution prediction method based on environment constraint grid

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114037715A (en) * 2021-11-09 2022-02-11 北京字节跳动网络技术有限公司 Image segmentation method, device, equipment and storage medium
CN115886839A (en) * 2022-12-19 2023-04-04 广州华见智能科技有限公司 Diagnosis system based on brain wave analysis
CN117349545A (en) * 2023-12-04 2024-01-05 中国电子科技集团公司第五十四研究所 Target space-time distribution prediction method based on environment constraint grid

Similar Documents

Publication Publication Date Title
WO2020177651A1 (en) Image segmentation method and image processing device
CN113554657A (en) Super-pixel segmentation method and system based on attention mechanism and convolutional neural network
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN112529146B (en) Neural network model training method and device
Kong et al. Pixel-wise attentional gating for scene parsing
Kong et al. Pixel-wise attentional gating for parsimonious pixel labeling
CN111046939A (en) CNN (CNN) class activation graph generation method based on attention
CN112232355B (en) Image segmentation network processing method, image segmentation device and computer equipment
JP2023523029A (en) Image recognition model generation method, apparatus, computer equipment and storage medium
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
US10733498B1 (en) Parametric mathematical function approximation in integrated circuits
CN113179421B (en) Video cover selection method and device, computer equipment and storage medium
CN109523546A (en) A kind of method and device of Lung neoplasm analysis
US20210350230A1 (en) Data dividing method and processor for convolution operation
Saltori et al. Gipso: Geometrically informed propagation for online adaptation in 3d lidar segmentation
WO2020062299A1 (en) Neural network processor, data processing method and related device
CN114022359A (en) Image super-resolution model training method and device, storage medium and equipment
CN114067389A (en) Facial expression classification method and electronic equipment
CN115457492A (en) Target detection method and device, computer equipment and storage medium
Wang et al. Superpixel segmentation with squeeze-and-excitation networks
CN113158970B (en) Action identification method and system based on fast and slow dual-flow graph convolutional neural network
CN109711315A (en) A kind of method and device of Lung neoplasm analysis
CN112598663B (en) Grain pest detection method and device based on visual saliency
CN115082840A (en) Action video classification method and device based on data combination and channel correlation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination