CN114119993A - Salient object detection method based on self-attention mechanism - Google Patents

Salient object detection method based on self-attention mechanism Download PDF

Info

Publication number
CN114119993A
CN114119993A CN202111278451.8A CN202111278451A CN114119993A CN 114119993 A CN114119993 A CN 114119993A CN 202111278451 A CN202111278451 A CN 202111278451A CN 114119993 A CN114119993 A CN 114119993A
Authority
CN
China
Prior art keywords
features
feature
self
convolution
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111278451.8A
Other languages
Chinese (zh)
Inventor
陈福康
孙凤铭
袁夏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202111278451.8A priority Critical patent/CN114119993A/en
Publication of CN114119993A publication Critical patent/CN114119993A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a salient object detection method based on a self-attention mechanism, which specifically comprises the following steps: performing feature extraction on an input picture by utilizing a convolutional neural network to generate a group of feature maps, wherein the feature maps comprise a shallow feature map and a deep feature map, and each feature map has semantic information with different scales; fusing the shallow feature maps to generate low-level integrated features, and merging the deep feature maps to form high-level integrated features; constructing a self-attention module based on a self-attention mechanism, inputting low-level integrated features and high-level integrated features into the self-attention module, respectively capturing the features in the high-level features and the low-level features, and exchanging semantic information to form a dependency relationship; and (4) strengthening the obtained features through a multi-scale feature strengthening module, and sending the fused and strengthened features into a cascade decoder to generate a final remarkable target detection map. The invention reduces the dependence on external information, is better at capturing the internal correlation of data or characteristics, can accurately position the obvious target and improves the detection efficiency of the target.

Description

Salient object detection method based on self-attention mechanism
Technical Field
The invention relates to the technical field of computer machine vision, in particular to a salient object detection method based on a self-attention mechanism.
Background
Computer vision is the computer used to realize the human visual function-perception, identification and understanding of the three-dimensional scene of the objective world. The human eye has a mechanism capable of rapidly detecting the surrounding environment, filtering secondary information and positioning a main target in a scene, which is called a human eye visual attention mechanism, in the field of computer vision, the attention mechanism for understanding and simulating the human visual system obtains great attention of academia, and displays a wide application prospect. Research has shown that the human visual system draws more attention to certain objects in an image, which are referred to as salient objects.
The saliency detection of the image means that a computer is utilized to simulate a human eye visual attention mechanism, a set of complete image saliency detection models are established, so that a human eye gazing area in the image is accurately and quickly detected, and the saliency detection is represented in a saliency map topographic form. The purpose of salient object detection is to highlight the most visually distinctive parts of an image, but salient object detection requires a computer to have a deep understanding of the semantics of the entire image and the detailed structure of objects. Traditionally, the saliency detection is carried out by adopting a manual feature mode, but the manually made features cannot capture high-level semantics, and the saliency prediction cannot achieve satisfactory results. With the application of convolutional neural networks and the generation of high-quality data sets, significant object detection based on deep learning has been substantially advanced.
The full convolution neural network is a main method for solving the problem of obvious target detection at present, and the full convolution neural network superposes a plurality of convolution and pooling layers, so that the receptive field is gradually increased and high-level semantics are generated, and the full convolution neural network plays an important role in the obvious target detection. For the complicated Semantic information in the Image, a method based on a full Convolution neural network is proposed to enhance the detection capability, for example, in document 1(Chen L C, Papandrou G, Kokkinos I, et al. Deep Lab: magnetic Image Segmentation with Deep computational networks, atom fusion, and fusion Connected CRFs [ J ]. IEEE Transactions on Pattern Analysis and Machine integration, 2018,40(4):834-848.) is enhanced by multi-scale context fusion; document 2(Ding h. context-controlled Feature and Gated Multi-scale Aggregation for Scene Segmentation [ C ]// IEEE/CVF Conference on Computer Vision & Pattern registration. IEEE,2018.) proposes an encoder-decoder architecture to fuse Multi-scale semantic features.
The nature of attention mechanism comes from human visual attention mechanism, while self-attention mechanism is one of the attention mechanisms and is also an important component in transform, document 3(a. vaswani, n.shazer, n.parmar, j.uszkoreoit, l.jones, a.n.gomez, and
Figure BDA0003330419180000021
this mechanism is described in detail in Kaiser, "Attention is all you needed," in Advances in Neural Information Processing Systems (NIPS),2017, pp.6000-6010.), which is characterized by the direct calculation of dependencies regardless of the distance between features. This aspect has achieved certain results, such as in document 4(Tan Z, Wang M, Xie J, et al deep semiconductor roll laboratory with Self-orientation [ J)]2017.), document 5(Verga P, Strubell E, Mccallium A. Simultaneousself-addressing to All matters for Full-Abstract Biological Extraction [ J]2018), etc.
However, under the challenging conditions of small size of the salient object, complex semantic information, low background contrast, and the like, the existing method still cannot accurately predict the salient object due to the lack of semantic information and weak feature dependence, so that the final salient object detection effect is not good.
Disclosure of Invention
The invention aims to provide a salient object detection method based on a self-attention mechanism, which improves the salient object detection effect by fully utilizing image semantic information.
The technical solution for realizing the purpose of the invention is as follows: a salient object detection method based on a self-attention mechanism comprises the following steps:
step 1, performing feature extraction on an input picture by utilizing a convolutional neural network to generate a group of feature maps, wherein the feature maps comprise a shallow feature map and a deep feature map, and each feature map has semantic information with different scales;
step 2, fusing the shallow feature maps to generate low-level integrated features, and merging the deep feature maps to form high-level integrated features;
step 3, constructing a self-attention module based on a self-attention mechanism, inputting low-level integrated features and high-level integrated features into the self-attention module, respectively capturing the features in the high-level features and the low-level features, and exchanging semantic information to form a dependency relationship;
and 4, reinforcing the obtained features through a multi-scale feature reinforcing module, and sending the fused and reinforced features into a cascade decoder to generate a final remarkable target detection map.
Further, the convolutional neural network in step 1 is specifically as follows:
selecting a ResNet-101 convolutional neural network as a picture feature extractor, wherein the ResNet-101 convolutional neural network comprises 5 convolution groups, each convolution group comprises a convolution calculation process, each convolution group comprises a downsampling operation, the first convolution group only comprises 1 convolution calculation operation, and the 2 nd to 5 th convolution groups comprise a plurality of identical residual error units and discard the last global pooling layer and the full connection layer.
Further, the feature integration in step 2 is specifically as follows:
fusing the multiple feature maps into a low-level integrated feature through a Concat operation by the shallow feature map, and merging the multiple feature maps into a high-level integrated feature through the Concat operation by the deep feature map; the generated high-level integrated features provide a large amount of semantic information, and the low-level integrated features contain spatial details which help to refine the boundary of the object, and the two contain complementary information.
Further, in step 3, a self-attention module is constructed based on a self-attention mechanism, specifically as follows:
the self-attention module comprises two convolution layers of 1 × 1 and six convolution layers of 3 × 3, two features are subjected to dimensionality reduction through embedding the convolution layers and are converted into queries, keys and values, remodeling matrixes and pooling operations are simultaneously carried out, then long-distance space-time interaction is carried out through dot product attention calculation among the features with different time step sizes to obtain a convolution diagram of 1 × 1, and finally the convolution diagram and the original features are subjected to element-level summation operation to obtain a final representation of long-distance context information.
Further, the multi-scale feature enhancing module in step 4 specifically includes:
the multi-scale feature enhancement module comprises 1 group of 1 × 1 convolution layers and 3 × 3 convolution layers, the number of passages of the 3 × 3 convolution layers is 2, 4 and 6 respectively, a Concat operation is carried out on a feature diagram after dimension reduction of the 3 × 3 convolution layers, and then point multiplication is carried out on the feature diagram after dimension reduction of the 1 × 1 convolution layers to form an enhancement diagram;
the multi-scale feature enhancement module is used for extracting spatial information of different scales, expanding a receptive field by utilizing cavity convolution, combining the multi-scale information to obtain a feature information output with optimized fusion, and then predicting two groups of features by utilizing a cascade decoding mode to form a prediction graph.
Compared with the prior art, the invention has the following remarkable advantages: (1) the semantic information and the spatial edge information are exchanged and fused by effectively utilizing a self-attention mechanism, and the final representation of the long-distance context information is obtained by calculating in a dot product attention mode; (2) meanwhile, a multi-scale feature enhancement module is utilized to carry out cavity convolution on the feature map under different expansion rates, so that the receptive field is effectively expanded on the premise of ensuring semantic information, and the efficiency of detecting the obvious target is improved; (3) finally, a final prediction graph is obtained by utilizing a cascade decoder, and experiments prove that the method has better effects on 3 evaluation indexes of 5 public data sets, and the method is superior to the current front-edge significant target detection method.
Drawings
FIG. 1 is an overall framework diagram of a neural network model based on the self-attention mechanism in the present invention.
FIG. 2 is a block diagram of the self-attention module of the present invention.
FIG. 3 is a diagram of a multi-scale feature enhancement method according to the present invention.
FIG. 4 is a block diagram of a multi-scale feature enhancement module according to the present invention
Detailed Description
The invention relates to a salient object detection method based on a self-attention mechanism, which comprises the following steps of:
step 1, performing feature extraction on an input picture by utilizing a convolutional neural network to generate a group of feature maps, wherein the feature maps comprise a shallow feature map and a deep feature map, and each feature map has semantic information with different scales;
step 2, fusing the shallow feature maps to generate low-level integrated features, and merging the deep feature maps to form high-level integrated features;
step 3, constructing a self-attention module based on a self-attention mechanism, inputting low-level integrated features and high-level integrated features into the self-attention module, respectively capturing the features in the high-level features and the low-level features, and exchanging semantic information to form a dependency relationship;
and 4, reinforcing the obtained features through a multi-scale feature reinforcing module, and sending the fused and reinforced features into a cascade decoder to generate a final remarkable target detection map.
As a specific example, step 1 is specifically as follows: the input pictures are subjected to feature extraction by utilizing a convolutional neural network, and a group of feature maps R1, R2, R3, R4 and R5 are generated, wherein the feature maps comprise low-level feature maps and high-level feature maps which have semantic information with different scales. Then we fuse the shallow feature maps to generate low level integration features L and merge the deep feature maps to form high level integration features R. And inputting the integrated features into a self-attention module based on a self-attention mechanism, and then strengthening the obtained features through a multi-scale feature strengthening module to finally obtain a prediction graph.
The convolutional neural network is specifically as follows:
selecting a ResNet-101 convolutional neural network as a picture feature extractor, wherein the ResNet-101 convolutional neural network comprises 5 convolution groups, each convolution group comprises a convolution calculation process, each convolution group comprises a downsampling operation, the first convolution group only comprises 1 convolution calculation operation, and the 2 nd to 5 th convolution groups comprise a plurality of identical residual error units and discard the last global pooling layer and the full connection layer.
As a specific example, step 2 is specifically as follows: the high-level integrated features H formed by step 1 provide a large amount of semantic information, the low-level integrated features L contain spatial details that help refine the object boundaries, and H and L contain complementary information. The self-attention module utilizes the characteristics that the self-attention machine reduces the dependence on external information and is better at capturing data or the internal correlation of features, two features are subjected to dimensionality reduction through embedding convolution layers, long-distance space-time interaction is carried out through point product attention calculation between the features with different time step lengths, and finally the result and the original features are subjected to element-level summation operation to obtain final representations F 'and L' of long-distance context information.
The method comprises the following steps of feature integration:
fusing the multiple feature maps into a low-level integrated feature through a Concat operation by the shallow feature map, and merging the multiple feature maps into a high-level integrated feature through the Concat operation by the deep feature map; the generated high-level integrated features provide a large amount of semantic information, and the low-level integrated features contain spatial details which help to refine the boundary of the object, and the two contain complementary information.
As a specific example, step 3 is specifically as follows: high-level integration features provide a large amount of semantic information, and low-level integration features contain spatial details that help refine object boundaries, both of which contain complementary information. The self-attention module utilizes the characteristics that the self-attention machine reduces the dependence on external information and is better at capturing data or the internal correlation of features, two features are subjected to dimensionality reduction through embedding convolution layers, long-distance space-time interaction is carried out through point product attention calculation among the features with different time step lengths, and finally the result and the original features are subjected to element-level summation operation to obtain the final representation of the long-distance context information.
The self-attention module comprises two convolution layers of 1 × 1 and six convolution layers of 3 × 3, two features are subjected to dimensionality reduction through embedding the convolution layers and are converted into queries, keys and values, remodeling matrixes and pooling operations are simultaneously carried out, then long-distance space-time interaction is carried out through dot product attention calculation among the features with different time step sizes to obtain a convolution diagram of 1 × 1, and finally the convolution diagram and the original features are subjected to element-level summation operation to obtain a final representation of long-distance context information.
As a specific example, step 4 is specifically as follows: after the feature extraction in steps 1 and 2 and the feature dot product attention calculation in step 3, F 'and L' are required to be sent to a multi-scale feature enhancement module, the multi-scale feature enhancement module is used for extracting spatial information of different scales, the receptive field is enlarged by utilizing cavity convolution, multi-scale information is combined to obtain a feature information output which is fused and optimized, and then two groups of features are predicted by utilizing a cascade decoding mode to form a prediction graph.
The multi-scale feature enhancing module specifically comprises the following components:
the multi-scale feature enhancement module comprises 1 group of 1 × 1 convolution layers and 3 × 3 convolution layers, the number of passages of the 3 × 3 convolution layers is 2, 4 and 6 respectively, the feature map subjected to dimension reduction by the 3 × 3 convolution layers is subjected to Concat operation, and then the feature map subjected to dot multiplication by the feature map subjected to the 1 × 1 convolution layer is formed into the enhancement map.
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Examples
As shown in fig. 1 to 4, the salient object detection method based on the self-attention mechanism of the present invention mainly includes the following steps:
step a: and extracting multi-scale convolution characteristics by using a multi-scale characteristic extraction module.
Selecting ResNet-101 as a picture feature extractor, selecting five convolution modules after ResNet-101, and discarding the last global pooling layer and full connection layer to adapt to the actual requirements of us to obtain multi-scale feature maps R1, R2, R3, R4 and R5, wherein the feature maps contain low-level details and high-level feature maps and have different-scale level semantic information. We fuse shallow feature maps to generate low-level integration features (denoted as L) and merge deep feature maps together to form high-level integration features (denoted as H).
Step b: a self-attention module is utilized to capture features in the high-level features and the low-level features respectively and exchange semantic information to form the dependency relationships.
A great deal of semantic information is provided in the high-level integrated features H in the self-attention module, and the low-level integrated features L contain spatial details which are helpful for refining the boundary of an object, and the low-level integrated features L and the high-level integrated features H contain complementary information. Two characteristics are subjected to dimensionality reduction by embedding into a convolution layer and are respectively expressed as H epsilon RC*H*W、L∈RC*H*WUsing convolutional layers to convert features into queries q, keys k, values v, denoted H, respectivelyq、Hk、Hv、Lq、Lk、LvTo make embedding once cost effective and able to capture spatial information, we use 1 x 1 convolution to reduce the dimensionality of the semantic channel for query, key, and value embedding layers, and then use 3 x 3 convolution layers to extract spatial information. We then achieve dot product attention between the two features to capture the long range relationship of the two features. The calculation process is as follows:
DA(Hq,Lk,Lv)=softmax((Hq)TLk)(Lv)T
DA(Lq,Hk,Hv)=softmax((Lq)THk)(Hv)T
the interaction between the query embedding layer and the key embedding layer forms a spatial attention matrix, the matrix models the spatial relationship between any two pixels of the characteristics, then, the attention matrix and the value embedding layer carry out matrix multiplication, and element-level summation operation is carried out on the multiplication result and the original characteristics of the matrix to obtain the final representation of the long-distance context information to obtain H 'and L', and the calculation process is as follows:
Figure BDA0003330419180000061
Figure BDA0003330419180000062
step c: and c, performing multi-scale feature reinforcement on the feature map obtained in the step b, and then sending the reinforced features into a cascade decoder to generate a final salient target detection map.
Inputting the H 'and L' feature maps into a multi-scale feature enhancement module, outputting the H 'and L' feature maps into four convolution modules in parallel, performing average pooling on the first module, performing channel transformation on 1 × 1 convolution layers, performing cavity convolution on the second module to the fifth module, selecting the diffusivity of 2, 4 and 6, and then concat the output of the four modules to obtain the enhanced feature maps. And then the output characteristic diagram is subjected to cascade decoding to finally obtain a prediction diagram.
Description of experimental procedures and results:
the present invention first trains the proposed model using the DUTS-TR dataset. It contains 10533 images with high quality pixel-level annotations. The training set is increased by horizontal-vertical flipping and image cropping to alleviate the overfitting problem. And obtaining a final network model through pre-training and fine-tuning.
After training IS completed, the network model evaluates its performance in five reference data sets widely used in the significance detection field, ECCSD, DUT-OMRON, HKU-IS, PASCAL-S, and DUTS-TE, respectively. All of these datasets were manually labeled at the pixel level for quantitative evaluation. Evaluation indexes include precision-recall curves (precision-calls), F-measures (F-measures) and Mean Absolute Errors (MAE). precision-call (pr) curves are standard indicators for evaluating significance performance. F-measure is expressed as FβThe index is an overall performance index and is obtained by weighted harmonic calculation of precision and recall rate. MAE is the mean absolute error, which is a measure of the predicted significance map and the mean difference of truth. Compared with the existing method, the method has good effect on 3 evaluation indexes of 5 public data sets. The invention can accurately position the obvious target in each data set and accurately position the obvious targetAnd segmenting out complete saliency targets quasi-locally.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (5)

1. A salient object detection method based on a self-attention mechanism is characterized by comprising the following steps:
step 1, performing feature extraction on an input picture by utilizing a convolutional neural network to generate a group of feature maps, wherein the feature maps comprise a shallow feature map and a deep feature map, and each feature map has semantic information with different scales;
step 2, fusing the shallow feature maps to generate low-level integrated features, and merging the deep feature maps to form high-level integrated features;
step 3, constructing a self-attention module based on a self-attention mechanism, inputting low-level integrated features and high-level integrated features into the self-attention module, respectively capturing the features in the high-level features and the low-level features, and exchanging semantic information to form a dependency relationship;
and 4, reinforcing the obtained features through a multi-scale feature reinforcing module, and sending the fused and reinforced features into a cascade decoder to generate a final remarkable target detection map.
2. The salient object detection method based on the self-attention mechanism as claimed in claim 1, wherein the convolutional neural network in step 1 is as follows:
selecting a ResNet-101 convolutional neural network as a picture feature extractor, wherein the ResNet-101 convolutional neural network comprises 5 convolution groups, each convolution group comprises a convolution calculation process, each convolution group comprises a downsampling operation, the first convolution group only comprises 1 convolution calculation operation, and the 2 nd to 5 th convolution groups comprise a plurality of identical residual error units and discard the last global pooling layer and the full connection layer.
3. The salient object detection method based on the self-attention mechanism is characterized in that the features in the step 2 are integrated as follows:
fusing the multiple feature maps into a low-level integrated feature through a Concat operation by the shallow feature map, and merging the multiple feature maps into a high-level integrated feature through the Concat operation by the deep feature map; the generated high-level integrated features provide a large amount of semantic information, and the low-level integrated features contain spatial details which help to refine the boundary of the object, and the two contain complementary information.
4. The salient object detection method based on the self-attention mechanism as claimed in claim 1, wherein the self-attention module is constructed based on the self-attention mechanism in step 3, and specifically comprises the following steps:
the self-attention module comprises two convolution layers of 1 × 1 and six convolution layers of 3 × 3, two features are subjected to dimensionality reduction through embedding the convolution layers and are converted into queries, keys and values, remodeling matrixes and pooling operations are simultaneously carried out, then long-distance space-time interaction is carried out through dot product attention calculation among the features with different time step sizes to obtain a convolution diagram of 1 × 1, and finally the convolution diagram and the original features are subjected to element-level summation operation to obtain a final representation of long-distance context information.
5. The salient object detection method based on the self-attention mechanism as claimed in claim 1, wherein the multi-scale feature enhancing module in step 4 is specifically as follows:
the multi-scale feature enhancement module comprises 1 group of 1 × 1 convolution layers and 3 × 3 convolution layers, the number of passages of the 3 × 3 convolution layers is 2, 4 and 6 respectively, a Concat operation is carried out on a feature diagram after dimension reduction of the 3 × 3 convolution layers, and then point multiplication is carried out on the feature diagram after dimension reduction of the 1 × 1 convolution layers to form an enhancement diagram;
the multi-scale feature enhancement module is used for extracting spatial information of different scales, expanding a receptive field by utilizing cavity convolution, combining the multi-scale information to obtain a feature information output with optimized fusion, and then predicting two groups of features by utilizing a cascade decoding mode to form a prediction graph.
CN202111278451.8A 2021-10-30 2021-10-30 Salient object detection method based on self-attention mechanism Pending CN114119993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111278451.8A CN114119993A (en) 2021-10-30 2021-10-30 Salient object detection method based on self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111278451.8A CN114119993A (en) 2021-10-30 2021-10-30 Salient object detection method based on self-attention mechanism

Publications (1)

Publication Number Publication Date
CN114119993A true CN114119993A (en) 2022-03-01

Family

ID=80380031

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111278451.8A Pending CN114119993A (en) 2021-10-30 2021-10-30 Salient object detection method based on self-attention mechanism

Country Status (1)

Country Link
CN (1) CN114119993A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115067945A (en) * 2022-08-22 2022-09-20 深圳市海清视讯科技有限公司 Fatigue detection method, device, equipment and storage medium
CN115424023A (en) * 2022-11-07 2022-12-02 北京精诊医疗科技有限公司 Self-attention mechanism module for enhancing small target segmentation performance
CN117492398A (en) * 2023-11-16 2024-02-02 北京雷格讯电子股份有限公司 High-speed data acquisition system and acquisition method thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115067945A (en) * 2022-08-22 2022-09-20 深圳市海清视讯科技有限公司 Fatigue detection method, device, equipment and storage medium
CN115424023A (en) * 2022-11-07 2022-12-02 北京精诊医疗科技有限公司 Self-attention mechanism module for enhancing small target segmentation performance
CN115424023B (en) * 2022-11-07 2023-04-18 北京精诊医疗科技有限公司 Self-attention method for enhancing small target segmentation performance
CN117492398A (en) * 2023-11-16 2024-02-02 北京雷格讯电子股份有限公司 High-speed data acquisition system and acquisition method thereof
CN117492398B (en) * 2023-11-16 2024-05-28 北京雷格讯电子股份有限公司 High-speed data acquisition system and acquisition method thereof

Similar Documents

Publication Publication Date Title
Xu et al. RSSFormer: Foreground saliency enhancement for remote sensing land-cover segmentation
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
Chen et al. EF-Net: A novel enhancement and fusion network for RGB-D saliency detection
Cong et al. Does thermal really always matter for RGB-T salient object detection?
Zhang et al. Deep hierarchical guidance and regularization learning for end-to-end depth estimation
CN114119993A (en) Salient object detection method based on self-attention mechanism
Ding et al. DCU-Net: a dual-channel U-shaped network for image splicing forgery detection
CN111325165B (en) Urban remote sensing image scene classification method considering spatial relationship information
CN110796026A (en) Pedestrian re-identification method based on global feature stitching
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
Huang et al. TISNet‐Enhanced Fully Convolutional Network with Encoder‐Decoder Structure for Tongue Image Segmentation in Traditional Chinese Medicine
CN116863319B (en) Copy mobile tamper detection method based on cross-scale modeling and alternate refinement
Zhou et al. Attention transfer network for nature image matting
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
Fang et al. Context enhancing representation for semantic segmentation in remote sensing images
Ge et al. WGI-Net: A weighted group integration network for RGB-D salient object detection
Zhu et al. DFTR: Depth-supervised fusion transformer for salient object detection
Wang et al. Msfnet: multistage fusion network for infrared and visible image fusion
Yao et al. Double cross-modality progressively guided network for RGB-D salient object detection
Su et al. Physical model and image translation fused network for single-image dehazing
Zhou et al. CMPFFNet: Cross-modal and progressive feature fusion network for RGB-D indoor scene semantic segmentation
CN114445620A (en) Target segmentation method for improving Mask R-CNN
CN114332122A (en) Cell counting method based on attention mechanism segmentation and regression
CN116935178A (en) Cross-modal image fusion method based on multi-scale hole attention
Ou et al. A scene segmentation algorithm combining the body and the edge of the object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination