CN111598108A - Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control - Google Patents

Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control Download PDF

Info

Publication number
CN111598108A
CN111598108A CN202010319916.9A CN202010319916A CN111598108A CN 111598108 A CN111598108 A CN 111598108A CN 202010319916 A CN202010319916 A CN 202010319916A CN 111598108 A CN111598108 A CN 111598108A
Authority
CN
China
Prior art keywords
attention
convolution
neural network
scale
stereo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010319916.9A
Other languages
Chinese (zh)
Inventor
刘云
张鑫禹
程明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202010319916.9A priority Critical patent/CN111598108A/en
Publication of CN111598108A publication Critical patent/CN111598108A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A quick salient object detection method of a multi-scale neural network based on three-dimensional attention control. The method aims to design a lightweight convolutional neural network for salient object detection. The method extracts multi-scale convolution characteristics through a multi-branch structure, wherein each branch is a depth separable convolution with different expansion rates; adding the convolution characteristics of all branches, and calculating an attention diagram for each branch by using a stereo attention unit; then multiplying the calculated attention diagram with the characteristics of each branch respectively, adding the multiplied results of the branches, and adding residual connection to form a multi-scale convolution module for controlling the three-dimensional attention; and finally, stacking the multi-scale modules to form a deep convolutional neural network, and performing salient object detection on the natural image. Experiments show that compared with the existing method, the method has the advantages of higher speed, fewer parameters, less calculation amount and similar precision.

Description

Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a quick salient object detection method based on a multi-scale neural network controlled by three-dimensional attention.
Background
Salient object detection, also referred to as saliency detection, is aimed at detecting the most visually distinctive objects or regions in natural images. Saliency detection techniques have many applications in computer vision, such as image retrieval, image segmentation, object detection, object tracking, scene classification, content-based image editing, and so forth. Traditional salient object detection methods mainly rely on manually designed features and a priori knowledge, such as image contrast, texture features, and the characteristic that salient objects often appear in the center of an image, but these methods usually lack high-level semantic information. Recently, the precision of significance detection based on convolutional neural networks is continuously improved due to the great progress of deep learning.
However, the improvement in accuracy comes with a huge price: current convolutional neural network-based methods typically rely on large networks with large computer volumes and large parameters. For example, although the EGNet model proposed by Jiaxing Zhao et al in the ICCV conference in 2019 is one of the most accurate significance detection methods at present, the EGNet model has 108 million parameters, and a memory space of 432MB is required for optically storing the parameters. For images of 336x336 size, EGNet has a speed of only 0.09 frames/sec on i7-8700K CPU, and 12.7 frames/sec on even powerful great britain TITAN XP GPU. Note that the power rating of a TITAN XP GPU is approximately 250W, and the EGNet model is not likely to be deployed on mobile devices. This makes EGNet difficult to deploy in practical applications, especially on mobile devices. However, due to recent rise of mobile trends, such as smart phones, robots, virtual reality, various intelligent terminals, and the like, deployment of the saliency detection system on mobile devices becomes a problem to be solved urgently.
Designing a lightweight convolutional neural network is a good method for solving the problems, and the lightweight neural network is a neural network with small calculated amount, few parameters and high speed obtained by some design skills. The lightweight neural network has related researches in other fields, such as image classification and semantic segmentation, and the famous image classification models such as MobileNet and ShuffleNet are compared, but the invention is dedicated to design the lightweight neural network for salient object detection. Significance detection typically faces two challenges: 1) the method needs high-level semantic information and bottom-level detail information to locate the salient object and correct the object detail; 2) it requires extraction of multi-scale information to handle salient objects of different sizes and dimensions in natural images. Since the depth of lightweight neural networks is usually shallow, the operations are often simplified, and therefore their learning and representation capabilities are often inferior to large scale convolutional neural networks. Because of this, it is not good to design a lightweight convolutional neural network for saliency detection directly using MobileNet and ShuffleNet as the backbone network.
Disclosure of Invention
The invention aims to solve the problems of too high computer complexity, too low speed and too large parameter quantity of the conventional significant object detection method based on the convolutional neural network, and provides a quick significant object detection method based on a multi-scale neural network controlled by three-dimensional attention. The method can achieve similar performance to previous methods, but only 1.33 million parameters, which can reach 343 frames/second speed on the Yingwei TITAN XP GPU and still have 5 frames/second speed on the i7-8700K CPU.
In order to achieve the purpose of the invention, a multi-scale convolution module controlled by three-dimensional attention is designed firstly, the designed module can well extract multi-scale convolution characteristics on the premise of ensuring light operation weight, the modules are stacked to form a deep convolution neural network, and high-level semantic information and bottom-level detail information can be well learned from an image, so that salient object detection in the image can be rapidly and accurately carried out.
The invention provides a method for detecting a rapid salient object of a multi-scale neural network based on stereo attention control, which comprises the following steps:
a. a multi-scale convolution module for controlling the stereo attention is designed.
The module extracts multi-scale convolution characteristics from input images and features by using a plurality of parallel depth separable convolutions with different expansion rates, and then fuses the convolution characteristics of all branches in an element-by-element addition mode. And respectively using a channel-based attention mechanism and a space-based attention mechanism to pay attention to the two different convolution characteristics of the merged convolution characteristics, wherein the channel-based attention mechanism obtains a vector with the dimensionality equal to the number of the convolution characteristic channels multiplied by the number of parallel branches, and the space-based attention mechanism obtains a single-channel matrix with the same size as the space of the convolution characteristics. Multiplying the two attentions in a matrix dimension expansion mode to obtain a stereo attention diagram, and cutting the stereo attention diagram along the channel dimension, so that the convolution features extracted by each parallel branch correspond to a stereo attention diagram with the same size. And multiplying the convolution characteristics extracted by each branch with the corresponding three-dimensional attention diagram respectively, and finally adding the multiplication results of all the branches, and adding the input of the module to obtain the output of the module.
b. And designing a deep convolutional neural network with an encoding-decoding structure.
The coding sub-network of the designed convolutional neural network can be divided into five stages, each stage is firstly connected with a convolutional layer with the step length of 2 to sample the input space size twice, and then is connected with a plurality of designed multi-scale convolutional modules controlled by the stereo attention. The decoding sub-network of the designed convolutional neural network starts from the last layer of the coding sub-network, merges the convolutional characteristics extracted by the coding sub-network at different stages in a gradual up-sampling mode, predicts a significance map after each characteristic merging, and adds deep supervision for training.
c. And c, inputting the color natural image to be detected into the deep convolutional neural network designed in the step b, decoding the significance map predicted after the sub-network is fused for the last time, namely the output of the designed convolutional neural network, wherein the output significance map is equal to the original input image in size.
Advantages and advantageous effects of the invention
The invention obtains a convolution neural network by stacking a designed multi-scale convolution module based on the three-dimensional attention control, and can quickly and accurately detect the salient objects. Because the designed multi-scale convolution module based on the stereo attention control replaces the traditional convolution with the depth separable convolution, the method has few parameters, small calculation amount and high speed. Meanwhile, the designed multi-scale convolution module based on the stereo attention control can efficiently learn multi-scale and rich image expression by using depth separable convolution, so that the method can achieve the precision similar to that of the traditional method.
Drawings
Fig. 1 is a multi-scale convolution module based on stereo attention control designed by the present invention.
Fig. 2 is an overall architecture of the convolutional neural network designed by the present invention.
FIG. 3 is a comparison of experimental results of the present invention and related methods.
Fig. 4 shows several sets of exemplary results of the present invention.
Detailed Description
The following describes in further detail embodiments of the present invention with reference to the accompanying drawings. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The method for detecting the fast significant object of the multi-scale neural network based on the stereo attention control specifically comprises the following operations:
a. and designing a multi-scale convolution module for controlling stereo attention, wherein the multi-scale convolution module is used for extracting multi-scale convolution characteristics.
Assume that DSConv3 × 3 represents a deep separable convolution with a convolution kernel size of 3 × 3, Conv3 × 3 represents an ordinary convolution with a convolution kernel size of 3 × 3, Conv1 × 1 represents an ordinary convolution with a convolution kernel size of 1 × 1, and r represents the expansion ratio of the convolution.
Assume that the convolution signature of a multiscale convolution module input based on stereo attention control is I ∈C×H×WIts channel number, height and width are represented by C, H and W, respectively, we first process the input I with a depth separable convolution DSConv3 × 3, formulated as
Figure BDA0002460966500000031
Wherein the content of the first and second substances,
Figure BDA0002460966500000041
showing a DSConv3 × 3 operation A multi-branch structure was designed, each branch being designated with F0For input, record as
Figure BDA0002460966500000042
Here, the first and second liquid crystal display panels are,
Figure BDA0002460966500000043
DSConv3 × 3 representing the ith branch, N representing the total number of branches, DSConv3 × 3 for each branch having a different dilation rate the resulting convolution characteristics of all branches are added element by element, i.e. adding all branches together
Figure BDA0002460966500000044
The fused features M will be used to compute the attention of the convolution features.
First, a channel-based attention mechanism that attempts to compute a channel-based attention vector d ∈(N+1)×C. To explore the relationship between different channels of convolution features M, global average pooling is performed on M, i.e. M is pooled
Figure BDA0002460966500000045
Wherein, z ∈C. Then, we process the feature vector z with a multi-layer perceptron consisting of two layers, i.e.
Figure BDA0002460966500000046
Wherein the vector d ∈(N+1)CIs derived channel-based attentionThe force vector.
The following is a space-based attention mechanism that is directed to computing a spatial attention matrix s ∈H×WBecause a large field of view can better extract context information, which is important for learning position-dependent attention, DSConv3 × 3 with dilation rate is used to increase the field of view, specifically, convolution signature M is first mapped by a Conv1 × 1 to a spatial space with a smaller number of channels
Figure BDA0002460966500000047
Where λ is the channel reduction ratio, the experimental results of the present invention used λ 4, then the resulting convolution signature was processed with two DSConv3 × 3 dilatations 2 and 4 respectively, finally the channel number was reduced to 1 with one Conv1 × 1, resulting in a channel size of 11×H×WAnd transforming it intoH×W. These operations may be formulated as
Figure BDA0002460966500000048
Here, the first and second liquid crystal display panels are,
Figure BDA0002460966500000049
representing a convolution with an ith convolution kernel size of k × k, a normal convolution when k is 1, and a deep separable convolution DSConv3 ×. s ∈ when k is 3H×WIs the derived space-based attention matrix.
The stereoscopic attention matrix can be calculated according to the following formula
Figure BDA00024609665000000410
Here, d ∈(N+1)CAnd s ∈H×WThe above-described channel-based attention vector and space-based attention matrix, v ∈, respectively(N+1)×C×H×WIs the obtained three-dimensional attention map. Symbol
Figure BDA00024609665000000411
Representing element-by-element multiplication, before which the channel-based attention d is extended by replication to(N+1)×C×H×WAnd space-based attention s is also expanded to the same size by replication(N+1)×C×H×W. Applying Softmax function to normalize on the dimension corresponding to the branch to obtain the final three-dimensional attention matrix, namely
Figure BDA0002460966500000051
Where C ∈ {0,1, …, C-1}, H ∈ {0,1, …, H-1}, W ∈ {0,1, …, W-1} are indices in the channel, height, and width dimensions, respectively, WiC×H×WThen all the branches can be weighted and summed as the weight of the ith branch, and the fusion feature based on attention is obtained and expressed as formula
Figure BDA0002460966500000052
The symbols therein
Figure BDA0002460966500000053
Finally, the fused feature F is processed by a Conv1 × 1 and added to the input I of the block as a residual connection, resulting in an output O of a multi-scale convolution block based on stereo attention control, which can be expressed as
Figure BDA0002460966500000054
Wherein the content of the first and second substances,
Figure BDA0002460966500000055
representing a Conv1 × 1 operation fig. 1 is a diagram of a multi-scale convolution module based on stereo attention control.
b. And designing a deep convolutional neural network with an encoding-decoding structure. Stacking the multi-scale convolution module based on the stereo attention control designed in the step aThe coding sub-network constituting the designed convolutional neural network, as shown in fig. 2, can be divided into five stages, each stage is connected with Conv3 × 3 or DSConv3 × 3 with step size of 2, and then connected with ni(i ═ 1,2,3,4,5) multiscale convolution modules based on stereo attention control, the number of convolution channels for all operations being C respectivelyi(i-1, 2,3,4,5) where only the first stage uses Conv3 × 3 and all other stages use DSConv3 × 3, since the input color image typically has only three channels and need not be separable convolvedi(i is 1,2,3,4,5) is 1, 3, 6, 3, C, respectivelyi(i is 1,2,3,4,5) is 16, 32, 64, 96, 128, respectively. The first four stages of the multi-scale convolution module based on the stereo attention control all have three branches, the expansion rates are 1,2 and 3 respectively, the fifth stage has two branches, the expansion rates are 1 and 2 respectively, and because the convolution characteristic diagram of the fifth stage is already small, the large expansion rate is not necessary. After the fifth stage, a Pyramid pooling module was followed, which was proposed by Hengshuang Zhao in the "Pyramid sharing network" published on the 2017 CVPR conference. Suppose the convolution characteristic graph output by each stage is marked as Si(i ═ 1,2,3,4,5,6), where S is5Representing the convolution characteristics before the fifth stage pyramid pooling Module, S6Represents the convolution characteristics of the fifth stage pyramid pooling module output, and Si+1(i-1, 2,3,4) is Si(i-1, 2,3,4) half the size.
Using the generated convolution features Si(i 1,2,3,4,5,6) a decoding subnetwork is constructed for significance detection. To fuse S of the highest layer5And S6First, adjust S with a Conv1 × 15Number of channels of, and then S5Processed features and S6The element-by-element addition is followed by a further depth separable convolution process with a convolution kernel size of k × k.
This step can be expressed as
Figure BDA0002460966500000061
Wherein the content of the first and second substances,
Figure BDA0002460966500000062
the convolution kernel representing the fifth decoding stage is a depth separable convolution of size k × k,
Figure BDA0002460966500000063
is a common convolution with a convolution kernel size of 1 × 1, R5Showing the characteristics after the fifth stage of fusion. Similarly, the fusion of the underlying features is similar to this except that the features passed down from the higher layers are first up-sampled to the same spatial size as the features of the current stage. Can be formulated as
Figure BDA0002460966500000064
Where Up denotes upsampling the convolution feature by a factor of 2. Thus, a decoded convolution signature R can be obtainedi(i ═ 1,2,3,4, 5). In neural network training, the present invention uses deep supervision. Specifically, in RiAnd (i-1, 2,3,4 and 5) sequentially connecting a Conv1 × 1 function and a Sigmoid function to perform significance prediction, upsampling the prediction result to the same size as the original image, and supervising training by using an accurate significance map in a training set.
c. Inputting the color natural image to be detected into the deep convolutional neural network with the coding-decoding structure designed in the step b, and obtaining the convolution characteristic R in the decoding sub-network1The predicted saliency map is the output of the designed neural network, and the output saliency map is equal to the input original image in size.
FIG. 3 shows a comparison of our method with other methods on six data sets, including data sets ECSSD, DUT-O, DUTS-TE, HKU-IS, SOD and THUR 15K. # Param denotes the parameters of the convolutional neural network in mega (M). FLOPs indicate the number of floating-point operations in giga (G). Speed represents Speed in Frames Per Second (FPS). FβRepresents F-measure, the larger the better; MAE represents the mean absolute error, the moreThe smaller the better. This is a common indicator for significance detection. SAMNet is the method of the present invention. RFCN is a method in "friendly Detection with reliable function connected networks" published by Linzhao Wang et al in 2016 ECCV conference, DSS is a method in "deep super friendly object Detection with Short Connections" published by Qiabin Hou et al in 2019 IEEE TPAMI, SRM is a method in "A static sensitive Detection module for detecting objects in images" published by Tianodan Wang et al in 2017 ICCV conference, Amule is a method in "aging Detection with stable Detection with objects in images" published by Pingging Zhang et al in 2017 ICCV conference, "ingredient Detection with aggregate collection-parameter for detecting objects in images", CVestimate is a method in "inserting Detection with sample Zhang et al in 2017 ICCV conference," CVrisk Detection with sample Detection function for meeting "published by Pigment Zhang et al in 2017 ICenvironmental protection meeting," and Pigmentdetection method in "published by Pigment Detection with sample collection in needle Detection with function meeting, and Pigment Detection result meeting in" published by Pigment Detection with sample collection and collection in "meeting, and" meeting in "meeting information in" meeting, and the method in "meeting method in Pigment discovery, and meeting The method in saliency Detection, C2S is the method in the paper "content knowledge for Salient Object Detection" published by Xin Li et al at the ECCV conference 2018, RAS is the method in the paper "recovery entry for Salient Object Detection" published by Shuhan Chen et al at the ECCV conference 2018, CPD is the method in the paper "Cascade Partial Decoder for Fast accept and Object Detection" published by Zo Wu et al at the CVPR conference 2019, and EGNet is the method in the paper "BASNet: Bounday-aware Object Detection" published by Xue Bin et al at the CVPR conference 2019. It can be seen that the parameters and the calculated amount of the deep convolutional neural network designed by the invention are small, the speed is high, and the parameter reaches 343FPS, but the precision is almost the same as that of the previous method. The method proposed by the present invention still has a speed of 5 frames/second on i7-8700K CPU, which was not possible before.
FIG. 4 is a comparison of saliency maps obtained using the method of the present invention with other methods. The first column of each row is the original, the second last column is the method of the invention, and the last column is the true value of the label in the data set. The bottom of the figure marks the names of the methods corresponding to the columns.

Claims (4)

1. A method for detecting a fast significant object of a multi-scale neural network based on stereo attention control is characterized by comprising the following steps:
a. designing a stereo attention control multi-scale convolution module, extracting multi-scale convolution characteristics from an input image or the characteristics by using a plurality of parallel branches formed by depth separable convolutions with different expansion rates, calculating an attention diagram for each branch, and finally performing characteristic fusion based on the attention diagram on the multi-scale convolution characteristics extracted by the plurality of branches;
b. designing a deep convolutional neural network with a coding-decoding structure, wherein a coding sub-network of the designed convolutional neural network is formed by stacking the stereo attention-controlled multi-scale convolutional modules designed in the step a, and a decoding sub-network of the designed convolutional neural network fuses convolution characteristics extracted by the coding sub-network at different stages in a gradual up-sampling mode;
c. and (c) inputting the color natural image to be detected into the deep convolutional neural network designed in the step (b), namely outputting a saliency map with the same size as the original image.
2. The method for fast detecting salient objects based on the multi-scale neural network with the stereo attention control as claimed in claim 1, wherein: the designed multi-scale convolution module for controlling the stereo attention force fuses the multi-scale convolution characteristics extracted by all branches in an adding mode, and then calculates the attention force diagram of each branch by using the fused characteristics, so that the attention force diagram corresponding to each branch is calculated according to the characteristics of all branches.
3. The method for fast detecting salient objects based on the multi-scale neural network with the stereo attention control as claimed in claim 2, wherein: the calculation of the stereo attention map fuses the attention mechanism based on the convolution channel and the attention mechanism based on the convolution space, so that each branch corresponds to a different stereo attention map.
4. The method for fast detecting salient objects based on the multi-scale neural network with the stereo attention control as claimed in claim 2, wherein: the feature fusion based on the stereogram is that the convolution features obtained by calculation on each branch and the corresponding stereogram are multiplied element by element, then the convolution features obtained by multiplying all the branches are added element by element, and the input feature graph is added to be used as residual connection.
CN202010319916.9A 2020-04-22 2020-04-22 Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control Pending CN111598108A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010319916.9A CN111598108A (en) 2020-04-22 2020-04-22 Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010319916.9A CN111598108A (en) 2020-04-22 2020-04-22 Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control

Publications (1)

Publication Number Publication Date
CN111598108A true CN111598108A (en) 2020-08-28

Family

ID=72185120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010319916.9A Pending CN111598108A (en) 2020-04-22 2020-04-22 Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control

Country Status (1)

Country Link
CN (1) CN111598108A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112037520A (en) * 2020-11-05 2020-12-04 杭州科技职业技术学院 Road monitoring method and system and electronic equipment
CN112258431A (en) * 2020-09-27 2021-01-22 成都东方天呈智能科技有限公司 Image classification model based on mixed depth separable expansion convolution and classification method thereof
CN112381164A (en) * 2020-11-20 2021-02-19 北京航空航天大学杭州创新研究院 Ultrasound image classification method and device based on multi-branch attention mechanism
CN112528899A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112597985A (en) * 2021-03-04 2021-04-02 成都西交智汇大数据科技有限公司 Crowd counting method based on multi-scale feature fusion
CN112861978A (en) * 2021-02-20 2021-05-28 齐齐哈尔大学 Multi-branch feature fusion remote sensing scene image classification method based on attention mechanism
CN114418003A (en) * 2022-01-20 2022-04-29 北京科技大学 Double-image identification and classification method based on attention mechanism and multi-size information extraction
CN114494703A (en) * 2022-04-18 2022-05-13 成都理工大学 Intelligent workshop scene target lightweight semantic segmentation method
US11694301B2 (en) * 2020-09-30 2023-07-04 Alibaba Group Holding Limited Learning model architecture for image data semantic segmentation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
CN110866907A (en) * 2019-11-12 2020-03-06 中原工学院 Full convolution network fabric defect detection method based on attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533084A (en) * 2019-08-12 2019-12-03 长安大学 A kind of multiscale target detection method based on from attention mechanism
CN110866907A (en) * 2019-11-12 2020-03-06 中原工学院 Full convolution network fabric defect detection method based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIANG-JIANG LIU ET AL.: "A Simple Pooling-Based Design for Real-Time Salient Object Detection", 《ARXIV》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112258431A (en) * 2020-09-27 2021-01-22 成都东方天呈智能科技有限公司 Image classification model based on mixed depth separable expansion convolution and classification method thereof
US11694301B2 (en) * 2020-09-30 2023-07-04 Alibaba Group Holding Limited Learning model architecture for image data semantic segmentation
CN112037520A (en) * 2020-11-05 2020-12-04 杭州科技职业技术学院 Road monitoring method and system and electronic equipment
CN112381164A (en) * 2020-11-20 2021-02-19 北京航空航天大学杭州创新研究院 Ultrasound image classification method and device based on multi-branch attention mechanism
CN112528899A (en) * 2020-12-17 2021-03-19 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112528899B (en) * 2020-12-17 2022-04-12 南开大学 Image salient object detection method and system based on implicit depth information recovery
CN112861978A (en) * 2021-02-20 2021-05-28 齐齐哈尔大学 Multi-branch feature fusion remote sensing scene image classification method based on attention mechanism
CN112597985A (en) * 2021-03-04 2021-04-02 成都西交智汇大数据科技有限公司 Crowd counting method based on multi-scale feature fusion
CN114418003A (en) * 2022-01-20 2022-04-29 北京科技大学 Double-image identification and classification method based on attention mechanism and multi-size information extraction
CN114418003B (en) * 2022-01-20 2022-09-16 北京科技大学 Double-image recognition and classification method based on attention mechanism and multi-size information extraction
CN114494703A (en) * 2022-04-18 2022-05-13 成都理工大学 Intelligent workshop scene target lightweight semantic segmentation method
CN114494703B (en) * 2022-04-18 2022-06-28 成都理工大学 Intelligent workshop scene target lightweight semantic segmentation method

Similar Documents

Publication Publication Date Title
CN111598108A (en) Rapid salient object detection method of multi-scale neural network based on three-dimensional attention control
Cong et al. Going from RGB to RGBD saliency: A depth-guided transformation model
Zhang et al. LFNet: Light field fusion network for salient object detection
CN110543841A (en) Pedestrian re-identification method, system, electronic device and medium
CN111914107B (en) Instance retrieval method based on multi-channel attention area expansion
CN114612759B (en) Video processing method, video query method, model training method and model training device
CN107590505B (en) Learning method combining low-rank representation and sparse regression
CN113515656B (en) Multi-view target identification and retrieval method and device based on incremental learning
CN110929735B (en) Rapid significance detection method based on multi-scale feature attention mechanism
CN109766918B (en) Salient object detection method based on multilevel context information fusion
Yang et al. HybridNet: Integrating GCN and CNN for skeleton-based action recognition
Kishorjit Singh et al. Image classification using SLIC superpixel and FAAGKFCM image segmentation
Zhong et al. Highly efficient natural image matting
Wang et al. STCD: efficient Siamese transformers-based change detection method for remote sensing images
Wang et al. A lightweight network with attention decoder for real-time semantic segmentation
Zong et al. A cascaded refined rgb-d salient object detection network based on the attention mechanism
CN114282055A (en) Video feature extraction method, device and equipment and computer storage medium
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
Zhao et al. Depth enhanced cross-modal cascaded network for RGB-D salient object detection
CN116958324A (en) Training method, device, equipment and storage medium of image generation model
CN113052156B (en) Optical character recognition method, device, electronic equipment and storage medium
CN115830633A (en) Pedestrian re-identification method and system based on multitask learning residual error neural network
Zhang et al. A multi-cue guidance network for depth completion
Pei et al. Fusing appearance and motion information for action recognition on depth sequences
Zhou et al. Salient object detection via reliability‐based depth compactness and depth contrast

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200828

WD01 Invention patent application deemed withdrawn after publication