CN114283315A - RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion - Google Patents
RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion Download PDFInfo
- Publication number
- CN114283315A CN114283315A CN202111565805.7A CN202111565805A CN114283315A CN 114283315 A CN114283315 A CN 114283315A CN 202111565805 A CN202111565805 A CN 202111565805A CN 114283315 A CN114283315 A CN 114283315A
- Authority
- CN
- China
- Prior art keywords
- rgb
- fusion
- modal
- features
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 50
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 18
- 238000012360 testing method Methods 0.000 claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 238000011156 evaluation Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 4
- 239000011800 void material Substances 0.000 claims description 3
- 230000001364 causal effect Effects 0.000 claims 1
- 238000010276 construction Methods 0.000 abstract 1
- 239000011159 matrix material Substances 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 239000004576 sand Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000007500 overflow downdraw method Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 241000282326 Felis catus Species 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The invention belongs to the field of computer vision, and provides an RGB-D saliency target detection method based on interactive guidance attention and trapezoidal pyramid fusion, which comprises the following steps: 1) acquiring an RGB-D data set for training and testing the task, and defining an algorithm target of the invention; 2) construction of RG for extracting RGB image featuresA B encoder and a Depth (Depth) image feature Depth encoder; 3) establishing a cross-modal characteristic fusion network, and guiding the RGB image characteristics and Depth image characteristics to carry out cross fusion through an attention mechanism guided by an interactive mode; 4) constructing an ultra-large scale receptive field fusion mechanism to enhance the high-level semantic information of the multi-modal characteristics; 5) decoder based on trapezoidal pyramid feature fusion network to generate saliency map Pest(ii) a 6) Predicted saliency map PestSegmentation graph P of salient objects labeled manuallyGTCalculating loss; 7) testing the test data set to generate a saliency map PtestAnd performing performance evaluation using the evaluation index.
Description
The technical field is as follows:
the invention relates to the field of computer vision and image processing, in particular to an RGB-D saliency target detection method based on interactive guidance attention and trapezoidal pyramid fusion.
Background art:
saliency target detection aims at locating the most striking targets or regions from given data (such as RGB pictures, RGB-D pictures, video, etc.) by simulating the human visual attention mechanism. In recent years, due to the wide application of salient object detection, salient object detection is rapidly developed, and is applied to many computer vision fields, such as image retrieval, video segmentation, semantic segmentation, video tracking, person reconstruction, thumbnail creation and quality evaluation.
Since the single modality RGB salient object detection algorithm faces challenging scenes (e.g., complex backgrounds, salient objects highly similar to the background, low contrast scenes, etc.), it is difficult to accurately and completely locate salient objects from the background. Therefore, to solve this problem, a Depth (Depth) image is introduced to salient object detection, which is performed by combining an RGB image and a Depth image to constitute RGB-D.
Since the Depth Map can provide many useful information such as information like spatial structure, 3D distribution, object edges, etc. Introducing a Depth map into the SOD task can help SOD models handle challenging scenes such as complex backgrounds, low contrast, salient objects similar to the background appearance, etc. Therefore, how to accurately locate the salient object by using the Depth Map assisted RGB-D salient object detection model is very important. Most of the previous RGB-D saliency target detection methods extract features independently by taking a Depth Map as a data stream independent of an RGB image, or input the Depth image into an RGB-D saliency detection model as a fourth channel of the RGB image, and the method treats the RGB image and the Depth image indiscriminately and ignores the fact that: in the RGB image and the Depth image, there is a great difference in the salient information carried by different areas, and there is also a difference in the representation of the information of the salient object by the RGB image and the Depth image.
Considering the ambiguity problem of cross-modal data existing between RGB image data and Depth image data, the invention tries to explore an efficient cross-modal feature fusion method and effectively eliminates the ambiguity problem between the cross-modal data by utilizing the cross-modal fusion method. In addition, in order to further explore a connection and cooperation mechanism among the multi-scale features, the performance of model detection is effectively improved by utilizing multi-scale feature information, high-level semantic information and low-level detail information can be considered, and edge details and overall integrity of a perception significance target can be achieved. According to the method, the effect of the characteristic pyramid on the multi-scale characteristic fusion is further excavated, so that the significance detection model is helped to predict the significance target more accurately.
The invention content is as follows:
aiming at the problems provided above, the invention provides an RGB-D saliency target detection method based on interactive guidance attention and trapezoidal pyramid fusion, which specifically adopts the following technical scheme:
1. an RGB-D dataset is acquired that trains and tests the task.
1.1) taking the NJUD data set, the NLPR data set and the DUT-RGBD data set as training sets, and taking the rest NLPR data set, the rest DUT-RGBD data set, the SIP data set, the STERE data set and the SSD data set as test sets.
1.2) RGB-D image dataset comprising a single RGB image PRGBCorresponding Depth image PDepthAnd corresponding artificially labeled salient object segmentation image PGT。
2. Constructing a significant target detection model network for extracting RGB image features and Depth image features by using a convolutional neural network;
2.1) using VGG16 as the backbone network for the model of the present invention, for extracting RGB image features and Depth image features due to pairs,are respectively asAnd
2.2) the VGG16 parameter weights pre-trained by the invention using ImageNet data sets initialize the VGG16 weights of the invention for building backbone networks.
3. Based on the multi-scale RGB image characteristics extracted in the step 2 And corresponding Depth image featuresAnd performing multi-scale cross-modal feature interactive fusion, and constructing a cross-modal feature fusion network by utilizing the interactive fusion to generate the multi-modal features.
3.1) Cross-modal feature fusion network from 5 levels of CMAF modules to 5 levels of RGB image featuresAnd corresponding Depth image features Compose and generate 5 levels of multimodal features And
3.2) input data of CMAF module of i-th levelAndform and generate multi-modal features at level i through an interactively guided attention mechanismWhere i ∈ {1, 2, 3, 4, 5 }.
3.3) CMAF module generates multi-modal features through an interactively guided attention mechanism as follows:
3.3.1) firstly, a residual convolution module is constructed for increasing the receptive field and semantic information of the features and enhancing the expression capability of the significance of the features, and the RGB image features and the corresponding Depth image features can be further enhanced through the residual convolution module.
3.3.2) further fusing the RGB image features and the corresponding Depth image features using element-aware matrix multiplication and element-aware matrix addition, and then converting the fused features into global context-aware attention weights W using softmax activation functionssAnd channel perceptual attention weight Wc:
Where Resconv represents the residual convolution module, multi represents the element-aware matrix multiplication operation, add represents the element-aware matrix addition operation, GAP represents the global average pooling, and softmax represents the softmax activation function.
3.3.3) attention weight W in obtaining global context awarenesssAnd channel perceptual attention weight WcAfter that, we will WsAnd WcRespectively combining the feature with the RGB image features after enhancement and the corresponding Depth image features, and guiding a salient region in the feature focusing features by using a weight matrix generated by an attention mechanism to obtain multi-modal features after filtering:
wherein, alpha is formed by { r, d }, and RGB image characteristics after being filtered can be obtained through the operationAnd corresponding Depth image features
3.3.4) fusing Cross-modal, RGB image features by a Cross-Interactive fusion methodAnd corresponding Depth image featuresObtaining fusion characteristics
Wherein i ∈ {1, 2, 3, 4, 5} represents the hierarchy of the model in which the feature is located, conv3 represents the convolution operation with a convolution kernel size of 3 × 3, and cat represents the feature join operation.
4) Through the operation, multi-modal features of 5 levels are extracted Andand inputting the 5 layers into a density hole convolution module, and enhancing the receptive field information and the high-level semantic information of the multi-modal characteristics through the multi-layer hole convolution operation.
4.1) extracting ultra-large-scale receptive field information from the multi-scale multi-modal characteristics through a hole convolution operation, and setting hole convolutions with different hole rates:
where i ∈ {1, 2, 3, 4, 5} represents the hierarchy in which the multimodal features reside, DLAi() Represents a hole convolution operation with a hole rate i, and DLA2()、DLA4() And DLA8() Representing the hole convolution operations with hole rates of 1, 2, 4 and 8 respectively,andrespectively representing the features with the void rate i generated by the multi-modal features of the ith level.
4.2) inputting the multi-modal characteristics of the multi-level reception fields generated in the above steps into a trapezoidal pyramid characteristic fusion network, fusing the multi-modal characteristics of different reception fields:
wherein, TPNet represents a trapezoidal pyramid feature fusion network.
5) Inputting the multi-modal characteristics of the 5-level ultra-large scale receptive fields obtained in the step 4 into a decoder formed by a trapezoidal pyramid characteristic fusion network to obtain final fusion characteristics, and obtaining a predicted saliency map P after sigmoid function activationest:
Pest=sigmoid(TPNet(f1,f2,f3,f4,f5) Equation (7)
6) Saliency map P predicted by the inventionestSegmentation graph P of salient objects labeled manuallyGTAnd calculating a loss function, gradually updating the parameter weight of the model provided by the invention through an SGD (generalized regression) and a back propagation algorithm, and finally determining the structure and the parameter weight of the RGB-D significance target detection algorithm.
7) On the basis of determining the structure and the parameter weight of the model in the step 6, testing the RGB-D image pair on the test set to generate a saliency map PtestAnd evaluating by using the evaluation indexes of MAE, S-measure, F-measure and E-measure.
The invention realizes multi-mode salient target detection based on a deep convolutional neural network, utilizes rich spatial structure information in a Depth image and carries out cross-modal characteristic fusion in an interactive guidance attention mode with Depth characteristics extracted from an RGB image, can adapt to the requirements of salient target detection in different scenes, and particularly has certain robustness in challenging scenes (complex background, low contrast, transparent objects and the like). Compared with the prior RGB-D significance target detection method, the method has the following benefits:
firstly, a deep learning technology is utilized, the relationship between an RGB-D image pair and an image salient object is constructed through an encoder and a decoder, and the salient prediction is obtained through extraction and fusion of cross-modal features.
Secondly, by means of an interactive fusion mode, the complementary information of the Depth image features to the RGB image features is effectively modulated, the cross-modal feature fusion is guided by the aid of Depth distribution information of the Depth image features, interference of background information in the RGB image is eliminated, and a foundation is laid for prediction of a next-stage significant target.
And finally, performing multi-scale multi-mode feature fusion through the constructed trapezoidal pyramid feature fusion network, and predicting a final saliency map.
Drawings
FIG. 1 is a schematic diagram of the model structure of the present invention
FIG. 2 is a schematic diagram of a cross-modal feature fusion module
FIG. 3 is a schematic diagram of a very large scale receptive field fusion module
FIG. 4 is a schematic diagram of a trapezoidal pyramid feature fusion network (TPNet)
FIG. 5 is a schematic diagram of model training and testing
FIG. 6 is a comparison graph of results of the present invention and other RGB-D saliency target detection methods
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the examples of the present invention, and the described examples are only a part of the examples of the present invention, but not all of the examples. Based on the examples of the present invention, all other examples obtained by a person of ordinary skill in the art without any creative effort belong to the protection scope of the present invention.
Referring to fig. 1, an RGB-D saliency target detection method based on interactive attention-guiding and trapezoidal pyramid fusion mainly includes the following steps:
1. an RGB-D dataset is acquired for training and testing the task, and the algorithm goals of the present invention are defined, and a training set and a test set for training and testing the algorithm are determined. The NJUD data set, the NLPR data set and the DUT-RGBD data set are used as training sets, and the rest data sets are used as testing sets and comprise a SIP data set, the rest NLPR data set, the rest DUT-RGBD data set, a STERE data set and an SSD data set.
2. Constructing a salient object detection model network for extracting RGB image features and Depth image features by utilizing a convolutional neural network, wherein the salient object detection model network comprises an RGB coder for extracting the RGB image features and a Depth coder for extracting the Depth image features:
2.1. inputting the RGB image with three channels into RGB coder to generate RGB image characteristics of 5 layers, each of which isAnd
2.2. inputting the three-channel Depth image into a Depth encoder to generate Depth image characteristics of 5 layers, wherein the Depth image characteristics areAnd
3. referring to FIG. 2, the 5 levels of RGB image features generated in step 2 are combined by a cross-modality fusion moduleAnd Depth image features Interactive fusion is carried out to obtain multi-modal characteristics of 5 layers Andthe main steps are as follows:
3.1. the cross-modal feature fusion network is composed of 5 levels of CMAF modules and 5 levels of RGB image featuresAnd corresponding Depth image featuresCompose and generate 5 levels of multimodal features And
3.2. the input data of the CMAF module at the ith level isAndform and output multi-modal features at level i via an interactively guided attention mechanismWhere i ∈ {1, 2, 3, 4, 5 }.
The CMAF module generates the multi-modal features through an interactively guided attention mechanism by the following specific process:
3.3.1. firstly, the invention constructs a residual convolution module for increasing the receptive field and semantic information of the features and enhancing the expression capability of the significance of the features, and the RGB image features and the corresponding Depth image features can be further enhanced through the residual convolution module.
3.3.2. Further utilizing element-aware matrix multiplication operations andelement-aware matrix addition operation fuses RGB image features and corresponding Depth image features, and then converts the fused features into global context-aware attention weights W by utilizing a softmax activation functionsAnd channel perceptual attention weight Wc:
Where Resconv represents the residual convolution module, multi represents the element-aware matrix multiplication operation, add represents the element-aware matrix addition operation, GAP represents the global average pooling, and softmax represents the softmax activation function.
3.3.3. Obtaining global context-aware attention weight WsAnd channel perceptual attention weight WcAfter that, we will WsAnd WcRespectively combining the feature with the RGB image features after enhancement and the corresponding Depth image features, and guiding a salient region in the feature focusing features by using a weight matrix generated by an attention mechanism to obtain multi-modal features after filtering:
wherein, alpha is formed by { r, d }, and RGB image characteristics after being filtered can be obtained through the operationAnd corresponding Depth image features
3.3.4. Fusing cross-modal, RGB image features by a cross-interactive fusion methodAnd corresponding Depth image featuresObtaining fusion characteristics
Wherein i ∈ {1, 2, 3, 4, 5} represents the hierarchy of the model in which the feature is located, conv3 represents the convolution operation with a convolution kernel size of 3 × 3, and cat represents the feature join operation.
4. Referring to fig. 3, the super-large scale receptive field fusion module is used to enhance the receptive field information and high-level semantic information of the multi-modal features:
4.1) extracting ultra-large-scale receptive field information from the multi-scale multi-modal characteristics through a hole convolution operation, and setting hole convolutions with different hole rates:
where i ∈ {1, 2, 3, 4, 5} represents the hierarchy in which the multimodal features reside, DLAi() Represents a hole convolution operation with a hole rate i, and DLA2()、DLA4() And DLA8() Representing the hole convolution operations with hole rates of 1, 2, 4 and 8 respectively,andrespectively represent the group iThe void rate generated by the multi-modal features of the hierarchy is a feature of i.
4.2) inputting the multi-modal characteristics of the multi-level reception fields generated in the above steps into a trapezoidal pyramid characteristic fusion network, fusing the multi-modal characteristics of different reception fields:
wherein TPNet () represents a trapezoidal pyramid feature fusion network.
5. Referring to fig. 4, the decoder of the algorithm proposed by the present invention using the trapezoidal pyramid uses 5 levels of multi-modal enhancement features f1、f2、f3、f4And f5Inputting the prediction data into a decoder, and activating by a sigmoid function to obtain a predicted saliency map Pest:
Pest=sigmoid(TPNet(f1,f2,f3,f4,f5) Equation (7)
6) Saliency map P predicted by the inventionestSegmentation graph P of salient objects labeled manuallyGTAnd calculating a loss function, gradually updating the parameter weight of the model provided by the invention through an SGD (generalized Gaussian distribution) and a back propagation algorithm, and finally determining the structure and the parameter weight of the RGB-D significance detection algorithm.
7) On the basis of determining the structure and the parameter weight of the model in the step 6, testing the RGB-D image pair on the test set to generate a saliency map PtestAnd evaluating by using the evaluation indexes of MAE, S-measure, F-measure and E-measure.
The above description is for the purpose of illustrating preferred embodiments of the present application and is not intended to limit the present application, and it will be apparent to those skilled in the art that various modifications and variations can be made in the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (5)
1. An RGB-D saliency target detection method based on interactive attention guidance and trapezoidal pyramid fusion is characterized by comprising the following steps:
1) acquiring an RGB-D data set for training and testing the task, defining an algorithm target of the invention, and determining a training set and a testing set for training and testing the algorithm;
2) constructing an RGB encoder for extracting RGB image characteristics and a Depth (Depth) image characteristic Depth encoder;
3) establishing a cross-modal characteristic fusion network, and guiding the RGB image characteristics and Depth image characteristics to carry out cross fusion through an attention mechanism guided by an interactive mode;
4) constructing an ultra-large-scale receptive field fusion mechanism based on the multi-modal features fused by the cross-modal features to enhance the receptive field information and the high-level semantic information of the multi-modal features;
5) establishing a decoder based on a trapezoidal pyramid feature fusion network, and obtaining a final predicted saliency map through an activation function;
6) predicted saliency map PestSegmentation graph P of salient objects labeled manuallyGTAnd calculating a loss function, gradually updating the parameter weight of the model provided by the invention through an SGD (generalized Gaussian distribution) and a back propagation algorithm, and finally determining the structure and the parameter weight of the RGB-D significance detection algorithm.
7) On the basis of determining the structure and the parameter weight of the model in the step 6, testing the RGB-D image pair on the test set to generate a saliency map PtestAnd performing performance evaluation using the evaluation index.
2. The RGB-D saliency target detection method based on interactive attention-guiding and trapezoidal pyramid fusion of claim 1, characterized in that: the specific method of the step 2) is as follows:
2.1) taking the NJUD data set, the NLPR data set and the DUT-RGBD data set as training sets, and taking the rest NLPR data set, the rest DUT-RGBD data set, the SIP data set, the STERE data set and the SSD data set as test sets.
2.2) RGB-D imagesThe data set comprises a single RGB image PRGBCorresponding Depth image PDepthAnd corresponding artificially labeled salient object segmentation image PGT。
3. The RGB-D saliency target detection method based on interactive attention-guiding and trapezoidal pyramid fusion of claim 1, characterized in that: the specific method of the step 3) is as follows:
3.1) Using VGG16 as the backbone network for the model of the invention for extracting RGB image features and causal Depth image features, respectivelyAnd
3.2) initialize the VGG16 weights of the invention for building the backbone network with VGG16 parameter weights pre-trained on ImageNet data sets.
4. The RGB-D saliency target detection method based on interactive attention-guiding and trapezoidal pyramid fusion of claim 1, characterized in that: the specific method of the step 4) is as follows:
4.1) the cross-modal feature fusion network is composed of 5 levels of CMAF modules and generates 5 levels of multimodal featuresAnd
5. The RGB-D saliency target detection method based on interactive guiding attention and trapezoidal pyramid fusion of claim 1, characterized in that: the specific method of the step 5) is as follows:
5.1) extracting ultra-large-scale receptive field information from the multi-scale multi-modal characteristics through a hole convolution operation, and setting hole convolutions with different hole rates:
where i ∈ {1, 2, 3, 4, 5} represents the hierarchy in which the multimodal features reside, DLAi() Represents a hole convolution operation with a hole rate i, and DLA2()、DLA4() And DLA8() Representing the hole convolution operations with hole rates of 1, 2, 4 and 8 respectively,andrespectively representing the features with the void rate i generated by the multi-modal features of the ith level.
5.2) inputting the multi-modal characteristics of the multi-level reception fields generated in the above steps into a trapezoidal pyramid characteristic fusion network, fusing the multi-modal characteristics of different reception fields:
wherein TPNet () represents a trapezoidal pyramid feature fusion network.
6) Inputting the multi-modal characteristics of the 5-level ultra-large scale receptive fields obtained in the step 5 into a decoder formed by a trapezoidal pyramid characteristic fusion network to obtain final fusion characteristics, and obtaining a predicted saliency map P after sigmoid function activationest:
Pest=sigmoid(TPNet(f1,f2,f3,f4,f5) Equation (3)
7) Saliency map P predicted by the inventionestSegmentation graph P of salient objects labeled manuallyGTAnd calculating a loss function, gradually updating the parameter weight of the model provided by the invention through an SGD (generalized Gaussian distribution) and a back propagation algorithm, and finally determining the structure and the parameter weight of the RGB-D significance detection algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111565805.7A CN114283315B (en) | 2021-12-17 | 2021-12-17 | RGB-D significance target detection method based on interactive guiding attention and trapezoidal pyramid fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111565805.7A CN114283315B (en) | 2021-12-17 | 2021-12-17 | RGB-D significance target detection method based on interactive guiding attention and trapezoidal pyramid fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114283315A true CN114283315A (en) | 2022-04-05 |
CN114283315B CN114283315B (en) | 2024-08-16 |
Family
ID=80873250
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111565805.7A Active CN114283315B (en) | 2021-12-17 | 2021-12-17 | RGB-D significance target detection method based on interactive guiding attention and trapezoidal pyramid fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114283315B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082553A (en) * | 2022-08-23 | 2022-09-20 | 青岛云智聚智能科技有限公司 | Logistics package position detection method and system |
CN115439726A (en) * | 2022-11-07 | 2022-12-06 | 腾讯科技(深圳)有限公司 | Image detection method, device, equipment and storage medium |
CN117854009A (en) * | 2024-01-29 | 2024-04-09 | 南通大学 | Cross-collaboration fusion light-weight cross-modal crowd counting method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242238A (en) * | 2020-01-21 | 2020-06-05 | 北京交通大学 | Method for acquiring RGB-D image saliency target |
CN112347859A (en) * | 2020-10-15 | 2021-02-09 | 北京交通大学 | Optical remote sensing image saliency target detection method |
CN113763422A (en) * | 2021-07-30 | 2021-12-07 | 北京交通大学 | RGB-D image saliency target detection method |
US20210390723A1 (en) * | 2020-06-15 | 2021-12-16 | Dalian University Of Technology | Monocular unsupervised depth estimation method based on contextual attention mechanism |
-
2021
- 2021-12-17 CN CN202111565805.7A patent/CN114283315B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111242238A (en) * | 2020-01-21 | 2020-06-05 | 北京交通大学 | Method for acquiring RGB-D image saliency target |
US20210390723A1 (en) * | 2020-06-15 | 2021-12-16 | Dalian University Of Technology | Monocular unsupervised depth estimation method based on contextual attention mechanism |
CN112347859A (en) * | 2020-10-15 | 2021-02-09 | 北京交通大学 | Optical remote sensing image saliency target detection method |
CN113763422A (en) * | 2021-07-30 | 2021-12-07 | 北京交通大学 | RGB-D image saliency target detection method |
Non-Patent Citations (1)
Title |
---|
沈文祥;秦品乐;曾建潮;: "基于多级特征和混合注意力机制的室内人群检测网络", 计算机应用, no. 12, 15 October 2019 (2019-10-15) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082553A (en) * | 2022-08-23 | 2022-09-20 | 青岛云智聚智能科技有限公司 | Logistics package position detection method and system |
CN115439726A (en) * | 2022-11-07 | 2022-12-06 | 腾讯科技(深圳)有限公司 | Image detection method, device, equipment and storage medium |
CN117854009A (en) * | 2024-01-29 | 2024-04-09 | 南通大学 | Cross-collaboration fusion light-weight cross-modal crowd counting method |
Also Published As
Publication number | Publication date |
---|---|
CN114283315B (en) | 2024-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113240691B (en) | Medical image segmentation method based on U-shaped network | |
CN114283315A (en) | RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion | |
CN110880019B (en) | Method for adaptively training target domain classification model through unsupervised domain | |
CN112084331A (en) | Text processing method, text processing device, model training method, model training device, computer equipment and storage medium | |
CN109874053A (en) | The short video recommendation method with user's dynamic interest is understood based on video content | |
CN112287170B (en) | Short video classification method and device based on multi-mode joint learning | |
CN111724400B (en) | Automatic video matting method and system | |
CN113435269A (en) | Improved water surface floating object detection and identification method and system based on YOLOv3 | |
CN109743642B (en) | Video abstract generation method based on hierarchical recurrent neural network | |
CN113284100B (en) | Image quality evaluation method based on recovery image to mixed domain attention mechanism | |
CN114283316A (en) | Image identification method and device, electronic equipment and storage medium | |
CN112651940B (en) | Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network | |
CN113033454B (en) | Method for detecting building change in urban video shooting | |
CN111275784A (en) | Method and device for generating image | |
WO2023185074A1 (en) | Group behavior recognition method based on complementary spatio-temporal information modeling | |
CN114282059A (en) | Video retrieval method, device, equipment and storage medium | |
CN114693952A (en) | RGB-D significance target detection method based on multi-modal difference fusion network | |
CN117033609A (en) | Text visual question-answering method, device, computer equipment and storage medium | |
CN116310396A (en) | RGB-D significance target detection method based on depth quality weighting | |
CN116434033A (en) | Cross-modal contrast learning method and system for RGB-D image dense prediction task | |
CN116933051A (en) | Multi-mode emotion recognition method and system for modal missing scene | |
CN114299305B (en) | Saliency target detection algorithm for aggregating dense and attention multi-scale features | |
CN115965968A (en) | Small sample target detection and identification method based on knowledge guidance | |
CN114926734A (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
CN116486112A (en) | RGB-D significance target detection method based on lightweight cross-modal fusion network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |