CN111814895A - Significance target detection method based on absolute and relative depth induction network - Google Patents

Significance target detection method based on absolute and relative depth induction network Download PDF

Info

Publication number
CN111814895A
CN111814895A CN202010695446.6A CN202010695446A CN111814895A CN 111814895 A CN111814895 A CN 111814895A CN 202010695446 A CN202010695446 A CN 202010695446A CN 111814895 A CN111814895 A CN 111814895A
Authority
CN
China
Prior art keywords
depth
absolute
feature
network
relative depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010695446.6A
Other languages
Chinese (zh)
Inventor
杨钢
尹学玲
卢湖川
岳廷秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Institute Of Artificial Intelligence Dalian University Of Technology
Original Assignee
Dalian Institute Of Artificial Intelligence Dalian University Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Institute Of Artificial Intelligence Dalian University Of Technology filed Critical Dalian Institute Of Artificial Intelligence Dalian University Of Technology
Priority to CN202010695446.6A priority Critical patent/CN111814895A/en
Publication of CN111814895A publication Critical patent/CN111814895A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a significant target detection method based on absolute and relative depth induction networks, which comprises the following steps: deep induced network training with a residual error network as a backbone network; the absolute depth induction module performs cross-modal feature fusion to position an object; and the relative depth induction module establishes a space geometric model to supplement detailed information. The method not only extracts the RG image characteristics from the residual error network, but also extracts the depth information to help the salient object detection task, and the absolute depth induction module RGB image characteristics and the depth image information are fused and utilized in a mode from coarse to fine, so that the disordered noise interference caused by the asynchronous characteristics of two spaces is avoided; the relative depth induction module establishes a spatial graph convolution model to explore spatial structure and geometric information so as to enhance the local feature representation capability, thereby improving the detection accuracy and robustness, achieving excellent detection effect and having wide application prospect.

Description

Significance target detection method based on absolute and relative depth induction network
Technical Field
The invention belongs to the technical field of significance target detection, and particularly relates to a significance target detection method based on absolute and relative depth induction networks.
Background
Salient object detection, which aims at locating and segmenting the most visually distinctive objects in an image, is a fundamental operation in computer image processing. In recent years, it has been widely applied to various fields such as repositioning, scene classification, visual tracking, semantic segmentation, and the like. Before the relevant image processing operation, the computer can adopt a significance detection technology to filter out irrelevant information, thereby greatly reducing the work of image processing and improving the efficiency.
Early methods of salient object detection were primarily designed to detect salient objects in images using hand-made features such as brightness, color, and texture. In recent years, various deep learning-based models have been proposed due to the development of CNN. Hou et al, 2017, proposed a short connection mechanism between layers and used it to aggregate feature maps from multiple scales. In 2017, Zhang et al explore multi-level features at each scale and generate significance maps in a recursive manner. Feng et al, 2019, proposed an attention feedback module to better explore the structure of significant objects. However, these recently proposed methods have certain challenges facing extremely complex situations such as semantically complex backgrounds, low-luminance environments and transparent objects, and to solve this problem we propose to supplement the RGB image with depth information. Therefore, the spatial structure and 3D geometric information of the scene can be explored, and the effectiveness and the robustness of the network are improved.
The features extracted by the traditional RGB-D significant object detection method lack global context information and semantic clues in the features. In recent years, an efficient integration approach of depth and RGB features has been a key issue for this task. In 2019 Zhao et al designed a contrast loss to explore the a priori contrast in depth images. An attention map is then generated by fusing the refined depth and RGB features. Outputting the final significance mapping by fully utilizing the fluid pyramid integration strategy of the multi-scale cross-modal characteristics. In 2019, pinal et al hierarchically integrates depth and RGB images and refines the final saliency map by a recursive attention model. However, the current method is asynchronous in the fusion depth and RGB image feature space, and clutter noise can be introduced into the network.
In summary, the existing salient object detection technology has the following defects: first, most existing methods extract features from the RGB image only, which are not sufficient to distinguish a significant object from a cluttered background region; second, most existing methods extract depth and RGB features through separate networks and fuse them directly using different strategies. However, there is no consistency across the modal feature space. Directly fusing them results in noisy responses in the prediction; third, although the significant objects can be accurately located by using the absolute depth induction module, detailed significance information of local regions is not deeply explored, which also limits further improvement of model performance.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a significant target detection method based on absolute and relative depth induction networks, and solves the problems mentioned in the background technology.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme: a salient object detection method based on absolute and relative depth induction networks comprises the following steps:
a. deep induced network training with a residual error network as a backbone network: removing the final pooling layer and the full-link layer of ResNet-50, uniformly adjusting the network input image to 256 multiplied by 256, carrying out normalization processing on the data set, generating corresponding side output images from the feature images generated by the five convolution blocks in a pyramid mode, and then carrying out fusion operation from top to bottom in the network;
b. the absolute depth induction module is fused across modal features to position an object: inputting the depth image of the input image into a group of convolutions to obtain a depth feature mapping image with the same size as Res2_ x feature mapping, applying an absolute depth induction network for multiple times, integrating the depth feature mapping image and an RGB feature mapping image together in a recursive mode, realizing cross-modal feature fusion, avoiding noise interference caused by simple fusion of two asynchronous modal features, strengthening the depth interaction between depth and color features, and adaptively fusing the RGB and depth features on each scale;
c. the relative depth induction module establishes a space geometric model to supplement detail information: firstly, the feature map from the final stage Res5_ x of the decoding network is up-sampled and integrated with the feature map obtained by the absolute depth induction module in a cross-modal fusion manner to generate a new feature map, and then the new feature map and the depth map generated by the absolute depth induction module are jointly input into the relative depth induction module to explore the spatial structure and detailed significance information of the image and wrap the relative depth information in the network to improve the performance of the significance model;
further, when the size of the input network image in the step a is the same, the data set is operated by using a bilinear interpolation method.
Further, when the side output graph is generated in the step a, the output feature graphs of the four residual blocks are input into a 1 × 1 convolution layer, and the channels of the feature graphs are subjected to dimension reduction to obtain the side output graph, so that the side output graph is used for a subsequent top-down integrated multi-level feature graph.
Further, the integration of the depth feature map and the RGB2 feature map together in a recursive manner as described in step b, the absolute depth-inducing network is implemented by a Gated Recursive Unit (GRU) which aims to handle the sequence problem, we describe the multi-scale feature integration process as a sequence problem and consider each scale as a time step.
Furthermore, in each time step, the dimension of the depth feature map is reduced, then the depth feature map and the RGB feature map are cascaded and converted through global maximum pooling to generate a new feature vector, and then the RGB feature map and the depth feature map are fused on each scale in a self-adaptive manner through operations such as a full connection layer.
Further, in step c, the spatial structure and detailed saliency information of the image are searched by using a relative depth induction module, which searches for relative depth information by using a Graph Convolution Network (GCN).
Further, the proposed Graph Convolution Network (GCN) projects the image pixels into a 3D space according to the spatial positions and the depth values of the image pixels, overcomes the disadvantage that adjacent pixels in a 2D space are not strongly associated in a 3D point cloud space, performs information propagation in a local area according to a short-distance relative depth relation, and successively enhances the local feature representation capability by exploring spatial structure and geometric information on multiple scales.
(III) advantageous effects
Compared with the prior art, the invention provides the significant target detection method based on the absolute and relative depth induction networks, which has the following beneficial effects:
the method not only extracts RGB image features from a residual error network, but also provides the method for assisting a saliency target detection task by using depth information, most of the existing RGB-D models only simply extract depth and RGB features and fusion the depth and RGB features heuristically, and an absolute depth induction module is used for cross-modal fusion of the RGB image features and the depth image information in a mode from coarse to fine, so that the disordered noise interference caused by the asynchronous characteristics of two spaces is avoided, and an object is accurately positioned; the relative depth induction module is utilized to establish a spatial graph convolution model to explore spatial structure and geometric information so as to enhance the local feature representation capability, thereby improving the detection accuracy and robustness, achieving excellent detection effect, being conductive to fusion with other fields and having wide application prospect.
Drawings
Fig. 1 is a significance target flow chart based on absolute and relative depth induction networks proposed by the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution: the salient object detection method based on the absolute and relative depth induction network comprises the following steps:
deep induced network training with a residual error network as a backbone network: and removing the last pooling layer and the full-link layer of Resnet-50, uniformly adjusting the network input image to 256 multiplied by 256, carrying out normalization processing on the data set, and generating a corresponding side output image by the feature map generated by the five convolution blocks in a pyramid mode. Then the fusion operation is carried out from top to bottom in the network.
Deep induced network training with a residual error network as a backbone network: removing the final pooling layer and the full-link layer of ResNet-50, the backbone network comprises five convolution blocks, Conv1, Res2_ x, … Res5_ x, inputting RGB images with size of x W H, respectively generating RGB images with size of x W H by convolution blocks
Figure BDA0002589987510000061
Characteristic diagram of
Figure BDA0002589987510000062
The shallower layers capture low-level information of the image, such as texture and spatial detail, and the deep feature maps contain high-level semantic information. We merge feature maps in a pyramidal fashion
Figure BDA0002589987510000063
Reducing the channels to C using a 1 x 1 convolution kernel, resulting in a side-output map
Figure BDA0002589987510000064
The multi-level feature map is then integrated in a top-down manner,
Figure BDA0002589987510000065
wherein σ is a ReLU activation function, CAT ·]Is a cascade operation of connecting two characteristic graphs under the same channel dimension, UP is an UP-sampling operation with bilinear interpolation, and Wl,blAre trainable parameters in the network.
The absolute depth induction module is fused across modal features to position an object: first, an input depth image D of size W x H is input into a set of convolution layers, and a convolution layer of size W x H is generated
Figure BDA0002589987510000066
Characteristic diagram f ofdThen, an Absolute Depth Induction Module (ADIM) is applied multiple times to recursively integrate the depth profile with the RGB profile
Figure BDA0002589987510000067
Integrated together, to enhance the depth interaction between depth and color features,
Figure BDA0002589987510000071
wherein,
Figure BDA0002589987510000072
is the depth characteristic of the update and,
Figure BDA0002589987510000073
is the aggregate result of depth and RGB information in the ith layer.
According to the above embodiments, preferably ADIM is implemented by a Gated Recursion Unit (GRU) which aims at handling the sequence problem, we formulate the multi-scale feature integration process as a sequence problem and treat each scale as a time step. At each timeIn step, we characterize RGB
Figure BDA0002589987510000074
Treated as input to GRU, and depth features
Figure BDA0002589987510000075
Which may be considered as the hidden state of the last step, the two feature maps are cascaded and transformed by a global max-pooling (GMP) operation and feature vectors are generated. A fully connected layer is then applied over the feature vector to generate a reset gate r and an update gate z. The values of the two gates are normalized by an S-type function, in practice the gate r controls the depth and the integration of the RGB features, the z controls
Figure BDA0002589987510000076
And (4) updating. In this way, RGB and depth features can be adaptively fused at each scale. The interaction between depth and RGB features is enhanced by the network. The generated multi-scale feature map is then processed
Figure BDA0002589987510000077
Combined with the signature graph in the decoded state, i.e. equation (1) is re-expressed as:
Figure BDA0002589987510000078
the relative depth induction module establishes a space geometric model to supplement detail information: the Relative Depth Induction Module (RDIM) is used in the decoding phase, first the feature map from the last phase of the decoding network
Figure BDA0002589987510000081
Performing upsampling and feature mapping
Figure BDA0002589987510000082
Integrated together, the generated feature map is represented as described in equation (3)
Figure BDA0002589987510000083
RDIM is then applied to the feature graph
Figure BDA0002589987510000084
And depth image to embed relative depth information in the network
Figure BDA0002589987510000085
According to the above embodiments, RDIM is preferably implemented by Graph Convolution Network (GCN), and to explore the relative depth relationship between pixels, we first will generate a feature map by ADIM
Figure BDA0002589987510000086
Let G ═ V, E, (where node set is V, edge set is E. we treat each node ni in the graph as a point in the 3D coordinate system, and let the coordinates be (x)i,yi,di) Wherein (x)i,yi) Is spatial location mapping of features
Figure BDA0002589987510000087
And di is the corresponding depth value. Representing a set of nodes as V ═ n1,n2,...,nkAnd k is the number of nodes. We define the edge set e of 3D coordinates and its adjacent m elementsi,jE is left to E, calculate edge Ei,jWeight w ofi,jAs relative depth values, to measure the spatial correlation between nodes ni and nj,
wi,j=|(xi,yi,di)-(xj,yj,dj)| (5)
to describe the semantic relationship between nodes ni and nj, we are edge ei,jDefining an attribute feature ai,jTo further consider global context information of the image, feature maps are applied
Figure BDA0002589987510000088
Application of GAP to extract high-level semantic information and output featuresVector fg
The space GCN consists of a set of stacked Graph Convolution Layers (GCLs), for each GCL the edge e is updated firsti,jAttribute feature of (a)i,j
Figure BDA0002589987510000091
Wherein,
Figure BDA0002589987510000092
and
Figure BDA0002589987510000093
are respectively a characteristic diagram
Figure BDA0002589987510000094
Position (x) ofi,yi) And (x)j,yj) Using MLP to update the function of each node,
Figure BDA0002589987510000095
wherein N (N)i) Is a set w of neighboring pixels of node nii,jAs edge ei,jIn such a way that RDIM focuses more on areas with larger relative distances, and in such a way that RDIM focuses more on areas with larger relative distances, the message is transmitted through the edges of the neighboring nodes. Then, the updated characteristics of all nodes are combined
Figure BDA0002589987510000096
Fed to the global max pooling layer and an updated global feature vector f is obtainedgFinally, we pass the last one
Figure BDA0002589987510000097
Obtain a characteristic diagram
Figure BDA0002589987510000098
Wherein
Figure BDA0002589987510000099
Is the overall output on the scale lrdmi. By using the GCN to transfer messages between nodes, the functionality of each node will be updated and refined according to its relationship to all other neighboring nodes. In our network, we apply RDIM at level 3 and level 4 of the decoding phase. The generated RDIM profile is then input to the next decoding stage.
Selecting the feature map generated in the last decoding stage
Figure BDA00025899875100000910
The final saliency map is predicted because it combines absolute and relative depth information, first using a bilinear interpolation operation to map the features
Figure BDA0002589987510000101
Up-sampling to make it be identical to input size, finally inputting into convolution layer of single channel to obtain final significant graph S, in the course of training, the final significant graph is passed through cross entropy loss function and is formed from truth graph
Figure BDA0002589987510000102
The monitoring is carried out by monitoring the operation of the device,
Figure BDA0002589987510000103
wherein
Figure BDA0002589987510000104
And Si,jThe significance values in the positions (i, j) of the true value map and significance map, respectively.
The invention not only extracts RGB image characteristics from a residual error network, but also provides a method for assisting a saliency target detection task by using depth information, most of the existing RGB-D models only simply extract depth and RGB characteristics and fusion the depth and RGB characteristics in a heuristic manner, and an absolute depth induction module is designed to fuse and utilize the RGB image characteristics and the depth image information in a mode from coarse to fine, so that disordered noise interference caused by asynchronous characteristics of two spaces is avoided, and an object is accurately positioned; meanwhile, a relative depth induction module is designed to establish a space map convolution model to explore space structure and geometric information so as to enhance the local feature representation capability, thereby improving the detection accuracy and robustness, achieving excellent detection effect, being conductive to fusion with other fields and having wide application prospect.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", "both ends", "one end", "the other end", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. A salient object detection method based on absolute and relative depth induction networks is characterized by comprising the following steps:
a. deep induced network training with a residual error network as a backbone network: removing the final pooling layer and the full-link layer of ResNet-50, uniformly adjusting the network input image to 256 multiplied by 256, carrying out normalization processing on the data set, generating corresponding side output images from the feature images generated by the five convolution blocks in a pyramid mode, and then carrying out fusion operation from top to bottom in the network;
b. the absolute depth induction module is fused across modal features to position an object: inputting the depth image of the input image into a group of convolutions to obtain a depth feature mapping image with the same size as Res2_ x feature mapping, applying an absolute depth induction network for multiple times, integrating the depth feature mapping image and an RGB feature mapping image together in a recursive mode, realizing cross-modal feature fusion, avoiding noise interference caused by simple fusion of two asynchronous modal features, strengthening the depth interaction between depth and color features, and adaptively fusing the RGB and depth features on each scale;
c. the relative depth induction module establishes a space geometric model to supplement detail information: firstly, feature maps from the final stage Res5_ x of the decoding network are up-sampled and integrated with feature maps obtained by cross-modal fusion of an absolute depth induction module to generate new feature maps, and then the new feature maps and the depth maps generated by the absolute depth induction module are jointly input into a relative depth induction module to explore the spatial structure and detailed significance information of images, and the relative depth information is wrapped in the network to improve the performance of the significance model.
2. The method for detecting the salient objects based on the absolute and relative depth induction networks according to claim 1, wherein: and c, when the sizes of the input network images in the step a are the same, operating the data set by using a bilinear interpolation method.
3. The method for detecting the salient objects based on the absolute and relative depth induction networks according to claim 1, wherein: and b, when the side output graph is generated in the step a, the output feature graphs of the four residual blocks are input into a 1 × 1 convolution layer, and the channels of the feature graphs are subjected to dimension reduction to obtain the side output graph, so that the side output graph is used for the subsequent top-down integrated multi-level feature graph.
4. The method for detecting the salient objects based on the absolute and relative depth induction networks according to claim 1, wherein: integrating the depth feature map and the RGB2 feature map together in a recursive manner as described in step b, the absolute depth-inducing network is implemented by a Gated Recursive Unit (GRU) which aims to handle the sequence problem, we represent the multi-scale feature integration process as a sequence problem and treat each scale as a time step.
5. The method for detecting the salient objects based on the absolute and relative depth induction networks according to claim 1, wherein: in each time step, the dimension of the depth feature map is reduced, then the depth feature map and the RGB feature map are cascaded and converted through global maximum pooling to generate a new feature vector, and the RGB feature map and the depth feature map are fused on each scale in a self-adaptive manner through operations such as a full connection layer.
6. The method for detecting the salient objects based on the absolute and relative depth induction networks according to claim 1, wherein: in step c, the spatial structure and detailed significance information of the image are searched by using a relative depth induction module, which searches relative depth information by using a Graph Convolution Network (GCN).
7. The method for detecting the salient objects based on the absolute and relative depth induction networks according to claim 1, wherein: the proposed Graph Convolution Network (GCN) projects image pixels into a 3D space according to the spatial positions and the depth values of the image pixels, overcomes the disadvantage that adjacent pixels in a 2D space are not strongly associated in a 3D point cloud space, performs information propagation in a local area according to a short-distance relative depth relation, and successively enhances the local feature representation capability by exploring spatial structure and geometric information on multiple scales.
CN202010695446.6A 2020-07-17 2020-07-17 Significance target detection method based on absolute and relative depth induction network Pending CN111814895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010695446.6A CN111814895A (en) 2020-07-17 2020-07-17 Significance target detection method based on absolute and relative depth induction network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010695446.6A CN111814895A (en) 2020-07-17 2020-07-17 Significance target detection method based on absolute and relative depth induction network

Publications (1)

Publication Number Publication Date
CN111814895A true CN111814895A (en) 2020-10-23

Family

ID=72866457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010695446.6A Pending CN111814895A (en) 2020-07-17 2020-07-17 Significance target detection method based on absolute and relative depth induction network

Country Status (1)

Country Link
CN (1) CN111814895A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076947A (en) * 2021-03-26 2021-07-06 东北大学 RGB-T image significance detection system with cross-guide fusion
CN113537279A (en) * 2021-05-18 2021-10-22 齐鲁工业大学 COVID-19 identification system based on similar residual convolution and LSTM
CN113963081A (en) * 2021-10-11 2022-01-21 华东师范大学 Intelligent image chart synthesis method based on graph convolution network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351941A1 (en) * 2016-06-03 2017-12-07 Miovision Technologies Incorporated System and Method for Performing Saliency Detection Using Deep Active Contours
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
CN110210539A (en) * 2019-05-22 2019-09-06 西安电子科技大学 The RGB-T saliency object detection method of multistage depth characteristic fusion
CN110399907A (en) * 2019-07-03 2019-11-01 杭州深睿博联科技有限公司 Thoracic cavity illness detection method and device, storage medium based on induction attention
AU2020100274A4 (en) * 2020-02-25 2020-03-26 Huang, Shuying DR A Multi-Scale Feature Fusion Network based on GANs for Haze Removal
CN111242138A (en) * 2020-01-11 2020-06-05 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
CN111242238A (en) * 2020-01-21 2020-06-05 北京交通大学 Method for acquiring RGB-D image saliency target

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170351941A1 (en) * 2016-06-03 2017-12-07 Miovision Technologies Incorporated System and Method for Performing Saliency Detection Using Deep Active Contours
WO2019136946A1 (en) * 2018-01-15 2019-07-18 中山大学 Deep learning-based weakly supervised salient object detection method and system
CN110210539A (en) * 2019-05-22 2019-09-06 西安电子科技大学 The RGB-T saliency object detection method of multistage depth characteristic fusion
CN110399907A (en) * 2019-07-03 2019-11-01 杭州深睿博联科技有限公司 Thoracic cavity illness detection method and device, storage medium based on induction attention
CN111242138A (en) * 2020-01-11 2020-06-05 杭州电子科技大学 RGBD significance detection method based on multi-scale feature fusion
CN111242238A (en) * 2020-01-21 2020-06-05 北京交通大学 Method for acquiring RGB-D image saliency target
AU2020100274A4 (en) * 2020-02-25 2020-03-26 Huang, Shuying DR A Multi-Scale Feature Fusion Network based on GANs for Haze Removal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘政怡;段群涛;石松;赵鹏;: "基于多模态特征融合监督的RGB-D图像显著性检测", 电子与信息学报, no. 04, 15 April 2020 (2020-04-15), pages 206 - 213 *
陈凯;王永雄;: "结合空间注意力多层特征融合显著性检测", 中国图象图形学报, no. 06, 16 June 2020 (2020-06-16), pages 66 - 77 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076947A (en) * 2021-03-26 2021-07-06 东北大学 RGB-T image significance detection system with cross-guide fusion
CN113076947B (en) * 2021-03-26 2023-09-01 东北大学 Cross-guided fusion RGB-T image saliency detection system
CN113537279A (en) * 2021-05-18 2021-10-22 齐鲁工业大学 COVID-19 identification system based on similar residual convolution and LSTM
CN113963081A (en) * 2021-10-11 2022-01-21 华东师范大学 Intelligent image chart synthesis method based on graph convolution network
CN113963081B (en) * 2021-10-11 2024-05-17 华东师范大学 Image chart intelligent synthesis method based on graph convolution network

Similar Documents

Publication Publication Date Title
Cong et al. Global-and-local collaborative learning for co-salient object detection
TWI821671B (en) A method and device for positioning text areas
Leng et al. Robust obstacle detection and recognition for driver assistance systems
Wang et al. Background-driven salient object detection
CN111814895A (en) Significance target detection method based on absolute and relative depth induction network
CN107392930B (en) Quantum Canny edge detection method
Saber et al. Partial shape recognition by sub-matrix matching for partial matching guided image labeling
Shen et al. ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
Wu et al. Handmap: Robust hand pose estimation via intermediate dense guidance map supervision
CN111480169A (en) Method, system and apparatus for pattern recognition
EP3992908A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
Wang et al. Towards Weakly Supervised Semantic Segmentation in 3D Graph-Structured Point Clouds of Wild Scenes.
Chen et al. Image splicing localization using residual image and residual-based fully convolutional network
Hu et al. RGB-D image multi-target detection method based on 3D DSF R-CNN
Zhang et al. R2Net: Residual refinement network for salient object detection
Yao et al. As‐global‐as‐possible stereo matching with adaptive smoothness prior
CN114445479A (en) Equal-rectangular projection stereo matching two-stage depth estimation machine learning algorithm and spherical distortion layer
Tang et al. SDRNet: An end-to-end shadow detection and removal network
CN116385660A (en) Indoor single view scene semantic reconstruction method and system
Zhou et al. FANet: Feature aggregation network for RGBD saliency detection
Long et al. Adaptive surface normal constraint for geometric estimation from monocular images
CN111914809B (en) Target object positioning method, image processing method, device and computer equipment
Xie et al. 3D surface segmentation from point clouds via quadric fits based on DBSCAN clustering
CN117315724A (en) Open scene-oriented three-dimensional pedestrian detection method, system, equipment and medium
Haryono et al. Oriented object detection in satellite images using convolutional neural network based on ResNeXt

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination