CN110020658B - Salient object detection method based on multitask deep learning - Google Patents

Salient object detection method based on multitask deep learning Download PDF

Info

Publication number
CN110020658B
CN110020658B CN201910243220.XA CN201910243220A CN110020658B CN 110020658 B CN110020658 B CN 110020658B CN 201910243220 A CN201910243220 A CN 201910243220A CN 110020658 B CN110020658 B CN 110020658B
Authority
CN
China
Prior art keywords
features
module
network
deconvolution
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910243220.XA
Other languages
Chinese (zh)
Other versions
CN110020658A (en
Inventor
张立和
吴杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201910243220.XA priority Critical patent/CN110020658B/en
Publication of CN110020658A publication Critical patent/CN110020658A/en
Application granted granted Critical
Publication of CN110020658B publication Critical patent/CN110020658B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of deep learning, and discloses a method for detecting a salient target based on multi-task deep learning, which is characterized in that a salient target detection network based on multi-task is provided, on the basis of the conventional VGG16 model, more local and semantic information is obtained by introducing a residual error module with semantic comparison local characteristics, and then interactive learning of two task networks is carried out, so that the two networks can mutually learn the characteristics of each other to complement the defects of the characteristics of the two networks. Compared with the prior method, the method has more accurate detection result. For images with multiple targets or targets similar to the background, the detection result of the method provided by the invention is more in line with the visual perception of human, and the obtained saliency map is more accurate. In addition, the edge of the detection result of the obvious target is greatly improved due to the sensitivity of another target contour network to the target contour.

Description

Salient object detection method based on multitask deep learning
Technical Field
The invention belongs to the technical field of deep learning, and relates to a task in computer vision, which is called salient object detection.
Background
With the development of science and technology, information such as images and videos received by people is explosively increased. How to rapidly and effectively process image data becomes a difficult problem to be solved urgently in front of people. Usually, one only focuses on more salient regions in the image that attract the attention of the human eye, i.e. foreground regions or salient objects, while disregarding background regions. Therefore, one uses a computer to simulate the human visual system for saliency detection. At present, salient object detection can be applied as a preprocessing step in various fields of computer vision, including image retrieval, image compression, object recognition, image segmentation and the like.
In saliency detection, how to accurately detect a salient object from an image is a very important problem. The traditional saliency detection method has many defects, and particularly when the situation that a complex multi-target image or a salient target is similar to a background is faced, the detection result is often inaccurate. There is also the problem that edge details are not detected in place.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: on the basis of utilizing a deep network, a novel method for detecting the obvious target is provided, so that the detection result is more accurate.
The technical scheme of the invention is as follows:
a method for detecting a salient object based on multitask deep learning comprises the following steps:
(1) adding modules on the basis of a VGG16 network to respectively obtain a significant target detection task network and a target contour detection task network, wherein each deconvolution module of the significant target detection task network only comprises a feature interaction module and a residual error module based on semantic comparison local features; the deconvolution module of the target contour detection task network only comprises a feature interaction module and a basic convolution layer; the encoding part is a basic VGG16 network and consists of a plurality of convolution modules, and the image is down-sampled step by step into high-level features; the decoding part consists of a plurality of deconvolution modules, each deconvolution module samples the features twice, and the deconvolution modules sample the features of the highest layer of the coding part to the size of an original image step by step to predict tasks;
(2) a residual error module based on semantic comparison local features in the salient object detection task network; respectively extracting local features and semantic features from the residual error module based on semantic comparison local features;
(3) in order to realize good interaction between the two task networks, a characteristic interaction module is designed to ensure that the obvious target detection task network and the target contour detection task network are mutually promoted; the feature interaction module is only used for a decoding part of the task network; for the interaction of the two task networks, the two task networks are alternately trained; when any task network is trained, in a feature interaction module of the task network, four parts of features are taken as input, including the output feature S of a deconvolution module in front of the feature interaction module of the current task network t And the feature S sampled twice thereon t up And S t up Convolution module output characteristic S in decoding part with same size t encoder Output characteristic C of up-sampling twice of deconvolution module corresponding to another task network t up (ii) a In the feature interaction module, the last three mentioned features are connected (concat) according to the channel level; then deconvoluting the characteristic of the current task network before the interactive moduleOutput characteristic S of the module t Carrying out global average pooling (Gap) operation to obtain an attention channel vector; then, performing convolution operation on the attention channel vector by 1x1 to make the length of the attention channel vector be the same as the number of the characteristic channels connected in advance; then, a sigmoid function is used for enabling the vector value to be between 0 and 1; finally, the attention vector is used for weighting each channel of the connected features to screen the connected features, so that the features after feature interaction are all the features which are most favorable for the current task;
(4) for the attention vector in the step (3), a sparse convolution module is provided, so that the attention vector becomes sparse, and the generalization capability of the model is further improved;
(5) carrying out true value supervision on the final output of the deconvolution module of each network to train the network; and finally, performing softmax processing on the prediction result of the last deconvolution module of the decoding network to obtain a final prediction result.
The invention has the beneficial effects that: the multi-task-based saliency target detection network provided by the invention acquires more local and semantic information by introducing a residual error module with semantic comparison local features on the basis of the existing VGG16 basic model, and then the two task networks are subjected to interactive learning, so that the two networks can mutually learn the features of each other to complement the defects of the features of the two networks. Compared with the prior method, the method has more accurate detection result. For images with multiple targets or targets similar to the background, the detection result of the method provided by the invention is more in line with the visual perception of human, and the obtained saliency map is more accurate. In addition, the edge of the detection result of the obvious target is greatly improved due to the sensitivity of another target contour network to the target contour.
Drawings
Fig. 1 is a general block diagram of a network in which the method of the present invention is implemented.
FIG. 2 is a block diagram of a sparse convolution in the method of the present invention.
FIG. 3 is a diagram of semantic comparison local feature residual module in the method of the present invention.
FIG. 4 shows the results of a test performed on a number of images by the method of the present invention.
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
The conception of the invention is as follows: and the complementarity among all tasks in the multi-task network is utilized to mutually improve all tasks, and finally, the result of the obvious target detection is improved. In the network, except for providing a characteristic interaction module with pertinence for interaction among multiple tasks, some modules are added to obtain more local and semantic information or improve the generalization capability of the model. Finally, an alternate training mode is adopted, so that the two tasks can learn the characteristics of the other party more specifically, and the final detection result is more accurate.
The invention is implemented as follows:
a method for detecting a salient object based on multitask deep learning comprises the following steps:
(1) adding modules on the basis of a VGG16 network to respectively obtain a significant target detection task network and a target contour detection task network, wherein each deconvolution module of the significant target detection task network only comprises a feature interaction module and a residual error module based on semantic comparison local features; the deconvolution module of the target contour detection task network only comprises a feature interaction module and a basic convolution layer; the encoding part is a basic VGG16 network and consists of a plurality of convolution modules, and the image is down-sampled step by step into high-level features; the decoding part consists of a plurality of deconvolution modules, each deconvolution module samples the features twice, and the deconvolution modules sample the features of the highest layer of the coding part to the size of an original image step by step to predict tasks;
(2) a residual error module based on semantic comparison local features in the salient object detection task network; respectively extracting local features and semantic features from the residual error module based on semantic comparison local features, and defining the following steps:
F out =F in +(f l (F in ;W l )-f c (F in ;W c ))
wherein: f in Input features of residual module based on semantic comparison of local features, F out Is the final output feature of the residual module based on semantic comparison local features; f. of l (. represents a partial convolution operation, W l Is a convolution parameter of the convolution; f. of c (. represents a convolution operation that extracts semantics, W c Is a parameter of the convolution; subtracting the obtained local features and semantic features to obtain comparison features, and adding the comparison features and the original features to obtain final output features;
(3) in order to realize good interaction between the two task networks, a characteristic interaction module is designed, so that the obvious target detection task network and the target contour detection task network are mutually promoted; the feature interaction module is only used for a decoding part of the task network; for the interaction of the two task networks, the two task networks are alternately trained; when any one task network is trained, in a feature interaction module of the task network, four parts of features are taken as input, including the output feature S of a deconvolution module before the feature interaction module of the current task network t And the feature S sampled twice thereon t up And S t up Convolution module output characteristic S in decoding part with same size t encoder Output characteristic C of up-sampling twice of deconvolution module corresponding to another task network t up (ii) a In the feature interaction module, the last three features are connected according to a channel layer; then, the output characteristic S of a deconvolution module before the characteristic interaction module of the current task network is processed t Carrying out global average pooling operation to obtain an attention channel vector; then, performing convolution operation on the attention channel vector by 1x1 to make the length of the attention channel vector be equal to the number of the characteristic channels connected before; then, a sigmoid function is used for enabling the vector value to be between 0 and 1; finally, the attention vector is used for weighting each channel of the connected features to screen the connected features, so that the features after feature interaction are all the features which are most favorable for the current task, and the specific definition is as follows:
Figure BDA0002010312540000051
(4) for the attention vector in the step (3), a sparse convolution module is provided, so that the attention vector becomes sparse, and the generalization capability of the model is further improved;
(5) carrying out true value supervision on the final output of the deconvolution module of each network to train the network; and finally, performing softmax processing on the prediction result of the last deconvolution module of the decoding network to obtain a final prediction result.

Claims (1)

1. A method for detecting a salient object based on multitask deep learning is characterized by comprising the following steps:
(1) adding modules on the basis of a VGG16 network to respectively obtain a significant target detection task network and a target contour detection task network, wherein each deconvolution module of the significant target detection task network only comprises a feature interaction module and a residual error module based on semantic comparison local features; the deconvolution module of the target contour detection task network only comprises a feature interaction module and a basic convolution layer; the encoding part is a basic VGG16 network and consists of a plurality of convolution modules, and the image is down-sampled step by step into high-level features; the decoding part consists of a plurality of deconvolution modules, each deconvolution module samples the features twice, and the deconvolution modules sample the features of the highest layer of the coding part to the size of an original image step by step to predict tasks;
(2) a residual error module based on semantic comparison local features in the salient object detection task network; respectively extracting local features and semantic features from the residual error module based on semantic comparison local features, and defining the following steps:
F out =F in +(f l (F in ;W l )-f c (F in ;W c ))
wherein: f in Input features of residual module based on semantic comparison of local features, F out Is the final output feature of the residual module based on semantic comparison local features; f. of l (. represents a partial volumeProduct operation, W l Is a convolution parameter of the convolution; f. of c (. represents a convolution operation that extracts semantics, W c Is a parameter of the convolution; subtracting the obtained local features and semantic features to obtain comparison features, and adding the comparison features and the original features to obtain final output features;
(3) in order to realize good interaction between the two task networks, a characteristic interaction module is designed to ensure that the obvious target detection task network and the target contour detection task network are mutually promoted; the feature interaction module is only used for a decoding part of the task network; for the interaction of the two task networks, the two task networks are alternately trained; when any one task network is trained, in a feature interaction module of the task network, four parts of features are taken as input, including the output feature S of a deconvolution module before the feature interaction module of the current task network t And the feature S sampled twice thereon t up And S t up Deconvolution module output characteristic S with same size and positioned in decoding part t encoder Output characteristic C of up-sampling twice of deconvolution module corresponding to another task network t up (ii) a In the feature interaction module, the last three features are connected according to a channel layer; then, the output characteristics S of a deconvolution module before the characteristic interaction module of the current task network are processed t Carrying out global average pooling operation to obtain an attention channel vector; then, performing convolution operation on the attention channel vector by 1x1 to make the length of the attention channel vector be equal to the number of the characteristic channels connected before; then, a sigmoid function is used for enabling the vector value to be between 0 and 1; finally, the attention vector is used for weighting each channel of the connected features to screen the connected features, so that the features after feature interaction are all the features which are most favorable for the current task, and the specific definition is as follows:
Figure FDA0003760809870000021
(4) for the attention vector in the step (3), a sparse convolution module is provided, so that the attention vector becomes sparse, and the generalization capability of the model is further improved;
(5) carrying out true value supervision on the final output of the deconvolution module of each network to train the network; and finally, performing softmax processing on the prediction result of the last deconvolution module of the decoding network to obtain a final prediction result.
CN201910243220.XA 2019-03-28 2019-03-28 Salient object detection method based on multitask deep learning Active CN110020658B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910243220.XA CN110020658B (en) 2019-03-28 2019-03-28 Salient object detection method based on multitask deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910243220.XA CN110020658B (en) 2019-03-28 2019-03-28 Salient object detection method based on multitask deep learning

Publications (2)

Publication Number Publication Date
CN110020658A CN110020658A (en) 2019-07-16
CN110020658B true CN110020658B (en) 2022-09-30

Family

ID=67190116

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910243220.XA Active CN110020658B (en) 2019-03-28 2019-03-28 Salient object detection method based on multitask deep learning

Country Status (1)

Country Link
CN (1) CN110020658B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598610B (en) * 2019-09-02 2022-02-22 北京航空航天大学 Target significance detection method based on neural selection attention
CN113298748B (en) * 2020-02-21 2022-11-18 安徽大学 Image collaborative salient object detection model based on attention mechanism
CN112257526B (en) * 2020-10-10 2023-06-20 中国科学院深圳先进技术研究院 Action recognition method based on feature interactive learning and terminal equipment
CN113505634B (en) * 2021-05-24 2024-06-14 安徽大学 Optical remote sensing image salient target detection method of double-flow decoding cross-task interaction network
CN114494999B (en) * 2022-01-18 2022-11-15 西南交通大学 Double-branch combined target intensive prediction method and system

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929977B2 (en) * 2016-08-25 2021-02-23 Intel Corporation Coupled multi-task fully convolutional networks using multi-scale contextual information and hierarchical hyper-features for semantic image segmentation
CN106909924B (en) * 2017-02-18 2020-08-28 北京工业大学 Remote sensing image rapid retrieval method based on depth significance
CN107133955B (en) * 2017-04-14 2019-08-09 大连理工大学 A kind of collaboration conspicuousness detection method combined at many levels
CN107240066A (en) * 2017-04-28 2017-10-10 天津大学 Image super-resolution rebuilding algorithm based on shallow-layer and deep layer convolutional neural networks
JP7023613B2 (en) * 2017-05-11 2022-02-22 キヤノン株式会社 Image recognition device and learning device
CN107688821B (en) * 2017-07-11 2021-08-06 西安电子科技大学 Cross-modal image natural language description method based on visual saliency and semantic attributes
CN107463892A (en) * 2017-07-27 2017-12-12 北京大学深圳研究生院 Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics
CN107886117A (en) * 2017-10-30 2018-04-06 国家新闻出版广电总局广播科学研究院 The algorithm of target detection merged based on multi-feature extraction and multitask
CN108304765B (en) * 2017-12-11 2020-08-11 中国科学院自动化研究所 Multi-task detection device for face key point positioning and semantic segmentation
CN108960069A (en) * 2018-06-05 2018-12-07 天津大学 A method of the enhancing context for single phase object detector
CN109165660B (en) * 2018-06-20 2021-11-09 扬州大学 Significant object detection method based on convolutional neural network
CN109190626A (en) * 2018-07-27 2019-01-11 国家新闻出版广电总局广播科学研究院 A kind of semantic segmentation method of the multipath Fusion Features based on deep learning
CN109376576A (en) * 2018-08-21 2019-02-22 中国海洋大学 The object detection method for training network from zero based on the intensive connection of alternately update
CN109165697B (en) * 2018-10-12 2021-11-30 福州大学 Natural scene character detection method based on attention mechanism convolutional neural network
CN109447136A (en) * 2018-10-15 2019-03-08 方玉明 A kind of conspicuousness detection method for 360 degree of images
CN111428088B (en) * 2018-12-14 2022-12-13 腾讯科技(深圳)有限公司 Video classification method and device and server

Also Published As

Publication number Publication date
CN110020658A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN110020658B (en) Salient object detection method based on multitask deep learning
CN109190752B (en) Image semantic segmentation method based on global features and local features of deep learning
CN108492319B (en) Moving target detection method based on deep full convolution neural network
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
Lin et al. Image manipulation detection by multiple tampering traces and edge artifact enhancement
CN112150450B (en) Image tampering detection method and device based on dual-channel U-Net model
CN111738054B (en) Behavior anomaly detection method based on space-time self-encoder network and space-time CNN
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN113283356B (en) Multistage attention scale perception crowd counting method
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN113936235A (en) Video saliency target detection method based on quality evaluation
CN112348809A (en) No-reference screen content image quality evaluation method based on multitask deep learning
CN115861756A (en) Earth background small target identification method based on cascade combination network
CN117557775A (en) Substation power equipment detection method and system based on infrared and visible light fusion
CN112036300A (en) Moving target detection method based on multi-scale space-time propagation layer
CN112800932B (en) Method for detecting remarkable ship target in offshore background and electronic equipment
CN112132867B (en) Remote sensing image change detection method and device
CN117315284A (en) Image tampering detection method based on irrelevant visual information suppression
CN117409244A (en) SCKConv multi-scale feature fusion enhanced low-illumination small target detection method
CN115457385A (en) Building change detection method based on lightweight network
CN110728316A (en) Classroom behavior detection method, system, device and storage medium
CN116229228A (en) Small target detection method based on center surrounding mechanism
CN115797684A (en) Infrared small target detection method and system based on context information
CN115223033A (en) Synthetic aperture sonar image target classification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant