CN110097115A - A kind of saliency object detecting method based on attention metastasis - Google Patents

A kind of saliency object detecting method based on attention metastasis Download PDF

Info

Publication number
CN110097115A
CN110097115A CN201910347420.XA CN201910347420A CN110097115A CN 110097115 A CN110097115 A CN 110097115A CN 201910347420 A CN201910347420 A CN 201910347420A CN 110097115 A CN110097115 A CN 110097115A
Authority
CN
China
Prior art keywords
attention
module
network
metastasis
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910347420.XA
Other languages
Chinese (zh)
Other versions
CN110097115B (en
Inventor
程明明
范登平
林铮
吴文海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Device Co Ltd
Nankai University
Original Assignee
Huawei Device Co Ltd
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Device Co Ltd, Nankai University filed Critical Huawei Device Co Ltd
Priority to CN201910347420.XA priority Critical patent/CN110097115B/en
Publication of CN110097115A publication Critical patent/CN110097115A/en
Application granted granted Critical
Publication of CN110097115B publication Critical patent/CN110097115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A kind of saliency object detecting method based on attention metastasis.Attention metastasis is distinctive function in human visual system, and still, current method ignores this important mechanism.The method of the present invention devises a kind of new convolutional neural networks framework, it efficiently utilizes the characteristics of static convolutional network, pyramid extension convolutional network, shot and long term memory network and attention transfer sensing module, to fully demonstrate the attention metastasis in human visual system, practical significance is had more for true application scenarios, and better conspicuousness object detection effect can be obtained.Relative to current all saliency object detecting methods, the method for the present invention has reached the leading level in the world, on the performance evaluating of the public data collection of mainstream, has surmounted current best saliency object detecting method.

Description

A kind of saliency object detecting method based on attention metastasis
Technical field
The invention belongs to technical field of image processing, and it is significant to be related specifically to a kind of video based on attention metastasis Property object detecting method.
Background technique
Saliency object detection (VSOD) is intended to extract noticeable object from dynamic video.This task From the vision attention behavior of the research mankind, i.e. human visual system quickly positions the weight in (visual attention mechanism) scene Want this great ability of information.Early stage physical research is quantitatively confirmed the existence of this specific, between object strong correlation Conspicuousness judgement and implicit visual attention distribute behavior.It is lived in due to us in the world of a dynamic change, depending on Frequency conspicuousness object detection is of great significance.Also, it has an extensive practical application, such as Video segmentation, video extraction, Video compress, automatic Pilot, machine interaction etc..Due to there is a large amount of different types of video data (for example, different movements Mode is blocked, and is obscured, object deformation etc.) and complicated human visual attention behavior (i.e. selective attention dynamic allocation, note Power of anticipating transfer etc.), therefore, saliency detection faces great challenge, and causes highest attention, has important Art value.
The VSOD model of early stage is based on some simple features (for example, color, movement etc.), and largely relies on Classical conspicuousness object detection algorithms (for example, Core-Periphery compares, background priority scheduling) and visual attention in image Cognitive theory (for example, feature integration is theoretical, guidance search etc.).They inquire into and have studied to spatial domain and time-domain conspicuousness The mode of the integration of feature, such as gradient flow field, geodesic distance, random walk and map structure etc..Traditional VSOD model is limited In limited feature representation ability.However, the VSOD model based on deep learning receives more concerns recently, by scheming Deep neural network is applied as upper, is successfully realized the conspicuousness detection of still image.For more specifically, Wang et al. is in IEEE TIP periodical (27 (1): delivers entitled " Video salient object detection via fully on 38-49,2018) The paper of convolutional networks ".The nerve net that a complete convolution is built for VSOD is realized in the paper Network.Another same time is published in entitled " the Deeply supervised 3d recurrent fcn for of the paper on BMVC salient object detection in videos".Room and time information is incorporated in by the work using 3D filter It comes together to build condition random field frame.Then, space-time depth characteristic, Recognition with Recurrent Neural Network etc. are proposed for preferably capturing The significant characteristics of room and time.Generally speaking, it based on the VSOD model of depth network, is mentioned since neural network is utilized Feature is taken, to possess powerful learning ability.Since document is too many, just no longer repeat one by one herein.But these models are neglected Slightly very important attention metastasis in human visual attention mechanism.Such as: there is a static black cat in video scene With the white cat of movement, the attention of people is concentrated on the white cat of movement at the beginning.Several seconds are spent, when that static black cat When the white cat moved suddenly and originally is quarrelled and fought noisily, attention will be transferred to black cat and white cat by people.Due to current state Existing model, which focuses mostly on greatly, on border considers the object of movement, or the conspicuousness detection technique of purely static object.Cause This, in such a needs more scene of the attention metastasis of comprehensive understanding people, the performance of these models will be significant Decline, detection effect are unsatisfactory.
Summary of the invention
Object of the present invention is to solve to fail to turn in view of conspicuousness object in existing saliency object detecting method The problem of shifting, to propose a kind of saliency object detecting method based on attention metastasis.
The method of the present invention is known as Saliency-Shift Aware Video salient object detection (SSAV), be made of two basic modules: pyramid expands convolution module (PDC) and conspicuousness object transfer sensing module (SSLSTM).The former is trained using strong still image conspicuousness object learning method, and the latter extends traditional The long product of memory coil in short-term network (convLSTM) makes it have conspicuousness object transfer perception mechanism.The present invention is obtained from PDC module Static nature sequence is taken as input and generates the corresponding VSOD result shifted with dynamic representation and attention.
Technical solution of the present invention
A kind of saliency object detecting method based on attention metastasis, this method comprises the following steps:
A. static convolution network module: multilayer convolutional neural networks are utilized, to multiframe still imageFeature is carried out to mention It takes, obtains one group of featureWherein, T indicates that the frame sum of input video, t indicate a wherein frame;Multilayer convolution therein Neural network is made of different basic convolutional neural networks, the basic convolutional neural networks include VGG-16 network, ResNet-50 network, ResNet-101 network and SE network.
B. pyramid expands convolution PDC module: using the feature extracted in step a as the input of the module, utilizing golden word Tower expands convolution module and obtains Analysis On Multi-scale Features.Specifically, PDC module is made of K layers of empty convolutional layerEvery layer Empty convolution respectively corresponds different expansion ratiosTo extract Analysis On Multi-scale Features vector
C. attention transfer perception AtModule: based on shot and long term memory network convLSTM, on the network foundation Add weight FAModule;Weight FAModule is specifically made of one group of simple convolutional layer stacking, benefit special design of the invention With weight FAModule carries out weight distribution to the Analysis On Multi-scale Features that step b is extracted, to realize attention transfer perception:
Above-mentioned attention transfer perception AtThe input of module is the Analysis On Multi-scale Features vector after PDC moduleIt is defeated It is out two-dimensional map figure St∈ [0,1]W×H;Wherein, attention transfer perception AtThe processing of module is as follows:
Hidden state Ht=convLSTM (Xt, Ht-1)
Attention transfer perception At=FA({X1..., Xt})
Perception conversion GM, t=At⊙HM, t
The prediction of conspicuousness object
Assuming that the total length of input video is T frame, subscript t indicates that present frame, t-1 indicate previous frame.HtWork as 3D tensor The hidden state at preceding moment, it passes through current input feature X by shot and long term memory network convLSTM ()tWith last moment Amount state obtains.Weight FABy special design of the invention, it is stacked by one group of simple convolutional layer heap, is utilized to () module One group of feature { X that step b is extracted1..., XtCarry out weight distribution;GM, tIndicating perception conversion, m ∈ M is channel index, M is total number of channels, and ⊙ symbol is matrix element multiplication, HM, tIndicate the hidden state in the channel m at 3D tensor current time.wS For the convolution kernel of 1 × 1 × M,For convolution operation, σ is an activation primitive.
D. it generates image result: convolution being carried out to the feature that step c is exported using the convolutional layer of a 1*1, recycles and swashs Function live to judge which neuron is activated, to generate the conspicuousness subject image of the every frame of video;
E. it updates network: calculating the conspicuousness subject image and artificial mark that step d is generated using cross entropy loss function Reference picture penalty values, carry out gradient passback, update network.
The described function for calculating penalty values with reference picture that is manually marking is as follows:
Wherein LAttAnd LVSODTo intersect entropy loss;L () is with the presence or absence of blinkpunkt notable figure Ft;MtManually to mark ginseng Examine image.
The advantages and benefits of the present invention are:
Saliency detection method of the invention considers attention metastasis, this mechanism is not the prior art, it It is that naturally occurring in human visual system but long-term studied personnel are ignored.It is creative that the mechanism is introduced in a network And have and acquire a certain degree of difficulty, conspicuousness is only detected from single frame video image compared to other current models, the method for the present invention is examined The space relationship of interframe in video is considered, and has considered the metastasis of conspicuousness according to attention viewpoint translation, more for application Add with practical significance, better actual effect can be obtained, reach top standard in the world.
Detailed description of the invention
Fig. 1 is the flow chart of SSAV method of the present invention.
Fig. 2 is the specific implementation frame diagram of SSAV method of the present invention.Wherein, the number 473 × 473 × 3 on image indicates The length of input picture × wide × port number.
Fig. 3 is that SSAV method of the present invention and other 17 existing best deep learning methods and conventional method exist Obtained in the complete data set of ViSal significant illustrated example (17 control methods be followed successively by PDBM, MBNM, FGRN, DLVS, SCNN, SCOM, SFLR, SGSP, STBP, MSTM, GFVM, SAGM, MB+M, RWRV, SPVM, TIMP, SIVM):
Fig. 4 is SSAV method of the present invention and other 17 existing best deep learning methods and conventional method in FBMS Test data set on obtained significant illustrated example (17 same Fig. 3 of control methods);
Fig. 5 is that SSAV method of the present invention and other 17 existing best deep learning methods and conventional method exist The significant illustrated example (17 same Fig. 3 of control methods) obtained in the test data set of DAVIS;
Fig. 6 is that SSAV method of the present invention and other 17 existing best deep learning methods and conventional method exist The significant illustrated example (17 same Fig. 3 of control methods) obtained in the test data set of DAVSOD;
Fig. 7 is that SSAAV method of the present invention and other 5 now best deep learning methods and conventional method exist The significant illustrated example that there is conspicuousness object to shift is obtained in the test data set of DAVSOD.Wherein column (a) are video input frames Frame (b) is attention viewpoint Fixation that the corresponding mankind observe that input frame leaves, is marked by hand with reference to figure It is (d) notable figure that the method for the present invention SSAV is obtained as frame, (e)-(i) is followed successively by the notable figure that 5 methods of comparison obtain: MBNM、FGRN、PDBM、SFLR、SAGM。
Specific embodiment
With reference to Fig. 1 and Fig. 2, specific implementation step of the invention is as follows:
A. static convolution network module: ResNet-50 neural network is utilized, to multiframe still imageCarry out feature Extraction obtains one group of featureWherein, T indicates that the frame sum of input video, t indicate a wherein frame.Example exhibition in Fig. 2 Having shown 3 frame input pictures is respectively: It-1、It、It+1.One group of feature is obtained after ResNet-50 network are as follows: Qt-1、Qt、Qt+1
B. pyramid expands convolution PDC module: using the feature extracted in step a as the input of the module, utilizing golden word Tower expands convolution module and obtains Analysis On Multi-scale Features.Specifically, PDC module is made of K layers of parallel empty convolutional layer Every layer of empty convolution respectively corresponds different expansion ratiosTo extract Analysis On Multi-scale Features vectorThis implementation Using 4 layers of empty convolution in example, every layer of corresponding expansion ratio is respectively as follows: 2,4,8,16.For example, the spy obtained by a step Sign Q obtains one group of feature { P for pyramid convolution algorithm is participated in1..., Pk..., PK, then merge to obtain one group with Q again Analysis On Multi-scale Features:
X=[Q, P1..., Pk..., PK],
Wherein, X is the reinforcing feature extracted, and Q is the 3D characteristic tensor of I frame in a video, Represent parallel operation.Multi-scale information can be obtained using pyramid expansion convolution module, extracts more robust feature.
C. attention transfer perception AtModule: based on shot and long term memory network convLSTM, on the network foundation Add weight FAModule;The FAModule is made of, using it to step special design of the invention one group of simple convolutional layer stacking The Analysis On Multi-scale Features that rapid b is extracted carry out weight distribution, to realize attention metastasis.
Above-mentioned attention transfer perception AtThe input of module is the Analysis On Multi-scale Features vector after PDC moduleIt is defeated It is out two-dimensional map figure St∈ [0,1]W×H;Wherein, attention transfer perception AtThe processing of module is as follows:
Hidden state Ht=convLSTM (Xt, Ht-1)
Attention transfer perception At=FA({X1..., Xt})
Perception conversion GM, t=At⊙HM, t
The prediction of conspicuousness object
Assuming that the total length of input video is T frame, subscript t indicates that present frame, t-1 indicate previous frame.HtWork as 3D tensor The hidden state at preceding moment, it passes through current input feature X by shot and long term memory network convLSTM ()tWith last moment Amount state obtains.Weight FABy special design of the invention, it is stacked by one group of simple convolutional layer heap, is utilized to () module One group of feature { X that step b is extracted1..., XtCarry out weight distribution;GM, tIndicating perception conversion, m ∈ M is channel index, M is total number of channels, and ⊙ symbol is matrix element multiplication, HM, tIndicate the hidden state in the channel m at 3D tensor current time.wS For the convolution kernel of 1 × 1 × M,For convolution operation, σ is a Sigmoid activation primitive.As shown in Figure 2, ConvLSTM network is using one 3 × 3 × 32 convolution kernel.
D. it generates image result: convolution being carried out to the feature that step c is exported using the convolutional layer of a 1*1, recycles and swashs Function live to judge which neuron is activated, to generate the conspicuousness subject image of the every frame of video;
E. it updates network: calculating the conspicuousness object figure that step d is generated using cross entropy cross entropy loss function As the penalty values with the reference picture manually marked, carries out gradient passback, updates network.
The model finally obtained can be used to extract the obvious object with attention metastasis in any video.Institute It is stating as follows with reference picture that is manually marking calculating loss function:
Wherein LAttAnd LVSODTo intersect entropy loss;L () is with the presence or absence of blinkpunkt notable figure Ft;MtManually to mark ginseng Examine image.
Effect of the invention is further illustrated by following emulation experiment:
(1) experimental data set and simulated conditions
Test image used by this experiment includes Wang Wenguan et al. in the ViSal data set constructed, 2014 in 2015 Year, Jitendar professor Malik of Berkeley University of California, the U.S. organized the interior FBMS constructed, the science of Adobe company in 2016 Family Perazzi is published in the DAVIS of famous international computer vision and pattern-recognition meeting (CVPR), Beijing Space aviation section VOS data set and model in skill university Li Jia group in building in 2018 step on the DAVSOD data that equality people announced in 2019 Collection.Wherein, ViSal data set is first data set exclusively for saliency object detection task design, it is contained 17 video sequences amount to 193 mark picture frames.FBMS data set is a classical data set earlier, is appointed for object segmentation Business design, there are 59 videos to add up to 720 mark frames, is now widely used in saliency object detection task.DAVIS is The data set of current first high quality mark, amounts to the picture frame of 3455 dense marks, there is 50 videos.Short 2 years Time, the data set have been widely used.As for VOS data set, then be current data concentrate quantity maximum one, it by 200 video compositions, are labelled with 7467 picture frames.Nankai University's media computation laboratory in 2018 constructs one and works as previous existence The largest DAVSOD data set of video in boundary, video sum are more than 200, and the picture frame number of mark has been more than current institute There is the summation of data set mark frame to reach 23938.This experiment porch is InterE5-2676v3@2.4GHz× 24, video card is GTX TITAN XP.It is emulated using Python Caffe.
(2) saliency object detection Performance evaluation criterion
We are measured using maximum F value (max F), structure metric (S) and these three gold indexs of mean error (M) The result of saliency object detection.
Mathematically F value is the harmonic-mean of precision and recall rate, comprehensive evaluation may be implemented, calculation formula is such as Shown in lower:
β is to impart the higher status of accuracy to weight added by Precision.Contemporary literature widespread practice is β is set2=0.3.Wherein, the calculation formula of Precision and Recall is as follows:
Precision Precision and recall rate Recall is made of a confusion matrix again, wherein in two-value decision problem In, it is practical with reference to also for positive sample that TP expression is predicted as positive sample, and FP indicates to be predicted as that positive sample is practical to be referenced as negative sample, FN It is also negative sample that expression, which is predicted as the practical reference of negative sample,.We carry out the result figure that detected with 256 different threshold values Binaryzation, each thresholding can calculate a F value, and maximum F value is a maximum F in F value after 256 thresholdings Value.
What structure metric (S) You Fandeng equality people proposed in 2017, for predictive metrics result and reference result Architectural difference degree.It is by facing area SrWith object-oriented SoTwo levels combine measurement:
S=α * So+(1-α)*Sr
Wherein, identical weight is arranged in area metrics and object measurement by setting α=0.5.Specific formula for calculation can join Examine original text: " Structure-measure:A New Way to Evaluate Foreground Maps.ICCV2017 ".
Average measurement error (Mae) is used to the mean absolute error between predictive metrics result and reference result, it is assumed that two The reference result of system is a two-dimensional matrix G, and prediction result is also a two-dimensional matrix S:
Wherein, N is total number of pixels in image.Average measurement error is used to estimate the accuracy of Pixel-level, is using most An extensive evaluation index.
Following table 1 give it is of the invention and it is current it is classical, state-of-the-art 17 control methods are most chosen at 5 in the world The maximum F value (max obtained on the open test data set (ViSal, FBMS-T, DAVIS-T, VOS-T, DAVSOD-T) of war F), structure metric (S), mean error (M).
Table 1
(3) experiment content
Experiment one
From the above table 1 as can be seen that SSAV method of the invention has apparent advantage with current 17 kinds of methods comparison, Highest precision is all reached in 5 data sets, such as 3 indexs in ViSal, FBMS, DAVIS, VOS and DAVSOD.This is filled It defends oneself and the validity and robustness of SSAV method of the present invention is illustrated.The above objective appraisal result quantitatively illustrates that the present invention exists The advantage that saliency object is detected under various scenes, is also required to comment by the subjectivity of visual results other than numerical result Valence.
Experiment two
In this experiment, we further illustrate test result representative on 4 data sets to illustrate this The performance of inventive method.Wherein (a) in Fig. 3-Fig. 6 is 3 frame different images of the video of input, is (b) ginseng marked by hand Picture frame is examined, is (c) notable figure that the method for the present invention SSAV is obtained, is (d) notable figure that PDBM method obtains, is (e) MBNM The notable figure that method obtains is (f) notable figure that FGRN method obtains, is (g) notable figure that DLVS method obtains, is (h) The notable figure that SCNN method obtains is (i) notable figure that SCOM method obtains, is (j) notable figure that SFLR method obtains, (k) It is the notable figure that SGSP method obtains, is (l) notable figure that STBP method obtains, is (m) notable figure that MSTM method obtains, (n) it is notable figure that GFVM method obtains, is (o) notable figure that SAGM method obtains, it is significant to be (p) that MB+M method obtains Figure is (q) notable figure that RWRV method obtains, is (r) notable figure that SPVM method obtains, and it is aobvious to be (s) that TIMP method obtains Figure is write, is (t) notable figure that SIVM method obtains.
From the point of view of the result of complex chart 3- Fig. 6, all very close reference image frame marked by hand of our method.And it compares 17 methods all have larger gap with reference picture.
Conspicuousness object transfer phenomena can be effectively coped in order to further verify the present invention, illustrates this in Fig. 7 One result.Wherein, (a) indicates 5 video frames in DAVSOD data set in a certain video, (b) indicates the attention viewpoint of people, (c) it indicates the reference picture GT marked by hand, is (d) notable figure that this paper SSAV method obtains, is (e) that MBNM method obtains Notable figure is (f) notable figure that FGRN method obtains, is (g) notable figure that PDBM method obtains, is (h) that SFLR method obtains Notable figure, (i) be notable figure that SAGM method obtains.It can be seen from the figure that SSAV method of the present invention is several with respect to other Classic method has obtained more satisfying result.It is existing that method of the invention can effectively capture conspicuousness transfer As: [cat] → [cat, box] → [cat] → [box] → [cat, box].However it other methods or can not completely detect It significant object (for example, SFLR and SAGM method) or only captures the cat of movement out and has ignored box (for example, MBNM Method).
What the present embodiment was not described in detail partly belongs to the public known common sense in this field, does not repeat one by one here.With The upper implementation network (ResNet-50 etc.) specifically used is used only for invention for example, being not to of the invention The restriction of protection scope, design all and that the present invention is similar or identical all belong to the scope of protection of the present invention.

Claims (5)

1. a kind of saliency object detecting method based on attention metastasis, it is characterised in that this method includes as follows Step:
A. static convolution network module: multilayer convolutional neural networks are utilized, feature extraction is carried out to multiframe still image;
B. pyramid expands convolution PDC module: using the feature extracted in step a as the input of the module, being expanded using pyramid It opens convolution module and obtains Analysis On Multi-scale Features;
C. attention transfer perception AtModule: based on shot and long term memory network convLSTM, power is added on the network foundation Weight FAModule, FAModule is made of one group of simple convolutional layer stacking, utilizes weight FAMore rulers that module extracts step b It spends feature and carries out weight distribution, to realize attention transfer perception;
D. it generates image result: convolution being carried out to the feature that step c is exported using the convolutional layer of a 1*1, recycles activation letter Number is to judge which neuron is activated, to generate the conspicuousness subject image of the every frame of video;
E. it updates network: calculating the conspicuousness subject image that step d is generated and the ginseng manually marked using cross entropy loss function The penalty values of image are examined, gradient passback is carried out, updates network.
2. the saliency object detecting method according to claim 1 based on attention metastasis, feature exist In: multilayer convolutional neural networks described in step a are made of different basic convolutional neural networks.
3. the saliency object detecting method according to claim 2 based on attention metastasis, feature exist In: the basic convolutional neural networks include VGG-16 network, ResNet-50 network, ResNet-101 network and SE network.
4. the saliency object detecting method according to any one of claims 1 to 3 based on attention metastasis, It is characterized by: the transfer perception of attention described in step c AtThe input of module be Analysis On Multi-scale Features after PDC module to AmountOutput is two-dimensional map figure St∈[0,1]W×H, W is picture traverse, and H is picture altitude;Attention transfer perception AtThe processing of module is as follows:
Hidden state Ht=convLSTM (Xt,Ht-1)
Attention transfer perception At=FA({X1,...,Xt})
Perception conversion Gm,t=At⊙Hm,t
The prediction of conspicuousness object
Where it is assumed that the total length of input video is T frame, subscript t indicates that present frame, t-1 indicate previous frame, HtWork as 3D tensor The hidden state at preceding moment, it passes through current input feature X by shot and long term memory network convLSTM ()tWith last moment Amount state obtains;Weight FA() module is stacked by one group of simple convolutional layer heap, is extracted using the weight module to step b One group of feature { X1,...,XtCarry out weight distribution;Gm,tIndicate perception conversion, m ∈ M is channel index, and ⊙ symbol is matrix Element multiplication, Hm,tIndicate the hidden state in the channel m at 3D tensor current time;wSFor the convolution kernel of 1 × 1 × M, For convolution operation, σ is an activation primitive.
5. the saliency object detecting method according to any one of claims 1 to 3 based on attention metastasis, It is characterized by: the function for calculating penalty values with artificial mark reference picture described in step e is as follows:
Wherein LAttAnd LVSODTo intersect entropy loss;L () is with the presence or absence of blinkpunkt notable figure Ft;Mt is artificial mark reference Figure.
CN201910347420.XA 2019-04-28 2019-04-28 Video salient object detection method based on attention transfer mechanism Active CN110097115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910347420.XA CN110097115B (en) 2019-04-28 2019-04-28 Video salient object detection method based on attention transfer mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910347420.XA CN110097115B (en) 2019-04-28 2019-04-28 Video salient object detection method based on attention transfer mechanism

Publications (2)

Publication Number Publication Date
CN110097115A true CN110097115A (en) 2019-08-06
CN110097115B CN110097115B (en) 2022-11-25

Family

ID=67446180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910347420.XA Active CN110097115B (en) 2019-04-28 2019-04-28 Video salient object detection method based on attention transfer mechanism

Country Status (1)

Country Link
CN (1) CN110097115B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929735A (en) * 2019-10-17 2020-03-27 杭州电子科技大学 Rapid significance detection method based on multi-scale feature attention mechanism
CN111242003A (en) * 2020-01-10 2020-06-05 南开大学 Video salient object detection method based on multi-scale constrained self-attention mechanism
CN111275694A (en) * 2020-02-06 2020-06-12 电子科技大学 Attention mechanism guided progressive division human body analytic model and method
CN111340046A (en) * 2020-02-18 2020-06-26 上海理工大学 Visual saliency detection method based on feature pyramid network and channel attention
CN111507215A (en) * 2020-04-08 2020-08-07 常熟理工学院 Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution
CN111523410A (en) * 2020-04-09 2020-08-11 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN115276784A (en) * 2022-07-26 2022-11-01 西安电子科技大学 Deep learning-based orbital angular momentum modal identification method
CN115359310A (en) * 2022-07-08 2022-11-18 中国人民解放军国防科技大学 SIC prediction method and system based on ConvLSTM and conditional random field

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101430689A (en) * 2008-11-12 2009-05-13 哈尔滨工业大学 Detection method for figure action in video
US8363939B1 (en) * 2006-10-06 2013-01-29 Hrl Laboratories, Llc Visual attention and segmentation system
US20140270707A1 (en) * 2013-03-15 2014-09-18 Disney Enterprises, Inc. Method and System for Detecting and Recognizing Social Interactions In a Video
CN106127799A (en) * 2016-06-16 2016-11-16 方玉明 A kind of visual attention detection method for 3 D video
WO2017155661A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
WO2018023734A1 (en) * 2016-08-05 2018-02-08 深圳大学 Significance testing method for 3d image
CN108428238A (en) * 2018-03-02 2018-08-21 南开大学 A kind of detection method general based on the polymorphic type task of depth network
CN109118459A (en) * 2017-06-23 2019-01-01 南开大学 Image significance object detection method and device
CN109309834A (en) * 2018-11-21 2019-02-05 北京航空航天大学 Video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8363939B1 (en) * 2006-10-06 2013-01-29 Hrl Laboratories, Llc Visual attention and segmentation system
CN101430689A (en) * 2008-11-12 2009-05-13 哈尔滨工业大学 Detection method for figure action in video
US20140270707A1 (en) * 2013-03-15 2014-09-18 Disney Enterprises, Inc. Method and System for Detecting and Recognizing Social Interactions In a Video
WO2017155661A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
US20170262995A1 (en) * 2016-03-11 2017-09-14 Qualcomm Incorporated Video analysis with convolutional attention recurrent neural networks
CN106127799A (en) * 2016-06-16 2016-11-16 方玉明 A kind of visual attention detection method for 3 D video
WO2018023734A1 (en) * 2016-08-05 2018-02-08 深圳大学 Significance testing method for 3d image
CN109118459A (en) * 2017-06-23 2019-01-01 南开大学 Image significance object detection method and device
CN108428238A (en) * 2018-03-02 2018-08-21 南开大学 A kind of detection method general based on the polymorphic type task of depth network
CN109309834A (en) * 2018-11-21 2019-02-05 北京航空航天大学 Video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HONGMEI SONG等: "Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 *
WENGUAN WANG等: "Revisiting Video Saliency: A Large-scale Benchmark and a New Model"", 《IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2018》 *
张晴: "基于视觉注意的显著物体检测实验设计", 《实验室研究与探索》 *
肖利梅等: "基于多尺度相位谱的显著性运动目标检测", 《兰州理工大学学报》 *
胡春海等: "视觉显著性驱动的运动鱼体视频分割算法", 《燕山大学学报》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929735A (en) * 2019-10-17 2020-03-27 杭州电子科技大学 Rapid significance detection method based on multi-scale feature attention mechanism
CN110929735B (en) * 2019-10-17 2022-04-01 杭州电子科技大学 Rapid significance detection method based on multi-scale feature attention mechanism
CN111242003A (en) * 2020-01-10 2020-06-05 南开大学 Video salient object detection method based on multi-scale constrained self-attention mechanism
CN111242003B (en) * 2020-01-10 2022-05-27 南开大学 Video salient object detection method based on multi-scale constrained self-attention mechanism
CN111275694B (en) * 2020-02-06 2020-10-23 电子科技大学 Attention mechanism guided progressive human body division analysis system and method
CN111275694A (en) * 2020-02-06 2020-06-12 电子科技大学 Attention mechanism guided progressive division human body analytic model and method
CN111340046A (en) * 2020-02-18 2020-06-26 上海理工大学 Visual saliency detection method based on feature pyramid network and channel attention
CN111507215A (en) * 2020-04-08 2020-08-07 常熟理工学院 Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution
CN111523410A (en) * 2020-04-09 2020-08-11 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN111523410B (en) * 2020-04-09 2022-08-26 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN115359310A (en) * 2022-07-08 2022-11-18 中国人民解放军国防科技大学 SIC prediction method and system based on ConvLSTM and conditional random field
CN115359310B (en) * 2022-07-08 2023-09-01 中国人民解放军国防科技大学 SIC prediction method and system based on ConvLSTM and conditional random field
CN115276784A (en) * 2022-07-26 2022-11-01 西安电子科技大学 Deep learning-based orbital angular momentum modal identification method
CN115276784B (en) * 2022-07-26 2024-01-23 西安电子科技大学 Deep learning-based orbital angular momentum modal identification method

Also Published As

Publication number Publication date
CN110097115B (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN110097115A (en) A kind of saliency object detecting method based on attention metastasis
Tao et al. Smoke detection based on deep convolutional neural networks
CN109697435B (en) People flow monitoring method and device, storage medium and equipment
CN106897670B (en) Express violence sorting identification method based on computer vision
CN105608456B (en) A kind of multi-direction Method for text detection based on full convolutional network
CN110532900A (en) Facial expression recognizing method based on U-Net and LS-CNN
CN106952269B (en) The reversible video foreground object sequence detection dividing method of neighbour and system
CN104732208B (en) Video human Activity recognition method based on sparse subspace clustering
CN104392228B (en) Unmanned plane image object class detection method based on conditional random field models
CN105160310A (en) 3D (three-dimensional) convolutional neural network based human body behavior recognition method
CN108764142A (en) Unmanned plane image forest Smoke Detection based on 3DCNN and sorting technique
CN105869173A (en) Stereoscopic vision saliency detection method
CN103186775B (en) Based on the human motion identification method of mix description
CN108921822A (en) Image object method of counting based on convolutional neural networks
CN106845374A (en) Pedestrian detection method and detection means based on deep learning
CN108805078A (en) Video pedestrian based on pedestrian's average state recognition methods and system again
CN109559310A (en) Power transmission and transformation inspection image quality evaluating method and system based on conspicuousness detection
CN112926453B (en) Examination room cheating behavior analysis method based on motion feature enhancement and long-term time sequence modeling
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN106570874A (en) Image marking method combining local image constraint and overall target constraint
CN109376753A (en) A kind of the three-dimensional space spectrum separation convolution depth network and construction method of dense connection
CN111582091B (en) Pedestrian recognition method based on multi-branch convolutional neural network
CN105303163B (en) A kind of method and detection device of target detection
CN107463954A (en) A kind of template matches recognition methods for obscuring different spectrogram picture
CN110399882A (en) A kind of character detecting method based on deformable convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant