CN110097115A - A kind of saliency object detecting method based on attention metastasis - Google Patents
A kind of saliency object detecting method based on attention metastasis Download PDFInfo
- Publication number
- CN110097115A CN110097115A CN201910347420.XA CN201910347420A CN110097115A CN 110097115 A CN110097115 A CN 110097115A CN 201910347420 A CN201910347420 A CN 201910347420A CN 110097115 A CN110097115 A CN 110097115A
- Authority
- CN
- China
- Prior art keywords
- attention
- module
- network
- metastasis
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
A kind of saliency object detecting method based on attention metastasis.Attention metastasis is distinctive function in human visual system, and still, current method ignores this important mechanism.The method of the present invention devises a kind of new convolutional neural networks framework, it efficiently utilizes the characteristics of static convolutional network, pyramid extension convolutional network, shot and long term memory network and attention transfer sensing module, to fully demonstrate the attention metastasis in human visual system, practical significance is had more for true application scenarios, and better conspicuousness object detection effect can be obtained.Relative to current all saliency object detecting methods, the method for the present invention has reached the leading level in the world, on the performance evaluating of the public data collection of mainstream, has surmounted current best saliency object detecting method.
Description
Technical field
The invention belongs to technical field of image processing, and it is significant to be related specifically to a kind of video based on attention metastasis
Property object detecting method.
Background technique
Saliency object detection (VSOD) is intended to extract noticeable object from dynamic video.This task
From the vision attention behavior of the research mankind, i.e. human visual system quickly positions the weight in (visual attention mechanism) scene
Want this great ability of information.Early stage physical research is quantitatively confirmed the existence of this specific, between object strong correlation
Conspicuousness judgement and implicit visual attention distribute behavior.It is lived in due to us in the world of a dynamic change, depending on
Frequency conspicuousness object detection is of great significance.Also, it has an extensive practical application, such as Video segmentation, video extraction,
Video compress, automatic Pilot, machine interaction etc..Due to there is a large amount of different types of video data (for example, different movements
Mode is blocked, and is obscured, object deformation etc.) and complicated human visual attention behavior (i.e. selective attention dynamic allocation, note
Power of anticipating transfer etc.), therefore, saliency detection faces great challenge, and causes highest attention, has important
Art value.
The VSOD model of early stage is based on some simple features (for example, color, movement etc.), and largely relies on
Classical conspicuousness object detection algorithms (for example, Core-Periphery compares, background priority scheduling) and visual attention in image
Cognitive theory (for example, feature integration is theoretical, guidance search etc.).They inquire into and have studied to spatial domain and time-domain conspicuousness
The mode of the integration of feature, such as gradient flow field, geodesic distance, random walk and map structure etc..Traditional VSOD model is limited
In limited feature representation ability.However, the VSOD model based on deep learning receives more concerns recently, by scheming
Deep neural network is applied as upper, is successfully realized the conspicuousness detection of still image.For more specifically, Wang et al. is in IEEE
TIP periodical (27 (1): delivers entitled " Video salient object detection via fully on 38-49,2018)
The paper of convolutional networks ".The nerve net that a complete convolution is built for VSOD is realized in the paper
Network.Another same time is published in entitled " the Deeply supervised 3d recurrent fcn for of the paper on BMVC
salient object detection in videos".Room and time information is incorporated in by the work using 3D filter
It comes together to build condition random field frame.Then, space-time depth characteristic, Recognition with Recurrent Neural Network etc. are proposed for preferably capturing
The significant characteristics of room and time.Generally speaking, it based on the VSOD model of depth network, is mentioned since neural network is utilized
Feature is taken, to possess powerful learning ability.Since document is too many, just no longer repeat one by one herein.But these models are neglected
Slightly very important attention metastasis in human visual attention mechanism.Such as: there is a static black cat in video scene
With the white cat of movement, the attention of people is concentrated on the white cat of movement at the beginning.Several seconds are spent, when that static black cat
When the white cat moved suddenly and originally is quarrelled and fought noisily, attention will be transferred to black cat and white cat by people.Due to current state
Existing model, which focuses mostly on greatly, on border considers the object of movement, or the conspicuousness detection technique of purely static object.Cause
This, in such a needs more scene of the attention metastasis of comprehensive understanding people, the performance of these models will be significant
Decline, detection effect are unsatisfactory.
Summary of the invention
Object of the present invention is to solve to fail to turn in view of conspicuousness object in existing saliency object detecting method
The problem of shifting, to propose a kind of saliency object detecting method based on attention metastasis.
The method of the present invention is known as Saliency-Shift Aware Video salient object detection
(SSAV), be made of two basic modules: pyramid expands convolution module (PDC) and conspicuousness object transfer sensing module
(SSLSTM).The former is trained using strong still image conspicuousness object learning method, and the latter extends traditional
The long product of memory coil in short-term network (convLSTM) makes it have conspicuousness object transfer perception mechanism.The present invention is obtained from PDC module
Static nature sequence is taken as input and generates the corresponding VSOD result shifted with dynamic representation and attention.
Technical solution of the present invention
A kind of saliency object detecting method based on attention metastasis, this method comprises the following steps:
A. static convolution network module: multilayer convolutional neural networks are utilized, to multiframe still imageFeature is carried out to mention
It takes, obtains one group of featureWherein, T indicates that the frame sum of input video, t indicate a wherein frame;Multilayer convolution therein
Neural network is made of different basic convolutional neural networks, the basic convolutional neural networks include VGG-16 network,
ResNet-50 network, ResNet-101 network and SE network.
B. pyramid expands convolution PDC module: using the feature extracted in step a as the input of the module, utilizing golden word
Tower expands convolution module and obtains Analysis On Multi-scale Features.Specifically, PDC module is made of K layers of empty convolutional layerEvery layer
Empty convolution respectively corresponds different expansion ratiosTo extract Analysis On Multi-scale Features vector
C. attention transfer perception AtModule: based on shot and long term memory network convLSTM, on the network foundation
Add weight FAModule;Weight FAModule is specifically made of one group of simple convolutional layer stacking, benefit special design of the invention
With weight FAModule carries out weight distribution to the Analysis On Multi-scale Features that step b is extracted, to realize attention transfer perception:
Above-mentioned attention transfer perception AtThe input of module is the Analysis On Multi-scale Features vector after PDC moduleIt is defeated
It is out two-dimensional map figure St∈ [0,1]W×H;Wherein, attention transfer perception AtThe processing of module is as follows:
Hidden state Ht=convLSTM (Xt, Ht-1)
Attention transfer perception At=FA({X1..., Xt})
Perception conversion GM, t=At⊙HM, t
The prediction of conspicuousness object
Assuming that the total length of input video is T frame, subscript t indicates that present frame, t-1 indicate previous frame.HtWork as 3D tensor
The hidden state at preceding moment, it passes through current input feature X by shot and long term memory network convLSTM ()tWith last moment
Amount state obtains.Weight FABy special design of the invention, it is stacked by one group of simple convolutional layer heap, is utilized to () module
One group of feature { X that step b is extracted1..., XtCarry out weight distribution;GM, tIndicating perception conversion, m ∈ M is channel index,
M is total number of channels, and ⊙ symbol is matrix element multiplication, HM, tIndicate the hidden state in the channel m at 3D tensor current time.wS
For the convolution kernel of 1 × 1 × M,For convolution operation, σ is an activation primitive.
D. it generates image result: convolution being carried out to the feature that step c is exported using the convolutional layer of a 1*1, recycles and swashs
Function live to judge which neuron is activated, to generate the conspicuousness subject image of the every frame of video;
E. it updates network: calculating the conspicuousness subject image and artificial mark that step d is generated using cross entropy loss function
Reference picture penalty values, carry out gradient passback, update network.
The described function for calculating penalty values with reference picture that is manually marking is as follows:
Wherein LAttAnd LVSODTo intersect entropy loss;L () is with the presence or absence of blinkpunkt notable figure Ft;MtManually to mark ginseng
Examine image.
The advantages and benefits of the present invention are:
Saliency detection method of the invention considers attention metastasis, this mechanism is not the prior art, it
It is that naturally occurring in human visual system but long-term studied personnel are ignored.It is creative that the mechanism is introduced in a network
And have and acquire a certain degree of difficulty, conspicuousness is only detected from single frame video image compared to other current models, the method for the present invention is examined
The space relationship of interframe in video is considered, and has considered the metastasis of conspicuousness according to attention viewpoint translation, more for application
Add with practical significance, better actual effect can be obtained, reach top standard in the world.
Detailed description of the invention
Fig. 1 is the flow chart of SSAV method of the present invention.
Fig. 2 is the specific implementation frame diagram of SSAV method of the present invention.Wherein, the number 473 × 473 × 3 on image indicates
The length of input picture × wide × port number.
Fig. 3 is that SSAV method of the present invention and other 17 existing best deep learning methods and conventional method exist
Obtained in the complete data set of ViSal significant illustrated example (17 control methods be followed successively by PDBM, MBNM, FGRN, DLVS,
SCNN, SCOM, SFLR, SGSP, STBP, MSTM, GFVM, SAGM, MB+M, RWRV, SPVM, TIMP, SIVM):
Fig. 4 is SSAV method of the present invention and other 17 existing best deep learning methods and conventional method in FBMS
Test data set on obtained significant illustrated example (17 same Fig. 3 of control methods);
Fig. 5 is that SSAV method of the present invention and other 17 existing best deep learning methods and conventional method exist
The significant illustrated example (17 same Fig. 3 of control methods) obtained in the test data set of DAVIS;
Fig. 6 is that SSAV method of the present invention and other 17 existing best deep learning methods and conventional method exist
The significant illustrated example (17 same Fig. 3 of control methods) obtained in the test data set of DAVSOD;
Fig. 7 is that SSAAV method of the present invention and other 5 now best deep learning methods and conventional method exist
The significant illustrated example that there is conspicuousness object to shift is obtained in the test data set of DAVSOD.Wherein column (a) are video input frames
Frame (b) is attention viewpoint Fixation that the corresponding mankind observe that input frame leaves, is marked by hand with reference to figure
It is (d) notable figure that the method for the present invention SSAV is obtained as frame, (e)-(i) is followed successively by the notable figure that 5 methods of comparison obtain:
MBNM、FGRN、PDBM、SFLR、SAGM。
Specific embodiment
With reference to Fig. 1 and Fig. 2, specific implementation step of the invention is as follows:
A. static convolution network module: ResNet-50 neural network is utilized, to multiframe still imageCarry out feature
Extraction obtains one group of featureWherein, T indicates that the frame sum of input video, t indicate a wherein frame.Example exhibition in Fig. 2
Having shown 3 frame input pictures is respectively: It-1、It、It+1.One group of feature is obtained after ResNet-50 network are as follows: Qt-1、Qt、Qt+1。
B. pyramid expands convolution PDC module: using the feature extracted in step a as the input of the module, utilizing golden word
Tower expands convolution module and obtains Analysis On Multi-scale Features.Specifically, PDC module is made of K layers of parallel empty convolutional layer
Every layer of empty convolution respectively corresponds different expansion ratiosTo extract Analysis On Multi-scale Features vectorThis implementation
Using 4 layers of empty convolution in example, every layer of corresponding expansion ratio is respectively as follows: 2,4,8,16.For example, the spy obtained by a step
Sign Q obtains one group of feature { P for pyramid convolution algorithm is participated in1..., Pk..., PK, then merge to obtain one group with Q again
Analysis On Multi-scale Features:
X=[Q, P1..., Pk..., PK],
Wherein, X is the reinforcing feature extracted, and Q is the 3D characteristic tensor of I frame in a video,
Represent parallel operation.Multi-scale information can be obtained using pyramid expansion convolution module, extracts more robust feature.
C. attention transfer perception AtModule: based on shot and long term memory network convLSTM, on the network foundation
Add weight FAModule;The FAModule is made of, using it to step special design of the invention one group of simple convolutional layer stacking
The Analysis On Multi-scale Features that rapid b is extracted carry out weight distribution, to realize attention metastasis.
Above-mentioned attention transfer perception AtThe input of module is the Analysis On Multi-scale Features vector after PDC moduleIt is defeated
It is out two-dimensional map figure St∈ [0,1]W×H;Wherein, attention transfer perception AtThe processing of module is as follows:
Hidden state Ht=convLSTM (Xt, Ht-1)
Attention transfer perception At=FA({X1..., Xt})
Perception conversion GM, t=At⊙HM, t
The prediction of conspicuousness object
Assuming that the total length of input video is T frame, subscript t indicates that present frame, t-1 indicate previous frame.HtWork as 3D tensor
The hidden state at preceding moment, it passes through current input feature X by shot and long term memory network convLSTM ()tWith last moment
Amount state obtains.Weight FABy special design of the invention, it is stacked by one group of simple convolutional layer heap, is utilized to () module
One group of feature { X that step b is extracted1..., XtCarry out weight distribution;GM, tIndicating perception conversion, m ∈ M is channel index,
M is total number of channels, and ⊙ symbol is matrix element multiplication, HM, tIndicate the hidden state in the channel m at 3D tensor current time.wS
For the convolution kernel of 1 × 1 × M,For convolution operation, σ is a Sigmoid activation primitive.As shown in Figure 2,
ConvLSTM network is using one 3 × 3 × 32 convolution kernel.
D. it generates image result: convolution being carried out to the feature that step c is exported using the convolutional layer of a 1*1, recycles and swashs
Function live to judge which neuron is activated, to generate the conspicuousness subject image of the every frame of video;
E. it updates network: calculating the conspicuousness object figure that step d is generated using cross entropy cross entropy loss function
As the penalty values with the reference picture manually marked, carries out gradient passback, updates network.
The model finally obtained can be used to extract the obvious object with attention metastasis in any video.Institute
It is stating as follows with reference picture that is manually marking calculating loss function:
Wherein LAttAnd LVSODTo intersect entropy loss;L () is with the presence or absence of blinkpunkt notable figure Ft;MtManually to mark ginseng
Examine image.
Effect of the invention is further illustrated by following emulation experiment:
(1) experimental data set and simulated conditions
Test image used by this experiment includes Wang Wenguan et al. in the ViSal data set constructed, 2014 in 2015
Year, Jitendar professor Malik of Berkeley University of California, the U.S. organized the interior FBMS constructed, the science of Adobe company in 2016
Family Perazzi is published in the DAVIS of famous international computer vision and pattern-recognition meeting (CVPR), Beijing Space aviation section
VOS data set and model in skill university Li Jia group in building in 2018 step on the DAVSOD data that equality people announced in 2019
Collection.Wherein, ViSal data set is first data set exclusively for saliency object detection task design, it is contained
17 video sequences amount to 193 mark picture frames.FBMS data set is a classical data set earlier, is appointed for object segmentation
Business design, there are 59 videos to add up to 720 mark frames, is now widely used in saliency object detection task.DAVIS is
The data set of current first high quality mark, amounts to the picture frame of 3455 dense marks, there is 50 videos.Short 2 years
Time, the data set have been widely used.As for VOS data set, then be current data concentrate quantity maximum one, it by
200 video compositions, are labelled with 7467 picture frames.Nankai University's media computation laboratory in 2018 constructs one and works as previous existence
The largest DAVSOD data set of video in boundary, video sum are more than 200, and the picture frame number of mark has been more than current institute
There is the summation of data set mark frame to reach 23938.This experiment porch is InterE5-2676v3@2.4GHz×
24, video card is GTX TITAN XP.It is emulated using Python Caffe.
(2) saliency object detection Performance evaluation criterion
We are measured using maximum F value (max F), structure metric (S) and these three gold indexs of mean error (M)
The result of saliency object detection.
Mathematically F value is the harmonic-mean of precision and recall rate, comprehensive evaluation may be implemented, calculation formula is such as
Shown in lower:
β is to impart the higher status of accuracy to weight added by Precision.Contemporary literature widespread practice is
β is set2=0.3.Wherein, the calculation formula of Precision and Recall is as follows:
Precision Precision and recall rate Recall is made of a confusion matrix again, wherein in two-value decision problem
In, it is practical with reference to also for positive sample that TP expression is predicted as positive sample, and FP indicates to be predicted as that positive sample is practical to be referenced as negative sample, FN
It is also negative sample that expression, which is predicted as the practical reference of negative sample,.We carry out the result figure that detected with 256 different threshold values
Binaryzation, each thresholding can calculate a F value, and maximum F value is a maximum F in F value after 256 thresholdings
Value.
What structure metric (S) You Fandeng equality people proposed in 2017, for predictive metrics result and reference result
Architectural difference degree.It is by facing area SrWith object-oriented SoTwo levels combine measurement:
S=α * So+(1-α)*Sr
Wherein, identical weight is arranged in area metrics and object measurement by setting α=0.5.Specific formula for calculation can join
Examine original text: " Structure-measure:A New Way to Evaluate Foreground Maps.ICCV2017 ".
Average measurement error (Mae) is used to the mean absolute error between predictive metrics result and reference result, it is assumed that two
The reference result of system is a two-dimensional matrix G, and prediction result is also a two-dimensional matrix S:
Wherein, N is total number of pixels in image.Average measurement error is used to estimate the accuracy of Pixel-level, is using most
An extensive evaluation index.
Following table 1 give it is of the invention and it is current it is classical, state-of-the-art 17 control methods are most chosen at 5 in the world
The maximum F value (max obtained on the open test data set (ViSal, FBMS-T, DAVIS-T, VOS-T, DAVSOD-T) of war
F), structure metric (S), mean error (M).
Table 1
(3) experiment content
Experiment one
From the above table 1 as can be seen that SSAV method of the invention has apparent advantage with current 17 kinds of methods comparison,
Highest precision is all reached in 5 data sets, such as 3 indexs in ViSal, FBMS, DAVIS, VOS and DAVSOD.This is filled
It defends oneself and the validity and robustness of SSAV method of the present invention is illustrated.The above objective appraisal result quantitatively illustrates that the present invention exists
The advantage that saliency object is detected under various scenes, is also required to comment by the subjectivity of visual results other than numerical result
Valence.
Experiment two
In this experiment, we further illustrate test result representative on 4 data sets to illustrate this
The performance of inventive method.Wherein (a) in Fig. 3-Fig. 6 is 3 frame different images of the video of input, is (b) ginseng marked by hand
Picture frame is examined, is (c) notable figure that the method for the present invention SSAV is obtained, is (d) notable figure that PDBM method obtains, is (e) MBNM
The notable figure that method obtains is (f) notable figure that FGRN method obtains, is (g) notable figure that DLVS method obtains, is (h)
The notable figure that SCNN method obtains is (i) notable figure that SCOM method obtains, is (j) notable figure that SFLR method obtains, (k)
It is the notable figure that SGSP method obtains, is (l) notable figure that STBP method obtains, is (m) notable figure that MSTM method obtains,
(n) it is notable figure that GFVM method obtains, is (o) notable figure that SAGM method obtains, it is significant to be (p) that MB+M method obtains
Figure is (q) notable figure that RWRV method obtains, is (r) notable figure that SPVM method obtains, and it is aobvious to be (s) that TIMP method obtains
Figure is write, is (t) notable figure that SIVM method obtains.
From the point of view of the result of complex chart 3- Fig. 6, all very close reference image frame marked by hand of our method.And it compares
17 methods all have larger gap with reference picture.
Conspicuousness object transfer phenomena can be effectively coped in order to further verify the present invention, illustrates this in Fig. 7
One result.Wherein, (a) indicates 5 video frames in DAVSOD data set in a certain video, (b) indicates the attention viewpoint of people,
(c) it indicates the reference picture GT marked by hand, is (d) notable figure that this paper SSAV method obtains, is (e) that MBNM method obtains
Notable figure is (f) notable figure that FGRN method obtains, is (g) notable figure that PDBM method obtains, is (h) that SFLR method obtains
Notable figure, (i) be notable figure that SAGM method obtains.It can be seen from the figure that SSAV method of the present invention is several with respect to other
Classic method has obtained more satisfying result.It is existing that method of the invention can effectively capture conspicuousness transfer
As: [cat] → [cat, box] → [cat] → [box] → [cat, box].However it other methods or can not completely detect
It significant object (for example, SFLR and SAGM method) or only captures the cat of movement out and has ignored box (for example, MBNM
Method).
What the present embodiment was not described in detail partly belongs to the public known common sense in this field, does not repeat one by one here.With
The upper implementation network (ResNet-50 etc.) specifically used is used only for invention for example, being not to of the invention
The restriction of protection scope, design all and that the present invention is similar or identical all belong to the scope of protection of the present invention.
Claims (5)
1. a kind of saliency object detecting method based on attention metastasis, it is characterised in that this method includes as follows
Step:
A. static convolution network module: multilayer convolutional neural networks are utilized, feature extraction is carried out to multiframe still image;
B. pyramid expands convolution PDC module: using the feature extracted in step a as the input of the module, being expanded using pyramid
It opens convolution module and obtains Analysis On Multi-scale Features;
C. attention transfer perception AtModule: based on shot and long term memory network convLSTM, power is added on the network foundation
Weight FAModule, FAModule is made of one group of simple convolutional layer stacking, utilizes weight FAMore rulers that module extracts step b
It spends feature and carries out weight distribution, to realize attention transfer perception;
D. it generates image result: convolution being carried out to the feature that step c is exported using the convolutional layer of a 1*1, recycles activation letter
Number is to judge which neuron is activated, to generate the conspicuousness subject image of the every frame of video;
E. it updates network: calculating the conspicuousness subject image that step d is generated and the ginseng manually marked using cross entropy loss function
The penalty values of image are examined, gradient passback is carried out, updates network.
2. the saliency object detecting method according to claim 1 based on attention metastasis, feature exist
In: multilayer convolutional neural networks described in step a are made of different basic convolutional neural networks.
3. the saliency object detecting method according to claim 2 based on attention metastasis, feature exist
In: the basic convolutional neural networks include VGG-16 network, ResNet-50 network, ResNet-101 network and SE network.
4. the saliency object detecting method according to any one of claims 1 to 3 based on attention metastasis,
It is characterized by: the transfer perception of attention described in step c AtThe input of module be Analysis On Multi-scale Features after PDC module to
AmountOutput is two-dimensional map figure St∈[0,1]W×H, W is picture traverse, and H is picture altitude;Attention transfer perception
AtThe processing of module is as follows:
Hidden state Ht=convLSTM (Xt,Ht-1)
Attention transfer perception At=FA({X1,...,Xt})
Perception conversion Gm,t=At⊙Hm,t
The prediction of conspicuousness object
Where it is assumed that the total length of input video is T frame, subscript t indicates that present frame, t-1 indicate previous frame, HtWork as 3D tensor
The hidden state at preceding moment, it passes through current input feature X by shot and long term memory network convLSTM ()tWith last moment
Amount state obtains;Weight FA() module is stacked by one group of simple convolutional layer heap, is extracted using the weight module to step b
One group of feature { X1,...,XtCarry out weight distribution;Gm,tIndicate perception conversion, m ∈ M is channel index, and ⊙ symbol is matrix
Element multiplication, Hm,tIndicate the hidden state in the channel m at 3D tensor current time;wSFor the convolution kernel of 1 × 1 × M,
For convolution operation, σ is an activation primitive.
5. the saliency object detecting method according to any one of claims 1 to 3 based on attention metastasis,
It is characterized by: the function for calculating penalty values with artificial mark reference picture described in step e is as follows:
Wherein LAttAnd LVSODTo intersect entropy loss;L () is with the presence or absence of blinkpunkt notable figure Ft;Mt is artificial mark reference
Figure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910347420.XA CN110097115B (en) | 2019-04-28 | 2019-04-28 | Video salient object detection method based on attention transfer mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910347420.XA CN110097115B (en) | 2019-04-28 | 2019-04-28 | Video salient object detection method based on attention transfer mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110097115A true CN110097115A (en) | 2019-08-06 |
CN110097115B CN110097115B (en) | 2022-11-25 |
Family
ID=67446180
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910347420.XA Active CN110097115B (en) | 2019-04-28 | 2019-04-28 | Video salient object detection method based on attention transfer mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110097115B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929735A (en) * | 2019-10-17 | 2020-03-27 | 杭州电子科技大学 | Rapid significance detection method based on multi-scale feature attention mechanism |
CN111242003A (en) * | 2020-01-10 | 2020-06-05 | 南开大学 | Video salient object detection method based on multi-scale constrained self-attention mechanism |
CN111275694A (en) * | 2020-02-06 | 2020-06-12 | 电子科技大学 | Attention mechanism guided progressive division human body analytic model and method |
CN111340046A (en) * | 2020-02-18 | 2020-06-26 | 上海理工大学 | Visual saliency detection method based on feature pyramid network and channel attention |
CN111507215A (en) * | 2020-04-08 | 2020-08-07 | 常熟理工学院 | Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution |
CN111523410A (en) * | 2020-04-09 | 2020-08-11 | 哈尔滨工业大学 | Video saliency target detection method based on attention mechanism |
CN115276784A (en) * | 2022-07-26 | 2022-11-01 | 西安电子科技大学 | Deep learning-based orbital angular momentum modal identification method |
CN115359310A (en) * | 2022-07-08 | 2022-11-18 | 中国人民解放军国防科技大学 | SIC prediction method and system based on ConvLSTM and conditional random field |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101430689A (en) * | 2008-11-12 | 2009-05-13 | 哈尔滨工业大学 | Detection method for figure action in video |
US8363939B1 (en) * | 2006-10-06 | 2013-01-29 | Hrl Laboratories, Llc | Visual attention and segmentation system |
US20140270707A1 (en) * | 2013-03-15 | 2014-09-18 | Disney Enterprises, Inc. | Method and System for Detecting and Recognizing Social Interactions In a Video |
CN106127799A (en) * | 2016-06-16 | 2016-11-16 | 方玉明 | A kind of visual attention detection method for 3 D video |
WO2017155661A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
WO2018023734A1 (en) * | 2016-08-05 | 2018-02-08 | 深圳大学 | Significance testing method for 3d image |
CN108428238A (en) * | 2018-03-02 | 2018-08-21 | 南开大学 | A kind of detection method general based on the polymorphic type task of depth network |
CN109118459A (en) * | 2017-06-23 | 2019-01-01 | 南开大学 | Image significance object detection method and device |
CN109309834A (en) * | 2018-11-21 | 2019-02-05 | 北京航空航天大学 | Video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain |
-
2019
- 2019-04-28 CN CN201910347420.XA patent/CN110097115B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8363939B1 (en) * | 2006-10-06 | 2013-01-29 | Hrl Laboratories, Llc | Visual attention and segmentation system |
CN101430689A (en) * | 2008-11-12 | 2009-05-13 | 哈尔滨工业大学 | Detection method for figure action in video |
US20140270707A1 (en) * | 2013-03-15 | 2014-09-18 | Disney Enterprises, Inc. | Method and System for Detecting and Recognizing Social Interactions In a Video |
WO2017155661A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
US20170262995A1 (en) * | 2016-03-11 | 2017-09-14 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
CN106127799A (en) * | 2016-06-16 | 2016-11-16 | 方玉明 | A kind of visual attention detection method for 3 D video |
WO2018023734A1 (en) * | 2016-08-05 | 2018-02-08 | 深圳大学 | Significance testing method for 3d image |
CN109118459A (en) * | 2017-06-23 | 2019-01-01 | 南开大学 | Image significance object detection method and device |
CN108428238A (en) * | 2018-03-02 | 2018-08-21 | 南开大学 | A kind of detection method general based on the polymorphic type task of depth network |
CN109309834A (en) * | 2018-11-21 | 2019-02-05 | 北京航空航天大学 | Video-frequency compression method based on convolutional neural networks and the significant information of HEVC compression domain |
Non-Patent Citations (5)
Title |
---|
HONGMEI SONG等: "Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 * |
WENGUAN WANG等: "Revisiting Video Saliency: A Large-scale Benchmark and a New Model"", 《IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2018》 * |
张晴: "基于视觉注意的显著物体检测实验设计", 《实验室研究与探索》 * |
肖利梅等: "基于多尺度相位谱的显著性运动目标检测", 《兰州理工大学学报》 * |
胡春海等: "视觉显著性驱动的运动鱼体视频分割算法", 《燕山大学学报》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929735A (en) * | 2019-10-17 | 2020-03-27 | 杭州电子科技大学 | Rapid significance detection method based on multi-scale feature attention mechanism |
CN110929735B (en) * | 2019-10-17 | 2022-04-01 | 杭州电子科技大学 | Rapid significance detection method based on multi-scale feature attention mechanism |
CN111242003A (en) * | 2020-01-10 | 2020-06-05 | 南开大学 | Video salient object detection method based on multi-scale constrained self-attention mechanism |
CN111242003B (en) * | 2020-01-10 | 2022-05-27 | 南开大学 | Video salient object detection method based on multi-scale constrained self-attention mechanism |
CN111275694B (en) * | 2020-02-06 | 2020-10-23 | 电子科技大学 | Attention mechanism guided progressive human body division analysis system and method |
CN111275694A (en) * | 2020-02-06 | 2020-06-12 | 电子科技大学 | Attention mechanism guided progressive division human body analytic model and method |
CN111340046A (en) * | 2020-02-18 | 2020-06-26 | 上海理工大学 | Visual saliency detection method based on feature pyramid network and channel attention |
CN111507215A (en) * | 2020-04-08 | 2020-08-07 | 常熟理工学院 | Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution |
CN111523410A (en) * | 2020-04-09 | 2020-08-11 | 哈尔滨工业大学 | Video saliency target detection method based on attention mechanism |
CN111523410B (en) * | 2020-04-09 | 2022-08-26 | 哈尔滨工业大学 | Video saliency target detection method based on attention mechanism |
CN115359310A (en) * | 2022-07-08 | 2022-11-18 | 中国人民解放军国防科技大学 | SIC prediction method and system based on ConvLSTM and conditional random field |
CN115359310B (en) * | 2022-07-08 | 2023-09-01 | 中国人民解放军国防科技大学 | SIC prediction method and system based on ConvLSTM and conditional random field |
CN115276784A (en) * | 2022-07-26 | 2022-11-01 | 西安电子科技大学 | Deep learning-based orbital angular momentum modal identification method |
CN115276784B (en) * | 2022-07-26 | 2024-01-23 | 西安电子科技大学 | Deep learning-based orbital angular momentum modal identification method |
Also Published As
Publication number | Publication date |
---|---|
CN110097115B (en) | 2022-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097115A (en) | A kind of saliency object detecting method based on attention metastasis | |
Tao et al. | Smoke detection based on deep convolutional neural networks | |
CN109697435B (en) | People flow monitoring method and device, storage medium and equipment | |
CN106897670B (en) | Express violence sorting identification method based on computer vision | |
CN105608456B (en) | A kind of multi-direction Method for text detection based on full convolutional network | |
CN110532900A (en) | Facial expression recognizing method based on U-Net and LS-CNN | |
CN106952269B (en) | The reversible video foreground object sequence detection dividing method of neighbour and system | |
CN104732208B (en) | Video human Activity recognition method based on sparse subspace clustering | |
CN104392228B (en) | Unmanned plane image object class detection method based on conditional random field models | |
CN105160310A (en) | 3D (three-dimensional) convolutional neural network based human body behavior recognition method | |
CN108764142A (en) | Unmanned plane image forest Smoke Detection based on 3DCNN and sorting technique | |
CN105869173A (en) | Stereoscopic vision saliency detection method | |
CN103186775B (en) | Based on the human motion identification method of mix description | |
CN108921822A (en) | Image object method of counting based on convolutional neural networks | |
CN106845374A (en) | Pedestrian detection method and detection means based on deep learning | |
CN108805078A (en) | Video pedestrian based on pedestrian's average state recognition methods and system again | |
CN109559310A (en) | Power transmission and transformation inspection image quality evaluating method and system based on conspicuousness detection | |
CN112926453B (en) | Examination room cheating behavior analysis method based on motion feature enhancement and long-term time sequence modeling | |
CN113591968A (en) | Infrared weak and small target detection method based on asymmetric attention feature fusion | |
CN106570874A (en) | Image marking method combining local image constraint and overall target constraint | |
CN109376753A (en) | A kind of the three-dimensional space spectrum separation convolution depth network and construction method of dense connection | |
CN111582091B (en) | Pedestrian recognition method based on multi-branch convolutional neural network | |
CN105303163B (en) | A kind of method and detection device of target detection | |
CN107463954A (en) | A kind of template matches recognition methods for obscuring different spectrogram picture | |
CN110399882A (en) | A kind of character detecting method based on deformable convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |