CN110334718A - A kind of two-dimensional video conspicuousness detection method based on shot and long term memory - Google Patents

A kind of two-dimensional video conspicuousness detection method based on shot and long term memory Download PDF

Info

Publication number
CN110334718A
CN110334718A CN201910614888.0A CN201910614888A CN110334718A CN 110334718 A CN110334718 A CN 110334718A CN 201910614888 A CN201910614888 A CN 201910614888A CN 110334718 A CN110334718 A CN 110334718A
Authority
CN
China
Prior art keywords
dimensional video
shot
term
long
temporal aspect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910614888.0A
Other languages
Chinese (zh)
Inventor
方玉明
黄汉秦
乐晨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201910614888.0A priority Critical patent/CN110334718A/en
Publication of CN110334718A publication Critical patent/CN110334718A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The present invention relates to a kind of two-dimensional video conspicuousness detection methods based on shot and long term memory, it is characterised in that: extracts short-term temporal aspect first with 3D convolutional network (3D-ConvNet);Secondly long-term temporal aspect is extracted using two-way shot and long term memory net (B-ConvLSTM);The short-term temporal aspect extracted and long-term temporal aspect are merged later;Saliency maps are obtained finally by fusion results deconvolution.The long-term and short-term temporal aspect of the models coupling, can effectively retain the operation information of well-marked target in video, two-dimensional video conspicuousness prognostic experiment the results show that the model proposed can obtain good detection effect.

Description

A kind of two-dimensional video conspicuousness detection method based on shot and long term memory
Technical field
The present invention relates to a kind of visual attention detection methods for detecting two-dimensional video conspicuousness, belong to multimedia technology neck Domain particularly belongs to digital picture and digital technical field of video processing, specially a kind of two-dimensional video based on shot and long term memory Conspicuousness detection method.
Background technique
Visual attention is critically important mechanism in visual perception, it can rapidly detect to show in natural image Information is write, when passing through observation of nature image, selective attention can be allowed by being absorbed in some specific significant information, and Ignore other not important information because of limit process resource.Substantially visual attention method can be divided into two kinds: the bottom of from It is upwards and top-down;Bottom-up processing is that independently lower automatic marking area detects for data-driven and task, and push up certainly to Lower method is to be related to the cognitive process of certain specific tasks.Conspicuousness detection model is intended to predict to infuse during people's landscape on the scene is seen Important area in meaning visual scene.There are many conspicuousness prediction techniques for the design of various visual tasks, such as scheme As segmentation, target detection, video frequency abstract, video compress etc..Most of existing conspicuousness detection models are for static image Design, and the research of saliency detection is restricted since complicated motion feature extracts.
Traditional conspicuousness detection model is mainly set about in terms of two, first is that low-level features, such as brightness, color, texture and Contrast;Followed by semantic information, such as face, personage and text.However, the method for these manual extractions can not comprehensively consider Various factors.In addition, the mode of manual extraction feature is also than relatively time-consuming.Early stage saliency detection model is by simply Current static conspicuousness detection model and additional temporal information is extended to design.It is special due to extracting movement by the way of light stream Sign, it has high computation complexity, therefore the applicability of these methods is restricted.Mould is detected for these saliencies Type, they mainly follow two steps: (1) extracting room and time information from video sequence to calculate separately significant spatial figure With time notable figure;(2) final space-time remarkable figure is calculated by convergence strategy combination significant spatial figure and time notable figure.
Itti et al. early stage describe it is a kind of extract still image conspicuousness method, by using multiple dimensioned center ring around The contrast of mechanism and intensity, color and direction predicts conspicuousness.Li et al. people by utilize provincial characteristics and image detail, A kind of conspicuousness method from bottom to top is devised, and optimizes image boundary to provide more accurate significant result.Sun etc. People establishes conspicuousness detection model using markov absorbing probability, in addition to common brightness, texture, color, direction etc. are rudimentary It further include the information such as image boundary information outside feature.Zhu et al. proposes boundary connectivity concept, indicates that image-region arrives The space layout of image boundary.Compared with the conspicuousness of still image, saliency prediction be must be taken into consideration in video sequence Motion information.Mahadevan et al., which passes through, combines perception movement grouping and center ring around mechanism, devises a kind of time and space significance Detection method.Liu et al. people proposes a kind of time and space significance prediction model based on super-pixel.Leboran et al. assumes consciousness Feature can use higher order statistical representation, to devise conspicuousness detection model.Fang et al. proposes a kind of by not Certainty weights to merge the video sequence conspicuousness method of spatial saliency and time conspicuousness.
With the rapid development of depth learning technology, there are many researchs establishes still image by deep learning Conspicuousness detection model.These saliency detection models have been demonstrated that in marking area extraction be effective.Video sequence The conspicuousness of column is compared with the conspicuousness of still image, and due to complicated temporal information, prediction has more challenge.Tran et al. is logical It crosses using Three dimensional convolution neural network (3D-CNN) and learns the space-time characteristic of video sequence, to introduce 3D ConvNet. Most of saliency researchs based on deep learning do not account for long term time information.In order to overcome above-mentioned described these Disadvantage, by devising a kind of new deep video conspicuousness detection network (DevsNet), by designing new 3D convolutional network (3D-ConvNet) and two-way convolution shot and long term memory network (B-ConvLSTM) carry out time and space significance prediction.Steps are as follows: Secondly building 3D convolutional network (3D-ConvNet) first utilizes two-way shot and long term memory network to extract short-term temporal aspect (B-ConvLSTM) long-term temporal aspect is obtained.By predicting final notable figure in conjunction with short-term and long-term temporal aspect. In short, the main contributions of the method proposed are as follows: (1) by devising a kind of new saliency detection model, respectively It is extracted using 3D convolutional network (3D-ConvNet) and two-way shot and long term memory network (B-ConvLSTM) short-term and long-term Temporal aspect.By combining in short term with long-term temporal aspect, the performance of prediction video saliency image can be improved.(2) by setting A kind of new double-layer double-direction shot and long term memory network (B-ConvLSTM) structure is counted, for extracting the length of saliency detection Phase temporal aspect.The B-ConvLSTM proposed not only can from previous video frame extracting time information, can also be from it Extracting time information in video frame afterwards, this shows proposed network while considering forward and backward temporal characteristics.
Above-mentioned to make referrals to visual attention model although and have very much, but visual attention model is in the research of two-dimensional video Still remain limitation.So needing to propose new method, Lai Gaishan two-dimensional video conspicuousness detection performance in this field.
Summary of the invention
In order to overcome at present for the limitation of the visual attention model research of two-dimensional video, the present invention is with regard to two-dimensional video Visual attention model propose a kind of new method, the feature of extraction includes short-term temporal aspect and long-term temporal aspect. It is short-term to extract wherein to be utilized respectively 3D convolutional network (3D-ConvNet) and two-way shot and long term memory network (B-ConvLSTM) With long-term temporal aspect;Obtained short-term and long-term temporal aspect is finally subjected to fusion and deconvolution obtains final show Write figure.
The present invention relates to a kind of two-dimensional video conspicuousness detection methods based on shot and long term memory, it is characterised in that: first Short-term temporal aspect is extracted using 3D convolutional network (3D-ConvNet);Secondly two-way shot and long term memory network (B- is used ConvLSTM long-term temporal aspect) is extracted;The short-term temporal aspect extracted and long-term temporal aspect are merged later;Most Saliency maps are obtained by fusion results deconvolution afterwards.The long-term and short-term temporal aspect of the models coupling, can be effective Retain video in well-marked target operation information, two-dimensional video conspicuousness prognostic experiment the results show that propose model can To obtain good detection effect.
The concrete operations of various pieces of the present invention are as follows:
The extraction of short-term temporal aspect:
Herein, the short-term temporal aspect of video, institute are extracted by devising a 3D convolutional network (3D-ConvNet) The kernel size for having 3D convolutional layer is 3*3*3, and each Conv3D layers of stride and filling size are 1*1*1.In these three layers Quantity using convolution kernel is 16,32 and 64, the quantity representative of convolution kernel generate channel number;By each pond layer, depending on Original half is reduced into the resolution sizes and time dimension of frequency frame, it means that it is short-term that 3D convolutional network only learns part Space-time characteristic, as being mentioned before.By also application batch normalization (BN) operation, by reducing internal covariant Amount offset is to accelerate depth network training.
The extraction of long-term temporal aspect:
By the way that the LSTM connected entirely (FC-LSTM) is extended to convolutional layer, as the shot and long term memory network by proposing (ConvLSTM), it and is utilized to capture effective space time information, internal structure turns from the state that is input to and state to state It changes.In the present invention, by extracting long-term temporal aspect as the input at network by using a series of video frames, these are special Sign is to be obtained by VGG16 parameter trained in advance from original video frame.
It is of the invention a little and technical effect:
Inventive algorithm is rationally efficient, propose a kind of short-term temporal aspect in novel method combination two-dimensional video and Long-term temporal aspect.By being utilized respectively 3D convolutional network (3D-ConvNet) and two-way shot and long term memory network (B- ConvLSTM) short-term and long-term temporal aspect is extracted.By combining in short term with long-term temporal aspect, prediction can be improved The performance of video saliency image.Robustness of the present invention is high, and evaluation index is all higher than algorithm best at present, and scalability is strong.
To achieve the goals above, the technical solution adopted by the present invention are as follows:
A kind of two-dimensional video conspicuousness detection method based on shot and long term memory, it is characterised in that the following steps are included:
Step 1: extract two-dimensional video frame in short-term temporal aspect, using 3D convolutional neural networks (3D-ConvNet) come It extracts;
Step 2: extracting the long-term temporal aspect in two-dimensional video frame, utilize two-way shot and long term memory network (B- ConvLSTM it) extracts;
Step 3: the short-term and long-term temporal aspect of extraction being merged, and deconvolution obtains and input video frame point Resolution notable figure of the same size.
Further, the short-term temporal aspect (motion information) in two-dimensional video frame described in step 1.
Further, 3D convolutional network is calculated according to (1):
hn=σ (∑ Wn*hn-1+bn) (1)
Wherein WnIndicate the 3D convolution nuclear parameter of (n-1) hidden layer;hn-1Represent (n-1) hidden layer;bnIt represents corresponding Bias term;Operator ' * ' represents convolution operation;σ represents activation primitive.
Further, we accelerate the training at network using batch standardization (BN algorithm), its calculation formula is by (2) It is shown:
Wherein E (x(k)) and Var (x(k)) respectively represent the expectation and variance of kth batch data;It represents after standardizing Result.
Further, the distribution of initial data may be changed after batch standardization (BN algorithm), in order to overcome this to ask Topic, is adjusted using formula (3):
Wherein γ(k)And β(k)It is corresponding adjustment parameter;Represent the result after standardization (BN);y(k)It represents and adjusts Result after whole.
Further, it is characterised in that: the long-term temporal aspect (motion information) in two-dimensional video frame described in step 2.
Further, by taking one layer of ConvLSTM as an example, each ConvLSTM unit calculation (4)-(8) are as follows:
It=σ (Wxi*Xt+Whi*Ht-1+bi) (4)
ft=σ (Wxf*Xt+Whf*Ht-1+bf) (5)
ot=σ (Wxo*Xt+Who*Ht-1+bo) (6)
Ct=ft o Ct-1+it o tanh(Wxc*Xt+Whc*Ht-1+bc) (7)
Ht=ot o tanh(Ct) (8)
Wherein σ and tanh respectively represents sigmoid activation primitive and tanh activation primitive;Wxi, Whi, Wxf, Whf, Wxo, Who, Wxc, WhcIt is the parameter of respective layer convolution kernel, bi, bf, bo, bcRepresent the bias term of respective layer;It, ft, otIt respectively represents The input gate of t-th of video frame forgets door and out gate;CtAnd HtIt is memory unit and hidden unit;' * ' indicates convolution operation; ' o ' indicates Hadamard (Hadamard) operation.
Further, the utilized loss function of training is calculated by formula (9)-(10):
Wherein yiIt represents training data and concentrates i-th label figure, and yi∈(y1, y2..., yN);N represents training number According to the total number of the image of collection;y′iI-th notable figure that representative model calculates;δ represents the parameter of whole network;δ ' expression Parameter after being optimized by Adam.
Further, the optimization algorithm Adam of each sub-network can be expressed by (11-14) formula:
mt1mt-1+(1-β1)gt (11)
Wherein mtAnd vtRespectively single order momentum term and second order momentum term;β1、β20.9 is usually taken respectively for power value size With 0.999;Respectively respective correction value;WtIndicate the t moment i.e. parameter of t iterative model;gt=Δ J (Wt) table Show gradient magnitude of the t iteration cost function about W;∈ is the number (generally 1e-8) of a value very little in order to avoid denominator It is 0.
Detailed description of the invention
Fig. 1 is algorithm frame figure of the invention.
Fig. 2 is the instance graph of different conspicuousness detection model algorithm comparisons.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description.Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art without creative labor it is obtained it is all its His embodiment, shall fall within the protection scope of the present invention.
Wherein, technical characteristic involved in this paper, write a Chinese character in simplified form/abridge, symbol etc., to recognize known in those skilled in the art It is explained based on knowing/being generally understood, definition/explanation.
Process of the invention is as shown in Figure 1, detailed process is as follows:
The present invention devises a 3D convolutional network (3D-ConvNet) to extract the short-term temporal aspect of video.All 3D volumes The kernel size of lamination is 3*3*3, and each Conv3D layers of stride and filling size are 1*1*1, and volume is used in these three layers The quantity of product core is 16,32 and 64, the quantity representative of convolution kernel generate channel number;By each pond layer, video frame Original half is reduced into resolution sizes and time dimension, it means that it is special that 3D convolutional network only learns the short-term space-time in part Sign, as previously mentioned.Herein also by application batch normalization (BN) operation, by reducing internal covariant offset To accelerate depth network training.By the way that the LSTM connected entirely (FC-LSTM) is extended to convolutional layer, as it is proposed that length Phase memory network (ConvLSTM), and be utilized to capture effective space time information, internal structure is from being input to state and state To the conversion of state.In the present invention, long-term temporal aspect is extracted as the input at network by using a series of video frames, What these were characterized in obtaining by VGG16 parameter trained in advance from original video frame.
Experiments have shown that two-dimensional video conspicuousness detection method proposed by the present invention is substantially better than current other methods.Mainly It is assessed by figure two and three kind of method: mean absolute error (MAE), under linearly dependent coefficient (PLCC) and ROC curve Area (AUC).ROC curve is widely used in the detection of visual attention model performance, by defining threshold value, vision note The Saliency maps of meaning power model are divided into significant point and non-significant point.The real class of TPR indicates mesh in visual attention model Punctuate is in the percentage for significant point, and the false positive class of FPR indicates that background dot is detected as significant point in visual attention model Percentage.And AUC is the area under ROC curve, energy better performance is assessed, and visual attention model is better, then it AUC value it is bigger;Linearly dependent coefficient (PLCC) is used to measure the linearly related degree between Saliency maps and bitmap, related Coefficient can be between 0 to 1, and related coefficient is bigger, then the performance of visual attention model is then better.MAE is used to measure prediction Difference between notable figure out and label image.MAE value is smaller to mean that the difference between the two maps is smaller, model Performance it is better.
A kind of two-dimensional video conspicuousness detection method based on shot and long term memory, it is characterised in that the following steps are included:
Step 1: extract two-dimensional video frame in short-term temporal aspect, using 3D convolutional neural networks (3D-ConvNet) come It extracts;
Step 2: extracting the long-term temporal aspect in two-dimensional video frame, utilize two-way shot and long term memory network (B- ConvLSTM it) extracts;
Step 3: the short-term and long-term temporal aspect of extraction being merged, and deconvolution obtains and input video frame point Resolution notable figure of the same size.
Wherein, the short-term temporal aspect (motion information) in two-dimensional video frame described in step 1.
Wherein, 3D convolutional network is calculated according to formula/algorithm (1):
hn=σ (∑ Wn*hn-1+bn) (1)
Wherein WnIndicate the 3D convolution nuclear parameter of (n-1) hidden layer;hn-1Represent (n-1) hidden layer;bnIt represents corresponding Bias term;Operator ' * ' represents convolution operation;σ represents activation primitive.
Wherein, accelerate the training at network using batch standardization (BN algorithm), its calculation formula is by shown in (2):
Wherein E (x(k)) and Var (x(k)) respectively represent the expectation and variance of kth batch data;It represents after standardizing Result.
Wherein, the distribution that may change initial data after batch standardization (BN algorithm) is adopted in order to overcome this problem It is adjusted with formula (3):
Wherein γ(k)And β(k)It is corresponding adjustment parameter;Represent the result after standardization (BN);y(k)It represents and adjusts Result after whole.
Wherein, it is characterised in that: the long-term temporal aspect (motion information) in two-dimensional video frame described in step 2.
Wherein, by taking one layer of ConvLSTM as an example, each ConvLSTM unit calculation (4-8) is as follows:
It=σ (Wxi*Xt+Whi*Ht-1+bi) (4)
ft=σ (Wxf*Xt+Whf*Ht-1+bf) (5)
ot=σ (Wxo*Xt+Who*Ht-1+bo) (6)
Ct=ft o Ct-1+it o tanh(Wxc*Xt+Whc*Ht-1+bc) (7)
Ht=ot o tanh(Ct) (8)
Wherein σ and tanh respectively represents sigmoid activation primitive and tanh activation primitive;Wxi, Whi, Wxf, Whf, Wxo, Who, Wxc, WhcIt is the parameter of respective layer convolution kernel, bi, bf, bo, bcRepresent the bias term of respective layer;It, ft, otIt respectively represents The input gate of t-th of video frame forgets door and out gate;CtAnd HtIt is memory unit and hidden unit;' * ' indicates convolution operation; ' o ' indicates Hadamard (Hadamard) operation.
Wherein, the utilized loss function of training is calculated by formula (9-10):
Wherein yiIt represents training data and concentrates i-th label figure, and yi∈(y1, y2..., yN);N represents training number According to the total number of the image of collection;y′iI-th notable figure that representative model calculates;δ represents the parameter of whole network;δ ' expression Parameter after being optimized by Adam.
Wherein, the optimization algorithm Adam of each sub-network can be expressed by (11-14) formula:
mt1mt-1+(1-β1)gt (11)
Wherein mtAnd vtRespectively single order momentum term and second order momentum term;β1、β20.9 is usually taken respectively for power value size With 0.999;Respectively respective correction value;WtIndicate the t moment i.e. parameter of t iterative model;gt=Δ J (Wt) table Show gradient magnitude of the t iteration cost function about W;∈ is the number (generally 1e-8) of a value very little in order to avoid denominator It is 0.
In Fig. 2: the comparison for different conspicuousness detection algorithms.First row to last column are respectively as follows: two-dimensional video frame Original video frame, reference picture, CE calculate notable figure, Fang calculate notable figure, Seo calculate notable figure, SAGE meter Notable figure, the lab diagram of the notable figure that MR is calculated and invention of notable figure, MC calculating that the notable figure of calculation, GAFL are calculated Picture.
From these relatively in, CE conspicuousness detection model can come out background detection, Fang conspicuousness detection model obscures Boundary, the Seo conspicuousness detection model background error detection for detecting target are prospect, SAGE conspicuousness detection model detects mesh more Mark, GAFL conspicuousness detection model missing inspection target, MC conspicuousness detection model edge are not clear enough, MR conspicuousness detection model is carried on the back Scape error detection is the experimental image of prospect and invention.It finally found that, conspicuousness detection method proposed by the present invention and existing The reference picture deposited is most close.
Table 1: the comparison of different conspicuousness detection models:
Above embodiment is the description of the invention, is not limitation of the invention, it is possible to understand that is not departing from this hair A variety of change, modification, replacement and modification, guarantor of the invention can be carried out to these embodiments in the case where bright principle and spirit Shield range is defined by the appended claims and the equivalents thereof.

Claims (9)

1. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory, which comprises the following steps:
Step 1: extracting the short-term temporal aspect in two-dimensional video frame, extracted using 3D convolutional neural networks 3D-ConvNet;
Step 2: extracting the long-term temporal aspect in two-dimensional video frame, mentioned using two-way shot and long term memory network B-ConvLSTM It takes;
Step 3: the short-term and long-term temporal aspect of extraction being merged, and deconvolution obtains and input video frame resolution ratio Notable figure of the same size.
2. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 1, feature exist In: the short-term temporal aspect in two-dimensional video frame described in step 1 includes motion information.
3. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 2, feature exist In: 3D convolutional network is calculated according to formula (1):
hn=σ (∑ Wn*hn-1+bn) (1)
Wherein WnIndicate the 3D convolution nuclear parameter of the (n-1)th hidden layer;hn-1Represent the (n-1)th hidden layer;bnRepresent corresponding bias term; Operator ' * ' represents convolution operation;σ represents activation primitive.
4. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 2, feature exist In: accelerate the training at network using batch standardization BN algorithm, its calculating is by shown in formula (2):
Wherein E (x(k)) and Var (x(k)) respectively represent the expectation and variance of kth batch data;Represent the knot after standardization Fruit.
5. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 4, feature exist In: batch standardization BN algorithm after, for change initial data distribution, adjusted using formula (3):
Wherein γ(k)And β(k)It is corresponding adjustment parameter;Represent the result after standardization BN algorithm;y(k)Represent adjustment Result afterwards.
6. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 1, feature exist In: the long-term temporal aspect in two-dimensional video frame described in step 2 includes motion information.
7. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 6, feature exist In: in one layer of ConvLSTM, shown in each ConvLSTM unit calculation such as formula (4)-(8):
It=σ (Wxi*Xt+Whi*Ht-1+bi) (4)
ft=σ (Wxf*Xt+Whf*Ht-1+bf) (5)
ot=σ (Wxo*Xt+Who*Ht-1+bo) (6)
Ct=ft o Ct-1+it o tanh(Wxc*Xt+Whc*Ht-1+bc) (7)
Ht=ot o tanh(Ct) (8)
Wherein σ and tanh respectively represents sigmoid activation primitive and tanh activation primitive;Wxi, Whi, Wxf, Whf, Wxo, Who, Wxc, WhcIt is the parameter of respective layer convolution kernel, bi, bf, bo, bcRepresent the bias term of respective layer;It, ft, otRespectively represent t-th of view The input gate of frequency frame forgets door and out gate;CtAnd HtIt is memory unit and hidden unit;' * ' indicates convolution operation;' o ' is indicated Hadamard operation.
8. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 1, feature exist In: the utilized loss function of training is calculated by formula (9)-(10):
Wherein yiIt represents training data and concentrates i-th label figure, and yi∈(y1, y2..., yN);N represents training dataset The total number of image;y′iI-th notable figure that representative model calculates;δ represents the parameter of whole network;δ ' expression passes through Parameter after Adam optimization.
9. a kind of two-dimensional video conspicuousness detection method based on shot and long term memory according to claim 8, feature exist In: the optimization algorithm Adam of each sub-network is expressed by formula (11)-(14):
mt1mt-1+(1-β1)gt (11)
Wherein mtAnd vtRespectively single order momentum term and second order momentum term;β1、β2For power value size, 0.9 and 0.999 are taken respectively;Respectively respective correction value;WtIndicate the t moment i.e. parameter of t iterative model;gt=Δ J (Wt) indicate t iteration Gradient magnitude of the cost function about W;∈ is the number of a value very little, ∈=1e-8.
CN201910614888.0A 2019-07-09 2019-07-09 A kind of two-dimensional video conspicuousness detection method based on shot and long term memory Pending CN110334718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910614888.0A CN110334718A (en) 2019-07-09 2019-07-09 A kind of two-dimensional video conspicuousness detection method based on shot and long term memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910614888.0A CN110334718A (en) 2019-07-09 2019-07-09 A kind of two-dimensional video conspicuousness detection method based on shot and long term memory

Publications (1)

Publication Number Publication Date
CN110334718A true CN110334718A (en) 2019-10-15

Family

ID=68143390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910614888.0A Pending CN110334718A (en) 2019-07-09 2019-07-09 A kind of two-dimensional video conspicuousness detection method based on shot and long term memory

Country Status (1)

Country Link
CN (1) CN110334718A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111008939A (en) * 2019-11-27 2020-04-14 温州大学 Neural network video deblurring method based on controllable feature space
CN111091045A (en) * 2019-10-25 2020-05-01 重庆邮电大学 Sign language identification method based on space-time attention mechanism
CN111507215A (en) * 2020-04-08 2020-08-07 常熟理工学院 Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution
CN111523410A (en) * 2020-04-09 2020-08-11 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN112101382A (en) * 2020-09-11 2020-12-18 北京航空航天大学 Space-time combined model and video significance prediction method based on space-time combined model
CN113298154A (en) * 2021-05-27 2021-08-24 安徽大学 RGB-D image salient target detection method
CN114979801A (en) * 2022-05-10 2022-08-30 上海大学 Dynamic video abstraction algorithm and system based on bidirectional convolution long-short term memory network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273800A (en) * 2017-05-17 2017-10-20 大连理工大学 A kind of action identification method of the convolution recurrent neural network based on attention mechanism
CN107451552A (en) * 2017-07-25 2017-12-08 北京联合大学 A kind of gesture identification method based on 3D CNN and convolution LSTM
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN108681712A (en) * 2018-05-17 2018-10-19 北京工业大学 A kind of Basketball Match Context event recognition methods of fusion domain knowledge and multistage depth characteristic
CN108960261A (en) * 2018-07-25 2018-12-07 扬州万方电子技术有限责任公司 A kind of obvious object detection method based on attention mechanism
CN109064507A (en) * 2018-08-21 2018-12-21 北京大学深圳研究生院 A kind of flow depth degree convolutional network model method of doing more physical exercises for video estimation
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018028255A1 (en) * 2016-08-11 2018-02-15 深圳市未来媒体技术研究院 Image saliency detection method based on adversarial network
CN107273800A (en) * 2017-05-17 2017-10-20 大连理工大学 A kind of action identification method of the convolution recurrent neural network based on attention mechanism
CN107451552A (en) * 2017-07-25 2017-12-08 北京联合大学 A kind of gesture identification method based on 3D CNN and convolution LSTM
CN108681712A (en) * 2018-05-17 2018-10-19 北京工业大学 A kind of Basketball Match Context event recognition methods of fusion domain knowledge and multistage depth characteristic
CN108960261A (en) * 2018-07-25 2018-12-07 扬州万方电子技术有限责任公司 A kind of obvious object detection method based on attention mechanism
CN109064507A (en) * 2018-08-21 2018-12-21 北京大学深圳研究生院 A kind of flow depth degree convolutional network model method of doing more physical exercises for video estimation
CN109376611A (en) * 2018-09-27 2019-02-22 方玉明 A kind of saliency detection method based on 3D convolutional neural networks

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091045A (en) * 2019-10-25 2020-05-01 重庆邮电大学 Sign language identification method based on space-time attention mechanism
CN111091045B (en) * 2019-10-25 2022-08-23 重庆邮电大学 Sign language identification method based on space-time attention mechanism
CN111008939A (en) * 2019-11-27 2020-04-14 温州大学 Neural network video deblurring method based on controllable feature space
CN111008939B (en) * 2019-11-27 2022-04-05 温州大学 Neural network video deblurring method based on controllable feature space
CN111507215A (en) * 2020-04-08 2020-08-07 常熟理工学院 Video target segmentation method based on space-time convolution cyclic neural network and cavity convolution
CN111523410A (en) * 2020-04-09 2020-08-11 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN111523410B (en) * 2020-04-09 2022-08-26 哈尔滨工业大学 Video saliency target detection method based on attention mechanism
CN112101382A (en) * 2020-09-11 2020-12-18 北京航空航天大学 Space-time combined model and video significance prediction method based on space-time combined model
CN112101382B (en) * 2020-09-11 2022-10-14 北京航空航天大学 Space-time combined model and video significance prediction method based on space-time combined model
CN113298154A (en) * 2021-05-27 2021-08-24 安徽大学 RGB-D image salient target detection method
CN113298154B (en) * 2021-05-27 2022-11-11 安徽大学 RGB-D image salient object detection method
CN114979801A (en) * 2022-05-10 2022-08-30 上海大学 Dynamic video abstraction algorithm and system based on bidirectional convolution long-short term memory network

Similar Documents

Publication Publication Date Title
CN110334718A (en) A kind of two-dimensional video conspicuousness detection method based on shot and long term memory
CN111126472B (en) SSD (solid State disk) -based improved target detection method
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN106157319B (en) The conspicuousness detection method in region and Pixel-level fusion based on convolutional neural networks
CN106447658B (en) Conspicuousness object detection method based on global and local convolutional network
Li et al. Lstm-cf: Unifying context modeling and fusion with lstms for rgb-d scene labeling
CN109461157A (en) Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field
CN110334589B (en) High-time-sequence 3D neural network action identification method based on hole convolution
CN109919122A (en) A kind of timing behavioral value method based on 3D human body key point
CN110428428A (en) A kind of image, semantic dividing method, electronic equipment and readable storage medium storing program for executing
CN108829677A (en) A kind of image header automatic generation method based on multi-modal attention
CN110796026A (en) Pedestrian re-identification method based on global feature stitching
CN108596240B (en) Image semantic segmentation method based on discriminant feature network
CN104700100A (en) Feature extraction method for high spatial resolution remote sensing big data
CN107506792A (en) A kind of semi-supervised notable method for checking object
CN110852199A (en) Foreground extraction method based on double-frame coding and decoding model
CN112418032A (en) Human behavior recognition method and device, electronic equipment and storage medium
CN112507904A (en) Real-time classroom human body posture detection method based on multi-scale features
CN109657082A (en) Remote sensing images multi-tag search method and system based on full convolutional neural networks
CN102938153B (en) Video image splitting method based on restrain spectral clustering and markov random field
CN109993151A (en) A kind of 3 D video visual attention detection method based on the full convolutional network of multimode
Qin et al. Application of video scene semantic recognition technology in smart video
CN116485860A (en) Monocular depth prediction algorithm based on multi-scale progressive interaction and aggregation cross attention features
Rao et al. Roads detection of aerial image with FCN-CRF model
Zhao et al. Object detection based on multi-channel deep CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination