CN104113789A

CN104113789A - On-line video abstraction generation method based on depth learning

Info

Publication number: CN104113789A
Application number: CN201410326406.9A
Authority: CN
Inventors: 李平; 俞俊; 李黎; 徐向华
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Huicui Intelligent Technology Co ltd
Priority date: 2014-07-10
Filing date: 2014-07-10
Publication date: 2014-10-22
Anticipated expiration: 2034-07-10
Also published as: CN104113789B

Abstract

The invention relates to an on-line video abstraction generation method based on depth learning. An original video is subjected to the following operation: 1) cutting the video uniformly into a group of small frame blocks, extracting statistical characteristics of each frame image and forming corresponding vectorization expressions; 2) pre-training video frame multilayer depth network and obtaining the nonlinearity expression of each frame; 3) selecting the front m frame blocks being as an initial concise video, and carrying out reconstruction on the concise video through a group sparse coding algorithm to obtain an initial dictionary and reconstruction coefficients; 4) updating depth network parameters according to the next frame block, carrying out reconstruction and reconstruction error calculation on the frame block, and adding the frame block to the concise video and updating the dictionary if the error is larger than a set threshold; and 5) processing new frame blocks till the end in sequence on line according to the step 4), and the updated concise video being generated video abstraction. With the method, latent high-level semantic information of the video can be excavated deeply, the video abstraction can be generated quickly, time of users is saved, and visual experience is improved.

Description

A kind of video frequency abstract based on degree of depth study generates method online

Technical field

The invention belongs to the technical field that video frequency abstract generates, particularly the video frequency abstract based on degree of depth study generates method online.

Background technology

In recent years, day by day universal along with portable sets such as digital camera, smart mobile phone, palmtop PCs, the quantity of all kinds of videos is blowout formula and increases.For example, intelligent transportation, safety monitoring, public security deploy troops on garrison duty etc. the video capture device of social key areas in a medium-sized city up to Ji Wan road, the video data that these equipment produce reaches PB level.For lock onto target personage or vehicle, the personnel such as traffic police need to expend a large amount of time and retrieve for examination the video flowing that tedium is monitored, and this has greatly affected work efficiency, is unfavorable for the establishment of safe city.Therefore, effectively choose the frame of video that comprises key message from tediously long video flowing, i.e. video summarization technique, has been subject to the extensive concern of academia and industrial quarters.

Traditional video summarization technique is mainly for edited structuring video, and as a film can be divided into multiple scenes, multiple plots that each scene is occurred by same place form, and each plot is made up of a series of smooth continuous frame of video again.Be different from the structuring videos such as traditional film, TV play, news report, monitor video is generally the destructuring video without montage, and this application that is video summarization technique brings larger challenge.

At present, main video frequency abstract field has based on key frame method, creates new images, frame of video piece, turns the technology such as natural language processing.Method based on key frame comprises the strategies such as plot rim detection, frame of video cluster, color histogram, action stability; Create some successive frames generations that new images utilization comprises important content, the method is easily subject to the fuzzy factors impact between different frame; Frame of video block method utilizes the technology such as scene rim detection, dialog analysis in structuring video to carry out cutting to original, forms short and small subject movies; Turn natural language processing and refer to that the captions that utilize in video and voice messaging are converted into video frequency abstract the technology of text snippet, this technology is not suitable for processing the monitor video without captions or sound.

Produce continuously a large amount of destructuring videos for intelligent transportation, the security protection key areas such as deploy to ensure effective monitoring and control of illegal activities, traditional video summarization method can not meet the application requirements of online processing stream-type video.For this reason, in the urgent need to processing online video flowing, can choose the video summarization method that comprises key content by efficiently and accurately again.

Summary of the invention

Concentrate and simplify tedious video flowing for online efficiently and accurately, to save the visual effect of user time augmented video content, the present invention proposes a kind of video frequency abstract based on degree of depth study and generate online method, the method comprises the following steps:

1, obtain after original video data, carry out following operation:

1) be one group little frame piece by even video cutting, each frame piece comprises multiframe, extracts the statistical nature of each two field picture, forms corresponding vectorization and represents;

2) pre-training video frame multilayer degree of depth network, obtains the non-linear expression of each frame;

3) choose front m frame piece for initially simplifying video, and by group sparse coding algorithm, it is reconstructed, obtain initial dictionary and reconstruction coefficients;

4) upgrade degree of depth network parameter according to next frame piece, this frame piece is reconstructed and calculates reconstructed error simultaneously, if error is greater than setting threshold, this frame piece is added and simplifies in video and upgrade dictionary;

5) according to step 4) process new frame piece until finish online successively, the video frequency abstract of simplifying video and be generation of renewal.

Further, described step 1) described in the statistical nature of the each two field picture of extraction form corresponding vectorization and represent, specifically:

1) establish original video and be evenly divided into n frame piece, each frame piece comprises t two field picture (as t=80), and each two field picture is scaled to unified pixel size and keeps original vertical-horizontal proportion;

2) extract each two field picture global characteristics and the yardstick invariant features conversion (SIFT:Scale-Invariant Feature Transform) such as color histogram, color moment, edge orientation histogram, Gabor wavelet transformation, local binary patterns, accelerate the local features such as robust features (SURF:Speeded Up Robust Feature);

3) order connects the above-mentioned characteristics of image of each frame, and formation dimension is n _fvectorization represent.

Further, described step 2) in pre-training video frame multilayer degree of depth network obtain the non-linear expression of each frame, specifically:

Utilize stacking denoising own coding device (SDA:Stacked Denoising Autoencoder) to train in advance multilayer degree of depth network (number of plies is less than 10);

A, at every one deck, each two field picture is proceeded as follows: first, be that the approach such as arbitrary value generate each frame noise image by adding less Gaussian noise, establishing input variable at random; Then, noise image is shone upon and is obtained its non-linear expression by own coding device (AE:Auto Encoder);

B, utilize stochastic gradient descent algorithm to adjust renewal to each layer parameter of degree of depth network;

Described step 3) in the group sparse coding algorithm that passes through be reconstructed initially simplifying video, specifically:

1) initially simplify video and form (m is less than 50 positive integer) by front m frame piece of original video, total n _init=m × t two field picture, X _kcorresponding k primitive frame piece; Obtaining corresponding non-linear table by pre-training degree of depth network is shown y _kthe non-linear expression of corresponding k frame piece;

2) establish initial dictionary D by n _dindividual atom composition, d _jcorresponding j atom; If reconstruction coefficients is C, the corresponding frame number of its element number, the atom number of the corresponding dictionary of its dimension, c _kcorresponding k frame piece coefficient, corresponding i two field picture;

3) utilize the group sparse coding target function of multiplier alternating direction implicit Optimal Regularization dictionary, can obtain respectively initial dictionary D and reconstruction coefficients C, solve

Wherein, symbol || || ₂represent the l of variable ₂normal form, regularization parameter λ is greater than 0 real number, function of many variables F (Y _k, C _k, D) be embodied as:

F (Y_{k}, C_{k}, D) = \frac{1}{2 n_{f}} Σ_{y_{i} &Element; Y_{k}, d_{j} &Element; D} {| | y_{i} - Σ_{j = 1}^{n_{d}} c_{j}^{i} d_{j} | |}_{2}^{2} + γ Σ_{j = 1}^{n_{d}} {| | c_{j} | |}_{2},

Wherein, parameter γ is greater than 0 real number, symbol in mathematical expression subrepresentation use dictionary D to be reconstructed i two field picture.The multiplier alternating direction implicit is here specially: first preset parameter D, makes above-mentioned target function become the convex function for parameters C; Then preset parameter C, makes above-mentioned target function become the convex function for parameter D, and iteration is alternately upgraded two parameters.

Described step 4) in upgrade degree of depth network parameter this frame piece is reconstructed and calculates reconstructed error according to next frame piece, specifically:

1) each two field picture of this frame piece is done as follows successively:

A. utilize the parameter of last one deck in online gradient descent algorithm renewal degree of depth neural net, i.e. weights W and side-play amount b;

B. utilize Back Propagation Algorithm to upgrade the parameter of other layers in degree of depth neural net;

2) upgrade the non-linear expression of each two field picture according to new parameter;

3), based on existing dictionary D, utilization group sparse coding is reconstructed and error of calculation ε present frame piece, to present frame piece X _knon-linear expression Y _kbe reconstructed, concrete steps are: first minimize function of many variables F (Y _k, C _k, D) and obtain optimum reconstruction coefficients then bring into section 1 in and calculate its value and be current reconstructed error ε.

Described step 4) if in error be greater than setting threshold and present frame piece added and simplifies in video and upgrade dictionary, specifically:

1) if to present frame piece X _knon-linear expression Y _kthe reconstructed error ε calculating is greater than setting threshold θ (getting empirical value), present frame piece is added and is simplified in video,

2) if the current video of simplifying in contain q frame piece, the non-linear expression set of two field picture of upgrading dictionary is use so upgrade dictionary D and solve target function

Wherein, parameter lambda is to be greater than 0 real number, for regulating the impact of regularization term.

The video frequency abstract the present invention proposes based on degree of depth study generates method online, its advantage is: utilize degree of depth study to excavate the high-level semantics features in video, make to organize the degree that sparse coding can better reflect dictionary reconstruct current video frame piece, thereby the frame of video piece of tool amount of information forms the video frequency abstract that comprises region-of-interest and key person's event; The video frequency abstract of simplifying is that user has saved a large amount of time, has strengthened the visual experience of key content simultaneously.

Brief description of the drawings

Fig. 1 is method flow diagram of the present invention.

Embodiment

With reference to accompanying drawing 1, further illustrate the present invention:

1, obtain after original video data, carry out following operation:

Step 1) described in the statistical nature of the each two field picture of extraction form corresponding vectorization and represent, specifically:

3) order connects the above-mentioned characteristics of image of each frame, and the vectorization that formation dimension is nf represents.

Step 2) in pre-training video frame multilayer degree of depth network obtain the non-linear expression of each frame, specifically:

Step 3) in the group sparse coding algorithm that passes through be reconstructed initially simplifying video, specifically:

F (Y_{k}, C_{k}, D) = \frac{1}{2 n_{f}} Σ_{y_{i} &Element; Y_{k}, d_{j} &Element; D} {| | y_{i} - Σ_{j = 1}^{n_{d}} c_{j}^{i} d_{j} | |}_{2}^{2} + γ Σ_{j = 1}^{n_{d}} {| | c_{j} | |}_{2},

Step 4) in upgrade degree of depth network parameter this frame piece is reconstructed and calculates reconstructed error according to next frame piece, specifically:

1) each two field picture of this frame piece is done as follows successively:

Step 4) if in error be greater than setting threshold and present frame piece added and simplifies in video and upgrade dictionary, specifically:

Claims

1. the video frequency abstract based on degree of depth study generates a method online, the method is characterized in that and obtains after original video, proceeds as follows:

2. the video frequency abstract based on degree of depth study as claimed in claim 1 generates method online, it is characterized in that: described step 1) in the statistical nature of the each two field picture of extraction form corresponding vectorization and represent, concrete steps are:

1.1) establish original video and be evenly divided into n frame piece, each frame piece comprises t two field picture, and each two field picture is scaled to unified pixel size and keeps original vertical-horizontal proportion;

1.2) extract global characteristics and the local feature of each two field picture;

Described global characteristics comprises color histogram, color moment, edge orientation histogram, Gabor wavelet transformation, local binary patterns;

Described local feature comprises: yardstick invariant features conversion SIFT, acceleration robust features SURF;

1.3) order connects the above-mentioned characteristics of image of each frame, and formation dimension is n _fvectorization represent.

3. the video frequency abstract based on degree of depth study as claimed in claim 1 generates method online, it is characterized in that: described step 2) in pre-training video frame multilayer degree of depth network obtain the non-linear expression of each frame, specifically utilize stacking denoising own coding device SDA to train in advance multilayer degree of depth network, comprising:

A, at every one deck, each two field picture is proceeded as follows: first, be that arbitrary value generates each frame noise image by adding Gaussian noise or establishing at random input variable; Then, noise image is shone upon and is obtained its non-linear expression by own coding device AE;

B, utilize stochastic gradient descent algorithm to adjust renewal to each layer parameter of degree of depth network.

4. the video frequency abstract based on degree of depth study as claimed in claim 1 generates method online, it is characterized in that: described step 3) in the group sparse coding algorithm that passes through be reconstructed initially simplifying video, concrete steps are:

3.1) initially simplify video and formed by front m frame piece of original video, total n _init=m × t two field picture, X _kcorresponding k primitive frame piece; Obtaining corresponding non-linear table by pre-training degree of depth network is shown y _kthe non-linear expression of corresponding k frame piece;

3.2) establish initial dictionary D by n _dindividual atom composition, d _jcorresponding j atom; If reconstruction coefficients is C, the corresponding frame number of its element number, the atom number of the corresponding dictionary of its dimension, c _kcorresponding k frame piece coefficient, corresponding i two field picture;

3.3) utilize the group sparse coding target function of multiplier alternating direction implicit Optimal Regularization dictionary, can obtain respectively initial dictionary D and reconstruction coefficients C, solve

F (Y_{k}, C_{k}, D) = \frac{1}{2 n_{f}} Σ_{y_{i} &Element; Y_{k}, d_{j} &Element; D} {| | y_{i} - Σ_{j = 1}^{n_{d}} c_{j}^{i} d_{j} | |}_{2}^{2} + γ Σ_{j = 1}^{n_{d}} {| | c_{j} | |}_{2},

Wherein, parameter γ is greater than 0 real number, symbol in mathematical expression subrepresentation use dictionary D to be reconstructed i two field picture; The multiplier alternating direction implicit is here specially: first preset parameter D, makes above-mentioned target function become the convex function for parameters C; Then preset parameter C, makes above-mentioned target function become the convex function for parameter D, and iteration is alternately upgraded two parameters.

5. the video frequency abstract based on degree of depth study as claimed in claim 1 generates method online, it is characterized in that: described step 4) in upgrade degree of depth network parameter and this frame piece be reconstructed and calculate reconstructed error according to next frame piece, concrete steps are:

4.1) each two field picture of this frame piece is done as follows successively:

4.1.1) utilize online gradient descent algorithm to upgrade the parameter of last one deck in degree of depth neural net, i.e. weights W and side-play amount b;

4.1.2) utilize Back Propagation Algorithm to upgrade the parameter of other layers in degree of depth neural net;

4.2) upgrade the non-linear expression of each two field picture according to new parameter;

4.3), based on existing dictionary D, utilization group sparse coding is reconstructed and error of calculation ε present frame piece, to present frame piece X _knon-linear expression Y _kbe reconstructed, be specially: first minimize function of many variables F (Y _k, C _k, D) and obtain optimum reconstruction coefficients then bring into section 1 in and calculate its value and be current reconstructed error ε.

6. the video frequency abstract based on degree of depth study as claimed in claim 1 generates method online, it is characterized in that: described step 4) if in error be greater than setting threshold and present frame piece added and simplifies in video and upgrade dictionary, specifically:

1) if to present frame piece X _knon-linear expression Y _kthe reconstructed error ε calculating is greater than setting threshold θ, present frame piece is added and is simplified in video,