CN106778571A - A kind of digital video feature extracting method based on deep neural network - Google Patents

A kind of digital video feature extracting method based on deep neural network Download PDF

Info

Publication number
CN106778571A
CN106778571A CN201611104658.2A CN201611104658A CN106778571A CN 106778571 A CN106778571 A CN 106778571A CN 201611104658 A CN201611104658 A CN 201611104658A CN 106778571 A CN106778571 A CN 106778571A
Authority
CN
China
Prior art keywords
video
training
neural network
deep neural
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611104658.2A
Other languages
Chinese (zh)
Other versions
CN106778571B (en
Inventor
李岳楠
陈学票
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201611104658.2A priority Critical patent/CN106778571B/en
Publication of CN106778571A publication Critical patent/CN106778571A/en
Application granted granted Critical
Publication of CN106778571B publication Critical patent/CN106778571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a kind of digital video feature extracting method based on deep neural network, the described method comprises the following steps:One denoising coding network of training realizes the Dimensionality Reduction to the initial descriptors of video, and condition generation model and encoder cascade are constituted into one group of basic characteristic extracting module;Multigroup characteristic extracting module is continuously trained, bottom-up stacking is done to gained module by training sequencing constitutes deep neural network;Training post processing network, is placed on the top of deep neural network, is used to optimize the robustness and distinction of video presentation symbol.Video feature extraction is that brief video presentation is accorded with by depth nerve net by this method, and video presentation symbol can realize the summaryization description to video-aware content, while having good robustness and distinction, be capable of achieving efficient, accurate video content recognition.

Description

A kind of digital video feature extracting method based on deep neural network
Technical field
Regarded the present invention relates to Signal and Information Processing technical field, more particularly to a kind of numeral based on deep neural network Frequency feature extracting method.
Background technology
Video data has that data volume is big relative to image data, data have sequential contact characteristic and data redundancy compared with Big the characteristics of.The management of video copyright protecting, video frequency searching and video dataization usually needs a kind of unique and extremely compact Descriptor as video content tab.The most straightforward procedure of generation video presentation symbol be it is independent from it is each represent frame in extract description Symbol, is cascaded the whole section of descriptor of video of composition.
Common methods have statistical method [1], brightness step method [2] and color correlation method [3].But this kind of method cannot Portray the temporal characteristicses of visual information.In order to realize the extraction to video space-time characteristic, document [4] adjacent block is along time and space Luminance difference on direction is accorded with as video presentation, and document [5] is accorded with using the track of characteristic point as video presentation.Additionally, three-dimensional Signal conversion [6], tensor resolution [7] and optical flow method [8] also all be used to be configured to the descriptor of reflecting video time-space attribute.
Inventor realize it is of the invention during, discovery at least has the following disadvantages and not enough in the prior art:
Existing feature extracting method has the shortcomings that redundancy is bigger than normal and sequential distortion is sensitive.And mostly rely on people Work is designed, but the feature extracting method of engineer is difficult to catch essential attribute of the video information on space-time direction.
The content of the invention
The invention provides a kind of digital video feature extracting method based on deep neural network, this method passes through depth Video feature extraction is brief video presentation symbol by nerve net, and video presentation symbol can be realized plucking video-aware content Change description, while having good robustness and distinction, be capable of achieving efficient, accurate video content recognition, it is as detailed below Description:
A kind of digital video feature extracting method based on deep neural network, the described method comprises the following steps:
One denoising coding network of training realizes the Dimensionality Reduction to the initial descriptors of video, by condition generation model and Encoder cascade constitutes one group of basic characteristic extracting module;
Multigroup characteristic extracting module is continuously trained, bottom-up stacking is done to gained module by training sequencing is constituted Deep neural network;
Training post processing network, is placed on the top of deep neural network, is used to optimize the robustness of video presentation symbol And distinction.
Wherein, methods described also includes:
Input video is pre-processed, the space-time connection of video content is expressed by condition generation model.
Wherein, it is described that input video is pre-processed, the space-time connection of video content is expressed by condition generation model The step of be specially:
LPF is done to video smooth and down-sampled, each frame picture size is compressed to and meets neural network input layer Size is needed, and regularization is done to the video after down-sampled, and the pixel average for making each frame is zero, and variance is 1;
By video data input condition Boltzmann machine (Conditional Restricted Boltzmann Machine, CRBM), each frame pixel of preprocessed video is set to the neuron of visible layer, CRBM networks are trained.
Wherein, described one denoising coding network of training realizes the Dimensionality Reduction to the initial descriptors of video, by condition The step of generation model and encoder cascade constitute one group of basic characteristic extracting module is specially:
Apply distortion to each training video and do pretreatment operation, distortion video as the input of CRBM is generated just Beginning descriptor, chooses the initial descriptors of multigroup original video and distortion video as training data, and one denoising of training is self-editing Code network;
Encoder E () obtained by training is stacked on CRBM, first group of characteristic extracting module is obtained.
Wherein, it is described continuously to train multigroup characteristic extracting module, bottom-up is done to gained module by training sequencing Stacking constitute deep neural network the step of be specially:
By the use of the output of features described above extraction module as training data, continue to train a pair of CRBM and encoder, use institute Obtain CRBM and encoder re-establishes second group of characteristic extracting module;
Train multiple CRBM and coder module successively, the training data of each module by previous module output group Into;
Modules are carried out bottom-up stacking by the sequencing according to training, form deep neural network.
Wherein, the training post processing network, is placed on the top of deep neural network, is used to optimize video presentation symbol Robustness and distinction the step of be specially:
It is that training video generates descriptor using the deep neural network being made up of K CRBM-E () module, passes through The cost function of postpositive disposal network is trained to be trained;
The post processing network is placed in the deep neural network top layer being made up of CRBM and encoder after completing training.
The beneficial effect of technical scheme that the present invention is provided is:
1st, the present invention extracts video features and is accorded with so as to generate video presentation by deep neural network, CRBM (Conditional Restricted Boltzmann Machine) network can portray the space-time essential attribute of video information;
2nd, autoencoder network can realize Data Reduction and the robustness lifting to descriptor, and post processing network can be overall Optimize the robustness and distinction of descriptor;
3rd, the present invention learns to obtain optimal feature extraction side without engineer's feature extracting method by training pattern Case;
4th, present procedure is simple, it is easy to accomplish, computation complexity is low.It is 3.2GHz in CPU frequency, inside saves as 32GB's Test result on computer shows that the time needed for the method for the invention calculates 500 frame video sequences is averagely only 1.52 Second.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the digital video feature extracting method based on deep neural network;
Fig. 2 is the schematic diagram of the limited Boltzmann machine structure of condition;
Fig. 3 is the schematic diagram of the deep neural network structure for video feature extraction.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, further is made to embodiment of the present invention below Ground is described in detail.
Embodiment 1
In order to realize brief and robust the description to video content, the embodiment of the present invention proposes a kind of based on depth god Through the digital video feature extracting method of network, referring to Fig. 1, the method is comprised the following steps:
101:One denoising coding network of training realizes the Dimensionality Reduction to the initial descriptors of video, and condition is generated into mould Type and encoder cascade constitute one group of basic characteristic extracting module;
102:Multigroup characteristic extracting module is continuously trained, bottom-up stacking is done to gained module by training sequencing Constitute deep neural network;
103:Training post processing network, is placed on the top of deep neural network, is used to optimize the Shandong of video presentation symbol Rod and distinction.
Wherein, before step 101, the method also includes:
Input video is pre-processed, the space-time connection of video content is expressed by condition generation model.
Wherein, it is above-mentioned to being pre-processed to input video, the space-time connection of video content is expressed by condition generation model The step of being is specially:
LPF is done to video smooth and down-sampled, each frame picture size is compressed to and meets neural network input layer Size is needed, and regularization is done to the video after down-sampled, and the pixel average for making each frame is zero, and variance is 1;
Video data is input into CRBM, each frame pixel of preprocessed video the neuron of visible layer is set to, to CRBM Network is trained.
Wherein, one denoising coding network of the training in step 101 realizes the Dimensionality Reduction to the initial descriptors of video, Condition generation model and encoder cascade are constituted into one group of basic characteristic extracting module to be specially:
Apply distortion to each training video and do pretreatment operation, distortion video as the input of CRBM is generated just Beginning descriptor, chooses the initial descriptors of multigroup original video and distortion video as training data, and one denoising of training is self-editing Code network;
Encoder E () obtained by training is stacked on CRBM, first group of characteristic extracting module is obtained.
Wherein, the multigroup characteristic extracting module of continuously training in step 102, is done certainly by training sequencing to gained module The upward stacking in bottom constitutes deep neural network and is specially:
By the use of the output of features described above extraction module as training data, continue to train a pair of CRBM and encoder, use institute Obtain CRBM and encoder re-establishes second group of characteristic extracting module;
Train multiple CRBM and coder module successively, the training data of each module by previous module output group Into;
Modules are carried out bottom-up stacking by the sequencing according to training, form deep neural network.
Wherein, the training post processing network in step 103, is placed on the top of deep neural network, is used to optimize and regards The robustness and distinction of frequency descriptor are specially:
It is that training video generates descriptor using the deep neural network being made up of K CRBM-E () module, passes through The cost function of postpositive disposal network is trained to be trained;
The post processing network is placed in the deep neural network top layer being made up of CRBM and encoder after completing training.
In sum, by video feature extraction it is that brief video presentation is accorded with by depth nerve net, video presentation symbol The summaryization description to video-aware content can be realized, while having good robustness and distinction, is capable of achieving efficient, accurate True video content recognition.
Embodiment 2
The scheme in embodiment 1 is described in detail with reference to specific accompanying drawing 2 and 3 and computing formula, is referred to It is described below:
201:Input video is pre-processed, the space-time connection between video content is expressed by condition generation model, and Generate the initial descriptors of video;
Wherein, the step 201 is specially:
1) in link is pre-processed, first by its video smoothing processing that input low pass filter carries out spatially per frame, In time to smoothing after video carry out it is down-sampled, will be finally normalized to per frame pixel average for 0, variance be 1.The present invention Embodiment is not particularly limited to low pass filter parameter.
2) with the limited Boltzmann machines of condition (Conditional Restricted Boltzmann Machine, CRBM) [9] generate the initial descriptors of video.CRBM can be modeled to each inter-frame statistics correlation of video, structure such as Fig. 2 institutes Show.The visible layer (i.e. video t frames) at current time is made to be expressed as vt, t-m frames are vt-m(m≥1).Current time hidden layer It is ht, it is seen that layer and hidden layer weight parameter are W, it is seen that layer is biased to a, and hidden layer is biased to b, it is seen that layer previous instant Weight parameter to current time is Ak, it is seen that layer previous instant is B to the weight parameter at hidden layer current timek
Concrete operations are as follows:
1st, it is V by size1×S1×F1Video (frame number is F1, each frame picture size is V1×S1) do low pass filtered popin It is sliding and down-sampled, each frame picture size is compressed to V2×S2, to meet neural network input layer size needs, to frame number F1 It is compressed to F2(F2=F1/ N, will substitute the N frames per the average value of N frames).It is V to down-sampled rear size2×S2×F2Regard Frequency does regularization, and the pixel average for making each frame is zero, and variance is 1.V is chosen in the example2=32, S2=32, F2=4.
2nd, video data is input into CRBM, the corresponding visible layer of t frames for making CRBM is vt∈R1024, will in the present embodiment Each frame pixel of preprocessed video is set to the neuron of visible layer.So the neuron number of visible layer is 1024.
Hidden layer t frames are ht, it is 300 that this example sets hidden layer neuron number.Visible layer in CRBM networks with Hidden layer weight parameter W ∈ R1024×300, it is seen that the biasing a ∈ R of layer1024, the biasing b ∈ R of hidden layer300, not in the same time between can See the weight parameter A of layerk∈R300×300, not in the same time between hidden layer weight parameter Bk∈R300×1024.Can be by minimizing Following cost function realizes the training to CRBM networks:
Wherein, LCRBMIt is the cost function of CRBM;p(vt|vt-1,...,vt-m) be in t-1 ..., the frame at t-m moment vt-1,...,vt-mUnder conditions of, present frame vtProbable value;E(vt,ht) it is energy function.
Wherein, k=1 ..., m are sequence number;M is the exponent number of CRBM;vt-kIt is the vector being made up of the pixel value of t-k frames;T It is transposition symbol.The value of method and m of the embodiment of the present invention to minimizing formula (1) is not limited.
This example chooses the exponent number m=3 of CRBM, and training video number is 500, is calculated using reverse conduction stochastic gradient descent Method minimizes cost function (1).
202:One denoising coding network of training realizes the Dimensionality Reduction to the initial descriptors of video, and condition is generated into mould Type and encoder cascade constitute one group of basic characteristic extracting module;
Wherein, the step 202 is specially:
1) apply distortion (compression, plus make an uproar, and rotation etc.) to each training video and do pretreatment operation, distortion is regarded Frequency generates initial descriptors as the input of CRBM.The initial descriptors of multigroup original video and distortion video are chosen as instruction Practice data, train a denoising autoencoder network (Denoising Autoencoder, DAE) [10];With it to by foregoing CRBM The video presentation symbol of generation carries out Dimensionality Reduction.Before training, generate original video with CRBM respectively and distortion video is (such as original Video through overcompression, plus the treatment such as make an uproar after version) descriptor, by n-th pair original and distortion video as a example by, make anRepresent former The descriptor of beginning video,Represent the descriptor of distortion video.Train DAE target be fromMiddle recovery an
By taking n-th pair of training data as an example, a is maden∈R300×4The descriptor of original video is represented,Represent distortion The descriptor of video.The cost function of denoising autoencoder network is:
Wherein, LDAEIt is the cost function of denoising autoencoder network;λDAEIt is weight decay term coefficient;Wi,j (l)It is network weight Weight, represents connection l layers of i-th neuron and the l+1 layers of weight of j neuron, and E () is encoder, and D () is decoding Device.
Cost function (2) is minimized using the stochastic gradient descent based on reverse conduction, optimal weights W is tried to achievei,j (l), it is complete Into training.Method and λ of the embodiment of the present invention to minimum formula (2)DAEValue be not limited.
The input layer and hidden layer of denoising autoencoder network are made up of 300 and 100 neurons respectively in this example, λDAE =10-5
2) encoder E () obtained by training is stacked on CRBM, obtains first group of characteristic extracting module, be expressed as {CRBM-E(·)}1.This feature extraction module is made up of three-layer neural network, and structure is 1024-300-100.
203:Multigroup characteristic extracting module is continuously trained, bottom-up is done to training gained module by training sequencing Stacking constitutes deep neural network;
Wherein, the step 203 is specially:
Using features described above extraction module { CRBM-E () }1Output as training data, continue according to above-mentioned steps A pair of CRBM and encoder are trained, second group of characteristic extracting module is re-established with gained CRBM and encoder, be expressed as {CRBM-E(·)}2.Repeat said process, train multiple CRBM and coder module successively, the training data of each module by The output composition of previous module.Modules are carried out bottom-up stacking by the sequencing according to training, form depth Neutral net.It is represented by by the deep neural network of K module composition:{CRBM-E(·)}1-{CRBM-E(·)}2-…- {CRBM-E(·)}K, as shown in Figure 3.The embodiment of the present invention is not particularly limited to the value of number of modules K.
The present embodiment uses K=2, i.e., illustrated using two groups of characteristic extracting modules.Using features described above extraction module {CRBM-E(·)}1Output as training data, continue to train a pair of CRBM and denoising encoder according to above-mentioned steps, use institute Obtain CRBM and encoder re-establishes second group of characteristic extracting module { CRBM-E () }2
In this example, the input layer and hidden layer neuron number of second group of CRBM are respectively 100 and 80, denoising own coding The input layer and hidden layer neuron number of device are respectively 80 and 50, therefore second group of structure of module for 100-80-50.By two Group module carries out bottom-up stacking, obtains the neutral net that structure is 1024-300-100-80-50.
204:Training post processing network, is placed on deep neural network top, is used to optimize the robust of video presentation symbol Property and distinction.
Wherein, the step 204 is specially:
1) using the deep neural network being made up of K CRBM-E () module for training video generates descriptor.With As a example by n-th pair of training data, (Vn,1,Vn,2,yn), wherein Vn,1And Vn,2It is two descriptors of training video, ynIt is label (yn =+1 two training videos of expression have identical vision content, yn=-1 two videos of expression have different vision contents).
It is the defined mapping of postpositive disposal network to make φ (), and L represents the number of plies (L of post processing network>1), then train The cost function of postpositive disposal network is as follows:
Wherein,It is network weight, constant λPostIt is weight decay term coefficient;Vn,1For first in n-th pair of training data The descriptor of individual video;Vn,2It is second descriptor of video.Cost function (3) is minimized, this is post-processed after completing training Unit is placed in the deep neural network top layer being made up of CRBM and encoder, as shown in Figure 3.The embodiment of the present invention is to minimum side Method and L, λPostValue after be not limited.
The deep neural network constituted using above-mentioned 2 CRBM-E () module is training video generation descriptor, by This composing training post-processes the sample of network.
Altogether identical to vision content by the n=4000 and different video of training set that this example is chosen to constituting, wherein, Video with identical vision content to by compression, plus make an uproar and the common distortion generation such as filter.
This example chooses the post processing network number of plies L=2, λPost=10-5, two-layer neuron number is respectively 40 and 30.It is logical Reverse conduction algorithmic minimizing cost function (3) is crossed, after completing training, the foregoing depth being made up of CRBM and encoder is placed on Network top, obtains the feature extraction network that structure is 1024-300-100-80-50-40-30.
In sum, by video feature extraction it is that brief video presentation is accorded with by depth nerve net, video presentation symbol The summaryization description to video-aware content can be realized, while having good robustness and distinction, is capable of achieving efficient, accurate True video content recognition.
Embodiment 3
Feasibility checking is carried out to the scheme in embodiment 1 and 2 with reference to experimental data, it is described below:
600 videos are chosen as test video, is that each video applies following distortion respectively:
1) XVid lossy compression methods, 320 × 240 are reduced to by the resolution ratio of original video, and frame per second is reduced to 25fps, bit rate drop It is 256kps;
2) medium filtering, filter size is from 10 pixels to 20 pixels;
3) Gaussian noise is added, variance yields is 0.1,0.5 or 1;
4) rotate, the anglec of rotation:2,5,10 degree;
5) histogram equalization, gray level number:16,32 or 64;
6) frame losing, frame losing percentage 25%;
7) picture scaling, scaling:0.2,4.
Pass sequentially through above-mentioned steps 1) to step 7) treatment, collectively generate 9600 sections of distortion videos.
It is that each distortion video and original video generate feature and describe with the deep neural network trained in embodiment 2 Symbol.It is inquiry video to choose each video one by one, and content recognition experiment is carried out on test library, precision ratio P is counted respectively, is recalled Rate R and F1Index.Wherein F1Index calculating method is as follows:
F1=2/ (1/P+1/R)
Test result shows, F1Index is 0.980, close to ideal value 1.Understand that built depth network can learn to tool There are the video features of good robustness and distinction, be capable of the essential perceptual property of reflecting video, have in content recognition experiment There is recognition accuracy higher.
Bibliography
[1]C.D.Roover,C.D.Vleeschouwer,F.Lefèbvre,and B.Macq,“Robust video hashing based on radial projections of key frames,”IEEE Trans.Signal Process.,vol.53,no.10,pp.4020-4037,Oct.2005.
[2]S.Lee and C.D.Yoo,“Robust video fingerprinting for content-based video identification,IEEE Trans.Circuits Syst.Video Technol.,vol.18,no.7, pp.983-988,Jul.2008.
[3]Y.Lei,W.Luo,Y.Wang and J.Huang,“Video sequence matching based on the invariance of color correlation,”IEEE Trans.Circuits Syst.Video Technol., vol.22,no.9,pp.1332-1343,Sept.2012.
[4]J.C.Oostveen,T.Kalker,and J.Haitsma,“Visual hashing of digital video:applications and techniques,”in Proc.SPIE Applications of Digital Image Processing XXIV,July 2001,vol.4472,pp.121-131.
[5]S.Satoh,M.Takimoto,and J.Adachi,“Scene duplicate detection from videos based on trajectories of feature points,”in Proc.Int.Workshop on Multimedia Information Retrieval,2007,237C244
[6]B.Coskun,B.Sankur,and N.Memon,“Spatio-temporal transform based video hashing,”IEEE Trans.Multimedia,vol.8,no.6,pp.1190–1208,Dec.2006.
[7]M.Li and V.Monga,“Robust video hashing via multilinear subspace projections,”IEEE Trans.Image Process.,vol.21,no.10,pp.4397–4409,Oct.2012.
[8]M.Li and V.Monga,“Twofold video hashing with automatic synchronization,”IEEE Trans.Inf.Forens.Sec.,vol.10,no.8,pp.1727-1738, Aug.2015.
[9]G.W.Taylor,G.E.Hinton,and S.T.Roweis,``Modeling human motion using binary latent variables,”in Proc.Advances in Neural Information Processing Systems,2007,vol.19.
[10]P.Vincent,H.Larochelle,I.Lajoie,Y.Bengio,P.A.Manzagol,Stacked denoising autoencoders:learning useful representations in a deep network with a local denoising criterion,"J Mach.Learn.Res.,vol.11,pp.3371-3408,Dec.2010.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for preferred embodiment, the embodiments of the present invention Sequence number is for illustration only, and the quality of embodiment is not represented.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all it is of the invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims (6)

1. a kind of digital video feature extracting method based on deep neural network, it is characterised in that methods described includes following Step:
One denoising coding network of training realizes the Dimensionality Reduction to the initial descriptors of video, by condition generation model and coding Device cascade constitutes one group of basic characteristic extracting module;
Multigroup characteristic extracting module is continuously trained, bottom-up stacking is done to gained module by training sequencing constitutes depth Neutral net;
Training post processing network, is placed on the top of deep neural network, is used to optimize robustness and the area of video presentation symbol Divide property.
2. a kind of digital video feature extracting method based on deep neural network according to claim 1, its feature exists In methods described also includes:
Input video is pre-processed, the space-time connection of video content is expressed by condition generation model.
3. a kind of digital video feature extracting method based on deep neural network according to claim 2, its feature exists In, it is described that input video is pre-processed, it is specific the step of the space-time connection of video content is expressed by condition generation model For:
LPF is done to video smooth and down-sampled, each frame picture size is compressed to and meets neural network input layer size Need, regularization is done to the video after down-sampled, the pixel average for making each frame is zero, and variance is 1;
By video data input condition Boltzmann machine, each frame pixel of preprocessed video is set to the neuron of visible layer, CRBM networks are trained.
4. a kind of digital video feature extracting method based on deep neural network according to claim 1, its feature exists In, one denoising coding network of the training realizes the Dimensionality Reduction to the initial descriptors of video, by condition generation model and The step of encoder cascade constitutes one group of basic characteristic extracting module is specially:
Apply distortion to each training video and do pretreatment operation, using distortion video as the input of CRBM, generate and initially retouch Symbol is stated, the initial descriptors of multigroup original video and distortion video is chosen as training data, a denoising own coding net is trained Network;
Encoder E () obtained by training is stacked on CRBM, first group of characteristic extracting module is obtained.
5. a kind of digital video feature extracting method based on deep neural network according to claim 1, its feature exists In, it is described continuously to train multigroup characteristic extracting module, bottom-up stacking is done to gained module by training sequencing and is constituted The step of deep neural network, is specially:
By the use of the output of features described above extraction module as training data, continue to train a pair of CRBM and encoder, with gained CRBM and encoder re-establish second group of characteristic extracting module;
Multiple CRBM and coder module are trained successively, and the training data of each module is made up of the output of previous module;
Modules are carried out bottom-up stacking by the sequencing according to training, form deep neural network.
6. a kind of digital video feature extracting method based on deep neural network according to claim 1, its feature exists In, the training post processing network is placed on the top of deep neural network, be used to optimize video presentation symbol robustness and The step of distinction, is specially:
It is that training video generates descriptor using the deep neural network being made up of K CRBM-E () module, by training The cost function of postpositive disposal network is trained;
After completing training, the post processing network is placed in the deep neural network top layer being made up of CRBM and encoder.
CN201611104658.2A 2016-12-05 2016-12-05 Digital video feature extraction method based on deep neural network Active CN106778571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611104658.2A CN106778571B (en) 2016-12-05 2016-12-05 Digital video feature extraction method based on deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611104658.2A CN106778571B (en) 2016-12-05 2016-12-05 Digital video feature extraction method based on deep neural network

Publications (2)

Publication Number Publication Date
CN106778571A true CN106778571A (en) 2017-05-31
CN106778571B CN106778571B (en) 2020-03-27

Family

ID=58878783

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611104658.2A Active CN106778571B (en) 2016-12-05 2016-12-05 Digital video feature extraction method based on deep neural network

Country Status (1)

Country Link
CN (1) CN106778571B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563391A (en) * 2017-09-06 2018-01-09 天津大学 A kind of digital picture feature extracting method based on expert model
CN108021927A (en) * 2017-11-07 2018-05-11 天津大学 A kind of method for extracting video fingerprints based on slow change visual signature
CN108874665A (en) * 2018-05-29 2018-11-23 百度在线网络技术(北京)有限公司 A kind of test result method of calibration, device, equipment and medium
CN108900888A (en) * 2018-06-15 2018-11-27 优酷网络技术(北京)有限公司 Control method for playing back and device
CN109857906A (en) * 2019-01-10 2019-06-07 天津大学 More video summarization methods of unsupervised deep learning based on inquiry
CN111291634A (en) * 2020-01-17 2020-06-16 西北工业大学 Unmanned aerial vehicle image target detection method based on convolution limited Boltzmann machine
CN111488932A (en) * 2020-04-10 2020-08-04 中国科学院大学 Self-supervision video time-space characterization learning method based on frame rate perception

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521671A (en) * 2011-11-29 2012-06-27 华北电力大学 Ultrashort-term wind power prediction method
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104268594A (en) * 2014-09-24 2015-01-07 中安消技术有限公司 Method and device for detecting video abnormal events
CN104900063A (en) * 2015-06-19 2015-09-09 中国科学院自动化研究所 Short distance driving time prediction method
CN105163121A (en) * 2015-08-24 2015-12-16 西安电子科技大学 Large-compression-ratio satellite remote sensing image compression method based on deep self-encoding network
CN106096568A (en) * 2016-06-21 2016-11-09 同济大学 A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521671A (en) * 2011-11-29 2012-06-27 华北电力大学 Ultrashort-term wind power prediction method
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104268594A (en) * 2014-09-24 2015-01-07 中安消技术有限公司 Method and device for detecting video abnormal events
CN104900063A (en) * 2015-06-19 2015-09-09 中国科学院自动化研究所 Short distance driving time prediction method
CN105163121A (en) * 2015-08-24 2015-12-16 西安电子科技大学 Large-compression-ratio satellite remote sensing image compression method based on deep self-encoding network
CN106096568A (en) * 2016-06-21 2016-11-09 同济大学 A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ADAM PASZKE 等: "ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation", 《ARXIV:1606.02147V1》 *
NOAH J. APTHORPE 等: "Automatic Neuron Detection in Calcium Imaging Data Using Convolutional Networks", 《ARXIV:1606.07372V1》 *
PASCAL VINCENT 等: "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion", 《JOURNAL OF MACHINE LEARNING RESEARCH》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563391A (en) * 2017-09-06 2018-01-09 天津大学 A kind of digital picture feature extracting method based on expert model
CN107563391B (en) * 2017-09-06 2020-12-15 天津大学 Digital image feature extraction method based on expert model
CN108021927A (en) * 2017-11-07 2018-05-11 天津大学 A kind of method for extracting video fingerprints based on slow change visual signature
CN108874665A (en) * 2018-05-29 2018-11-23 百度在线网络技术(北京)有限公司 A kind of test result method of calibration, device, equipment and medium
CN108900888A (en) * 2018-06-15 2018-11-27 优酷网络技术(北京)有限公司 Control method for playing back and device
CN109857906A (en) * 2019-01-10 2019-06-07 天津大学 More video summarization methods of unsupervised deep learning based on inquiry
CN109857906B (en) * 2019-01-10 2023-04-07 天津大学 Multi-video abstraction method based on query unsupervised deep learning
CN111291634A (en) * 2020-01-17 2020-06-16 西北工业大学 Unmanned aerial vehicle image target detection method based on convolution limited Boltzmann machine
CN111488932A (en) * 2020-04-10 2020-08-04 中国科学院大学 Self-supervision video time-space characterization learning method based on frame rate perception

Also Published As

Publication number Publication date
CN106778571B (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN106778571A (en) A kind of digital video feature extracting method based on deep neural network
Deng et al. Learning to predict crisp boundaries
Kim et al. Fully deep blind image quality predictor
Chen et al. Quaternion pseudo-Zernike moments combining both of RGB information and depth information for color image splicing detection
CN106096568A (en) A kind of pedestrian's recognition methods again based on CNN and convolution LSTM network
He et al. Frame-wise detection of relocated I-frames in double compressed H. 264 videos based on convolutional neural network
CN113592736B (en) Semi-supervised image deblurring method based on fused attention mechanism
CN111241963B (en) First person view video interactive behavior identification method based on interactive modeling
Ayoobkhan et al. Prediction-based Lossless Image Compression
CN114936605A (en) Knowledge distillation-based neural network training method, device and storage medium
CN112801068B (en) Video multi-target tracking and segmenting system and method
Selvaraj et al. Digital image steganalysis: A survey on paradigm shift from machine learning to deep learning based techniques
Shao et al. Generative image inpainting via edge structure and color aware fusion
WO2020043296A1 (en) Device and method for separating a picture into foreground and background using deep learning
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
Kohli et al. CNN based localisation of forged region in object‐based forgery for HD videos
CN114387641A (en) False video detection method and system based on multi-scale convolutional network and ViT
Naeem et al. T-VLAD: Temporal vector of locally aggregated descriptor for multiview human action recognition
CN111046213B (en) Knowledge base construction method based on image recognition
Majumder et al. A tale of a deep learning approach to image forgery detection
Thakur et al. Machine learning based saliency algorithm for image forgery classification and localization
CN112418127B (en) Video sequence coding and decoding method for video pedestrian re-identification
Xia et al. Abnormal event detection method in surveillance video based on temporal CNN and sparse optical flow
CN106570509B (en) A kind of dictionary learning and coding method for extracting digital picture feature
Bashir et al. Towards deep learning-based image steganalysis: Practices and open research issues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant