CN104156464B

CN104156464B - Micro- video retrieval method and device based on micro- video frequency feature data storehouse

Info

Publication number: CN104156464B
Application number: CN201410416334.7A
Authority: CN
Inventors: 陈芋文; 张矩; 钟坤华; 刘磊锋
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2018-04-27
Anticipated expiration: 2034-08-20
Also published as: CN104156464A

Abstract

The present invention provides a kind of micro- video retrieval method and device based on micro- video frequency feature data storehouse, it is mainly used for realizing the quick-searching of the micro- video of magnanimity, first, the key frame of pretreatment extraction video is carried out to micro- video, video frame is formed and marks associating per frame and micro- video；Then feature extraction is carried out by depth e-learning using video frame as the input of own coding neutral net, forms binary code storehouse of micro- video per frame；The K nearest neighbor algorithms for being finally based on Hamming distance are retrieved.Relative to the prior art, present invention employs the artificial interference process that depth learning technology avoids video feature extraction, fast and effectively micro- video can be retrieved.

Description

Micro- video retrieval method and device based on micro- video frequency feature data storehouse

Technical field

The present invention relates to a kind of information retrieval field, be primarily referred to as database server foundation and database in information Read and sort, more particularly to the search method and device of a kind of video image.

Background technology

With the continuous lifting of mobile communication technology and terminal hardware level, mobile Internet is rapidly developed, mutually Networking sweeps the spring breeze reformed for a moment.Under the background that mobile Internet and smart mobile phone are gradually popularized, micro- video enters once again The sight of people, in the market start various Related products occur, and entrepreneur and investment people have also turned one's attention to this gradually Field.The raising of smart machine computing capability caused mobile phone terminal shooting, editor and create video come true, microblogging and The social activity product such as types of facial makeup in Beijing operas also cultivate with the demand that excites people and share and exchange, thus generate a large amount of length in 30 seconds UGC (User Generated Content, that is to say user-generated content) micro- video, in face of these magnanimity and also comprising big Measuring unstructured micro- video data progress quick-searching becomes urgent demand.

In recent years, researchers propose many interesting ways in terms of video feature extraction and retrieval.Although this A little research methods have certain effect in feature extraction and retrieval, but its calculating process is generally more complicated, and feature extraction needs Artificial interference, more complicated feature extracting method also bring the difficulty of retrieval.In addition, existing video feature extraction and retrieval Method be not also directed to micro- video it is peculiar the characteristics of carry out feature extraction and retrieval.

In conclusion how to carry out more rapidly effective micro- video frequency searching, have become and studied based on content information retrieval One of important topic for solving is needed in field.

The content of the invention

In view of the foregoing deficiencies of prior art, it is an object of the invention to provide one kind to be based on micro- video frequency feature data The micro- video retrieval method and device in storehouse, for solving not being directed to what micro- video was fast and effectively retrieved in the prior art Problem.

In order to achieve the above objects and other related objects, the present invention provides following technical scheme:

A kind of picture frame in the method for building up in micro- video frequency feature data storehouse, including the micro- video of extraction, and by described image Frame is associated with micro- video；Described image frame is normalized to obtain normalized view data；Described image data are made to make Autoencoder network pre-training is carried out for input, to obtain the weighting parameter and offset parameter in the autoencoder network per layer network And it is unfolded to connect into the autoencoder network of a completion；The autoencoder network of the completion is carried out using BP neural network Fine setting, is converted into binary code by the output data in the intermediate layer of the autoencoder network of the completion after fine setting and is deposited Storage.

Preferably, in the method for building up in above-mentioned micro- video frequency feature data storehouse, described image frame is included with a predetermined interval The multiple images set of micro- one frame of video described in frequency abstraction, and described multiple images set with micro- video with many-to-one Mapping mode is associated.Preferably, in the method for building up in above-mentioned micro- video frequency feature data storehouse, the side of described image frame is normalized Method includes：Make described image frame carry out picture smoothing processing, obtain denoising image；Calculate the average value of the denoising image；Meter Calculate the standard variance of the denoising image；The denoising image is made to subtract the average value and again divided by institute of the denoising image The standard variance of denoising image is stated, obtains normalized view data.

Preferably, in the method for building up in above-mentioned micro- video frequency feature data storehouse, described image data are made to carry out own coding net The method of network pre-training is：The input for making autoencoder network first layer is 3027 visible elements units, and makes the hidden layer be 8192 concealed nodes units；Make hiding for all remaining limited Boltzmann machines for being connected to each layer of the autoencoder network Layer concealed nodes unit is N number of, and makes visible layer visible elements unit as 2N；Initialize each layer described limited Bohr hereby Graceful machine weight is a random real number, is biased to zero；Described image data are made to be learned in each layer limited Boltzmann machine Practise, and the learning efficiency of every layer of Boltzmann machine is 0.001.

Preferably, in the method for building up in above-mentioned micro- video frequency feature data storehouse, make BP neural network to the completion from The method that coding network is finely adjusted is：Feedovered using forward calculation network to the autoencoder network of the completion, with To the data of the intermediate layer output of the autoencoder network of the completion；And the intermediate layer is exported using feedback modifiers network Data are fed back, to correct the data of the intermediate layer output；By the own coding of the completion after being feedovered and being fed back The data of the intermediate layer output of net are converted to binary code；And the binary code is stored.

In addition, present invention also offers a kind of micro- video method for quickly retrieving based on micro- video frequency feature data storehouse, it is described Search method includes：The video frame of micro- video to be checked is extracted, and normalizes the video frame to obtain normalized view data； Make described image data as input to carry out the deep learning of the autoencoder network, with extract the two of the video frame into Code processed；The binary code of the video frame and the binary code in micro- video frequency feature data storehouse is made to carry out Hamming distance calculating, and Ascending sequence is carried out to the video in micro- video frequency feature data storehouse according to the distance value after calculating and is exported.

In addition, present invention also offers a kind of micro- video quick-searching device, including：Video frame extraction module, for carrying The video frame of micro- video to be checked is taken, and the video frame is associated with micro- video to be checked；Video frame pretreatment module, is used In normalizing the video frame to obtain normalized view data；Characteristic extracting module, for making described image data conduct Input is to carry out the deep learning of autoencoder network, to extract the binary code of the video frame；Module is retrieved, for calculating The Hamming distance of binary code in the binary code of the video frame and micro- video frequency feature data storehouse, and according to after calculating away from Ascending sequence is carried out to the video in micro- video frequency feature data storehouse from value and is exported.

Finally, present invention also offers a kind of device of establishing in micro- video frequency feature data storehouse, including：Micro- video pictures extraction Module, for extracting the picture frame in micro- video, and described image frame is associated with micro- video；Picture frame pre-processes mould Block, for normalizing described image frame to obtain normalized view data；Own coding pre-training module, for making described image Data as input carry out autoencoder network pre-training, with obtain in the autoencoder network per layer network weighting parameter and partially Put parameter and be unfolded to connect into the autoencoder network of a completion；Autoencoder network finely tunes module, for using BP nerves Network is finely adjusted the autoencoder network of the completion, and the autoencoder network of the completion is carried out using forward calculation network Feedforward, to obtain the data of the intermediate layer of the autoencoder network of completion output；And using feedback modifiers network in described The data of interbed output are fed back, to correct the data of the intermediate layer output；Own coding characteristic extracting module, for will be into The data of the intermediate layer output of the autoencoder network of the completion after row feedforward and feedback are converted to binary code；And by described in Binary code is stored.

In conclusion micro- video retrieval method and device provided by the invention based on micro- video frequency feature data storehouse, mainly It is used for realization the quick-searching of the micro- video of magnanimity.First, the key frame of pretreatment extraction video is carried out to micro- video, forms video Frame simultaneously marks associating per frame and micro- video；Then depth network science is passed through using video frame as the input of own coding neutral net Practise and carry out feature extraction, form binary code storehouse of micro- video per frame；The K- nearest neighbor algorithms for being finally based on Hamming distance are examined Rope.Relative to the prior art, present invention employs the artificial interference process that deep learning technology avoids video feature extraction, energy It is enough that fast and effectively micro- video is retrieved.

Brief description of the drawings

Fig. 1 is shown as a kind of flow chart of the method for building up in micro- video frequency feature data storehouse.

Fig. 2 is that the network in the method for building up in micro- video frequency feature data storehouse instructs and orders intention in advance.

Fig. 3 is the network expanded schematic diagram in the method for building up in micro- video frequency feature data storehouse.

Fig. 4 is that the network in the method for building up in micro- video frequency feature data storehouse finely tunes schematic diagram.

Fig. 5 is a kind of flow chart of micro- video method for quickly retrieving based on micro- video frequency feature data storehouse.

Fig. 6 is a kind of principle schematic for establishing device in micro- video frequency feature data storehouse.

Fig. 7 is a kind of principle schematic of micro- video quick-searching device.

Drawing reference numeral explanation

Device is established in 100 micro- video frequency feature data storehouses

110 micro- video pictures extraction modules

130 picture frame pretreatment modules

150 own coding pre-training modules

170 autoencoder networks finely tune module

190 own coding characteristic extracting modules

200 micro- video quick-searching devices

210 video frame extraction modules

230 video frame pretreatment modules

250 characteristic extracting modules

270 retrieval modules

S10~S70 A B C steps

Embodiment

Illustrate embodiments of the present invention below by way of specific instantiation, those skilled in the art can be by this specification Disclosed content understands other advantages and effect of the present invention easily.The present invention can also pass through in addition different specific realities The mode of applying is embodied or practiced, the various details in this specification can also be based on different viewpoints with application, without departing from Various modifications or alterations are carried out under the spirit of the present invention.It should be noted that in the case where there is no conflict, following embodiments and implementation Feature in example can be mutually combined.

Micro- video is mainly characterized by " short ", and (User Generated Content, that is to say that user generates to a UGC Content) micro- video in 30 seconds, therefore it has certain difference with general video.

In addition, appropriate explanation is made to some nouns in embodiment here, to enable those skilled in the art to more Understand well or implement technical scheme.

Autoencoder network, refers to a kind of unsupervised learning method, he utilizes back-propagation algorithm, allows desired value to be equal to defeated Enter value.Its essence is a kind of neural network model, and in the term of deep learning, autoencoder network is also referred to as own coding nerve Network.

Limited Boltzmann machine (Restricted Boltzmann Machine, abbreviation RBM) is that a kind of production is random Neutral net (generative stochastic neural network), the network is by some visible elements (visible Unit, corresponding visible variable, that is, data sample) and some hidden units (hidden unit, corresponding hidden variable) compositions, It can be seen that variable and hidden variable are all binary variables, that is, its state takes { 0,1 }.Whole network is a bigraph (bipartite graph), only may be used Seeing between unit and hidden unit just can there are side, it is seen that all without there is side connection between unit and between hidden unit. In the present invention, limited Boltzmann machine is the connection being used for realization in autoencoder network between layers.

In order to realize rapidly and efficiently micro- video frequency searching, its essentially consist in how to provide one can be high rapidly The video frequency feature data storehouse of effect, this is because, the process of retrieval is exactly by the feature of video to be checked and existing video or number It is compared and sorts according to the video in storehouse, therefrom finds out and the immediate video of video features to be checked.So in this implementation In example, a kind of method for building up in micro- video frequency feature data storehouse is given first, will be apparent from the method for building up below.

The present invention provides a kind of method for building up in micro- video frequency feature data storehouse, Fig. 1 is seen, the described method includes：

Step S10, extracts the picture frame in micro- video, and described image frame is associated with micro- video；

Step S30, normalizes described image frame to obtain normalized view data；

Step S50, makes described image data carry out autoencoder network pre-training as input, to obtain the own coding net Weighting parameter and offset parameter and be unfolded in network per layer network connect into an autoencoder network completed；

Step S70, is finely adjusted the autoencoder network of the completion using BP neural network, will be described complete after fine setting Into the output data in intermediate layer of autoencoder network be converted into binary code and stored.

The method for building up in above-mentioned micro- video frequency feature data storehouse first carries out micro- video the picture frame of pretreatment extraction video, The every frame of formation image frame flag is associated with micro- video；Then depth is passed through using picture frame as the input of own coding neutral net E-learning carries out feature extraction, forms binary code feature database of micro- video per frame.Can be quick and effective by this method The property data base for establishing micro- video, provide basis for the retrieval of micro- video.

Specifically, in the step S10, the purpose for extracting micro- video image frame is to carry out study reconstruct to micro- video, The information content of picture frame is bigger, and the effect of study is better；If only extracting key frame, the information of e-learning is on the low side.Therefore, upper Micro- video flowing can just be extracted at interval of 10 frames one frame of micro- video by stating in scheme, form the picture frame collection of micro- video Close.And described image frame is associated with micro- video, micro- video and picture frame can be made to carry out one-to-many association, image The name of frame is named in the form of the entitled prefix of micro- video adds numbering.For example, with v_iRepresent i-th micro- video, by regarding Picture frame after the extraction of frequency frame is (p_i1, p_i2..., p_in) formed own coding depth network training sample, complete micro- video Pretreatment.

Further, in the step S30, the view data after micro- video pre-filtering is normalized into 32 × 32 sizes Picture, specific method is：Schilling described image frame carries out picture smoothing processing, obtains denoising image, and picture is carried out here Smoothing processing is mainly to remove the noise of picture；Then, calculate the average value of the denoising image and calculate the denoising image Standard variance；Finally, the denoising image is made to subtract the average value and again divided by the denoising figure of the denoising image The standard variance of picture, you can obtain normalized view data.

Further, in the step S50, the input of autoencoder network first layer is 3072 visible elements, is hidden Layer is arranged to 8192 units.The hiding layer unit of all remaining limited Boltzmann machine layers is N number of, it is seen that layer unit 2N It is a.The weight for initializing each layer of limited Boltzmann machine is a random small real number, is biased to zero.Every layer of Boltzmann The learning efficiency of machine is 0.001, carries out each layer and is limited Boltzmann machine e-learning.

Learnt by above-mentioned training, autoencoder network pre-training mainly is carried out to view data, refers to Fig. 2, its In last layer output be image data character representation.After autoencoder network pre-training, the power of every layer network is obtained Value parameter and offset parameter, network is unfolded, you can connects into the autoencoder network of a completion, such as Fig. 3.

Further, in the step S70, the autoencoder network is finely adjusted and mainly uses BP network algorithms Network is finely adjusted, it is in the forward calculation network being finely adjusted to the autoencoder network, intermediate layer is (i.e. special such as Fig. 4 Levy layer) binary code that output data unsteady state operation is 0 and 1, the former output result in intermediate layer is used in feedback modifiers network Fed back.

Further, after by fine setting, autoencoder network can be with reconstructed picture.Take the centre of autoencoder network after finely tuning Layer extracts the characteristic of micro- vedio data, forms the binary code of picture feature and is stored in the database.Its In, when intermediate layer is changed into binary code, binary code conversion is carried out using the method to round up.

It is the foundation that micro- video frequency feature data storehouse can be achieved by the above method, so as to utilize the micro- video features established Database can realize the quick-searching of micro- video, its specific search method principle is all based on micro- video frequency feature data Storehouse, it will be described in further detail below.

In addition, present invention also offers a kind of micro- video method for quickly retrieving based on micro- video frequency feature data storehouse, See Fig. 5, the search method includes：

Step A, extracts the video frame of micro- video to be checked, and normalizes the video frame to obtain normalized picture number According to；

Step B, makes described image data as input to carry out the deep learning of the autoencoder network, to extract State the binary code of video frame；

Step C, makes in the binary code and micro- video frequency feature data storehouse that micro- video frame to be checked is obtained through feature extraction Binary code carry out Hamming distance calculating, it is and ascending to micro- video frequency feature data storehouse according to the distance value after calculating In video be ranked up and exported.

Specifically, in step A and step B, its used method and above-mentioned micro- video frequency feature data storehouse method for building up It is similar, therefore just no longer repeated here.

Further, in step C, the binary code of the video frame and the binary code in micro- video frequency feature data storehouse Method is retrieved using K- nearest neighbor algorithms used by carrying out Hamming distance calculating, so as to obtain retrieval result.

See Fig. 4, above-mentioned micro- video method for quickly retrieving is similar with above-mentioned micro- video frequency feature data storehouse method for building up, that is, treats Processing of micro- video progress as the method for building up of micro- video frequency feature data storehouse is examined, to obtain the two of micro- video frequency feature data to be checked Ary codes, then carry out comparing calculation, finally again by result of calculation by the binary code in itself and micro- video frequency feature data storehouse Being ranked up and being exported according to ascending.It should be appreciated that after result of calculation is obtained, can be special by micro- video The relationship maps between binary code and micro- video in database are levied, ranking results will be pressed with micro- video similar in micro- video to be checked Exported.It is all above the Conventional wisdom and technological means of retrieval technique, therefore just repeats no more here.

In addition, establishing device 100 present invention also offers a kind of micro- video frequency feature data storehouse, Fig. 6, described device bag are seen Include：

Micro- video pictures extraction module 110, for extracting the picture frame in micro- video, and by described image frame with it is described micro- Video is associated；Picture frame pretreatment module 130, for normalizing described image frame to obtain normalized view data；From Pre-training module 150 is encoded, makes described image data carry out autoencoder network pre-training, obtains the weights square of described image data Battle array and amount of bias are simultaneously exported；Autoencoder network finely tunes module 170, for using BP neural network to the self-editing of the completion Code network is finely adjusted, and is feedovered using forward calculation network to the autoencoder network of the completion, to obtain the completion Autoencoder network intermediate layer output data；And the data exported using feedback modifiers network to the intermediate layer are carried out instead Feedback, to correct the data of the intermediate layer output；Own coding characteristic extracting module 190, for it will be feedovered and fed back after The data of the intermediate layer output of the autoencoder network of the completion are converted to binary code；And the binary code is deposited Storage.

Specifically, in the forward calculation network being finely adjusted to the autoencoder network, by the data of intermediate layer output Using the method unsteady state operation to round up as binary code, and the former output result in feedback modifiers network season intermediate layer carries out Feedback, to take the intermediate layer of the autoencoder network after fine setting to obtain the characteristic of described image data, forms described image frame Binary features code and stored.

In addition, present invention also offers a kind of micro- video quick-searching device 200, Fig. 7, micro- video quick-searching are seen Device 200 includes：Video frame extraction module 210, for extracting the video frame of micro- video to be checked, and by the video frame with it is described Micro- video to be checked is associated；Video frame pretreatment module 230, for normalizing the video frame to obtain normalized picture number According to；Characteristic extracting module 250, for making described image data as input to carry out the deep learning of autoencoder network, to carry Take out the binary code of the video frame；Module 270 is retrieved, for the binary code for calculating the video frame and micro- video features The Hamming distance of binary code in database, and according to the distance value after calculating to regarding in micro- video frequency feature data storehouse Frequency carries out ascending sequence and is exported.

The above-described embodiments merely illustrate the principles and effects of the present invention, not for the limitation present invention.It is any ripe Know the personage of this technology all can carry out modifications and changes under the spirit and scope without prejudice to the present invention to above-described embodiment.Cause This, those of ordinary skill in the art is complete without departing from disclosed spirit and institute under technological thought such as Into all equivalent modifications or change, should by the present invention claim be covered.

Claims

A kind of 1. method for building up in micro- video frequency feature data storehouse, it is characterised in that：

The picture frame in micro- video is extracted, and described image frame is associated with micro- video；

Described image frame is normalized to obtain normalized view data；

Described image data are made to carry out autoencoder network pre-training as input, to obtain in the autoencoder network per layer network Weighting parameter and offset parameter and being unfolded connect into the autoencoder network of a completion；

The autoencoder network of the completion is finely adjusted using BP neural network, wherein, BP neural network is to the completion The method that autoencoder network is finely adjusted includes：

Feedovered using forward calculation network to the autoencoder network of the completion, to obtain the autoencoder network of the completion Intermediate layer output data；

And fed back using the data that feedback modifiers network exports the intermediate layer, to correct the number of the intermediate layer output According to；

The data that the intermediate layer of the autoencoder network of the completion after being feedovered and being fed back exports are converted into binary code； And the binary code is stored.
2. the method for building up in micro- video frequency feature data storehouse according to claim 1, it is characterised in that described image frame includes With the multiple images set of micro- one frame of video described in a predetermined interval frequency abstraction, and described multiple images set micro- is regarded with described Frequency is associated with many-to-one mapping mode.
3. the method for building up in micro- video frequency feature data storehouse according to claim 1, it is characterised in that normalization described image The method of frame is：

Make described image frame carry out picture smoothing processing, obtain denoising image；

Calculate the average value of the denoising image；

Calculate the standard variance of the denoising image；

Make the denoising image subtract the denoising image the average value and again divided by the denoising image standard variance, Obtain normalized view data.
4. the method for building up in micro- video frequency feature data storehouse according to claim 1, it is characterised in that make described image data Carry out autoencoder network pre-training method be：

The input for making autoencoder network first layer is 3027 visible elements units, and it is 8192 concealed nodes lists to make hidden layer Member；

Make the hidden layer concealed nodes unit of all remaining limited Boltzmann machines for being connected to each layer of the autoencoder network To be N number of, and visible layer visible elements unit is made as 2N；

The limited Boltzmann machine weight for initializing each layer is a random real number, is biased to zero；

Described image data are made to be learnt in each layer limited Boltzmann machine, and the study effect of every layer of Boltzmann machine Rate is 0.001.
5. a kind of micro- video of the method for building up based on the micro- video frequency feature data storehouse of Claims 1-4 any one of them is quick Search method, it is characterised in that the search method includes：

The video frame of micro- video to be checked is extracted, and normalizes the video frame to obtain normalized view data；

Make described image data as input to carry out the deep learning of the autoencoder network, to extract the video frame Binary code；

The binary code of the video frame and the binary code in micro- video frequency feature data storehouse is made to carry out Hamming distance calculating, and root Ascending sequence is carried out to the video in micro- video frequency feature data storehouse according to the distance value after calculating and is exported.
6. micro- video method for quickly retrieving according to claim 5, it is characterised in that carry out Hamming distance calculating and used Method be K- nearest neighbor algorithms.
A kind of 7. micro- video quick-searching device, it is characterised in that including：

Video frame extraction module, for extracting the video frame of micro- video to be checked, and by the video frame and micro- video to be checked It is associated；

Video frame pretreatment module, for normalizing the video frame to obtain normalized view data；

Characteristic extracting module, for making described image data as input to carry out the deep learning of autoencoder network, with extraction Go out the binary code of the video frame；

Module is retrieved, the Hamming of binary code and the binary code in micro- video frequency feature data storehouse for calculating the video frame Distance, and ascending sequence is carried out to the video in micro- video frequency feature data storehouse according to the distance value after calculating and is given With output；

Micro- video frequency feature data storehouse is the foundation used such as the micro- video frequency feature data storehouse of claim 1-4 any one of them What method was established.
8. device is established in a kind of micro- video frequency feature data storehouse, it is characterised in that including：

Micro- video pictures extraction module, for extracting the picture frame in micro- video, and by described image frame and micro- video phase Association；

Picture frame pretreatment module, for normalizing described image frame to obtain normalized view data；

Own coding pre-training module, for making described image data carry out autoencoder network pre-training as input, to obtain State the weighting parameter in autoencoder network per layer network and offset parameter and be unfolded to connect into the own coding net of a completion Network；

Autoencoder network finely tunes module, for being finely adjusted using BP neural network to the autoencoder network of the completion, utilizes Forward calculation network feedovers the autoencoder network of the completion, to obtain the intermediate layer of the autoencoder network of the completion The data of output；And fed back using the data that feedback modifiers network exports the intermediate layer, to correct the intermediate layer The data of output；

Own coding characteristic extracting module, for the intermediate layer of the autoencoder network of the completion after being feedovered and being fed back is defeated The data gone out are converted to binary code；And the binary code is stored.
9. device is established in micro- video frequency feature data storehouse according to claim 8, it is characterised in that using what is rounded up The data that method exports the intermediate layer are converted to binary code.