CN104156464A

CN104156464A - Micro-video retrieval method and device based on micro-video feature database

Info

Publication number: CN104156464A
Application number: CN201410416334.7A
Authority: CN
Inventors: 陈芋文; 张矩; 钟坤华; 刘磊锋
Original assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2014-08-20
Filing date: 2014-08-20
Publication date: 2014-11-19
Anticipated expiration: 2034-08-20
Also published as: CN104156464B

Abstract

The invention provides a micro-video retrieval method and device based on a micro-video feature database. The micro-video retrieval method and device are mainly used for rapid retrieval of a huge number of micro-videos. The method comprises the steps that firstly, the micro-videos are preprocessed, key frames of the micro-videos are extracted, video frames are formed, and the correlation between each frame and the micro-videos is marked; then, the video frames serve as input of an own coding neural network, feature extraction is carried out through depth network learning, and a binary code base of each frame of the micro-videos is formed; finally, retrieval is carried out based on the Hamming distance k-nearest neighbor algorithm. Compared with the prior art, the depth learning technology is adopted, the manual interference process of video feature extraction is omitted, and rapid and effective retrieval can be carried out on the micro-videos.

Description

Micro-video retrieval method and device based on micro-video frequency feature data storehouse

Technical field

The present invention relates to a kind of information retrieval field, mainly refer to that the foundation of database server and the information in database read and sorts, and particularly relate to a kind of search method and device of video image.

Background technology

Along with the continuous lifting of mobile communication technology and terminal hardware level, mobile Internet is rapidly developed, and internet sweeps the spring breeze of a burst of reform.At mobile Internet and smart mobile phone, gradually under universal background, micro-video enters people's sight line once again, starts to occur various Related products on market, and entrepreneur and investment people have also turned one's attention to this field gradually.The raising of smart machine computing power has made at mobile phone terminal, to take, edit and create video to come true, microblogging is also cultivated and has excited people to share and the demand exchanging with social products such as the types of facial makeup in Beijing operas, thereby produced UGC (the User Generated Content of a large amount of length in 30 seconds, that is to say user-generated content) micro-video, in the face of these magnanimity and also comprise the micro-video data of a large amount of destructurings and carry out quick-searching and become urgent demand.

In recent years, researchers aspect video feature extraction and retrieval, a lot of significant methods have been proposed.Although these research methods have certain effect in feature extraction and retrieval, its computation process is generally more complicated, and feature extraction needs artificial interference, and more complicated feature extracting method has also brought the difficulty of retrieval.In addition, existing video feature extraction and search method are not carried out feature extraction and retrieval for the distinctive feature of micro-video yet.

In sum, how to carry out micro-video frequency searching more fast and effectively, become and in content-based Research into information retrieval field, needed one of important topic of solution.

Summary of the invention

The shortcoming of prior art, the object of the present invention is to provide a kind of micro-video retrieval method and device based on micro-video frequency feature data storehouse, the problem that cannot retrieve fast and effectively for micro-video for solving prior art in view of the above.

For achieving the above object and other relevant objects, the invention provides following technical scheme:

The method for building up in micro-video frequency feature data storehouse, comprises the picture frame extracting in micro-video, and described picture frame is associated with described micro-video; Described in normalization, picture frame is to obtain normalized view data; Make described view data carry out autoencoder network as input and train in advance, to obtain weighting parameter and the offset parameter of every layer network in described autoencoder network and to be launched to connect into an autoencoder network completing; Adopt BP neural network to finely tune the described autoencoder network completing, the output data-switching in the middle layer of the autoencoder network completing described in after fine setting is become to binary code and stored.

Preferably, in the method for building up in above-mentioned micro-video frequency feature data storehouse, described picture frame comprises a plurality of image collections with micro-video one frame described in a predetermined interval frequency abstraction, and described a plurality of image collection is associated with many-to-one mapping mode with described micro-video.Preferably, in the method for building up in above-mentioned micro-video frequency feature data storehouse, the method for picture frame comprises described in normalization: make described picture frame carry out picture smoothing processing, obtain denoising image; Calculate the mean value of described denoising image; Calculate the standard variance of described denoising image; Make the described mean value of denoising image described in described denoising figure image subtraction and again divided by the standard variance of described denoising image, obtain normalized view data.

Preferably, in the method for building up in above-mentioned micro-video frequency feature data storehouse, the method that makes described view data carry out the pre-training of autoencoder network is: make 3027 the visual node units that are input as of autoencoder network ground floor, and to make hidden layer be 8192 concealed nodes unit; Making all remaining hidden layer concealed nodes unit that are connected in the limited Boltzmann machine of described each layer of autoencoder network is N, and to make the visual node unit of visible layer be 2N; The described limited Boltzmann machine weight of the every one deck of initialization is a random real number, is biased to zero; Make described view data learn in each layer of described limited Boltzmann machine, and the learning efficiency of every layer of Boltzmann machine is 0.001.

Preferably, in the method for building up in above-mentioned micro-video frequency feature data storehouse, the method that makes BP neural network finely tune the described autoencoder network completing is: utilize forward calculation network to feedover to the described autoencoder network completing, with the data of the middle layer output of the autoencoder network that completes described in obtaining; And utilize feedback modifiers network to feed back the data of described middle layer output, to revise the data of described middle layer output; By feedover and feed back after described in the data of middle layer output of the own coding net that completes be converted to binary code; And described binary code is stored.

In addition, the present invention also provides a kind of micro-video method for quickly retrieving based on micro-video frequency feature data storehouse, and described search method comprises: extract the frame of video of micro-video to be checked, and described in normalization frame of video to obtain normalized view data; Make described view data learn to carry out the degree of depth of described autoencoder network as input, to extract the binary code of described frame of video; Make the binary code of described frame of video and the binary code in micro-video frequency feature data storehouse carry out Hamming distance calculating, and according to the distance value after calculating, the video in described micro-video frequency feature data storehouse is carried out ascending sequence and exported.

In addition, the present invention also provides a kind of micro-video quick-searching device, comprising: frame of video extraction module, for extracting the frame of video of micro-video to be checked, and is associated described frame of video with described micro-video to be checked; Frame of video pretreatment module, for frame of video described in normalization to obtain normalized view data; Characteristic extracting module, for making described view data learn to carry out the degree of depth of autoencoder network as input, to extract the binary code of described frame of video; Retrieval module, for calculating the Hamming distance of the binary code of described frame of video and the binary code in micro-video frequency feature data storehouse, and carries out ascending sequence and is exported the video in described micro-video frequency feature data storehouse according to the distance value after calculating.

Finally, the present invention also provides the apparatus for establishing in a kind of micro-video frequency feature data storehouse, comprising: micro-video pictures extraction module, for extracting the picture frame of micro-video, and is associated described picture frame with described micro-video; Picture frame pretreatment module, for picture frame described in normalization to obtain normalized view data; The pre-training module of own coding, trains in advance for making described view data carry out autoencoder network as input, to obtain weighting parameter and the offset parameter of every layer network in described autoencoder network and to be launched to connect into an autoencoder network completing; Autoencoder network fine setting module, for adopting BP neural network to finely tune the described autoencoder network completing, utilize forward calculation network to feedover to the described autoencoder network completing, with the data of the middle layer output of the autoencoder network that completes described in obtaining; And utilize feedback modifiers network to feed back the data of described middle layer output, to revise the data of described middle layer output; Own coding characteristic extracting module, for by feedover and feed back after described in the data of middle layer output of the autoencoder network that completes be converted to binary code; And described binary code is stored.

In sum, micro-video retrieval method and the device based on micro-video frequency feature data storehouse provided by the invention, is mainly used in realizing the quick-searching of the micro-video of magnanimity.First, micro-video is carried out to the key frame that video is extracted in pre-service, formation frame of video the every frame of mark are associated with micro-video; Then the input using frame of video as own coding neural network is carried out feature extraction by degree of depth e-learning, forms the binary code storehouse of the every frame of micro-video; The last K-nearest neighbor algorithm based on Hamming distance is retrieved.With respect to prior art, the present invention has adopted degree of deep learning art to avoid the artificial interference process of video feature extraction, can to micro-video, retrieve fast and effectively.

Accompanying drawing explanation

Fig. 1 is shown as the process flow diagram of the method for building up in a kind of micro-video frequency feature data storehouse.

Fig. 2 is that the network in the method for building up in described micro-video frequency feature data storehouse is instructed and ordered intention in advance.

Fig. 3 is that the network in the method for building up in described micro-video frequency feature data storehouse launches schematic diagram.

Fig. 4 is the network fine setting schematic diagram in the method for building up in described micro-video frequency feature data storehouse.

Fig. 5 is a kind of process flow diagram of the micro-video method for quickly retrieving based on described micro-video frequency feature data storehouse.

Fig. 6 is the principle schematic of the apparatus for establishing in a kind of micro-video frequency feature data storehouse.

Fig. 7 is a kind of principle schematic of micro-video quick-searching device.

Drawing reference numeral explanation

The apparatus for establishing in 100 micro-video frequency feature data storehouses

110 micro-video pictures extraction modules

130 picture frame pretreatment module

The pre-training module of 150 own coding

170 autoencoder network fine setting modules

190 own coding characteristic extracting module

200 micro-video quick-searching devices

210 frame of video extraction modules

230 frame of video pretreatment module

250 characteristic extracting module

270 retrieval modules

S10～S70 A B C step

Embodiment

Below, by specific instantiation explanation embodiments of the present invention, those skilled in the art can understand other advantages of the present invention and effect easily by the disclosed content of this instructions.The present invention can also be implemented or be applied by other different embodiment, and the every details in this instructions also can be based on different viewpoints and application, carries out various modifications or change not deviating under spirit of the present invention.It should be noted that, in the situation that not conflicting, the feature in following examples and embodiment can combine mutually.

The principal feature of micro-video is " short ", and micro-video of a UGC (User Generated Content, that is to say user-generated content) is in 30 seconds, therefore itself and general video have certain difference.

In addition, some nouns in embodiment are done to suitable explanation here, to enable those skilled in the art to understand better or implement technical scheme of the present invention.

Autoencoder network, refers to a kind of unsupervised learning method, and he utilizes back-propagation algorithm, allows desired value equal input value.Its essence is a kind of neural network model, in the term of degree of depth study, autoencoder network is also known as own coding neural network.

Limited Boltzmann machine (Restricted Boltzmann Machine, be called for short RBM) be a kind of production stochastic neural net (generative stochastic neural network), this network is by some visible elements (visible unit, corresponding visible variable, that is data sample) and some hidden units (hidden unit, corresponding hidden variable) form, visible variable and hidden variable are all binary variables, that is its state is got { 0,1}.Whole network is a bigraph (bipartite graph), only has between visible element and hidden unit and just can have limit, does not have limit and connect between visible element and between hidden unit.In the present invention, limited Boltzmann machine is for realizing autoencoder network connection between layers.

In order to realize micro-video frequency searching fast and effectively, it is mainly how to provide a video frequency feature data storehouse rapidly and efficiently, this be because, the process of retrieval is exactly that the feature of video to be checked and existing video or the video in database are compared and sorted, therefrom to find out and the immediate video of video features to be checked.So, in the present embodiment, given first the method for building up in a kind of micro-video frequency feature data storehouse, will elaborate described method for building up below.

The invention provides the method for building up in a kind of micro-video frequency feature data storehouse, see Fig. 1, described method comprises:

Step S10, extracts the picture frame in micro-video, and described picture frame is associated with described micro-video;

Step S30, picture frame is to obtain normalized view data described in normalization;

Step S50, makes described view data carry out autoencoder network as input and trains in advance, to obtain weighting parameter and the offset parameter of every layer network in described autoencoder network and to be launched to connect into an autoencoder network completing;

Step S70, adopts BP neural network to finely tune the described autoencoder network completing, and the output data-switching in the middle layer of the autoencoder network completing described in after fine setting is become to binary code and stored.

First the method for building up in above-mentioned micro-video frequency feature data storehouse carries out to micro-video the picture frame that video is extracted in pre-service, and the every frame of formation picture frame mark is associated with micro-video; Then the input using picture frame as own coding neural network is carried out feature extraction by degree of depth e-learning, forms the binary code feature database of the every frame of micro-video.By the method, can set up fast and effectively the property data base of micro-video, for the retrieval of micro-video provides basis.

Particularly, in described step S10, the object of extracting micro-video frame image is that micro-video is learnt to reconstruct, and the quantity of information of picture frame is larger, and the effect of study is better; If only extract key frame, the information of e-learning is on the low side.Therefore, in such scheme, can to micro-video flowing, at interval of 10 frames, just extract a frame of described micro-video, form the set of frames of described micro-video.And described picture frame is associated with described micro-video, can make micro-video and picture frame carry out the associated of one-to-many, the name of picture frame is called with the name of micro-video the form name that prefix adds numbering.For example,, with v _irepresent the micro-video of i portion, the picture frame after extracting by frame of video is (p _i1, p _i2, p _in) form the training sample of own coding degree of depth network, complete the pre-service of micro-video.

Further, in described step S30, the view data after micro-video pre-filtering is normalized into the picture of 32 * 32 sizes, concrete grammar is: a shilling described picture frame carries out picture smoothing processing, obtain denoising image, picture being carried out to smoothing processing is here mainly to remove the noise of picture; Then, calculate the mean value of described denoising image and the standard variance of the described denoising image of calculating; Finally, make the described mean value of denoising image described in described denoising figure image subtraction and again divided by the standard variance of described denoising image, can obtain normalized view data.

Further, in described step S50, autoencoder network ground floor be input as 3072 visible elements, hidden layer is set to 8192 unit.The hidden layer unit of all remaining limited Boltzmann machine layers is N, and visible layer unit is 2N.The weight of the limited Boltzmann machine of the every one deck of initialization is a random little real number, is biased to zero.The learning efficiency of every layer of Boltzmann machine is 0.001, carries out each layer of limited Boltzmann machine e-learning.

By above-mentioned training study, be mainly view data to be carried out to autoencoder network train in advance, can be with reference to figure 2, wherein the output of last one deck be the character representation of image data.After the pre-training of autoencoder network, obtain weighting parameter and the offset parameter of every layer network, network is launched, can connect into an autoencoder network completing, as Fig. 3.

Further, in described step S70, it is mainly to adopt BP network algorithm to finely tune network that described autoencoder network is finely tuned, as Fig. 4, when the forward calculation network that described autoencoder network is finely tuned, middle layer (being characteristic layer) output data are forced to be converted to 0 and 1 binary code, when feedback modifiers network, use the former Output rusults in middle layer to feed back.

Further, after fine setting, autoencoder network can reconstructed picture.The characteristic of micro-vedio data is extracted in the middle layer of getting the rear autoencoder network of fine setting, and the binary code that forms picture feature is also stored in database.Wherein, when middle layer is changed into binary code, employing be that the way rounding up is carried out binary code conversion.

By said method, can realize the foundation in micro-video frequency feature data storehouse, thereby utilize micro-video frequency feature data storehouse of setting up can realize the quick-searching of micro-video, its concrete search method principle all, based on described micro-video frequency feature data storehouse, will be described in detail this below.

In addition, the present invention also provides a kind of micro-video method for quickly retrieving based on described micro-video frequency feature data storehouse, sees Fig. 5, and described search method comprises:

Steps A, extracts the frame of video of micro-video to be checked, and described in normalization frame of video to obtain normalized view data;

Step B, makes described view data learn to carry out the degree of depth of described autoencoder network as input, to extract the binary code of described frame of video;

Step C, make binary code that described micro-frame of video to be checked obtains through feature extraction and the binary code in micro-video frequency feature data storehouse carry out Hamming distance calculating, and according to the distance value after calculating is ascending, video in described micro-video frequency feature data storehouse is sorted and exported.

Particularly, in steps A and step B, the method that it adopts and above-mentioned micro-video frequency feature data storehouse method for building up are similar, therefore just no longer repeated here.

Further, in step C, it is to utilize K-nearest neighbor algorithm to retrieve that the binary code in the binary code of described frame of video and micro-video frequency feature data storehouse carries out the method that Hamming distance calculating adopts, thereby obtains result for retrieval.

See Fig. 4, above-mentioned micro-video method for quickly retrieving and above-mentioned micro-video frequency feature data storehouse method for building up are similar, micro-video to be checked is carried out and the same processing of micro-video frequency feature data storehouse method for building up, to obtain the binary code of micro-video frequency feature data to be checked, then the binary code in itself and described micro-video frequency feature data storehouse is carried out to comparing calculation, last again by result of calculation according to ascending sorting and being exported.Should be appreciated that after obtaining result of calculation, can, by the relationship maps between binary code and micro-video in described micro-video frequency feature data storehouse, the micro-video close with micro-video to be checked be exported by ranking results.More than be all the conventional knowledge and technology means of retrieval technique, therefore just repeat no more here.

In addition, the present invention also provides the apparatus for establishing 100 in a kind of micro-video frequency feature data storehouse, sees Fig. 6, and described device comprises:

Micro-video pictures extraction module 110, for extracting the picture frame of micro-video, and is associated described picture frame with described micro-video; Picture frame pretreatment module 130, for picture frame described in normalization to obtain normalized view data; The pre-training module 150 of own coding, makes described view data carry out autoencoder network and trains in advance, obtains weight matrix and the amount of bias of described view data and is exported; Autoencoder network fine setting module 170, for adopting BP neural network to finely tune the described autoencoder network completing, utilize forward calculation network to feedover to the described autoencoder network completing, with the data of the middle layer output of the autoencoder network that completes described in obtaining; And utilize feedback modifiers network to feed back the data of described middle layer output, to revise the data of described middle layer output; Own coding characteristic extracting module 190, for by feedover and feed back after described in the data of middle layer output of the autoencoder network that completes be converted to binary code; And described binary code is stored.

Particularly, when the forward calculation network that described autoencoder network is finely tuned, the data of middle layer output are forced to be converted to binary code with the method rounding up, and feed back at the former Output rusults in feedback modifiers network middle layer in season, to get the middle layer of the autoencoder network after fine setting, obtain the characteristic of described view data, form the binary features code of described picture frame and stored.

In addition, the present invention also provides a kind of micro-video quick-searching device 200, sees Fig. 7, and described micro-video quick-searching device 200 comprises: frame of video extraction module 210, for extracting the frame of video of micro-video to be checked, and described frame of video is associated with described micro-video to be checked; Frame of video pretreatment module 230, for frame of video described in normalization to obtain normalized view data; Characteristic extracting module 250, for making described view data learn to carry out the degree of depth of autoencoder network as input, to extract the binary code of described frame of video; Retrieval module 270, for calculating the Hamming distance of the binary code of described frame of video and the binary code in micro-video frequency feature data storehouse, and according to the distance value after calculating, the video in described micro-video frequency feature data storehouse is carried out ascending sequence and exported.

Above-described embodiment is illustrative principle of the present invention and effect thereof only, but not for limiting the present invention.Any person skilled in the art scholar all can, under spirit of the present invention and category, modify or change above-described embodiment.Therefore, such as in affiliated technical field, have and conventionally know that the knowledgeable, not departing from all equivalence modifications that complete under disclosed spirit and technological thought or changing, must be contained by claim of the present invention.

Claims

1. the method for building up in micro-video frequency feature data storehouse, is characterized in that:

Extract the picture frame in micro-video, and described picture frame is associated with described micro-video;

Described in normalization, picture frame is to obtain normalized view data;

Make described view data carry out autoencoder network as input and train in advance, to obtain weighting parameter and the offset parameter of every layer network in described autoencoder network and to be launched to connect into an autoencoder network completing;

Adopt BP neural network to finely tune the described autoencoder network completing, the output data-switching in the middle layer of the autoencoder network completing described in after fine setting is become to binary code and stored.

2. the method for building up in micro-video frequency feature data according to claim 1 storehouse, it is characterized in that, described picture frame comprises a plurality of image collections with micro-video one frame described in a predetermined interval frequency abstraction, and described a plurality of image collection is associated with many-to-one mapping mode with described micro-video.

3. the method for building up in micro-video frequency feature data according to claim 1 storehouse, is characterized in that, the method for picture frame is described in normalization:

Make described picture frame carry out picture smoothing processing, obtain denoising image;

Calculate the mean value of described denoising image;

Calculate the standard variance of described denoising image;

Make the described mean value of denoising image described in described denoising figure image subtraction and again divided by the standard variance of described denoising image, obtain normalized view data.

4. the method for building up in micro-video frequency feature data according to claim 1 storehouse, is characterized in that, the method that makes described view data carry out the pre-training of autoencoder network is:

Make 3027 the visual node units that are input as of autoencoder network ground floor, and to make hidden layer be 8192 concealed nodes unit;

Making all remaining hidden layer concealed nodes unit that are connected in the limited Boltzmann machine of described each layer of autoencoder network is N, and to make the visual node unit of visible layer be 2N;

The described limited Boltzmann machine weight of the every one deck of initialization is a random real number, is biased to zero;

Make described view data learn in each layer of described limited Boltzmann machine, and the learning efficiency of every layer of Boltzmann machine is 0.001.

5. the method for building up in micro-video frequency feature data according to claim 1 storehouse, is characterized in that, the method that makes BP neural network finely tune the described autoencoder network completing is:

Utilize forward calculation network to feedover to the described autoencoder network completing, with the data of the middle layer output of the autoencoder network that completes described in obtaining;

And utilize feedback modifiers network to feed back the data of described middle layer output, to revise the data of described middle layer output;

By feedover and feed back after described in the data of middle layer output of the autoencoder network that completes be converted to binary code; And described binary code is stored.

6. a micro-video method for quickly retrieving for the method for building up in the micro-video frequency feature data storehouse based on described in claim 1 to 5, is characterized in that, described search method comprises:

Extract the frame of video of micro-video to be checked, and described in normalization frame of video to obtain normalized view data;

Make described view data learn to carry out the degree of depth of described autoencoder network as input, to extract the binary code of described frame of video;

Make the binary code of described frame of video and the binary code in micro-video frequency feature data storehouse carry out Hamming distance calculating, and according to the distance value after calculating, the video in described micro-video frequency feature data storehouse is carried out ascending sequence and exported.

7. micro-video method for quickly retrieving according to claim 6, is characterized in that, carrying out the method that Hamming distance calculating adopts is K-nearest neighbor algorithm.

8. a micro-video quick-searching device, is characterized in that, comprising:

Frame of video extraction module, for extracting the frame of video of micro-video to be checked, and is associated described frame of video with described micro-video to be checked;

Frame of video pretreatment module, for frame of video described in normalization to obtain normalized view data;

Characteristic extracting module, for making described view data learn to carry out the degree of depth of autoencoder network as input, to extract the binary code of described frame of video;

Retrieval module, for calculating the Hamming distance of the binary code of described frame of video and the binary code in micro-video frequency feature data storehouse, and carries out ascending sequence and is exported the video in described micro-video frequency feature data storehouse according to the distance value after calculating.

9. the apparatus for establishing in micro-video frequency feature data storehouse, is characterized in that, comprising:

Micro-video pictures extraction module, for extracting the picture frame of micro-video, and is associated described picture frame with described micro-video;

Picture frame pretreatment module, for picture frame described in normalization to obtain normalized view data;

The pre-training module of own coding, trains in advance for making described view data carry out autoencoder network as input, to obtain weighting parameter and the offset parameter of every layer network in described autoencoder network and to be launched to connect into an autoencoder network completing;

Autoencoder network fine setting module, for adopting BP neural network to finely tune the described autoencoder network completing, utilize forward calculation network to feedover to the described autoencoder network completing, with the data of the middle layer output of the autoencoder network that completes described in obtaining; And utilize feedback modifiers network to feed back the data of described middle layer output, to revise the data of described middle layer output;

Own coding characteristic extracting module, for by feedover and feed back after described in the data of middle layer output of the autoencoder network that completes be converted to binary code; And described binary code is stored.

10. the apparatus for establishing in micro-video frequency feature data according to claim 9 storehouse, is characterized in that, adopts the method rounding up to be converted to binary code to the data of described middle layer output.