CN103617233B

CN103617233B - Method and device for detecting repeated video based on semantic content multilayer expression

Info

Publication number: CN103617233B
Application number: CN201310611187.4A
Authority: CN
Inventors: 刘大伟; 徐伟
Original assignee: Yantai Zhong Ke Network Technical Institute
Current assignee: Yantai Zhong Ke Network Technical Institute
Priority date: 2013-11-26
Filing date: 2013-11-26
Publication date: 2017-05-17
Anticipated expiration: 2033-11-26
Also published as: CN103617233A

Abstract

The invention relates to a method for detecting repeated video based on semantic content multilayer expression. The method comprises the following steps that according to information of index video, a feature database is established; shot detection is conducted on inquiry video to be inquired; a key frame is extracted from each inquiry video clip; each inquiry key frame is processed according to the feature extraction algorithm; Hash processing is conducted on each inquiry high-dimensional feature vector; each inquiry feature label is in relevance with corresponding inquiry high-dimensional feature vector identification, inquiry key frame identification, inquiry video clip identification and inquiry video identification, and feature labels are retrieved in the feature database; feature filtering is conducted on each group of similar feature labels obtained through retrieval; similarity match is conducted on the feature vectors in each alternative feature vector set, and therefore a repeated video detection result is obtained; distance calculation of the high-dimensional feature vector of a performance bottleneck can be avoided, the detection accuracy is guaranteed, and the processing speed of repeated video detection is effectively improved.

Description

A kind of repetition video detecting method and device represented based on semantic content multilayer

Technical field

The present invention relates to a kind of video detecting method, more particularly to a kind of repetition video represented based on semantic content multilayer Detection method and device.

Background technology

With developing rapidly for network digital video application, in order to protect and managing video content, repeatedly video on a large scale Detection turns into the problem of research concern.Repeat video detecting method and be broadly divided into two major classes:Digital watermarking and the weight based on content Recheck and survey.Digital watermark method is detected during hiding data message (i.e. watermark) is embedded into image and video.And be based on The method Bian video content analysis algorithms of content, generation video signatures or key frame feature are retrieved, with higher Treatment effeciency and accuracy.Most of repetition video frequency searchings of the research concern based on content.

The general procedure process of existing method can be divided into following three step:

First, video generates video segment by shot segmentation algorithm, and each video segment extracts one or more key frames；

Then, one group of high dimensional feature vector is generated using feature extraction algorithm to each key frame of video；

Finally, the similarity for defining video with the time of characteristic vector and spatial match algorithm is used for being detected.

First it is shot segmentation and Key-frame Extraction Algorithm.Shot segmentation is also called Scene Incision (Shot Boundary Detection).Camera lens is a series of sequence of frames of video of the video camera from start to stop between two operations, existing Some shot segmentation algorithms are generally divided into two classes:The first kind is the method based on threshold value when the similarity between two frames is less than advance During the threshold value of definition, that is, it is judged to edge.Threshold value can be it is global, what self adaptation or global self adaptation was combined.Second Class is the method based on statistical learning, is practised including educational inspector and the class method of unsupervised learning two, the algorithm of supervised learning classification The method of such as SVM, Adaboost and other models, the algorithm of unsupervised learning is mainly clustering algorithm, such as K-means, fuzzy K-means.Key-frame Extraction Algorithm is extracted from a camera lens can most represent the frame of camera lens content as key Frame, the feature of concern includes color, edge, shape, MPEG-7 motion descriptors etc..Mainly include two major classes:Frame sequence compares Method and the global method for comparing.

After by the pretreatment of shot segmentation and key-frame extraction, the basic object of index and retrieval is key frame The character representation of character representation, i.e. image, can be divided into two classes:Global characteristics and local feature, correspond respectively to different regarding The selection of frequency content representation algorithm and similarity measurement.Yeh et al. proposes that a kind of 16 dimension subregions of global key frame rank are retouched State symbol and a kind of corresponding sequences match algorithm.Chiu et al. incorporates global and local feature descriptor and uses min- Hashing and time-space registration detect repetition video.Shang et al. proposes a kind of binary system overall situation space-time characteristic and uses to be based on The method of inverted file is indexed and quick detection.Pan et al. proposes the space-time union feature that a kind of Bian is analyzed with DCT, and Video copy detection framework is devised based on this feature.Wu et al. further considers the motion of local key point, takes out one kind Track behavioural characteristic, and Bian Markov-chain models are indicated and match.Liu et al. proposes a kind of combination part SIFT The repetition video detection framework of feature and local sensitive hash (LSH) algorithm and random sampling uniformity (RANSAC) algorithm. Local feature is expressed as vision word and is detected using similar RANSAC matching algorithms by Avrithis et al..

SURF is the detector based on approximate Hessian for representing digital picture for proposing in recent years, by reality Checking is bright to be better than other local feature method for expressing, such as SIFT, PCA-SIFT etc. in terms of computational efficiency.The present invention utilizes SURF Feature to index accordingly optimized：The symbol of the intermediate result Laplacian calculated using characteristic vector, i.e. Hessian Trace of a matrix divides the bucket space of hash index generation, and the filtering of characteristic vector is carried out using the position of point of interest.

Local sensitivity Hash LSH algorithms are a kind of efficient algorithms that approximate KNN lookup is carried out in higher dimensional space.LSH is breathed out Uncommon family of functions has following property：Closely located object has probability higher to collide compared to distant object.Different The different distance metric of LSH families of functions correspondence.

Method based on local feature has more preferable robustness compared to the method based on global characteristics, particularly tackles face Color adjustment, cuts, and adds the video by conversion such as captions, transcoding, but to pay calculation cost higher simultaneously.

Method based on local feature, in the retrieving of basic LSH algorithms, a query point is several by being hashing onto In the corresponding bucket of individual different Hash table, institute some spies a little closest with the taking-up of the distance of query point in bucket are then calculated Vector is levied as retrieval result.It is believed that the high dimensional feature vector in retrieving is (such as：64 dimension SURF descriptors) Europe Formula distance is calculated needs the consumption plenty of time as cost, is the existing performance bottleneck place based on LSH algorithms.Due to network Application scenarios are higher to requirement of real-time, meanwhile, the repetition video detection based on multilayer content analysis needs to process the height of magnanimity Dimensional feature vector, so, processing speed ratio " partial accuracy " is more important.In addition, compared to using only a higher-dimension for integration Vector describes an algorithm based on global characteristics of key frame, and the algorithm based on local feature represents each key frame It is hundreds of high dimension vectors.Therefore, alternatively collection and reduction computational load are concerns how effectively to filter reduction characteristic vector Important Problems.

The content of the invention

The technical problems to be solved by the invention are to provide a kind of adaptive local sensitive hash ADLSH that passes through to frame of video SURF characteristic vectors be indexed and retrieve, the averaged feature vector number effectively estimated by parameter learning in each barrel Based on repetition video detecting method and device that semantic content multilayer is represented.

The technical scheme that the present invention solves above-mentioned technical problem is as follows：A kind of palinopsia represented based on semantic content multilayer Frequency detection method, comprises the following steps：

Step 1：Information according to index video sets up property data base；

Step 2：Inquiry video to be checked is carried out into Shot Detection, multiple queries video segment is obtained；The inquiry is regarded Frequency is provided with inquiry video labeling, and each inquiry video segment is respectively arranged with inquiry video segment mark；

Step 3：Key frame is extracted to each inquiry video segment, multiple queries key frame is obtained, each inquiry key frame It is respectively arranged with the crucial frame identification of inquiry；

Step 4：To each inquiry key frame processed using feature extraction algorithm, obtain a group polling high dimensional feature to Amount, each inquiry high dimensional feature vector is provided with inquiry high dimensional feature vectorial；

Step 5：Each inquiry high dimensional feature vector is carried out into Hash treatment respectively, a group polling feature tag is obtained；

Step 6：By each query characteristics label and corresponding inquiry high dimensional feature vectorial, the crucial frame identification of inquiry, Inquiry video segment mark and inquiry video labeling be associated, and using above-mentioned mark as each query characteristics label association , retrieval and inquisition feature tag and its associations in property data base obtain multigroup similar features label；

Step 7：According to every group of positional information of feature tag, feature is carried out to every group of similar features label that retrieval is obtained Filtering, obtains including the alternative features vector set of multiple characteristic vectors；

Step 8：According to the crucial frame identification of inquiry and inquiry video segment mark, in the vector set of each alternative features Characteristic vector carries out similarity mode, obtains repeating video testing result.

The beneficial effects of the invention are as follows：The present invention is ground to the repetition video detection represented based on semantic content multilayer Study carefully, using SURF descriptors as local feature, design a kind of new index structure based on LSH, the index combines SURF The internal characteristicses of descriptor, set to reduce calculating consumption during retrieval by parameter learning and self adaptation, maintain inspection The scalability and robustness of rope.A kind of simple and effective filter algorithm and two-layer are used to the characteristic vector set that retrieval is obtained Matching algorithm, further cuts down the quantity of alternative features vector set and generates the associated score of whole video, by setting phase Closing score threshold carries out repeating video detection；

The algorithm is indexed and examined by adaptive local sensitive hash ADLSH to the SURF characteristic vectors of frame of video Rope, the averaged feature vector number in each barrel is effectively estimated by parameter learning, and the height of performance bottleneck is caused so as to avoid The distance of dimensional feature vector is calculated, and then, characteristic vector to key frame and video is completed by characteristic filter and two-layer matching Multi-layer Matched, obtains associated score as testing result, and the algorithm can be effective to improve while Detection accuracy is ensured The processing speed of video detection is repeated, better than other algorithms for being currently based on local sensitivity Hash LSH.

On the basis of above-mentioned technical proposal, the present invention can also do following improvement.

Further, the step 1 specifically includes following steps：

Step 1.1：Index video is carried out into Shot Detection, multiple video segments are obtained, each video segment is respectively provided with There is video segment to identify, the index video is provided with index video labeling；

Step 1.2：Key frame is extracted to each video segment, multiple key frames are obtained, each key frame is respectively arranged with Crucial frame identification；

Step 1.3：Each key frame is processed using feature extraction algorithm, one group of high dimensional feature vector is obtained, often Individual high dimensional feature vector is provided with high dimensional feature vectorial；

Step 1.4：Each high dimensional feature vector is carried out into Hash treatment respectively, one group of feature tag is obtained；

Step 1.5：By each feature tag and corresponding high dimensional feature vectorial, crucial frame identification, piece of video segment mark Know and index video labeling be associated, by association after all feature tags be stored in property data base.

Further, the step 5 specifically includes following steps：

Step 5.1：Each inquiry high dimensional feature vector is represented using following sign function：

Wherein, p is 64 dimension high dimensional feature vectors, and Hessian matrixes are characterized the intermediate result of extraction algorithm extraction；

Step 5.2：The hash function of each inquiry high dimensional feature vector is expressed as follows：

Wherein a is the 64 dimension random vectors independently chosen from 2 a to Stable distritation, b be one from being uniformly distributed [0, W] real number chosen, parameter W randomly selects in 4 or 8 as optimal value；

Step 5.3：The inquiry high dimensional feature vector p of each 64 dimension is mapped to L L bucket of Hash table：

g_j(p), j=1 ..., L

The label of each barrel is a k dimensional vector, k hash function for randomly selecting of correspondence：

g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|)

Step 5.4：From inquiry video in extract inquiry high dimensional feature vector in randomly select m to inquiry high dimensional feature to Amount, the probability of mean collisional is m is vectorial inquiry high dimensional feature：

In each barrel inquiry high dimensional feature vector number be：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k

Wherein, N_bucketL=nLEp (c)^k≤ Raiton, Ratio are 0.1%；

L is expressed as the function of k：

Solution obtains unique k and L optimal values.

Further, the step 7 specifically includes following steps：

Step 7.1：During inquiry key frame is extracted, storage intermediate result is believed as the position of each characteristic point Breath；

Step 7.2：Each query characteristics label for obtaining an as characteristic point will be processed by Hash, it is special according to each Levy relative distance of the positional information calculation each two characteristic point a little in two-dimensional space；

Step 7.3：Statistic of classification is carried out according to the crucial frame identification of inquiry, obtains all in two correspondence key frame images The average value and standard deviation of characteristic point relative distance；

Step 7.4 relative distance is exceeded average value and is removed as noise spot much larger than the characteristic point of standard deviation.

Further, the step 8 specifically includes following steps：

Step 8.1：It is according to the crucial frame identification of inquiry that each matching characteristic in the vector set of each alternative features is vectorial Hash treatment is carried out again, the key frame matched with inquiry key frame is searched using linear sweep：The number of matching characteristic vector Amount is matching key frame more than the key frame of predetermined threshold；

Step 8.2：For each key frame f that inquiry video segment is identified_i ^q, and match key frameSimilarity be：

Wherein, N_mBe with the matching key frame in a bucket corresponding characteristic vector number, w_i,jIt is the power of correspondence bucket Value, specially w_i,j=1/N_bucket；

Step 8.3：Inquiry video v_qWith index video v_cBetween associated score be：

Wherein, N_frameIt is the inquiry key frame sum for inquiring about video extraction, if an index video and inquiry video Associated score score_cMore than predetermined threshold S_t, then by as a repetition video.

Further, a kind of repetition video detecting device represented based on semantic content multilayer, including module is set up, camera lens inspection Survey module, key-frame extraction module, characteristic extracting module, Hash processing module, relating module, characteristic filter module and similarity Matching module；

It is described to set up module, for setting up property data base according to the information of index video；

The shot detection module, for inquiry video to be checked to be carried out into Shot Detection, obtains multiple queries video Fragment, each inquiry video segment is respectively arranged with inquiry video segment mark, and the inquiry video is provided with inquiry video mark Know；

The key-frame extraction module, for extracting key frame to each inquiry video segment, obtains multiple queries crucial Frame, each inquiry key frame is respectively arranged with the crucial frame identification of inquiry；

The characteristic extracting module, for being processed using feature extraction algorithm each inquiry key frame, obtains one Group polling high dimensional feature vector, each inquiry high dimensional feature vector is provided with inquiry high dimensional feature vectorial；

The Hash processing module, for each inquiry high dimensional feature vector to be carried out into Hash treatment respectively, obtains one group Query characteristics label；

The relating module, for by each query characteristics label and corresponding inquiry high dimensional feature vectorial, inquiry Crucial frame identification, inquiry video segment mark and inquiry video labeling are associated, and above-mentioned mark is special as each inquiry The associations of label are levied, retrieval and inquisition feature tag and its associations in property data base obtain multigroup similar features label；

The characteristic filter module, for according to every group of positional information of feature tag, every group obtained to retrieval to be similar Feature tag carries out characteristic filter, obtains including the alternative features vector set of multiple characteristic vectors；

The similarity mode module, it is standby to each for according to the crucial frame identification of inquiry and inquiry video segment mark Selecting the characteristic vector in characteristic vector set carries out similarity mode, obtains repeating video testing result.

Further, the module of setting up further includes detection sub-module, key-frame extraction submodule, feature extraction submodule Block, Hash submodule with associate submodule；

The detection sub-module, for index video to be carried out into Shot Detection, obtains multiple video segments, each piece of video Section is respectively arranged with video segment mark, and the index video is provided with index video labeling；

The key-frame extraction submodule, for extracting key frame to each video segment, obtains multiple key frames, each Key frame is respectively arranged with crucial frame identification；

The feature extraction submodule, for this pair, each key frame is processed using feature extraction algorithm, obtains one Group high dimensional feature vector, each high dimensional feature vector is provided with high dimensional feature vectorial；

The Hash submodule, for each high dimensional feature vector to be carried out into Hash treatment respectively, obtains one group of feature mark Sign；

The association submodule, for by each feature tag and corresponding high dimensional feature vectorial, crucial frame identification, Video segment identify and index video labeling be associated, by association after all feature tags be stored in property data base.

Further, the Hash processing module further includes high dimension vector submodule, hash function submodule, mapping Module and extraction submodule；

The high dimension vector submodule, for each inquiry high dimensional feature vector to be represented using following sign function：

The hash function submodule, the hash function for each inquiry high dimensional feature vector is expressed as follows：

The mapping submodule, for the inquiry high dimensional feature vector p of each 64 dimension to be mapped into the L L of Hash table Individual bucket：

g_j(p), j=1 ..., L

g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|)

The extraction submodule, for randomly selecting m to looking into the inquiry high dimensional feature vector of the extraction from inquiry video High dimensional feature vector is ask, m is the probability of mean collisional between inquiry high dimensional feature vector：

In each barrel inquiry high dimensional feature vector number be：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k

Wherein, N_bucketL=nLEp (c)^k≤ Raiton, Ratio are 0.1%；

L is expressed as the function of k：

Solution obtains unique k and L optimal values.

Further, the characteristic filter module further includes intermediate storage submodule, calculates apart from submodule, classification system Meter submodule and removal submodule；

The intermediate storage submodule, for during inquiry key frame is extracted, storage intermediate result to be used as each The positional information of characteristic point；

It is described to calculate apart from submodule, for each query characteristics label for obtaining an as spy will to be processed by Hash Levy a little, relative distance of the positional information calculation each two characteristic point according to each characteristic point in two-dimensional space；

The statistic of classification submodule, for carrying out statistic of classification according to the crucial frame identification of inquiry, obtains in two correspondences The average value and standard deviation of all characteristic point relative distances in key frame images；

The removal submodule, for relative distance is exceeded average value and much larger than standard deviation characteristic point as making an uproar Sound point is removed.

Further, the similarity mode module further includes to travel through submodule, similarity submodule and related submodule Block；

The traversal submodule, for according to each inquired about during crucial frame identification gathers each alternative features vector Hash treatment is carried out again with characteristic vector, the key frame matched with inquiry key frame is searched using linear sweep：Matching is special The quantity for levying vector exceedes the key frame of predetermined threshold to match key frame；

The similarity submodule, for each key frame f identified for inquiry video segment_i ^q, and match key frameSimilarity be：

The relevant sub-module, for inquiring about video v_qWith index video v_cBetween associated score be：

Brief description of the drawings

Fig. 1 is the inventive method flow chart of steps；

Fig. 2 is apparatus of the present invention structure chart.

In accompanying drawing, the list of parts representated by each label is as follows：

1st, module, 1-1, detection sub-module, 1-2, key-frame extraction submodule, 1-3, feature extraction submodule, 1- are set up 4th, Hash submodule, 1-5, association submodule, 2, shot detection module, 3, key-frame extraction module, 4, characteristic extracting module, 5, Hash processing module, 5-1, high dimension vector submodule, 5-2, hash function submodule, 5-3, mapping submodule, 5-4, extraction Module, 6, relating module, 7, characteristic filter module, 7-1, intermediate storage submodule, are calculated apart from submodule, 7-3, classification 7-2 Statistic submodule, 7-4, removal submodule, 8, similarity mode module, 8-1, traversal submodule, 8-2, similarity submodule, 8- 3rd, relevant sub-module.

Specific embodiment

Principle of the invention and feature are described below in conjunction with accompanying drawing, example is served only for explaining the present invention, and It is non-for limiting the scope of the present invention.

As shown in figure 1, being the inventive method flow chart of steps, Fig. 2 is apparatus of the present invention structure chart.

Embodiment 1

A kind of repetition video detecting method represented based on semantic content multilayer, is comprised the following steps：

Step 1：Information according to index video sets up property data base；

The step 1 specifically includes following steps：

The step 5 specifically includes following steps：

Wherein a is that 64 dimensions independently chosen from 2 a to Stable distritation (correspondence Euclidean distance is Gaussian Profile) are random Vector, b is one from the real number for being uniformly distributed [0, W] selection, and parameter W randomly selects in 4 or 8 as optimal value；

g_j(p), j=1 ..., L

g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|)

In each barrel inquiry high dimensional feature vector number be：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k

Wherein, N_bucketL=nLEp (c)^k≤ Raiton, Ratio are 0.1%；

L is expressed as the function of k：

Solution obtains unique k and L optimal values.

The step 7 specifically includes following steps：

The step 8 specifically includes following steps：

Step 8.3：Inquiry video v_qWith index video v_cBetween associated score be：

A kind of repetition video detecting device represented based on semantic content multilayer, including set up module 1, shot detection module 2, key-frame extraction module 3, characteristic extracting module 4, Hash processing module 5, relating module 6, characteristic filter module 7 and similarity Matching module 8；

It is described to set up module 1, for setting up property data base according to the information of index video；

The shot detection module 2, for inquiry video to be checked to be carried out into Shot Detection, obtains multiple queries video Fragment, each inquiry video segment is respectively arranged with inquiry video segment mark, and the inquiry video is provided with inquiry video mark Know；

The key-frame extraction module 3, for extracting key frame to each inquiry video segment, obtains multiple queries crucial Frame, each inquiry key frame is respectively arranged with the crucial frame identification of inquiry；

The characteristic extracting module 4, for being processed using feature extraction algorithm each inquiry key frame, obtains one Group polling high dimensional feature vector, each inquiry high dimensional feature vector is provided with inquiry high dimensional feature vectorial；

The Hash processing module 5, for each inquiry high dimensional feature vector to be carried out into Hash treatment respectively, obtains one group Query characteristics label；

The relating module 6, for by each query characteristics label and corresponding inquiry high dimensional feature vectorial, inquiry Crucial frame identification, inquiry video segment mark and inquiry video labeling are associated, and above-mentioned mark is special as each inquiry The associations of label are levied, retrieval and inquisition feature tag and its associations in property data base obtain multigroup similar features label；

The characteristic filter module 7, for according to every group of positional information of feature tag, every group obtained to retrieval to be similar Feature tag carries out characteristic filter, obtains including the alternative features vector set of multiple characteristic vectors；

The similarity mode module 8, it is standby to each for according to the crucial frame identification of inquiry and inquiry video segment mark Selecting the characteristic vector in characteristic vector set carries out similarity mode, obtains repeating video testing result.

The module 1 of setting up further includes detection sub-module 1-1, key-frame extraction submodule 1-2, feature extraction submodule Block 1-3, Hash submodule 1-4 with associate submodule 1-5；

The detection sub-module 1-1, for index video to be carried out into Shot Detection, obtains multiple video segments, and each is regarded Frequency fragment is respectively arranged with video segment mark, and the index video is provided with index video labeling；

The key-frame extraction submodule 1-2, for extracting key frame to each video segment, obtains multiple key frames, Each key frame is respectively arranged with crucial frame identification；

The feature extraction submodule 1-3, for this pair, each key frame is processed using feature extraction algorithm, is obtained One group of high dimensional feature vector, each high dimensional feature vector is provided with high dimensional feature vectorial；

The Hash submodule 1-4, for each high dimensional feature vector to be carried out into Hash treatment respectively, obtains one group of feature Label；

The association submodule 1-5, for by each feature tag and corresponding high dimensional feature vectorial, key frame mark Know, video segment mark and index video labeling be associated, by association after all feature tags be stored in property data base.

The Hash processing module 5 further includes high dimension vector submodule 5-1, hash function submodule 5-2, mapping Module 5-3 and extraction submodule 5-4；

The high dimension vector submodule 5-1, for each inquiry high dimensional feature vector to be represented using following sign function：

The hash function submodule 5-2, the hash function for each inquiry high dimensional feature vector is expressed as follows：

The mapping submodule 5-3, for the inquiry high dimensional feature vector p of each 64 dimension to be mapped into L Hash table L bucket：

g_j(p), j=1 ..., L

g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|)

The extraction submodule 5-4, for randomly selecting m couples in the inquiry high dimensional feature vector of the extraction from inquiry video Inquiry high dimensional feature vector, the probability of mean collisional is m is vectorial inquiry high dimensional feature：

In each barrel inquiry high dimensional feature vector number be：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k

Wherein, N_bucketL=nLEp (c)^k≤ Raiton, Ratio are 0.1%；

L is expressed as the function of k：

Solution obtains unique k and L optimal values.

The characteristic filter module 7 further includes intermediate storage submodule 7-1, calculates apart from submodule 7-2, classification system Meter submodule 7-3 and removal submodule 7-4；

The intermediate storage submodule 7-1, for during inquiry key frame is extracted, storage intermediate result to be used as every The positional information of individual characteristic point；

It is described to calculate apart from submodule 7-2, for each query characteristics label for obtaining as will to be processed by Hash Individual characteristic point, the relative distance of positional information calculation each two characteristic point according to each characteristic point in two-dimensional space；

The statistic of classification submodule 7-3, for carrying out statistic of classification according to the crucial frame identification of inquiry, obtains right at two Answer the average value and standard deviation of all characteristic point relative distances in key frame images；

The removal submodule 7-4, for relative distance being exceeded into average value and being made much larger than the characteristic point of standard deviation For noise spot is removed.

The similarity mode module 8 further includes to travel through submodule 8-1, similarity submodule 8-2 and related submodule Block 8-3；

The traversal submodule 8-1, for will be in the vector set of each alternative features according to the crucial frame identification of inquiry it is every Individual matching characteristic vector carries out again Hash treatment, and the key frame matched with inquiry key frame is searched using linear sweep： The key frame that quantity with characteristic vector exceedes predetermined threshold is matching key frame；

The similarity submodule 8-2, for each key frame f identified for inquiry video segment_i ^q, and match key FrameSimilarity be：

Wherein, N_mBe with the matching key frame in a bucket corresponding characteristic vector number, w_{I, j}It is the power of correspondence bucket Value, specially w_i,j=1/N_bucket；

The relevant sub-module 8-3, for inquiring about video v_qWith index video v_cBetween associated score be：

In specific implementation, using the property of SURF descriptors for each index feature vector p designs a symbol letter Number is as follows：

The sign function is combined with original LSH hash functions and is obtained the hash function of ADLSH and is expressed as follows：

Wherein p is one 64 dimension SURF characteristic vector.A is that (correspondence Euclidean distance is Gauss point from a 2- Stable distritation Cloth) in independently choose 64 dimension random vectors, b be one from be uniformly distributed [0, W] selection real number.Each hash function h_a,b P the vectorial p of one 64 dimension is mapped as a real number for having symbol by ().For index building structure, each point p is mapped to L L bucket of individual Hash table：g_j(p), j=1 ..., L.The label of each barrel is a k dimensional vector, and corresponding k randomly selects Hash function：g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|).In concrete implementation, we are exhausted using hash function To value | h_a,b(p) | to represent barrel label and each barrel be split into two buckets according to the sign function of p.So, ADLSH algorithms The bucket number of generation is about 2 times of original LSH algorithms, and accordingly, the vectorial number being hashing onto in each barrel is averagely reduced to Half.

Algorithm based on a LSH major issue in actual applications is several parameter W, the selection of k, L.It is existing big Some algorithm is all to set in an experiment, it is impossible to successfully manage the demand of practical application.It is an object of the present invention to ensure local On the premise of sensitive natur, according to the concrete condition of real data, reduction is hashing onto the vectorial number in each barrel, and reaching can be with Cancel the effect that high dimension vector distance is calculated in bucket.I.e. for a query vector q, we take all barrels of g that it is hashing onto₁ (q),...,g_LInstitute directed quantity q in (q) alternately vector set.Reduce the quantity of alternative vector set as far as possible at the same ensure to Measure neighbours point v (| | the q-v | | of any Euclidean distance of q within R₂≤ R) it is contained in alternative vector set.Analysis below The related constraint relation of parameters in ADLSH algorithms, and the method for proposing a kind of parameter learning setting of self adaptation.

ADLSH algorithms can solve the problems, such as R neighbor search with probability 1- δ, and δ is probability of failure (in realization of the invention In we take 0.1%).For two vector p₁, p₂, it is c=to make its distance | | p₁-p₂||₂, then the two vectors are by a Kazakhstan The probability of uncommon Function Mapping collision is：

Wherein f₂T () is the probability density function of Gaussian Profile positive portions：

(without loss of generality, it is assumed that R=1, distance between any data set vector can be with R neighbor searching problems Scaled or expanded in the region of this hypothesis with appropriate ratio, without the distance correspondence between influence data vector), In order to retrieve all Euclidean distance c within R, i.e. the vector of c ＜ 1, it is necessary to meet following condition：

It is above the Probability Condition of single hash function, for a bucket label for k dimensional vectors, collision probability is：

For L Hash table, the probability that query vector q finds neighbour of the distance within 1 is：

Pr_NN[||q-p||₂≤ 1]=1- (1-p (1)^k)^L≥1-δ (5)

For fixed p (c), the optimal value of parameter W is the function of c, and the collision that reduction W can reduce any two vector is general Rate p (c).Equally, increasing k or reduce L can reduce the probability for finding neighbour.By the pass between further analysis multiple parameters System, devises following three step to complete auto-adaptive parameter setting：

1)W

Drawn through experiment, for L values (generally less than 10 feasible in a practical application³), optimal W values (typically take 2 Power) can not ether it is small.In the case where the property of local sensitivity Hash ensures, all of collision probability p (c) with apart from c monotone decreasings, In order that Ep (c) (following by definition) has a more significant change, optimal W values can not ether it is big.Observed based on more than and analyzed, We choose 4 or 8 as W optimal value.It is worth noting that, the selection of W values is unrelated with real data collection, it is not necessary to according to Real data learns or corrects.

2) sample learning and estimation

When algorithm is applied into real data, from index video extraction sum for n SURF characteristic vectors in take out at random M is taken to vector as sample.Estimate the distribution situation of distance between vector in the data acquisition system, represent what is estimated with formula (6) Mean collisional probability between vector：

Ep (c)=p (c_e), wherein

Notice that the c in Ep (c) can be different with the change of data set, not meet c ＜ 1.We are estimated with formula (7) Vectorial number in each barrel：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k (7)

3) k and L

Our target is to reduce the total number of institute's directed quantity in the L bucket that a query vector is hashing onto as far as possible N_bucket·L.With a ratio R atio come the scope according to concrete application requirement setting number.It is n for a sum vector number Data set, each query vector average retrieval to vectorial number no more than Raiton (present invention in use Ratio for 0.1%).Then there is following constraint formula：

N_bucketL=nLEp (c)^k≤Raito·n (8)

According to formula (5), L can be expressed as the function of k：

Notice that the collision probability in L (k) uses the p (1) after standardizing, rather than p (c).According to formula (3), for solid Fixed W values, p (1) is definite value.In actual applications, according to the difference of data set, by step 2) different Ep (c) can be obtained, Raito is value set in advance, determines an optimal k value according to Raito, and then determine optimal L (k).

● characteristic filter

By ADLSH hash algorithms bucket division and three steps auto-adaptive parameter set, for each query characteristics to Amount, needed not move through during retrieval in bucket higher-dimension distance calculate can be obtained by one it is contemplated that par it is alternative Characteristic vector set.The change of Video Key two field picture can not be effectively tackled due to the SURF characteristic matchings based on Euclidean distance Change or noise.Inspection typically using some spaces and filter algorithm, such as RANSAC in similar research.In the present invention, by Quantity is significantly reduced compared to other researchs in the alternative vector set for obtaining, we are using the simple mistake based on distance Filtering method removes the matching characteristic vector of some apparent errors.For each pair vector for retrieving, we are carried using SURF features The intermediate result i.e. positional information of characteristic point during taking, calculates the relative distance in two-dimensional space.Two correspondences are obtained to close The average value and standard deviation of all of characteristic point relative distance in key two field picture.By therein standard deviation is much larger than more than average Feature as noise spot to removing.

ADLSH of the invention and characteristic filter method can obtain quantity less, match more accurately feature pair.

● two-layer matching process

The SURF characteristic vectors obtained by ADLSH and characteristic filter, further will be obtained using a two-layer matching process To the associated score of correspondence video.During index and retrieval, corresponding of the SURF characteristic vectors of each inquiry video The number of characteristic point will all be recorded in same bucket with characteristic vector and different Hash tables.Our marks according to key frame Each the matching SURF characteristic vector that will be obtained carries out Hash again, and one inquiry of correspondence is found by a linear sweep The matching key frame of key frame：It is crucial that matching characteristic vector number is considered as matching more than the key frame of a threshold value set in advance (about each key frame generates 100 SURF characteristic vectors to frame, and 60) our selected thresholds are.So we have obtained key frame The testing result of rank.In order to obtain the other testing result of videl stage, for each key frame f for inquiring about video_i ^q, and match pass Key frameSimilarity be defined as follows：

Wherein N_dIt is the characteristic vector sum of extraction in the key frame for inquire about video.N_mIt is that key frame is matched with this at one Corresponding characteristic vector number in bucket.w_i,jIt is the weights of correspondence bucket, for removing the different influence of vectorial number in bucket, that is, reduces The excessive bucket of vectorial number can simply be set as in practice for the influence of similarity：w_i,j=1/N_bucket.According to more than As a result, inquiry video v_qWith index video v_cBetween associated score be defined as follows：

Wherein N_frameIt is the key frame sum for inquiring about video extraction.If an index video divides to the related of inquiry video Number score_cMore than a threshold value S_tThen by as a repetition video.Threshold value S in practical application_tDepend on data set and need To compromise between recall rate and accuracy rate and be set.

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all it is of the invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included within the scope of the present invention.

Claims

1. a kind of repetition video detecting method represented based on semantic content multilayer, it is characterised in that comprise the following steps：

Step 1：Information according to index video sets up property data base；

Step 2：Inquiry video to be checked is carried out into Shot Detection, multiple queries video segment is obtained；The inquiry video sets Inquiry video labeling is equipped with, each inquiry video segment is respectively arranged with inquiry video segment mark；

Step 3：Key frame is extracted to each inquiry video segment, multiple queries key frame is obtained, each inquiry key frame difference It is provided with the crucial frame identification of inquiry；

Step 4：Each inquiry key frame is processed using feature extraction algorithm, group polling high dimensional feature vector is obtained, Each inquiry high dimensional feature vector is provided with inquiry high dimensional feature vectorial；

Step 6：By each query characteristics label and corresponding inquiry high dimensional feature vectorial, the crucial frame identification of inquiry, inquiry Video segment identify and inquiry video labeling be associated, and using above-mentioned mark as each query characteristics label associations, Retrieval and inquisition feature tag and its associations in property data base, obtain multigroup similar features label；

Step 7：According to every group of positional information of feature tag, feature mistake is carried out to every group of similar features label that retrieval is obtained Filter, obtains including the alternative features vector set of multiple characteristic vectors；

Step 8：According to the crucial frame identification of inquiry and inquiry video segment mark, to the feature in the vector set of each alternative features Vector carries out similarity mode, obtains repeating video testing result；

Wherein, the step 1 specifically includes following steps：

Step 1.1：Index video is carried out into Shot Detection, multiple video segments are obtained, each video segment is respectively arranged with and regards Frequency fragment identification, the index video is provided with index video labeling；

Step 1.2：Key frame is extracted to each video segment, multiple key frames are obtained, each key frame is respectively arranged with key Frame identification；

Step 1.3：Each key frame is processed using feature extraction algorithm, one group of high dimensional feature vector is obtained, each is high Dimensional feature vector is provided with high dimensional feature vectorial；

Step 1.5：By each feature tag and corresponding high dimensional feature vectorial, crucial frame identification, video segment mark and Index video labeling be associated, by association after all feature tags be stored in property data base.

2. the repetition video detecting method represented based on semantic content multilayer according to claim 1, it is characterised in that institute State step 5 and specifically include following steps：

Wherein a is the 64 dimension random vectors independently chosen from 2 a to Stable distritation, and b is one and is selected from [0, W] is uniformly distributed The real number for taking, parameter W randomly selects in 4 or 8 as optimal value；

g_j(p), j=1 ..., L

g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|)

Step 5.4：M is randomly selected in the inquiry high dimensional feature vector extracted from inquiry video to inquiry high dimensional feature vector, m The probability of mean collisional is vectorial inquiry high dimensional feature：

Ep (c)=p (c_e),

In each barrel inquiry high dimensional feature vector number be：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k

Wherein, N_bucketL=nLEp (c)^k≤ Raiton, Ratio are 0.1%；

L is expressed as the function of k：

Solution obtains unique k and L optimal values.

3. the repetition video detecting method represented based on semantic content multilayer according to claim 1, it is characterised in that institute State step 7 and specifically include following steps：

Step 7.1：During inquiry key frame is extracted, positional information of the intermediate result as each characteristic point is stored；

Step 7.2：Each the query characteristics label for obtaining will be processed as a characteristic point by Hash, according to each characteristic point Relative distance of the positional information calculation each two characteristic point in two-dimensional space；

Step 7.3：Statistic of classification is carried out according to the crucial frame identification of inquiry, all features in two correspondence key frame images are obtained The average value and standard deviation of point relative distance；

4. the repetition video detecting method represented based on semantic content multilayer according to claim 1, it is characterised in that institute State step 8 and specifically include following steps：

Step 8.1：Each the matching characteristic vector in the vector set of each alternative features is carried out according to the crucial frame identification of inquiry Hash treatment again, the key frame matched with inquiry key frame is searched using linear sweep：The quantity of matching characteristic vector surpasses It is matching key frame to cross the key frame of predetermined threshold；

s i m (f_{i}^{q}, f_{j}^{c}) = \underset{N_{d}}{Σ} \underset{L}{Σ} w_{i, j} \cdot N_{m}

Wherein, N_mBe with the matching key frame in a bucket corresponding characteristic vector number, w_i,jIt is the weights of correspondence bucket, specifically It is w_i,j=1/N_bucket, N_bucketIt is the inquiry high dimensional feature vector number in each barrel；

Step 8.3：Inquiry video v_qWith index video v_cBetween associated score be：

{score}_{c} = \frac{Σ_{N_{f r a m e}} s i m (f_{i}^{q}, f_{j}^{c})}{N_{f r a m e}}

Wherein, N_frameIt is the inquiry key frame sum for inquiring about video extraction, if an index video is related to inquiry video Fraction score_cMore than predetermined threshold S_t, then by as a repetition video.

5. a kind of repetition video detecting device represented based on semantic content multilayer, it is characterised in that：Including setting up module (1), Shot detection module (2), key-frame extraction module (3), characteristic extracting module (4), Hash processing module (5), relating module (6), characteristic filter module (7) and similarity mode module (8)；

It is described to set up module (1), for setting up property data base according to the information of index video；

The shot detection module (2), for inquiry video to be checked to be carried out into Shot Detection, obtains multiple queries piece of video Section, each inquiry video segment is respectively arranged with inquiry video segment mark, and the inquiry video is provided with inquiry video labeling；

The key-frame extraction module (3), for extracting key frame to each inquiry video segment, obtains multiple queries crucial Frame, each inquiry key frame is respectively arranged with the crucial frame identification of inquiry；

The characteristic extracting module (4), for being processed using feature extraction algorithm each inquiry key frame, obtains one group Inquiry high dimensional feature vector, each inquiry high dimensional feature vector is provided with inquiry high dimensional feature vectorial；

The Hash processing module (5), for each inquiry high dimensional feature vector to be carried out into Hash treatment respectively, obtains one group and looks into Ask feature tag；

The relating module (6), for each query characteristics label to be closed with corresponding inquiry high dimensional feature vectorial, inquiry Key frame identification, inquiry video segment mark and inquiry video labeling are associated, and using above-mentioned mark as each query characteristics The associations of label, retrieval and inquisition feature tag and its associations in property data base obtain multigroup similar features label；

The characteristic filter module (7), for according to every group of positional information of feature tag, the every group of similar spy obtained to retrieval Levying label carries out characteristic filter, obtains including the alternative features vector set of multiple characteristic vectors；

The similarity mode module (8) is alternative to each for according to the crucial frame identification of inquiry and inquiry video segment mark Characteristic vector in characteristic vector set carries out similarity mode, obtains repeating video testing result；

Wherein, the module (1) of setting up further includes detection sub-module (1-1), key-frame extraction submodule (1-2), feature Extracting sub-module (1-3), Hash submodule (1-4) with associate submodule (1-5)；

The detection sub-module (1-1), for index video to be carried out into Shot Detection, obtains multiple video segments, each video Fragment is respectively arranged with video segment mark, and the index video is provided with index video labeling；

The key-frame extraction submodule (1-2), for extracting key frame to each video segment, obtains multiple key frames, often Individual key frame is respectively arranged with crucial frame identification；

The feature extraction submodule (1-3), for this pair, each key frame is processed using feature extraction algorithm, obtains one Group high dimensional feature vector, each high dimensional feature vector is provided with high dimensional feature vectorial；

The Hash submodule (1-4), for each high dimensional feature vector to be carried out into Hash treatment respectively, obtains one group of feature mark Sign；

Association submodule (1-5), for by each feature tag and corresponding high dimensional feature vectorial, key frame mark Know, video segment mark and index video labeling be associated, by association after all feature tags be stored in property data base.

6. the repetition video detecting device represented based on semantic content multilayer according to claim 5, it is characterised in that：Institute State Hash processing module (5) and further include high dimension vector submodule (5-1), hash function submodule (5-2), mapping submodule (5-3) and extract submodule (5-4)；

The high dimension vector submodule (5-1), for each inquiry high dimensional feature vector to be represented using following sign function：

The hash function submodule (5-2), the hash function for each inquiry high dimensional feature vector is expressed as follows：

The mapping submodule (5-3), for the inquiry high dimensional feature vector p of each 64 dimension to be mapped into the L L of Hash table Individual bucket：

g_j(p), j=1 ..., L

g_j(p)=(| h_1,j(p)|,...,|h_k,j(p)|)

Extraction submodule (5-4), for randomly selecting m to looking into the inquiry high dimensional feature vector of the extraction from inquiry video High dimensional feature vector is ask, m is the probability of mean collisional between inquiry high dimensional feature vector：

Ep (c)=p (c_e),

In each barrel inquiry high dimensional feature vector number be：

N_bucket=∑_np(c_e)^k≈n·Ep(c)^k

Wherein, N_bucketL=nLEp (c)^k≤ Raiton, Ratio are 0.1%；

L is expressed as the function of k：

Solution obtains unique k and L optimal values.

7. the repetition video detecting device represented based on semantic content multilayer according to claim 5, it is characterised in that：Institute State characteristic filter module (7) and further include intermediate storage submodule (7-1), calculate apart from submodule (7-2), statistic of classification Module (7-3) and removal submodule (7-4)；

The intermediate storage submodule (7-1), for during inquiry key frame is extracted, storage intermediate result to be used as each The positional information of characteristic point；

It is described to calculate apart from submodule (7-2), for each the query characteristics label for obtaining will to be processed as one by Hash Characteristic point, the relative distance of positional information calculation each two characteristic point according to each characteristic point in two-dimensional space；

The statistic of classification submodule (7-3), for carrying out statistic of classification according to the crucial frame identification of inquiry, obtains in two correspondences The average value and standard deviation of all characteristic point relative distances in key frame images；

It is described removal submodule (7-4), for using relative distance exceed average value and much larger than standard deviation characteristic point as Noise spot is removed.

8. the repetition video detecting device represented based on semantic content multilayer according to claim 5, it is characterised in that：Institute State similarity mode module (8) and further include traversal submodule (8-1), similarity submodule (8-2) and relevant sub-module (8- 3)；

Traversal submodule (8-1), for according to each inquired about during crucial frame identification gathers each alternative features vector Matching characteristic vector carries out again Hash treatment, and the key frame matched with inquiry key frame is searched using linear sweep：Matching The key frame that the quantity of characteristic vector exceedes predetermined threshold is matching key frame；

The similarity submodule (8-2), for each key frame f identified for inquiry video segment_i ^q, and match key frameSimilarity be：

s i m (f_{i}^{q}, f_{j}^{c}) = \underset{N_{d}}{Σ} \underset{L}{Σ} w_{i, j} \cdot N_{m}

The relevant sub-module (8-3), for inquiring about video v_qWith index video v_cBetween associated score be：

{score}_{c} = \frac{Σ_{N_{f r a m e}} s i m (f_{i}^{q}, f_{j}^{c})}{N_{f r a m e}}