CN104166685B

CN104166685B - A kind of method and apparatus for detecting video segment

Info

Publication number: CN104166685B
Application number: CN201410357297.7A
Authority: CN
Inventors: 方伟
Original assignee: BEIJING JETSEN TECHNOLOGY Co Ltd
Current assignee: BEIJING JETSEN TECHNOLOGY Co Ltd
Priority date: 2014-07-24
Filing date: 2014-07-24
Publication date: 2017-07-11
Anticipated expiration: 2034-07-24
Also published as: CN104166685A

Abstract

The embodiment of the invention discloses a kind of method and apparatus for detecting video segment, belong to image processing field.Method includes：Using the video file or real time streaming video of current input as object to be detected, whether the video occurred in that in video library in current video stream is detected and recognized by video analysis and search method, and the accurate location to the video segment in flow measurement to be checked is positioned, and export testing result.The multi-features global and local information of image that the present invention is extracted, ensure to reduce matching error rate while recall rate high, content change to image keeps robustness, so as to being accurately positioned to the video segment in real-time or offline TV or video frequency program, there is frame losing in video flowing, occur maintaining accurate detection and the identification to video segment when mosaic block, different video resolution ratio and frame per second, video content change.

Description

A kind of method and apparatus for detecting video segment

Technical field

The present invention relates to image processing field, more particularly to a kind of method and apparatus for detecting video segment.

Background technology

Nowadays with the development of video recording equipment software and hardware technology, increasing video is taken the photograph by specialty or ordinary people System is produced.In radio, TV and film industries, the quantity and data volume of video material and television program video are very big.Managing these In massive video data, a common and important behaviour is turned into according to specific needs quick obtaining certain or some videos therein Make, for example：Find video material, programme content, advertising segment of same video content etc..

Meanwhile, with the rise of internet video website, the video for having hundreds of thousands daily is uploaded, and every video website possesses Video data be all magnanimity data, storage overhead is very big, and it is empty to waste storage if user uploads similar video Between, so as to increased cost.And for personal and family, some videos with record life can be also preserved, equally, why Sample searches similar video segment in these videos and also becomes an actual demand.

The video retrieval technology for being currently based on content is broadly divided into two classes：1) video detection based on key frame；2) it is based on The video detection of sequence fragment matching.

Video detecting method based on key frame extracts representative key frame set come table from video first The video is levied, the matching between video and retrieval are then realized by the matching between key frame.After extracting key frame, from Characteristics of image that is more succinct and being conducive to calculating is extracted on key frame.Characteristics of image mainly includes two major classes：Global image Feature and local image characteristics.Conventional global characteristics have：Color characteristic (color histogram, color moment etc.), textural characteristics (ash Degree co-occurrence matrix, LBP, Gabor etc.), shape edges feature (edge histogram, Shape context etc.).Conventional local feature Including：Local shape factor operator --- Harris, Laplace, DOG, Hessian etc.；Local feature description's operator --- MSER, SIFT, ORB etc..The characteristics of there is global characteristics computing to detect, but it is easily influenceed by image change；And it is local Characteristics of image has the spy of illumination invariant, rotational invariance, translation invariance, scale invariability and part affine-invariant features Point, but its amount of calculation is larger, and the computing capability requirement to machine is higher.

Video detecting method based on sequence fragment matching make use of the sequential and spatial continuity feature between frame of video To be detected to video and be positioned, the method mainly includes two parts：Image characteristics extraction and sequence analysis.At this In class method, characteristics of image is mainly using simple global image feature (such as color characteristic, textural characteristics, space Ordinal features) or object movement locus feature.In matching process is carried out to video, such method needs to use with whole Body or sliding window form carry out the sequences match between multiple successive frames and analyze (for example：Editing distance, structure matrix diagonals Line method), so that it is determined that whether occurring in that target video fragment in video to be detected.The calculating time of such method mainly consumes In sequence analysis, and during interframe ordered pair its is critically important, once entanglement occurs in sequential, detection performance will be affected, in addition The shortcoming of such algorithm is that can not be accurately determined the state pause judgments time of video segment, and to the testing result of short-sighted frequency Poor, it is more suitable for judging the overall similitude of video, in real-time streams detection, is susceptible to frame losing, audio letter Situations such as breath is lost, mosaic image occurs.

The content of the invention

The embodiment provides a kind of method and apparatus for detecting video segment, solve video file or regard in real time Video segment during frequency flows carries out accurate detection with the problem for recognizing.

To reach above-mentioned purpose, adopt the following technical scheme that：

The invention discloses a kind of method for detecting video segment, comprise the following steps：

The corresponding key frame of all videos in video library is extracted, and feature extraction is carried out to the key frame, obtain described The corresponding local feature vectors of key frame and global characteristics vector；

Global characteristics vector to the key frame by the way of based on Hash table sets up index, and uses the mode of falling row Local feature to the key frame sets up index, forms index database；

Key frame to be measured is extracted from video segment, retrieval will be carried out in the index database by key frame to be measured per frame Match somebody with somebody, and return to the candidate frame set of the video segment similar to the key frame to be measured；

By each key frame in the video segment, video flowing is merged into corresponding candidate frame analysis respectively, according to the video The similarity of fragment and the video flowing, judges whether the video segment is video in the video library.

Further, the piecemeal color of current image frame is extracted during the corresponding key frame of all videos in the extraction video library Degree histogram and carries out Similarity Measure as image feature vector with the image feature vector of a upper key frame, such as not phase Seemingly, then added current image frame as the new key frame of a frame in key frame set, otherwise judge whether the value of counter is equal to Sampling interval is worth, equal then to be added current image frame as the new key frame of a frame in key frame set.

It is further, described that the colouring information of key frame is normalized rear piecemeal when carrying out feature extraction to key frame, Three passages to each image block distinguish counting statistics feature, and the characteristic vector cascade of three passages is produced into global characteristics Vector.

Further, when the global characteristics vector using by the way of based on Hash table to key frame sets up index, to rule Global characteristics vector G after generalized^normHash mapping is carried out, cryptographic Hash H is

Wherein, ω_iIt is the corresponding weight of each characteristic component.

Further, it is described will per frame key frame to be measured carried out in index database retrieval matching when, respectively according to pass to be detected The corresponding local feature of key frame and global characteristics are matched in index database, obtain the common factor conduct of corresponding key frame of video The candidate frame set.

It is further, described that by each key frame in the video segment, video flowing is merged into corresponding candidate frame analysis respectively When,

Video numbering id will be belonged to and the candidate frame of adjacent sequential relationship is metMerge one candidate sequence of composition

Calculate the video segment Q=..., F^Q... } and with each candidate sequence V^idBetween similarity VS^id：

Wherein, Q and V^idThe video segment and the candidate sequence are represented respectively, and MF is Q and V^idThe key that the match is successful Frame number,It is the similarity between each pair key frame of video that the match is successful, F^QIt is the pass in the video segment Key frame,It is each candidate frame in the candidate sequence；

So as to obtain the video segment Q and all candidate sequences similarity set VS=..., VS (Q, V^id),…}；

By VS (Q, V with maximum similarity in the similarity set^id) it is considered as the detection knot of the video segment Q Really, if its similarity VS (Q, V with the candidate sequence^id) be more than matching threshold, then the video segment Q is identified as institute State the video V in video library^id。

The invention also discloses a kind of device for detecting video segment, including such as lower module：

Extraction module, for extracting the corresponding key frame of all videos in video library, and carries out feature to the key frame Extract, obtain the corresponding local feature vectors of the key frame and global characteristics vector；

Index module, index is set up for the global characteristics vector to the key frame by the way of based on Hash table, And index is set up to the local feature of the key frame using the mode of falling row, form index database；

Matching module, for extracting key frame to be measured from video segment, will per frame key frame to be measured in the index Retrieval matching is carried out in storehouse, and returns to the candidate frame set of the video segment similar to the key frame to be measured；

Analysis module, for video to be merged into corresponding candidate frame analysis respectively by each key frame in the video segment Stream, according to the similarity of the video segment and the video flowing, in judging whether the video segment is the video library Video.

Further, the extraction module is specifically for extracting the piecemeal chroma histogram of current image frame as characteristics of image Vector, and Similarity Measure is carried out with the image feature vector of a upper key frame, it is such as dissimilar, then using current image frame as Whether the new key frame of one frame is added in key frame set, otherwise judges the value of counter equal to sampling interval value, equal then Preceding picture frame is added in key frame set as the new key frame of a frame.

Further, the extraction module is by the colouring information of key frame specifically for being normalized rear piecemeal, to each Three passages difference counting statistics feature of image block, and the characteristic vector cascade of three passages is produced into global characteristics vector.

Further, the index module is specifically for the global characteristics vector G after standardization^normCarry out Hash mapping, Cryptographic Hash H is

Wherein, ω_iIt is the corresponding weight of each characteristic component.

Further, the matching module is specifically for special according to the corresponding local feature of key frame to be detected and the overall situation respectively Levy and matched in index database, obtain the common factor of corresponding key frame of video as the candidate frame set.

Further, the analysis module includes,

Combining unit, for will belong to video numbering id and meet the candidate frame of adjacent sequential relationshipMerging group Into a candidate sequence

Similarity calculated, for calculate the video segment Q=..., F^Q... } and with each candidate sequence V^idIt Between similarity VS^id：

Wherein, Q and V^idThe video segment and the candidate sequence are represented respectively, and MF is Q and V^idThe key that the match is successful Frame number, is the similarity between each pair key frame of video that the match is successful, F^QIt is the key frame in the video segment,For institute State each candidate frame in candidate sequence；

Evaluation unit, for VS (Q, the V that will there is maximum similarity in the similarity set^id) it is considered as the piece of video The testing result of section Q, if its similarity VS (Q, V with the candidate sequence^id) be more than matching threshold, then by the piece of video Section Q is identified as the video V in the video library^id。

A kind of method and apparatus for detecting video segment disclosed by the invention, by the video file or real-time streams of current input Whether video detected and recognized by video analysis and search method and occur in that in current video stream and regard as object to be detected Video in frequency storehouse, and accurate location to the video segment in flow measurement to be checked positions, and export testing result.This hair The multi-features of the bright extraction global and local information of image, it is ensured that matching error rate is reduced while recall rate high, Content change to image keeps robustness, so as to be carried out accurately to the video segment in real-time or offline TV or video frequency program , there is frame losing in video flowing, mosaic block, different video resolution ratio and frame per second, video content occur and change (frame in positioning Picture material brightness change, object of which movement) when keep to the accurate detection of video segment and identification.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the method for detection video segment that the embodiment of the present invention one is provided；

Hash represents intention in a kind of method of detection video segment that Fig. 2 is provided for the embodiment of the present invention one；

Inverted index schematic diagram in a kind of method of detection video segment that Fig. 3 is provided for the embodiment of the present invention one；

Fig. 4 is a kind of function structure chart of the device of detection video segment that the embodiment of the present invention one is provided.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, further is made to the present invention below in conjunction with the accompanying drawings Detailed description.

The present invention is realized to the video in TV and video frequency program using video analysis and retrieval technique based on content Fragment is detected and is recognized that then the video file or real time streaming video that will be currently input into pass through as object to be detected Whether video analysis and search method detect and recognize the video occurred in that in current video stream in video library, and to the video Accurate location of the fragment in flow measurement to be checked is positioned, and exports testing result.Here video segment can be that certain is regarded Frequency sub-piece, or a complete video.

A kind of method for detecting video segment of the present invention, as shown in figure 1, comprising the following steps：

Step 101：The corresponding key frame of all videos in video library is extracted, and feature extraction is carried out to the key frame, Obtain the corresponding local feature vectors of the key frame and global characteristics vector；

When certain specific video segment is appeared in video file or video flowing, the starting and ending frame of the video segment Content and adjacent video stream content frame between some image differences occur, for this feature, in order to ensure to video Fragment carries out accurately mate and positioning, and the corresponding key frame of all videos in video library is extracted in the present embodiment.

In the extraction video library during the corresponding key frame of all videos, the piecemeal chroma histogram of current image frame is extracted As image feature vector, and Similarity Measure is carried out with the image feature vector of a upper key frame, it is such as dissimilar, then ought Preceding picture frame is added in key frame set as the new key frame of a frame, otherwise judges whether the value of counter is equal to the sampling interval Value is equal then to be added current image frame as the new key frame of a frame in key frame set.Specifically：

A, the piecemeal chroma histogram of extraction current image frame are used as image feature vector；

B, the similarity meter that the vectorial image feature vector with a upper key frame of this feature be based on COS distance Calculate (can certainly be other distance functions), if both are judged to dissmilarity, then it is assumed that current image frame is the new pass of a frame Key frame, is added into key frame set, otherwise into next step；

C, key frame sampling interval counter is added 1, judge whether the value of counter is equal to key frame sampling interval value, such as Both are equal for fruit, think that the frame, for key frame, is added into key frame set；

Extracting the corresponding key frame of each video can capture starting and ending of the video segment in TV and video frequency program Frame, while not influenceed by the duration and frame per second of video segment, either video long or short video clips, can extract foot Enough significant key frames.

Extract after key frame, it is necessary to extract can characterize the feature of image of these key frames and the characteristics of image of information to Amount, therefore, it is necessary to carry out feature extraction to key frame the characteristics of when occurring in TV and video frequency program for video segment. Specifically：

1st, picture size is normalized；

2nd, extracting local image characteristics is used to describe topography's information of key frame：First by Difference-of- Gaussian local shape factor operator extractions obtain set of characteristic points, then with 128-SIFT characteristic vectors to this feature point Image neighborhood is described；

3rd, extracting global image feature is used to describe the global image information of key frame：We believe the color of image first Breath is normalized, and carries out piecemeal to image；Then three passages to each image block distinguish counting statistics feature, including Average and standard deviation, finally produce final global image characteristic vector by the characteristic vector cascade of three passages；

In the present embodiment, the multi-features of the extraction global and local information of image so that it can be to image Content change keep robustness.

Step 102：Global characteristics vector to the key frame by the way of based on Hash table sets up index, and uses Row's mode sets up index to the local feature of the key frame, forms index database；

After obtaining the characteristic vector set of sign video, in order to accelerate the characteristic vector matching during real-time detection Speed, then set up quick indexing using these characteristic vectors.The characteristics of for global characteristics vector sum local feature vectors, point The Rapid matching of global and local image feature vector is not realized using two kinds of different indexed modes.

For global image feature, quick indexing is carried out by the way of based on Hash table in the present embodiment, its foundation Process is as follows：

1) by global characteristics vector G=(g₁,g₂,…g_n) carry out standardization processing：

Wherein, B is constant, for controlling quantified precision.

2) to the global characteristics vector G after standardization^normHash mapping is carried out, cryptographic Hash H is

Wherein, ω_iIt is the corresponding weight of each characteristic component.

3) by the key frame of all videos in video libraryGlobal image characteristic vector all by above step treatment after Obtain corresponding<H,G>It is saved in a Hash table T, wherein H is key, and G is value, as shown in Figure 2.

For local image characteristics, in the present embodiment, quick indexing is carried out using inverted index mode, its process is such as Under：

A) the 128-SIFT characteristic vectors for obtaining will be extracted in all key frames to be clustered, and generate K cluster centre C= {C₁,…C_i,…C_K, each cluster centre C_iIt is one 128 vector of dimension.In order to accelerate to cluster speed in cluster process, I Vector is calculated using approximate neighbor search mode the distance between with cluster centre；

B) for each key frame images, each local feature vectors (128-SIFT) that it is included are poly- with K Class center is entered row distance and is compared, and by closest that cluster centre C_iGeneric C_iThe local feature vectors are assigned, Then each the local image characteristics vector in the key frame images has been owned by a corresponding classification number；

C) inverted index I is set up：With the classification number C of K cluster centres_iAs the list item of inverted index, each list item correspondence The table of falling row chain then have recorded all key frames comprising the list itemInformation.Index list item in each table of falling row chain Have recorded classification number C_iOnce there is information：Contain such alias C_iKey frameThe position (x, y) of characteristic point, position are compiled Code (pos), yardstick (scale) and principal direction (orient).The inverted index for establishing is as shown in Figure 3.

Step 103：Key frame to be measured is extracted from video segment, will be entered in the index database key frame to be measured per frame Row retrieval matching, and return to the candidate frame set of the video segment similar to the key frame to be measured；

After establishing the corresponding index database of video library, will be completed using the index set up in video flowing Q to be detected Similar video segments carry out quick real-time detection in the storehouse of appearance, and are accurately positioned the starting and ending time of its appearance.

Key frame set is extracted from video flowing Q to be detected for characterizing the video flowing, Key-frame Extraction Algorithm and figure It is consistent with the algorithm that the index stage used is set up offline as feature extraction algorithm.

In order to adapt to the detection of live video stream, as soon as frame key frame is often extracted, as query image to index Retrieval and inquisition is carried out in storehouse, and carries out picture frame Similarity Measure and sequence, while it is crucial to return to similar video segment correspondence The candidate frame set of frame, its retrieving is as follows：

1. the newest frame key frame FQ in current input video stream is extracted according to Key-frame Extraction Algorithm, and it is complete to extract its Office's characteristics of imageAnd local image characteristicsWhereinIt is 128 dimensions Vector, m represents the local feature region number that the key frame is included, and n represents the length of global characteristics；

2. by local image characteristics L^QIt is quantified asWhereinRepresentIt is corresponding after quantization Classification number；

3. by V^QIn ownAll video segments correspondence key frame in corresponding inverted index chained listTake out and make It is the first candidate, according toMerge, and count eachComprising classification numberNumber, to it is all inquiry obtain Key frame according toNumber is ranked up from big to small, obtains the first ordered set

4. by global image feature G^QCarry out standardization processing and quantify to generate H^QHash key, with H^QIt is query object to Kazakhstan The global characteristics vector of correlation is inquired about in uncommon table T, by the corresponding candidate frame of global characteristics vectorTake out as the second candidate, obtain To the second ordered set

5. the first ordered set Set is sought_LWith the second ordered set Set_HCommon factor, the candidate frame collection for obtaining is combined into candidate's Matching key frame of video series：Set=Set_L∩Set_H；

6. by key frame F^QWith each candidate frame in Set setBetween have identical C_iCharacteristic point be considered as a pair It is right with puttingWherein (x, y) is image coordinate.In these matching double points, due to quantifying to miss Poor presence, can cause some erroneous matching situations, it is therefore desirable to carry out validity screening to these matching double points：

● judgePosition encoded pos it is whether identical, be considered wrong if difference Mismatching point to being deleted, otherwise into next step；

● judgePrincipal direction orient between whether differ larger, if difference compared with It is big then be considered error matching points to being deleted, otherwise into next step；

● judgeImage-region scaling multiple between whether differ larger, if Difference is larger, is considered error matching points to being deleted；

● to key frame F^QAnd candidate frameAfter all matching double points between the two complete above-mentioned judgement, one has been obtained Matching double points set after simplifying, using key frame F^QAnd candidate frameCharacteristic point coordinate position calculate most of between them The affine transformation relationship that matching double points all meet.In embodiment, using RANSAC (RANdom SAmple Consensus, with Machine is sampled consistency algorithm) algorithm realizes above-mentioned selection process.

● key frame F will be unsatisfactory for^QAnd candidate frameThe matching double points of affine variation relation are deleted, and obtain key frame F^QWith Candidate frameBetween matching point set；

7. pair candidate frame setIn all candidate key-frames be ranked up, according to candidate frameWith Key frame F^QBetween the point logarithm that the match is successful be ranked up from big to small.

By corresponding key frame is processed as above during process obtains the video similar with current queries key frame FQ as time Select frame setDue to having used the mode of quick indexing, therefore, the matching efficiency between key frame is higher, In addition, employing the feature matching method that global characteristics vector sum local feature vectors are blended, it is ensured that recall rate high it is same When reduce the error rate of matching.

Step 104：By each key frame in the video segment, video flowing is merged into corresponding candidate frame analysis respectively, according to The similarity of the video segment and the video flowing, judges whether the video segment is video in the video library.

For current video segment to be detected, often extract a frame key frame, all inquired about by previous step obtain with Matching candidate frame setThen need on this basis according to video in interframe sequential relationship and affiliated storehouse Numbering, to video flowing to be detected in the retrieval result of each key frame be analyzed merging, so as to whether judge candidate video Recognize successfully and be accurately positioned on the basis of recognizing successfully, detailed process is as follows：

First, the similarity between each pair key frame of video that the match is successful and candidate frame is calculated, using equation below：

Wherein, MP is F^QWith the point logarithm that the match is successful, S, R, T are respectively F^QWithBetween image scaling multiple, figure As the anglec of rotation, image translation pixel distance, FS is normalized between [0,1]；

Then, will belong to same video number and meet adjacent sequential relationship matching candidate frameMerge composition one Individual candidate sequence

Calculate video segment Q={ .., F to be checked^Q... } and each candidate sequence V^idBetween similarity VS^id：

Wherein, Q and V^idThe video segment and the candidate sequence are represented respectively, and MF is Q and V^idThe key that the match is successful Frame number,It is the similarity between each pair key frame of video that the match is successful, F^QIt is the pass in the video segment Key frame,It is each candidate frame in the candidate sequence.

Finally, VS is normalized between [0,1]；It is similar to all candidate sequences so as to obtain the video segment Q Degree set VS=..., VS (Q, V^id),…}。

In the present embodiment, by VS (Q, V with maximum similarity in current time fragment^id) as the inspection of video segment Result is surveyed, if its VS (Q, V^id)>T(0<T≤1, T values are bigger to represent higher to matching confidence level requirement), then it is assumed that piece of video The matching confidence reliability of section, will the video segment Q video V that are identified as in video library^id, the starting and ending of video segment Q Frame time has just corresponded to the position [start, end] that Q occurs in current detection video flowing.

By above step, it is possible to carry out video segment detection and identification to current input video file or real-time streams, When there is the video segment similar to video library in detecting current video stream and recognize successfully, will be according to sequence of frames of video Matching result it is accurately positioned, when reporting starting and ending of the video segment for recognizing in the detection video flowing Between, and recognition result confidence level.

The invention also discloses a kind of device for detecting video segment, as shown in figure 4, including such as lower module：

Extraction module 401, for extracting the corresponding key frame of all videos in video library, and carries out spy to the key frame Extraction is levied, the corresponding local feature vectors of the key frame and global characteristics vector is obtained；

Index module 402, rope is set up for the global characteristics vector to the key frame by the way of based on Hash table Draw, and index is set up to the local feature of the key frame using the mode of falling row, form index database；

Matching module 403, for extracting key frame to be measured from video segment, will per frame key frame to be measured in the rope Draw and carry out in storehouse retrieval matching, and return to the candidate frame set of the video segment similar to the key frame to be measured；

Analysis module 404, for the analysis of corresponding candidate frame to be merged into and regarded respectively by each key frame in the video segment Frequency flows, according to the similarity of the video segment and the video flowing, in judging whether the video segment is the video library Video.

Further, the extraction module is specifically for using Difference-of-Gaussian local shape factor operators Local Features Analysis are carried out to key frame, extraction obtains set of characteristic points, and with 128-SIFT characteristic vectors to this feature point Image neighborhood is described.

Wherein, ω_iIt is the corresponding weight of each characteristic component.

Further, the analysis module includes,

Combining unit, for will belong to video numbering id and meet the candidate frame of adjacent sequential relationshipMerge composition One candidate sequence

Similarity calculated, for calculating video segment Q={ .., the F^Q... } and with each candidate sequence V^idBetween Similarity VS^id：

Evaluation unit, for VS (Q, the V that will there is maximum similarity in the similarity set^id) it is considered as the piece of video The testing result of section Q, if its similarity VS (Q, V with the candidate sequence^id) be more than matching threshold, then by the video Fragment Q is identified as the video V in the video library^id。

A kind of method and apparatus for detecting video segment disclosed by the invention, by the video file or real-time streams of current input Whether video detected and recognized by video analysis and search method and occur in that in current video stream and regard as object to be detected Video in frequency storehouse, and accurate location to the video segment in flow measurement to be checked positions, and export testing result.This hair The multi-features of the bright extraction global and local information of image, it is ensured that matching error rate is reduced while recall rate high, Content change to image keeps robustness, so as to be carried out accurately to the video segment in real-time or offline TV or video frequency program , there is frame losing in video flowing, mosaic block, different video resolution ratio and frame per second, video content occur and change (frame figure in positioning As content brightness change, object of which movement) when keep the accurate detection to video segment and identification.

The above, specific embodiment only of the invention, but protection scope of the present invention is not limited thereto, and it is any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention described should be defined by scope of the claims.

Claims

1. it is a kind of detect video segment method, it is characterised in that comprise the following steps：

The corresponding key frame of all videos in video library is extracted, and feature extraction is carried out to the key frame, obtain the key The corresponding local feature vectors of frame and global characteristics vector；

Global characteristics vector to the key frame by the way of based on Hash table sets up index, and using the mode of falling row to institute The local feature for stating key frame sets up index, forms index database；

Key frame to be measured is extracted from video segment, retrieval matching will be carried out in the index database by key frame to be measured per frame, And return to the candidate frame set of the video segment similar to the key frame to be measured；

By each key frame in the video segment, video flowing is merged into corresponding candidate frame analysis respectively, according to the video segment With the similarity of the video flowing, judge whether the video segment is video in the video library, including：

{VS}^{i d} (Q, V^{i d}) = l o g (M F) * Σ F S (F^{Q}, F_{i d}^{R})

Wherein, Q and V^idThe video segment and the candidate sequence are represented respectively, and MF is Q and V^idThe crucial frame number that the match is successful,It is the similarity between each pair key frame of video that the match is successful, F^QIt is the key frame in the video segment,It is each candidate frame in the candidate sequence；

By VS (Q, V with maximum similarity in the similarity set^id) it is considered as the testing result of the video segment Q, such as Really its similarity VS (Q, V with the candidate sequence^id) be more than matching threshold, then the video segment Q is identified as described regarding Video V in frequency storehouse^id。

2. method according to claim 1, it is characterised in that：The corresponding key frame of all videos in the extraction video library When, extract current image frame piecemeal chroma histogram as image feature vector, and with the characteristics of image of a upper key frame Vector carries out Similarity Measure, such as dissimilar, then added current image frame as the new key frame of a frame in key frame set, no Then key frame sampling interval counter is added 1, whether judge the value of counter equal to sampling interval value, it is equal then by present image Frame is added in key frame set as the new key frame of a frame.

3. method according to claim 1, it is characterised in that：It is described when carrying out feature extraction to key frame, by key frame Colouring information be normalized rear piecemeal, three passages of each image block are distinguished with counting statistics feature, and logical by three The characteristic vector cascade in road produces global characteristics vector.

4. method according to claim 1, it is characterised in that：It is described using based on by the way of Hash table to the complete of key frame When office's characteristic vector sets up index, Hash mapping is carried out to the global characteristics vector Gnorm after standardization, cryptographic Hash H is

H = Σ_{i = 1}^{n} G_{i}^{N o r m} * ω_{i}

Wherein, ω_iIt is the corresponding weight of each characteristic component.

5. method according to claim 1, it is characterised in that：It is described key frame to be measured to be examined in index database per frame Rope timing, is matched according to the corresponding local feature of key frame to be detected and global characteristics in index database respectively, is obtained The common factor of corresponding key frame of video is used as the candidate frame set.

6. it is a kind of detect video segment device, it is characterised in that including such as lower module：

Extraction module, for extracting the corresponding key frame of all videos in video library, and carries out feature extraction to the key frame, Obtain the corresponding local feature vectors of the key frame and global characteristics vector；

Index module, sets up index, and adopt for the global characteristics vector to the key frame by the way of based on Hash table Index is set up to the local feature of the key frame with the mode of falling row, index database is formed；

Matching module, for extracting key frame to be measured from video segment, will per frame key frame to be measured in the index database Retrieval matching is carried out, and returns to the candidate frame set of the video segment similar to the key frame to be measured；

Analysis module, for video flowing, root to be merged into corresponding candidate frame analysis respectively by each key frame in the video segment According to the similarity of the video segment and the video flowing, judge whether the video segment is video in the video library, Including：

Similarity calculated, for calculate the video segment Q=..., F^Q... } and with each candidate sequence V^idBetween Similarity VS^id：

{VS}^{i d} (Q, V^{i d}) = l o g (M F) * Σ F S (F^{Q}, F_{i d}^{R})

Wherein, Q and V^idThe video segment and the candidate sequence are represented respectively, and MF is Q and V^idThe crucial frame number that the match is successful, It is the similarity between each pair key frame of video that the match is successful, F^QIt is the key frame in the video segment,It is the time Select each candidate frame in sequence；

Evaluation unit, for VS (Q, the V that will there is maximum similarity in the similarity set^id) it is considered as the video segment Q Testing result, if its similarity VS (Q, V with the candidate sequence^id) be more than matching threshold, then by the video segment Q It is identified as the video V in the video library^id。

7. device according to claim 6, it is characterised in that：The extraction module is specifically for extracting current image frame Piecemeal chroma histogram carries out Similarity Measure as image feature vector with the image feature vector of a upper key frame, Such as dissmilarity, then added current image frame as the new key frame of a frame in key frame set, otherwise by the key frame sampling interval Whether counter adds 1, judges the value of counter equal to sampling interval value, equal then using current image frame as the new key frame of a frame In addition key frame set.

8. device according to claim 6, it is characterised in that：The extraction module is specifically for the color of key frame is believed Breath is normalized rear piecemeal, and three passages of each image block are distinguished with counting statistics feature, and by three features of passage Vector cascade produces global characteristics vector.

9. device according to claim 6, it is characterised in that：The index module is specifically for the overall situation after standardization Characteristic vector Gnorm carries out Hash mapping, and cryptographic Hash H is

H = Σ_{i = 1}^{n} G_{i}^{N o r m} * ω_{i}

Wherein, ω_iIt is the corresponding weight of each characteristic component.

10. device according to claim 6, it is characterised in that：The matching module is specifically for respectively according to be detected The corresponding local feature of key frame and global characteristics are matched in index database, and the common factor for obtaining corresponding key frame of video is made It is the candidate frame set.