CN103077203A

CN103077203A - Method for detecting repetitive audio/video clips

Info

Publication number: CN103077203A
Application number: CN2012105802599A
Authority: CN
Inventors: 李伟忠; 杨磊
Original assignee: QINGDAO AIWEI INTERACTIVE INFORMATION TECHNOLOGY Co Ltd
Current assignee: QINGDAO AIWEI INTERACTIVE INFORMATION TECHNOLOGY Co Ltd
Priority date: 2012-12-28
Filing date: 2012-12-28
Publication date: 2013-05-01

Abstract

The invention discloses a method for detecting repetitive audio/video clips. The method comprises the following steps of: A, segmenting a video streaming in a to-be-detected audio/video clip to obtain video streaming clips, and extracting video features from each video streaming clip; B, detecting a video sequence of audio/video clips with the repeatability in the to-be-detected audio/video clip to obtain the repetitive audio/video clips, wherein the audio/video clips with the repeatability are respectively matched with the video streaming clips; C, accurately positioning respective starting time points and end time points of the repetitive audio/video clips by utilizing fine-grit video matching for the repetitive audio/video clips; and D, merging matching pairs by using a sequence-based method for the repetitive audio/video clips accurately positioned in the step C. According to the method for detecting the repetitive audio/video clips, specific audio/video clips in an audio/video can be accurately and efficiently detected.

Description

A kind of detection method of repeated audio frequency and video fragment

Technical field

The present invention relates to the detection technique field of video video segment, particularly relate to a kind of detection method of repeated audio frequency and video fragment.

Background technology

The automatic detection of audio frequency and video is exactly to utilize computing machine to look, automatically detect specific audio frequency and video fragment the sound stream and accurately locate the position of this specific audio frequency and video fragment from one section.At present the method for the automatic detection of common audio frequency and video be based on the method for rule, based on the method for sign, based on the method for identification etc.Rule-based method is that some rules are formulated in artificial going for the Some features of audio frequency and video.But the defective of art methods is to select to represent that the feature of audio frequency and video is not enough stablizing sometimes, so be difficult to set up unified detection system by these features.

Method based on sign, detect the audio frequency and video fragment such as the station symbol by TV station, but, at present a lot of TV stations do not conceal station symbol when intercutting specific audio frequency and video fragment (such as advertisement), and this phenomenon is more and more, so this method that detects the audio frequency and video fragment by station symbol had just lost efficacy.Based on the method for identification, then the database that the method requires to have in advance a very large very complete storage advertisement utilizes this database identification to be embedded in the audio frequency and video fragment of TV programme the inside, and the method can not detect non-existent audio frequency and video fragment in the database.Above-mentioned existing detection method nearly all is to realize the detection of audio frequency and video by video features simultaneously.Because video itself, these method desired data amounts are large, and the feature complicacy is high, and computing velocity is slow.

Summary of the invention

The object of the present invention is to provide a kind of detection method and system of repeated audio frequency and video fragment, it can detect the specific audio frequency and video fragment in the audio frequency and video more accurately and efficiently.

For realizing the detection method of a kind of repeated audio frequency and video fragment that the object of the invention provides, comprise the following steps:

Steps A is cut apart the video flowing in the audio frequency and video fragment to be detected and is obtained the video flowing fragment, and extracts video features from each video flowing fragment;

Step B, the video sequence of the audio frequency and video fragment with repeatability of detection and video flowing fragment match obtains repeated audio frequency and video fragment in the audio frequency and video fragment.

Step C to described repeated audio frequency and video fragment, utilizes fine-grained video matching accurately to locate the starting and ending time point of repeated audio frequency and video fragment;

Step D, to the repeated audio frequency and video fragment of location accurately among the step C, use method based on sequence to coupling to merging, obtain complete repeated audio frequency and video fragment.

Described steps A comprises the following steps:

Steps A 1 is divided into a plurality of video flowing fragments with the video flowing in the audio frequency and video fragment to be detected, and each video flowing fragment is for carrying out mark take a time unit as a unit;

Steps A 2 is extracted the video features parameter from the video flowing fragment.

Described video features parameter comprises the Mel-cepstrum coefficient, a parameter or more than one parameter combinations in zero-crossing rate and the short-time energy.

Described extraction Mel-cepstrum coefficient, zero-crossing rate and short-time energy comprise the following steps:

The video data volume that gathers take 40ms is as a frame, and adjacent video frames does not repeat, and extracts 12 Mel-cepstrum coefficients, zero-crossing rate and short-time energies totally 14 parameters, consists of the 14 frame characteristic parameters of tieing up.

The described video sequence that detects in audio/video flow with the audio frequency and video fragment with repeatability of video flowing fragment match is by utilizing Euclidean distance to realize as the distance measure of coarseness similarity matching.

The described Euclidean distance of utilizing comprises the following steps: as the distance measure of coarseness similarity matching

Utilize Euclidean distance to find the audio frequency and video small fragment with repeatability of all and its coupling as the distance measure of coarseness similarity matching, and stipulate that the part that repeats in two small fragments just is the coupling fragment sequence greater than half.

Among the described step D, coupling to merging, is comprised the following steps:

The repeated audio frequency and video fragment of the every a pair of exact matching that matches for detection, it is right less than all couplings of pre-set threshold value TT to search respectively with their time interval, new-found coupling to the coupling of being connected to connecting, and then the duplicate detection matching process, until new coupling right start and end time that the coupling that does not satisfy condition to existing, obtains is exactly the starting and ending time of a complete repeated audio frequency and video fragment.

The invention has the beneficial effects as follows: the detection method of repeated audio frequency and video fragment of the present invention, utilize repeated audio frequency and video fragment) repeatability be the most stable feature of repeated audio frequency and video fragment, more stable than its its feature and rule, so accuracy rate can be higher.The present invention adopts video features to detect repeated audio frequency and video fragment in addition, only just be enough to represent, distinguish repeated audio frequency and video fragment and normal program with video information, and video information have still less lower distinguishing feature and calculated amount still less and the more efficient computing velocity of data volume, complicacy than video information.

Description of drawings

Fig. 1 is the detection method process flow diagram of the present invention's repeatability audio frequency and video fragment;

Embodiment

Introduce in detail the detection method of the present invention's repeatability audio frequency and video fragment below in conjunction with above-mentioned target, comprise the following steps:

Step S100 is cut apart the video flowing in the audio frequency and video fragment to be detected and is obtained the video flowing fragment, and extracts video features from each video flowing fragment;

Step S110 is divided into a plurality of video flowing fragments with the video flowing in the audio frequency and video fragment to be detected, and each video flowing fragment is for carrying out mark take a time unit as a unit.

Video flowing in the audio frequency and video fragment to be detected being extracted, this section video flowing is divided into one by one small video clips, is the important preparation process that video detects.

Here so-called cut apart be not one large section video flowing real be divided into one by one n(such as n=5) second small fragment, but artificial make a mark every n fragment second, then process take n video flowing second as the video flowing of a unit.Its fundamental purpose is convenient extraction feature, more efficiently carries out various processing.

The present invention, is cut apart the video flowing in the audio frequency and video fragment to be detected as the elementary cell of divided video stream not have overlapping length as 10 seconds video flowing fragment.

Selecting 10 seconds is a detecting unit, be because the length of repeated audio frequency and video fragment generally all greater than 10 seconds, whether the repeated audio frequency and video fragment of repeated matching is not arranged so just do not spend in detecting these 10 seconds, and can also guarantee to find the repeated audio frequency and video fragment sequence of all repetitions.

Step S120 extracts the video features parameter from the video flowing fragment.

The video features parameter attribute extracts and refers to seek the raw video signal expression-form, and extraction can represent the data of original signal.

The video data volume that the present invention gathers take 40ms is as a frame, adjacent video frames do not repeat, be that frame moves also be 40ms, extract 12 Mel-cepstrum coefficients (Mel Frequency Cepstrum Coeficient, MFCC), zero-crossing rate and short-time energy totally 14 parameters, consist of the frame characteristic parameters of 14 dimensions.

1) extracts Mel-cepstrum coefficient (Mel-Frequency Cepstral Coefficients, MFCC).

The MFCC feature is the feature that often adopts in speech recognition and the Speaker Identification, and it is to utilize the triangular filter group that the filtering of Fourier transform energy coefficient is got, and its frequency domain is carried out Mei Er (Mel) change of scale, more meets human auditory properties.

2) extract zero-crossing rate (Zero-Crossing Rate, ZCR)

Zero-crossing rate refers to that signal value is by the number of times of null value in the unit interval.It has illustrated the average signal frequency to a certain extent.When discrete two adjacent samplings of time signal had different symbols, " zero passage " phenomenon just appearred.

ZCR = \frac{1}{2 (N - 1)} Σ_{m = 1}^{N - 1} | sgn [x (m + 1) - sgn [x (m)] |

Wherein, sgn[.] the is-symbol function, x (m) is the sampled value of vision signal.

3) extract short-time energy (Short Time Energy)

The energy spectrometer of vision signal is based on the vision signal energy sizable variation in time. and short-time average energy has illustrated the intensity of vision signal.

E_{n} = Σ_{m = 0}^{N - 1} x_{n}^{2} (m)

Wherein, x (m) is the sampled value of vision signal.

Step S200 utilizes Euclidean distance as the distance measure of coarseness similarity matching, and the video sequence of the audio frequency and video fragment with repeatability of detection and video flowing fragment match tentatively obtains repeated audio frequency and video fragment in audio/video flow;

It is exactly the starting and ending position of locating rapidly and accurately repeated audio frequency and video fragment in a large amount of audio/video flows that repeatability audio frequency and video fragment detects the most key stage.

In order to realize above-mentioned requirements, the present invention adopts video information to detect potential matching sequence.Selecting video information, is because video has the lower distinguishing feature of still less data volume, complicacy and calculated amount still less and more efficient computing velocity than video.And can reach the effect that video information detects.

Utilization cut apart good length be 10 seconds video flowing fragment as a probe, all radio and television stream of search in audio/video flow comprises TV feed and other TV feed at this 10 seconds places.

Preferably, utilize Euclidean distance to find the audio frequency and video small fragment with repeatability of all and its coupling as the distance measure of coarseness similarity matching, and stipulate that the part that repeats in two small fragments just is the coupling fragment sequence greater than half.

Utilize Euclidean distance as the distance measure of coarseness similarity matching, the method for employing is to calculate frame level Euclidean distance D1 every 10 frames, and such advantage is that calculated amount is to calculate frame by frame 1/10th of Euclidean distance, and almost not loss of degree of accuracy.

D 1 = Σ_{n = 1}^{N} \sqrt{{(a_{n} - b_{n})}^{2}}

Wherein N is the frame characteristic parameter of 14 dimensions.

Obtain behind the distance matrix by comparing with pre-set matching threshold TD, if exist 7 or above frame number (being that length was greater than 2.10 seconds) less than pre-set matching threshold TD, just think that this audio frequency and video fragment is the video sequence of audio frequency and video fragment with repeatability with the video flowing fragment match, repeated audio frequency and video fragment similar, coupling is arranged in audio/video flow, tentatively obtain this repeatability audio frequency and video fragment, it is right namely to mate.

Step S300 to the repeated audio frequency and video fragment that tentatively obtains among the step S200, utilizes fine-grained video matching accurately to locate the starting and ending time point of repeated audio frequency and video fragment;

The coarseness similarity matching methods also has the limitation of two aspects: 1) because above-mentioned coupling is rough coupling, there are deviation to a certain extent in coupling border and real repeated audio frequency and video segment boundaries; 2) can only detect greater than half matching sequence of fragment length, be invalid for length less than half matching sequence.

The coupling that detects for the distance measure that utilizes Euclidean distance as the coarseness similarity matching is right, utilizes the Euclidean distance D2 of improvement type to redeterminate the starting and ending time point of repeated audio frequency and video fragment as the distance measure of fine granularity similarity matching.

D 2 = Σ_{n = 1}^{N} | a_{n} - b_{n} |

Wherein N is the frame characteristic parameter of 14 dimensions.

Right for every a pair of 10 seconds couplings that find, they are labeled as A ₂And B ₂Then calculate respectively A ₂B ₂, A ₂B ₁, A ₁B ₂, A ₁B ₁, A ₂B ₃, A ₃B ₂, A ₃B ₃Improvement type Euclidean distance frame by frame obtains behind the distance matrix writing down respectively the position less than the point of threshold value TD by comparing with pre-set matching threshold TD.So just can accurately locate the time of beginning and the end of repeated audio frequency and video fragment.

Step S400, to the repeated audio frequency and video fragment of location accurately among the step S300, use method based on sequence to coupling to merging, obtain complete repeated audio frequency and video fragment.

Owing to being that audio frequency and video are divided into 10 seconds is a unit, so when the length of repeated audio frequency and video fragment during greater than 10 seconds, a complete repeated audio frequency and video fragment will be by excessive cutting apart.Therefore to the repeated audio frequency and video fragment of an over-segmentation be merged.

Utilization of the present invention merges a complete repeated audio frequency and video fragment based on the method for sequence.

10 seconds repeated audio frequency and video fragments of the every a pair of exact matching that matches for detection, it is right less than all couplings of pre-set threshold value TT to search respectively with their time interval, new-found coupling to the coupling of being connected to connecting, and then the duplicate detection matching process, until new coupling right start and end time that the coupling that does not satisfy condition to existing, obtains is exactly the starting and ending time of a complete repeated audio frequency and video fragment.

Below by being that advertisement in 10 minutes the broadcast TV program detects as an example to a segment length, introduce in detail the implementation process of the detection method of repeated broadcast TV program fragment of the present invention.

Whole process is divided into four-stage substantially: video flowing cut apart extraction with video features; The right detection of video sequence coupling with repeatability; Utilize fine-grained video matching to come the starting and ending time point of accurate positioning advertising; Use based on the method for sequence to coupling to merging.

Video flowing cut apart extraction stage with video features, this stage is divided into 10 minutes broadcast TV program fragment do not have 120 overlapping length to be 10 seconds small fragments, then respectively 120 small fragments are carried out feature extraction, the feature of extracting comprises: 12 MFCC, zero-crossing rate, short-time energy, the frame length that adopts is 40ms, frame moves and is 40ms, forms the proper vector of 14 dimensions.

Each small fragment has the proper vector of 125 14 dimensions.

For example this segment length is that 2 different advertisements are arranged in 10 minutes the TV programme: new * * *, the Shandong * * * *.Wherein new * * * position that occurs 2 times is respectively 10-25 second (the 3rd, 4,5 section), 123-138 second (the 25th, 26,27,28 section); The Shandong * * * * position that occurs 2 times is respectively 30-50 second (the 7th, 8,9,10 section), 155-175 second (the 32nd, 33,34,35 section).

Have the right detection-phase of video sequence coupling of repeatability, utilize above-mentioned proper vector to calculate 125 small fragment Euclidean distance D1 between any two, if the identical length of certain two small fragment was greater than half of fragment length, namely 2.5 seconds.These two small fragments of mark mate.

Utilize the proper vector of above-mentioned 125 small fragments to pass through formula

D 1 = Σ_{n = 1}^{N} \sqrt{{(a_{n} - b_{n})}^{2}}

Wherein N is the frame characteristic parameter of 14 dimensions.

Calculate Euclidean distance between any two, if the identical length of certain two small fragment was greater than half of fragment length, namely 2.10 seconds.These two small fragments of mark mate.Then (3,26), (4,27), (5,28) are similar; (7,32), (8,33), (9,34), (10,35) are similar.

Utilize fine-grained video matching to locate accurately the starting and ending time point stage of repeated broadcast TV program fragment, to the above-mentioned coupling small fragment that finds pair, calculate the Euclidean distance of its improvement type, the start and end time point of these two small fragment couplings of accurate mark.

Wherein N is the frame characteristic parameter of 14 dimensions, calculates between any two and (3,25) (2,26) (1,25) (5,29) (6,28) (6,29), (7,31) (6,32) Euclidean distance of (6,31) (10,36) (11,35) (11,36) improvement type.The 0-2 that can obtain at last the 3rd section is similar second with the 25th section 3-10 second, and the 3rd section 2-10 is similar second with the 26th section 0-3 second, and the 4th section 0-2 is similar second with the 26th section 3-10 second, the 4th section 2-10 is similar second with the 27th section 0-3 second, and the 5th section 0-2 is similar second with the 27th section 3-10 second, and the 5th section 2-10 is similar second with the 28th section 0-3 second, and (7,32), (8,33), (9,34), (10,35) then are fully similar.

Use based on the method for sequence to coupling to carrying out merging phase, by the above-mentioned right sequence number of exact matching small fragment that finds, it is merged.

10 seconds fragments for the every a pair of exact matching that finds above, merge according to the method based on sequence, namely obtain the 3rd section and began in 0 second to finish by 10 seconds of the 5th section and the 25th section the end in the 3rd second that began to the 28th section in the 4th second, length be respectively 110 seconds be exactly that 10-210 second and 123-138 mate second.And (7,32), (8,33), (9,34), (10,35) they are to mate then to be 30-50 second and 155-1710 mates second fully.

In sum, the detection method of repeated audio frequency and video fragment of the present invention is by the extraction with video features cut apart of the video flowing in the renaturation audio frequency and video fragment; The right detection of video sequence coupling with repeatability; Utilize fine-grained video matching to locate accurately the starting and ending time point of repeated audio frequency and video fragment; Use based on the method for sequence to coupling to merging.So the method is the application of a kind of video content analysis and retrieval, it utilizes the video features in the audio frequency and video, detect automatically the repeated audio frequency and video fragment that repeats in the audio frequency and video, and can accurately locate the position of the repeated audio frequency and video fragment that all repeat with mark.

Claims

1. the detection method of a repeated audio frequency and video fragment is characterized in that, comprises the following steps:

2. the detection method of repeated audio frequency and video fragment according to claim 1 is characterized in that, described steps A comprises the following steps:

3. the detection method of repeated audio frequency and video fragment according to claim 2 is characterized in that, described video features parameter comprises the Mel-cepstrum coefficient, a parameter or more than one parameter combinations in zero-crossing rate and the short-time energy.

4. the detection method of repeated audio frequency and video fragment according to claim 3 is characterized in that, described extraction Mel-cepstrum coefficient, and zero-crossing rate and short-time energy comprise the following steps:

5. the detection method of repeated audio frequency and video fragment according to claim 1 and 2, it is characterized in that, the described video sequence that detects in audio/video flow with the audio frequency and video fragment with repeatability of video flowing fragment match is by utilizing Euclidean distance to realize as the distance measure of coarseness similarity matching.

6. the detection method of repeated audio frequency and video fragment according to claim 5 is characterized in that, the described Euclidean distance of utilizing comprises the following steps: as the distance measure of coarseness similarity matching

7. the detection method of repeated audio frequency and video fragment according to claim 1 is characterized in that, among the described step D, coupling to merging, is comprised the following steps: