CN101894251A

CN101894251A - Video detection method and device

Info

Publication number: CN101894251A
Application number: CN200910084336XA
Authority: CN
Inventors: 杨显锋; 刘晓蓉; 袁敏; 马爽
Original assignee: Academy of Broadcasting Science of SAPPRFT
Current assignee: Academy of Broadcasting Science Research Institute; Academy of Broadcasting Science of SAPPRFT
Priority date: 2009-05-21
Filing date: 2009-05-21
Publication date: 2010-11-24

Abstract

The invention discloses a video detection method and a video detection device which have low redundancy rate and can detect a sample video with inconsistent shot boundaries. The method comprises the following steps of: determining standard duration L and a detection interval T of the sample video; intercepting a video fragment of which the time length is L+T forward from a current detection point as a detection area; and extracting the video characteristic of the standard sample video and judging whether a video fragment which is accordant with the video characteristic exists in the current detection area or not, and if so, determining that received video stream comprises the sample video, wherein a method for extracting the video characteristic in the step comprises the following steps of: A, sampling video frames; B, calculating the color characteristic and the texture characteristic of each sampled frame and performing Gaussian normalization processing on the texture characteristic of each frame; and C, clustering the color characteristic and the texture characteristic of each sampled frame, taking the cluster center of the color characteristic as the color characteristic of the video and taking the cluster center of the texture characteristic as the texture characteristic of the video.

Description

A kind of video detecting method and device

Technical field

The invention belongs to the video identification technology field, relate in particular to a kind of video detecting method and device.

Background technology

Appearance along with new media form of services such as the carrying out of the development of information network, cable television digital transformation and mobile TV, mobile TV, Web TV, IPTV makes that the copyright protection situation of Digital Media is more and more severeer.

Realize protection, at first want and accurately to discern the video content that broadcasts, judge whether it includes the video content of propagating without authorization without the copyright owner agrees digital publishing rights.Simultaneously; the video content that broadcasts is accurately discerned, not only helped protection, also help based on the state's laws rules to digital publishing rights; the video playback business is carried out many-sided management and regulation, the sustainable benign development that guarantees the digital content industry is had very important significance.

Video detecting method commonly used at present mainly contains two kinds, a kind of is to be that a recognition unit is discerned with a complete camera lens, promptly be that cut-point is discerned with the camera lens, because the border of sample video is arbitrarily, not necessarily overlap with camera lens, thus this method can not realize to the border arbitrarily the sample video correctly discern so-called sample video, promptly as the known video of sample, the purpose of Video Detection promptly is to differentiate in the video of broadcast whether comprise certain sample video; Another kind is to discern at the topography in the video clips, and for example intelligent transportation recognition technology the inside is to vehicle detection, and this method can not realize complete video clips is discerned.

Another difficult point of Video Detection is between video copy and the sample video to exist distortion, for example variations such as picture size, bit rate, frame per second, and how the different video copy that detects of robust is one of gordian technique of the essential solution of recognizer institute.

The present invention proposes a kind of new video detecting method and device, to overcome the above-mentioned defective that exists in the prior art.

Summary of the invention

It is less that the technical problem to be solved in the present invention provides a kind of redundance, and the video detecting method and the device that can detect its border and the inconsistent sample video of shot boundary.

For solving the problems of the technologies described above, video detecting method of the present invention comprises the steps:

The first step, the standard duration L that determines the sample video and assay intervals T;

In order to obtain more complete sum testing result accurately, Video Detection of the present invention can be a lasting process, described assay intervals refers to implement the time interval between twice detection, i.e. distance between two adjacent surveyed area check points, the check point of each surveyed area is uniform on time shaft.

Second step, in the video flowing that receives, selected moment from this current check point, intercepts the long video clips of L+T forward as surveyed area as current detection point; And in the sample video, intercept the long video segment of L as the master sample video;

The 3rd step, extract the video features of described master sample video, judge whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video flowing of reception if exist; If do not exist and then think and do not comprise this sample video in the described surveyed area, and carried out for the 4th step;

The 4th goes on foot, chooses in a manner described next surveyed area, judge and whether have video clips isometric with described master sample video and that video features is consistent in this surveyed area, then think and comprise this sample video in the video flowing that receives if exist, then do not continue to choose next surveyed area and detect if do not exist, reach predetermined number up to the surveyed area number of using, if do not find yet and the consistent video segment of described master sample video video features this moment, then think not comprise this sample video in the video flowing that receives.

In the first step, why a duration that is shorter than the sample video usually mainly is the consideration for real-time as the standard duration.Described standard duration is provided with shortly more, and then real-time is good more, but the short more accuracy of identification of standard duration is low more simultaneously.Therefore, the requirement of the requirement that determine to want comprehensive real-time of described standard duration and accuracy of identification and deciding.If the length of sample video itself can requirement of real time, then the desirable sample video of L is long.

In second step, during intercepting master sample video, can be that the starting point from the sample video intercepts backward, just can be detected when this sample video has just broadcasted the L duration in practical video streaming like this, thereby can reduce to detect delay.

In the 3rd step, can adopt described master sample video is slided in described surveyed area, see and whether follow the mode of certain video clips unanimity to detect.

In the 3rd step, can adopt video features abstracting method of the prior art to extract video features, can be the characteristics of image that extracts frame of video, also can be the motion feature that extracts video.

In the 3rd step, can also adopt following steps to extract video features:

The first step, frame of video is sampled;

Second goes on foot, calculates the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, the codomain of respectively tieing up component of textural characteristics is all dropped between designation area, for example on [0,1] interval.The benefit of carrying out normalized is, makes and respectively ties up component when carrying out characteristic distance calculating, the contribution basically identical.The preferred Gauss's normalization method of the present invention is handled.

The 3rd the step, with each the sampling frame color characteristic and textural characteristics carry out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.The benefit of cluster is the influence and the distortion of a few frames that can smooth noise or loses, and reduces the complexity of video coupling.The preferred K-Means clustering algorithm of the present invention carries out clustering processing.

Among the present invention, be consistent with the mode that corresponding video segment in the described surveyed area is extracted video features to described master sample video.

When the video that broadcasts was detected, if a more than sample video, the best sample video of corresponding video segment consistance was as testing result in the video flowing of getting and receiving.Also can preestablish a similarity threshold values, when only the similarity of corresponding video fragment is higher than this threshold values in the video flowing of sample video and reception, just thinking has the sample video to be detected, otherwise thinks that all sample videos all are not comprised in the video flowing of broadcast.That is to say,, when having a more than sample video to be contemplated as falling with in the video flowing of reception, think only to comprise in video features and the described surveyed area the best sample video of video segment consistance accordingly in the video flowing that receives according to this method.

For solving the problems of the technologies described above, video detecting device of the present invention comprises:

Broadcast video acquisition module, surveyed area are chosen module and sample video identification module;

Wherein, described broadcast video acquisition module is used to gather the video data of broadcast;

Described surveyed area is chosen module and is used for the described video data gathered, selected as required several surveyed areas, surveyed area all is from this selected regional check point, intercept forward that the long video clips of L+T obtains, each regional check point evenly distributes on time shaft, distance between adjacent two check points is T, and L is the standard duration of sample video, and it is not more than the length of sample video;

Described surveyed area is chosen module and also is used for intercepting the long video segment of L as the master sample video at the sample video;

Described sample video identification module is used to extract the video features of described master sample video, judge and whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video that broadcasts if exist, if do not have then judge whether there is video clips isometric with described master sample video and that video features is consistent in the next surveyed area, then think and comprise this sample video in the video flowing that receives if exist, otherwise continuing to choose another surveyed area detects, if all there is not such video segment in described each surveyed area, then think not comprise this sample video in the video flowing that receives.

Wherein, described sample video identification module comprises that video features extracts submodule and judgement submodule;

Described video features extracts the video features that submodule is used for extracting in the following manner the corresponding video segment of described master sample video and described surveyed area:

At first, frame of video is sampled;

Secondly, calculate the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, the codomain of respectively tieing up component of textural characteristics is all dropped between designation area, for example on [0,1] interval;

Then, color characteristic and the textural characteristics with each sampling frame carries out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.

Described judgement submodule is used for judging whether described surveyed area exists video segment isometric with described master sample video and that video features is consistent, thereby draws the conclusion that whether comprises this sample video in the video flowing of reception.

Beneficial effect of the present invention is:

1) the present invention is in the video flowing that receives, and the video segment that begins constantly to review the L+T duration forward from current detection judges whether comprise the sample video in this surveyed area as surveyed area.Adopt the present invention program, can be under the situation of minimum data redundance the complete known video segment that detects, and the inventive method can detected video clips border be arbitrarily, does not require it must is shot boundary;

2) video features model of the present invention has adopted a kind of Clustering Model that merges different visual signatures, can smoothly fall the influence of a few frames distortion or LOF, thereby has improved the robustness of video features.

Description of drawings

Fig. 1 is a video detecting method principle schematic of the present invention;

Fig. 2 is that the video segment color histogram calculates synoptic diagram;

Fig. 3 is a video detecting device structural representation of the present invention.

Embodiment

Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.

The problem that real-time video detects can be described as edge joint and receives the identification that content is made on the video data limit of broadcasting, and content recognition is finished t constantly ₁Broadcast t constantly with content ₀Between allow to have certain time delay τ, but the τ value must be constant in a scope, the τ value is more for a short time to show that the real-time of identification is good more.The content recognition of this paper definition is meant and identifies the fragment that has identical content with the sample video from the video flowing that receives, the duration of this fragment all requires identical with the sample video (if defined the master sample video with content frame, then require identical) with the master sample video, but allow the signal characteristic distortion of appropriateness, for example variation of picture size and quality.Need before the identification to set up the sample video library, during identification fragment that intercepts out in the video flowing and sample storehouse are compared, thereby find out the sample of coupling.

The present invention needs to determine in advance two important values when video is detected: the one, and sample video standard duration L is if sample video duration, then can be got the long fragment for L of its front end greater than L as master sample video 0; The 2nd, assay intervals T promptly is separated by and how long is made content recognition one time, and its inverse has been represented the frequency of identification.Suppose that making a content recognition required time is T ₀, then must satisfy T＞T ₀So recognition speed is fast more, the identification interval T can be established more for a short time, and corresponding time delay τ is just more little.

Fig. 1 is a video detecting method principle schematic of the present invention, as shown in the figure, when a content recognition starts, it will be lighted intercepting length forward from the current time and be used for content recognition for the video segment S of L+T, wherein comprising the length that has increased newly since starting from last time identification is the video data of T, also comprises to historical video data to recall the long interval of L that is.Starting point with S is an initial point 0, then its interval table be shown (0, L+T], this interval is that a left side is opened the right side and closed.

When making content recognition, utilize master sample video 0 in surveyed area S, to slide, have only when 0 to be completely contained in the S and just can be detected with the search content identical segments.If starting point is that t is long for the object video of L is designated as 0 (t) in the region S, if having the object video 0 (t ') identical, then necessary satisfied 0＜t '≤T with master sample video 0.

The video coupling is based on content characteristic, and the character representation of setting video object 0 is F (0), and this model assumption 0 can determine that then following relational expression is set up: 0 by its feature is unique ₁=0 ₂, then F (0 ₁)=F (0 ₂); 0 ₁≠ 0 ₂, then F (0 ₁) ≠ F (0 ₂).When master sample video 0 slides into position t ' time from surveyed area S starting point, obtain F (0 _{T '})=F (0), then the sample video is detected, and when skew, feature is not inconsistent, and the sample video can not be detected.

For above-mentioned algorithm model, can prove to have following several character:

Character 1: master sample video 0 always can be comprised in one and only video identification zone s and be detected fully.Prove as follows:

The left and right sides identified region of current identified region S is counted S _n, n=± 1, ± 2 ..., n is zone, a negative value interval scale left side, for right regional on the occasion of representing.With the starting point of current identified region as timeorigin, then a left side or the n time right identified region border can be expressed as (nT, L+T+nT].We consider that there are overlapping situation in master sample video and current region, and the reference position of establishing the sample video is t ' ,-L＜t '＜L+T.Can prove and work as

The time, the master sample video is contained in the n time identified region fully, symbol

Representative rounds up.

Because

Can get

Again because

Can get

This shows that master sample video border has been completely contained in the border of the n time identified region, can be by complete identification.The distributed areas of master sample video can be divided into following three kinds of forms:

(1) when 0＜t '≤T, n=0, promptly the master sample video packets is contained in the current region;

(2) when T＜t '≤L+T, n 〉=1, promptly master sample video head in current region, afterbody in the video data in future, identification after waiting until;

(3) when-L＜t '≤0, n≤-1, promptly master sample video afterbody is in current region, and head is in historical video data, and is identified.

Prove uniqueness below:

Be without loss of generality, suppose that master sample video 0 is completely contained in the current region S, the border is [t ', t '+L], 0＜t '≤T, and the front and back adjacent area S of S _-1, S _{+ 1}The border be respectively (T, L] and (T, L+2T], visible two zones all can not comprise the border of master sample video 0 fully, can draw same master sample video thus and can not detect simultaneously in two identified regions, thereby avoid redundancy detection.

Character 2:, then satisfy T if the time delay value τ of video identification is defined as master sample video right margin and end of identification difference constantly ₀＜τ＜T+T ₀, T ₀For once discerning required time.When the right margin of master sample video overlapped with current identified region right margin, time-delay was minimum, is T ₀When the left margin of master sample video overlapped with current identified region left margin, time-delay was maximum, is T+T ₀

Character 3: the repeat usage of video data is

We are defined as the once above data total amount of repeated use and the ratio of raw data total amount to repeat usage, and this index can be used to weigh the calculated amount of recognizer.Prove as follows: for each newly-increased duration is the video data of T, and except the data that constitute current identified region S, its overall data also will be by ensuing

Individual identified region uses, symbol

Expression rounds downwards, and its right-hand member L%T partial data also will be by the zone

Use, symbol % represents complementation, so can get reusable data total amount is

So repeat usage is This shows that data repeat usage and identification are partitioned into inverse ratio, more little at interval when identification, when promptly real-time was good more, the repeat usage of data was high more, and calculated amount is big more.

Video content identification is based on characteristic matching, and the video content characteristic model adopts the statistical model that can describe whole section vedio color feature and textural characteristics distribution in this method.One section video can be expressed as the set of diagrams picture, for each two field picture, calculates its RGB (red bluish-green) color histogram and textural characteristics.Each passage of RGB is divided into 8 grades, so color histogram is the proper vector of 512 dimensions.Textural characteristics calculates the co-occurrence tactical deployment of troops that adopts the probability texture analysis, and this method at first forms an interdependent matrix of gray space, can calculate 13 textural characteristics components according to this matrix then.Because the physical significance of above 13 textural characteristics components is different with span, thus they are carried out Gaussian normalization processing, like this calculated characteristics apart from the time, can make each component have equal weight.Adopt the element value distribution influence after the Gaussian normalization method can make a small amount of super large or extra small element value to whole normalization little.

On color that calculates every two field picture and textural characteristics basis, the method with cluster obtains typical characteristic distribution again, and what clustering algorithm adopted is the K-Means clustering algorithm.Color feature vector (512 dimension) and normalized textural characteristics proper vector (13 dimension) are carried out cluster respectively, and characteristic distance adopts Euclidean distance.In the invention process, calculation in the class of color characteristic and textural characteristics all is made as 5, so the feature F of one section video (0) is just described by 5 color feature vectors and 5 texture feature vectors.To be one section video represent with the fixing proper vector of dimension the advantage of this representation, can resist the distortion of individual frame feature, also can eliminate the influence of frame per second variation.

In the mode of setting up the video features model video detecting method/device of the present invention is described further below.

One, color character

As a kind of visual signature commonly used, color histogram is widely used in video frequency searching and video identification.It has described the color value distributed intelligence of image, and advantage is insensitive to the localized variation of noise and image, is used for identification and has robustness preferably.When representing the color distribution characteristic of one section video with color histogram, we need do following processing.One section video can be expressed as the set of diagrams picture, in order to reduce data volume, will carry out the sampling of picture frame usually.If each sampling frame all adopts color histogram to represent, then data volume can be bigger, and when frame per second changed, number of key frames can change, and caused unique point not line up phenomenon, makes the difficulty of characteristic matching strengthen.Therefore we describe the colouring information of set of diagrams picture with the accumulative total colouring information of sampling frame.Simple in order to calculate, accumulative total color distribution (Cumulative Color Distribution) is estimated with the DC coefficient (DC coefficient) in the sampling frame.Each Color Channel (accumulative histogram Cr) can be defined as follows for c=Y, Cb:

H_{c}^{CCD} = \frac{1}{M} Σ_{i = b_{k}}^{b_{k + M - 1}} H_{i} (j)

j＝1，…，B

H wherein _iThe color histogram of representing a sampling frame in one section video.M is all the sampling frame number in the window, and B is the color quantizing grade.In the present embodiment, B=24 (uniform quantization).Therefore, the total dimension of three-channel color characteristic is 3x24=72.The calculating of video segment color histogram as shown in Figure 2.

Two, textural characteristics

For each sampling frame of video segment, except calculating its color histogram, the present invention also calculates its textural characteristics.Present embodiment uses a kind of co-occurrence tactical deployment of troops of probability texture analysis to calculate textural characteristics, and this method at first forms an interdependent matrix of gray space, can calculate various textural characteristics according to this matrix then.Utilize these textural characteristics to classify, can improve nicety of grading.

1, gray space co-occurrence matrix

Suppose that the image that will analyze is a rectangle, have the distinguishable pixel of Nx in the horizontal direction, have the Ny pixel in vertical direction.The gray scale of pixel is quantized to the Ng level, establishes

Lx＝{1，2，3，......，Nx}

Ly＝{1，2，3，......，Ny}

G＝{1，2，3，......，Ng}

Ly * Lx is the pixel by this group image of row one row arrangement.G is one group and quantizes gray scale.

The matrix that the gray space co-occurrence matrix is made up of the number of times that is occurred the gray scale of image adjacent picture elements.Matrix size is relevant with gray scale quantized level Ng, i.e. Ng * Ng.

If the gray scale of certain pixel is i, the gray scale of adjacent picture elements is j, and i and j gray scale are P to the number of times that is occurred in image _Ij, then the element of the capable j row of the interdependent matrix of gray space i is p _IjThe distance of adjacent picture elements is d, is close to mutually as if two pixels, then d=1.In addition, adjacent picture elements has direction, divides 4 usually, and that is for differing 45 °, along continuous straight runs is 0 °, and vertical direction is 90 °, and right diagonal is 45 °, left side diagonal is 135 °, and 4 not normalized gray space co-occurrence matrixs of direction can be represented by the formula

P(i，j，d，0°)＝#{((k，l)，(m，n))∈(L _y×L _x)×(L _y×L _x)/

k-m＝0，|l-n|＝d，I(k，l)＝i，I(m，n)＝j}

P(i，j，d，45°)＝#{((h，l)，(m，n))∈(L _y×L _x)×(L _y×L _x)/

(k-m=d, l-n=-d) or (k-m=d, l-n=-d),

I(k，l)＝i，I(m，n)＝j}

P(i，j，d，90°)＝#{((h，l)，(m，n))∈(L _y×L _x)×(L _y×L _x)/

|k-m|＝d，l-n＝0，I(k，l)＝i，I(m，n)＝j}

P(i，j，d，135°)＝#{((k，l)，(m，n))∈(L _y×L _x)×(L _y×L _x)/

(k-m=d, l-n=d) or (k-m=-d, l-n=-d),

I(k，l)＝i，I(m，n)＝j}

In the formula, i is certain some gray scale, and j is the consecutive point gray scale, d is the point to point distance, and k and 1 is that gray scale is the position of pixel in image of i, and m and n are that gray scale is the position of pixel in image of sound, # is under certain distance and direction condition, and gray scale is i and the number of times of j adjacent picture elements to being occurred.The gray space co-occurrence matrix has symmetry, promptly

P(i，j，d，a)＝P(j，i，d，a)

Usually, this matrix need the normalization, promptly a certain element of matrix divided by the right total R of this image adjacent picture elements, make this matrix element sum be constantly equal to 1.The calculating of P can be determined (when d=1) by following formula,

In the time of a=0 °, R=2N _y(N _x-1)

In the time of a=90 °, R=2N _x(N _y-1)

When a=45 °, in the time of 135 °, R=2 (N _x-1) (N _y-1)

2, co-occurrence matrix method textural characteristics calculates

After the interdependent matrix of gray space formed, desired textural characteristics can come out from this matrix computations, and this group textural characteristics has 13, that is:

1) the second order moment at angle

f_{i} = \underset{i}{Σ} \underset{j}{Σ} {P (i, j)}^{2}

I and j are the gray-scale value that the interdependent matrix of gray space is positioned at row and column in the formula, and (i j) is the interdependent matrix of normalization back gray space the (i, j) right probability to P.

2) contrast

f_{2} = Σ_{n = 0}^{N_{g} - 1} n^{2} {Σ_{\underset{| i - j | = n}{i = 1}}^{N_{g}} Σ_{j = 1}^{N_{g}} P (i, j)}

3) related coefficient

f_{3} = \frac{\underset{i}{Σ} \underset{j}{Σ} (i, j) P (i, j) - μ_{x} μ_{y}}{δ_{x} δ_{y}}

μ in the formula _x, μ _y, δ _xAnd δ _yBe P _xAnd P _yAverage and standard deviation.And

P_{x} (i) = Σ_{j = 1}^{Ng} P (i, j)

P_{y} (i) = Σ_{i = 1}^{Ng} P (i, j)

4) variance

f_{4} = \underset{i}{Σ} \underset{j}{Σ} {(i - μ)}^{2} P (i, j)

μ is the gray average of image I in the formula.

5) Cha moment reciprocal

f_{5} = \underset{i}{Σ} \underset{j}{Σ} \frac{1}{1 + {(i - j)}^{2}} P (i, j)

6) and mean value

f_{6} = Σ_{k = 2}^{2 Ng} {kP}_{x + y} (k)

In the formula

P_{x + y} (k) = Σ_{\underset{| i + j | = k}{i = 1}}^{N_{g}} Σ_{j = 1}^{N_{g}} P (i, j)

k＝2，3，…，2Ng

7) and variance

f_{7} = Σ_{k = 2}^{2 Ng} {(k - f_{6})}^{2} P_{x + y} (k)

8) and entropy

f_{8} = - Σ_{k = 2}^{2 Ng} P_{x + y} (k) \log {P_{x + y} (k)}

9) entropy

f_{9} = - \underset{i}{Σ} \underset{j}{Σ} P (i, j) \log (P (i, j))

10) Cha variance

f_{10} = Σ_{k = 0}^{Ng - 1} {(k - f_{14})}^{2} P_{x - y} (k)

In the formula

P_{x - y} (k) = Σ_{\underset{| i - j | = k}{i = 1}}^{N_{g}} Σ_{j = 1}^{N_{g}} P (i, j)

k＝0，1，2，…，Ng-1

f_{14} = Σ_{k = 0}^{Ng - 1} {kP}_{x - y} (k)

11) Cha entropy

f_{11} = - Σ_{k = 0}^{Ng - 1} P_{x - y} (k) \log {P_{x - y} (k)}

12,13) relevant information is measured

f_{12} = \frac{HXY - HXY 1}{\max {HX, HY}}

f ₁₃＝(1-exp[-2.0(HXY2-HXY)]) ^1/2

HX in the formula, HY are P _x, P _yEntropy,

HXY = - \underset{i}{Σ} \underset{j}{Σ} P (i, j) \log (P (i, j))

HXY 1 = - \underset{i}{Σ} \underset{j}{Σ} P (i, j) \log (P_{x} (i) P_{y} (j))

HXY 2 = - \underset{i}{Σ} \underset{j}{Σ} P_{x} (i) P_{y} (j) \log (P_{x} (i) P_{y} (j))

At first calculate gray scale co-occurrence matrix, divide 0,45,90,135 degree four directions to seek neighborhood, neighborhood can be established apart from d, and matrix size is Ng * Ng, and Ng is a gray level.On the co-occurrence matrix, calculate 13 textural characteristics, comprise 13 real numbers such as second order moment, contrast, related coefficient at angle.

3, textural characteristics normalization

Because the physical significance of above 13 textural characteristics components is different with span, need carry out inner normalization to them, like this calculated characteristics apart from the time, can make each component have equal weight.The Gaussian normalization method is a kind of method for normalizing preferably, is characterized in that the element value distribution influence after a small amount of super large or extra small element value are to whole normalization is little, and concrete grammar is as follows:

The proper vector of a N dimension can be designated as: F=[f ₁, f ₂..., f _n].As use I ₁, I ₂..., I _MM sub-picture in the representing images storehouse is to each width of cloth image I ₁, its corresponding proper vector is F _i=[f _I1, f _I2..., f _In].Suppose characteristic component value series [f _{1, j}, f _{2, j}..., f _{M, j}] meet Gaussian distribution, calculate its average m _j, and standard deviation sigma _j, utilize the following formula can be then with f _{I, j}Be normalized to [1,1] interval.

f_{i, j}^{(N)} = \frac{f_{i, j} - m_{j}}{σ_{j}}

After following formula normalization, each f _{I, j}All be transformed into and have that N (0,1) distributes

If use 3 σ _jCarry out normalization, then Value drop on [1,1] interval probability and can reach 99%.

4, feature clustering

Feature clustering is exactly to find out representative unique point from the unique point that is dispersed in higher dimensional space in a large number to represent.These unique points can be described as the class center, and then carry out follow-up processing according to these class central points.We carry out cluster analysis to the textural characteristics of all key frames of one section video, and the fundamental purpose of cluster has following several respects:

(1) represents the number of characteristics point with base point, reach the purpose of characteristic compression;

(2) come the representation class attribute with the typical unique point of minority, can reduce the complexity of tagsort;

(3) eliminate number of key frames and change, cause unique point not line up phenomenon;

(4) eliminate the influence that the individual frame feature distorts.

The K-Means cluster is classical clustering algorithm, and the purpose of this clustering procedure is that to wish to reduce in each classification the square distance of every bit and group center poor as far as possible.

Suppose the feature point set that has one group to comprise c classification now, wherein k classification can be with set C _kRepresent, suppose C _kComprise n _kIndividual unique point { X ₁, X ₂..., X _Nk, this type of center is y _kSuch difference of two squares e of ， The _kCan be defined as:

e _k=∑ _id ²(x _i, y _k), x wherein _iIt is the unique point that belongs to k group.

And the total difference of two squares E of this c classification is the difference of two squares summation of each class:

E＝∑?k＝1-c?ek

So the method for K mean cluster is an optimized problem with regard to becoming, be exactly how to choose c classification and relevant class center, make the value of E be minimum.

Concrete K-MEANS algorithm is that the method with iteration realizes that algorithm steps is as follows:

(1) select k object as initial cluster center arbitrarily from c data object;

(2) circulation (3) to (4) is till each cluster centre convergence;

(3), calculate the distance of each object and these center object according to the average (center object) of each cluster object; And again corresponding object is divided according to minor increment;

(4) recomputate the average (center object) of each cluster

The K mean cluster has following characteristics: each cluster itself is compact as much as possible, and separates as much as possible between each cluster.

We set K=5 in the present embodiment, so the textural characteristics of video segment

Just represent by 5 texture feature vectors

Represent a textural characteristics cluster centre.

5, characteristic matching

Because video library has hundreds and thousands of sample videos usually, has identical content in order to judge video segment to be measured and which sample video, can adopt the nearest neighbor classifier of band threshold value.If the minor increment d between test video Q and the arbitrary sample video _Min＜θ, θ are the threshold value of nearest neighbor classifier, and then video Q is identified as the sample video nearest with it, otherwise Q is judged as not identical with any sample video.Figure 3 shows that video identification apparatus structure synoptic diagram of the present invention, as shown in the figure, video identification device of the present invention is mainly formed by two parts cascade, the video signal collective part of front end for broadcasting, gather ts stream (Transport stream transport stream) signal in real time, the rear end is a sample video identification part.In the present embodiment, video signal collective is partly gathered digital television signal, behind the demultiplexing video pes (packet elementary stream) bag is sent to sample video identification search engine by network, the sample video recognition engine to video data decode, feature extraction and content recognition, and recognition result write database.In the embodiment of the invention, video acquisition and identification co-operation are in a high-performance calculation machine equipment.

The present invention is by setting up the sample pattern storehouse of known video, then with broadcast the stream from Digital Television or the video clips that intercepts in the network flow-medium carries out quick content match, thereby detect whether contain known video in the broadcast items in real time.Thereby the method, device that proposes of the present invention can be widely used in detecting the known video segment in real time from the video flowing that broadcasts such as digital TV programme, and is first-class as advertisement, sheet.

Should be understood that simultaneously the present invention's scope of asking for protection is illustrated in the appended claims, and can not be with the foregoing description of instructions as restriction, every conspicuous modification within aim of the present invention is also due within protection scope of the present invention.

Claims

1. a video detecting method is characterized in that comprising the steps:

Second step, in the video flowing that receives, selected moment from this current check point, intercepts the long video clips of L+T forward as a surveyed area as check point; And in the sample video, intercept the long video segment of L as the master sample video;

The 3rd step, extract the video features of described master sample video, judge whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video flowing of reception if exist; If do not exist and then think and do not comprise this sample video in this surveyed area, and carried out for the 4th step;

The 4th goes on foot, chooses in a manner described next surveyed area, judge and whether have video clips isometric with described master sample video and that video features is consistent in this surveyed area, then think and comprise this sample video in the video flowing that receives if exist, then do not continue to choose next surveyed area and detect if do not exist, reach predetermined number up to the surveyed area number of using, if do not find video segment isometric with described master sample video and that video features is consistent this moment yet, then think not comprise this sample video in the video flowing that receives;

The standard duration of described sample video is not more than the length of sample video, and described assay intervals refers to implement the time interval between twice detection, i.e. distance between two adjacent surveyed area check points, and the check point of each surveyed area is uniform on time shaft.

2. video detecting method according to claim 1 is characterized in that:

The standard duration of described sample video equals the length of sample video.

3. video detecting method according to claim 1 is characterized in that:

In second step, during intercepting master sample video, be that the starting point from the sample video intercepts backward.

4. according to claim 1,2 or 3 described video detecting methods, it is characterized in that, in the 3rd step, is to extract video features in the following way:

A, frame of video is sampled;

B, calculate the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, make the codomain of respectively tieing up component of textural characteristics all drop between designation area;

C, with each the sampling frame color characteristic and textural characteristics carry out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.

5 video detecting methods according to claim 4 is characterized in that:

In step B, between described designation area [0,1] interval.

6. video detecting method according to claim 4 is characterized in that:

In step B, the method for normalized is Gauss's normalization method.

7. video detecting method according to claim 4 is characterized in that:

In step C, the clustering method of employing is the K-Means clustering procedure.

8. according to claim 1,2 or 3 described video detecting methods, it is characterized in that:

According to this method, when having a more than sample video to be contemplated as falling with in the video flowing of reception, then think only to comprise in video features and the described surveyed area the best sample video of video segment consistance accordingly in the video flowing that receives.

9. video detecting device is characterized in that:

This device comprise broadcast video acquisition module, surveyed area is chosen module and sample video identification module;

10. video detecting device according to claim 9 is characterized in that:

Described sample video identification module comprises that video features extracts submodule and judgement submodule;

Wherein, described video features extracts the video features that submodule is used for extracting in the following manner the corresponding video segment of described master sample video and described surveyed area:

At first, frame of video is sampled;

Wherein, described judgement submodule is used for judging whether described surveyed area exists video segment isometric with described master sample video and that video features is consistent, thereby draws the conclusion that whether comprises this sample video in the video flowing of reception.