CN101894251A - Video detection method and device - Google Patents

Video detection method and device Download PDF

Info

Publication number
CN101894251A
CN101894251A CN200910084336XA CN200910084336A CN101894251A CN 101894251 A CN101894251 A CN 101894251A CN 200910084336X A CN200910084336X A CN 200910084336XA CN 200910084336 A CN200910084336 A CN 200910084336A CN 101894251 A CN101894251 A CN 101894251A
Authority
CN
China
Prior art keywords
video
sample
sample video
textural characteristics
surveyed area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910084336XA
Other languages
Chinese (zh)
Inventor
杨显锋
刘晓蓉
袁敏
马爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Academy of Broadcasting Science Research Institute
Academy of Broadcasting Science of SAPPRFT
Original Assignee
Academy of Broadcasting Science of SAPPRFT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Academy of Broadcasting Science of SAPPRFT filed Critical Academy of Broadcasting Science of SAPPRFT
Priority to CN200910084336XA priority Critical patent/CN101894251A/en
Publication of CN101894251A publication Critical patent/CN101894251A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video detection method and a video detection device which have low redundancy rate and can detect a sample video with inconsistent shot boundaries. The method comprises the following steps of: determining standard duration L and a detection interval T of the sample video; intercepting a video fragment of which the time length is L+T forward from a current detection point as a detection area; and extracting the video characteristic of the standard sample video and judging whether a video fragment which is accordant with the video characteristic exists in the current detection area or not, and if so, determining that received video stream comprises the sample video, wherein a method for extracting the video characteristic in the step comprises the following steps of: A, sampling video frames; B, calculating the color characteristic and the texture characteristic of each sampled frame and performing Gaussian normalization processing on the texture characteristic of each frame; and C, clustering the color characteristic and the texture characteristic of each sampled frame, taking the cluster center of the color characteristic as the color characteristic of the video and taking the cluster center of the texture characteristic as the texture characteristic of the video.

Description

A kind of video detecting method and device
Technical field
The invention belongs to the video identification technology field, relate in particular to a kind of video detecting method and device.
Background technology
Appearance along with new media form of services such as the carrying out of the development of information network, cable television digital transformation and mobile TV, mobile TV, Web TV, IPTV makes that the copyright protection situation of Digital Media is more and more severeer.
Realize protection, at first want and accurately to discern the video content that broadcasts, judge whether it includes the video content of propagating without authorization without the copyright owner agrees digital publishing rights.Simultaneously; the video content that broadcasts is accurately discerned, not only helped protection, also help based on the state's laws rules to digital publishing rights; the video playback business is carried out many-sided management and regulation, the sustainable benign development that guarantees the digital content industry is had very important significance.
Video detecting method commonly used at present mainly contains two kinds, a kind of is to be that a recognition unit is discerned with a complete camera lens, promptly be that cut-point is discerned with the camera lens, because the border of sample video is arbitrarily, not necessarily overlap with camera lens, thus this method can not realize to the border arbitrarily the sample video correctly discern so-called sample video, promptly as the known video of sample, the purpose of Video Detection promptly is to differentiate in the video of broadcast whether comprise certain sample video; Another kind is to discern at the topography in the video clips, and for example intelligent transportation recognition technology the inside is to vehicle detection, and this method can not realize complete video clips is discerned.
Another difficult point of Video Detection is between video copy and the sample video to exist distortion, for example variations such as picture size, bit rate, frame per second, and how the different video copy that detects of robust is one of gordian technique of the essential solution of recognizer institute.
The present invention proposes a kind of new video detecting method and device, to overcome the above-mentioned defective that exists in the prior art.
Summary of the invention
It is less that the technical problem to be solved in the present invention provides a kind of redundance, and the video detecting method and the device that can detect its border and the inconsistent sample video of shot boundary.
For solving the problems of the technologies described above, video detecting method of the present invention comprises the steps:
The first step, the standard duration L that determines the sample video and assay intervals T;
In order to obtain more complete sum testing result accurately, Video Detection of the present invention can be a lasting process, described assay intervals refers to implement the time interval between twice detection, i.e. distance between two adjacent surveyed area check points, the check point of each surveyed area is uniform on time shaft.
Second step, in the video flowing that receives, selected moment from this current check point, intercepts the long video clips of L+T forward as surveyed area as current detection point; And in the sample video, intercept the long video segment of L as the master sample video;
The 3rd step, extract the video features of described master sample video, judge whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video flowing of reception if exist; If do not exist and then think and do not comprise this sample video in the described surveyed area, and carried out for the 4th step;
The 4th goes on foot, chooses in a manner described next surveyed area, judge and whether have video clips isometric with described master sample video and that video features is consistent in this surveyed area, then think and comprise this sample video in the video flowing that receives if exist, then do not continue to choose next surveyed area and detect if do not exist, reach predetermined number up to the surveyed area number of using, if do not find yet and the consistent video segment of described master sample video video features this moment, then think not comprise this sample video in the video flowing that receives.
In the first step, why a duration that is shorter than the sample video usually mainly is the consideration for real-time as the standard duration.Described standard duration is provided with shortly more, and then real-time is good more, but the short more accuracy of identification of standard duration is low more simultaneously.Therefore, the requirement of the requirement that determine to want comprehensive real-time of described standard duration and accuracy of identification and deciding.If the length of sample video itself can requirement of real time, then the desirable sample video of L is long.
In second step, during intercepting master sample video, can be that the starting point from the sample video intercepts backward, just can be detected when this sample video has just broadcasted the L duration in practical video streaming like this, thereby can reduce to detect delay.
In the 3rd step, can adopt described master sample video is slided in described surveyed area, see and whether follow the mode of certain video clips unanimity to detect.
In the 3rd step, can adopt video features abstracting method of the prior art to extract video features, can be the characteristics of image that extracts frame of video, also can be the motion feature that extracts video.
In the 3rd step, can also adopt following steps to extract video features:
The first step, frame of video is sampled;
Second goes on foot, calculates the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, the codomain of respectively tieing up component of textural characteristics is all dropped between designation area, for example on [0,1] interval.The benefit of carrying out normalized is, makes and respectively ties up component when carrying out characteristic distance calculating, the contribution basically identical.The preferred Gauss's normalization method of the present invention is handled.
The 3rd the step, with each the sampling frame color characteristic and textural characteristics carry out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.The benefit of cluster is the influence and the distortion of a few frames that can smooth noise or loses, and reduces the complexity of video coupling.The preferred K-Means clustering algorithm of the present invention carries out clustering processing.
Among the present invention, be consistent with the mode that corresponding video segment in the described surveyed area is extracted video features to described master sample video.
When the video that broadcasts was detected, if a more than sample video, the best sample video of corresponding video segment consistance was as testing result in the video flowing of getting and receiving.Also can preestablish a similarity threshold values, when only the similarity of corresponding video fragment is higher than this threshold values in the video flowing of sample video and reception, just thinking has the sample video to be detected, otherwise thinks that all sample videos all are not comprised in the video flowing of broadcast.That is to say,, when having a more than sample video to be contemplated as falling with in the video flowing of reception, think only to comprise in video features and the described surveyed area the best sample video of video segment consistance accordingly in the video flowing that receives according to this method.
For solving the problems of the technologies described above, video detecting device of the present invention comprises:
Broadcast video acquisition module, surveyed area are chosen module and sample video identification module;
Wherein, described broadcast video acquisition module is used to gather the video data of broadcast;
Described surveyed area is chosen module and is used for the described video data gathered, selected as required several surveyed areas, surveyed area all is from this selected regional check point, intercept forward that the long video clips of L+T obtains, each regional check point evenly distributes on time shaft, distance between adjacent two check points is T, and L is the standard duration of sample video, and it is not more than the length of sample video;
Described surveyed area is chosen module and also is used for intercepting the long video segment of L as the master sample video at the sample video;
Described sample video identification module is used to extract the video features of described master sample video, judge and whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video that broadcasts if exist, if do not have then judge whether there is video clips isometric with described master sample video and that video features is consistent in the next surveyed area, then think and comprise this sample video in the video flowing that receives if exist, otherwise continuing to choose another surveyed area detects, if all there is not such video segment in described each surveyed area, then think not comprise this sample video in the video flowing that receives.
Wherein, described sample video identification module comprises that video features extracts submodule and judgement submodule;
Described video features extracts the video features that submodule is used for extracting in the following manner the corresponding video segment of described master sample video and described surveyed area:
At first, frame of video is sampled;
Secondly, calculate the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, the codomain of respectively tieing up component of textural characteristics is all dropped between designation area, for example on [0,1] interval;
Then, color characteristic and the textural characteristics with each sampling frame carries out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.
Described judgement submodule is used for judging whether described surveyed area exists video segment isometric with described master sample video and that video features is consistent, thereby draws the conclusion that whether comprises this sample video in the video flowing of reception.
Beneficial effect of the present invention is:
1) the present invention is in the video flowing that receives, and the video segment that begins constantly to review the L+T duration forward from current detection judges whether comprise the sample video in this surveyed area as surveyed area.Adopt the present invention program, can be under the situation of minimum data redundance the complete known video segment that detects, and the inventive method can detected video clips border be arbitrarily, does not require it must is shot boundary;
2) video features model of the present invention has adopted a kind of Clustering Model that merges different visual signatures, can smoothly fall the influence of a few frames distortion or LOF, thereby has improved the robustness of video features.
Description of drawings
Fig. 1 is a video detecting method principle schematic of the present invention;
Fig. 2 is that the video segment color histogram calculates synoptic diagram;
Fig. 3 is a video detecting device structural representation of the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail.
The problem that real-time video detects can be described as edge joint and receives the identification that content is made on the video data limit of broadcasting, and content recognition is finished t constantly 1Broadcast t constantly with content 0Between allow to have certain time delay τ, but the τ value must be constant in a scope, the τ value is more for a short time to show that the real-time of identification is good more.The content recognition of this paper definition is meant and identifies the fragment that has identical content with the sample video from the video flowing that receives, the duration of this fragment all requires identical with the sample video (if defined the master sample video with content frame, then require identical) with the master sample video, but allow the signal characteristic distortion of appropriateness, for example variation of picture size and quality.Need before the identification to set up the sample video library, during identification fragment that intercepts out in the video flowing and sample storehouse are compared, thereby find out the sample of coupling.
The present invention needs to determine in advance two important values when video is detected: the one, and sample video standard duration L is if sample video duration, then can be got the long fragment for L of its front end greater than L as master sample video 0; The 2nd, assay intervals T promptly is separated by and how long is made content recognition one time, and its inverse has been represented the frequency of identification.Suppose that making a content recognition required time is T 0, then must satisfy T>T 0So recognition speed is fast more, the identification interval T can be established more for a short time, and corresponding time delay τ is just more little.
Fig. 1 is a video detecting method principle schematic of the present invention, as shown in the figure, when a content recognition starts, it will be lighted intercepting length forward from the current time and be used for content recognition for the video segment S of L+T, wherein comprising the length that has increased newly since starting from last time identification is the video data of T, also comprises to historical video data to recall the long interval of L that is.Starting point with S is an initial point 0, then its interval table be shown (0, L+T], this interval is that a left side is opened the right side and closed.
When making content recognition, utilize master sample video 0 in surveyed area S, to slide, have only when 0 to be completely contained in the S and just can be detected with the search content identical segments.If starting point is that t is long for the object video of L is designated as 0 (t) in the region S, if having the object video 0 (t ') identical, then necessary satisfied 0<t '≤T with master sample video 0.
The video coupling is based on content characteristic, and the character representation of setting video object 0 is F (0), and this model assumption 0 can determine that then following relational expression is set up: 0 by its feature is unique 1=0 2, then F (0 1)=F (0 2); 0 1≠ 0 2, then F (0 1) ≠ F (0 2).When master sample video 0 slides into position t ' time from surveyed area S starting point, obtain F (0 T ')=F (0), then the sample video is detected, and when skew, feature is not inconsistent, and the sample video can not be detected.
For above-mentioned algorithm model, can prove to have following several character:
Character 1: master sample video 0 always can be comprised in one and only video identification zone s and be detected fully.Prove as follows:
The left and right sides identified region of current identified region S is counted S n, n=± 1, ± 2 ..., n is zone, a negative value interval scale left side, for right regional on the occasion of representing.With the starting point of current identified region as timeorigin, then a left side or the n time right identified region border can be expressed as (nT, L+T+nT].We consider that there are overlapping situation in master sample video and current region, and the reference position of establishing the sample video is t ' ,-L<t '<L+T.Can prove and work as
Figure B200910084336XD0000071
The time, the master sample video is contained in the n time identified region fully, symbol
Figure B200910084336XD0000072
Representative rounds up.
Because
Figure B200910084336XD0000073
Can get
Figure B200910084336XD0000074
Again because
Figure B200910084336XD0000075
Can get
Figure B200910084336XD0000076
This shows that master sample video border has been completely contained in the border of the n time identified region, can be by complete identification.The distributed areas of master sample video can be divided into following three kinds of forms:
(1) when 0<t '≤T, n=0, promptly the master sample video packets is contained in the current region;
(2) when T<t '≤L+T, n 〉=1, promptly master sample video head in current region, afterbody in the video data in future, identification after waiting until;
(3) when-L<t '≤0, n≤-1, promptly master sample video afterbody is in current region, and head is in historical video data, and is identified.
Prove uniqueness below:
Be without loss of generality, suppose that master sample video 0 is completely contained in the current region S, the border is [t ', t '+L], 0<t '≤T, and the front and back adjacent area S of S -1, S + 1The border be respectively (T, L] and (T, L+2T], visible two zones all can not comprise the border of master sample video 0 fully, can draw same master sample video thus and can not detect simultaneously in two identified regions, thereby avoid redundancy detection.
Character 2:, then satisfy T if the time delay value τ of video identification is defined as master sample video right margin and end of identification difference constantly 0<τ<T+T 0, T 0For once discerning required time.When the right margin of master sample video overlapped with current identified region right margin, time-delay was minimum, is T 0When the left margin of master sample video overlapped with current identified region left margin, time-delay was maximum, is T+T 0
Character 3: the repeat usage of video data is
Figure B200910084336XD0000077
We are defined as the once above data total amount of repeated use and the ratio of raw data total amount to repeat usage, and this index can be used to weigh the calculated amount of recognizer.Prove as follows: for each newly-increased duration is the video data of T, and except the data that constitute current identified region S, its overall data also will be by ensuing
Figure B200910084336XD0000081
Individual identified region uses, symbol
Figure B200910084336XD0000082
Expression rounds downwards, and its right-hand member L%T partial data also will be by the zone
Figure B200910084336XD0000083
Use, symbol % represents complementation, so can get reusable data total amount is
Figure B200910084336XD0000084
So repeat usage is This shows that data repeat usage and identification are partitioned into inverse ratio, more little at interval when identification, when promptly real-time was good more, the repeat usage of data was high more, and calculated amount is big more.
Video content identification is based on characteristic matching, and the video content characteristic model adopts the statistical model that can describe whole section vedio color feature and textural characteristics distribution in this method.One section video can be expressed as the set of diagrams picture, for each two field picture, calculates its RGB (red bluish-green) color histogram and textural characteristics.Each passage of RGB is divided into 8 grades, so color histogram is the proper vector of 512 dimensions.Textural characteristics calculates the co-occurrence tactical deployment of troops that adopts the probability texture analysis, and this method at first forms an interdependent matrix of gray space, can calculate 13 textural characteristics components according to this matrix then.Because the physical significance of above 13 textural characteristics components is different with span, thus they are carried out Gaussian normalization processing, like this calculated characteristics apart from the time, can make each component have equal weight.Adopt the element value distribution influence after the Gaussian normalization method can make a small amount of super large or extra small element value to whole normalization little.
On color that calculates every two field picture and textural characteristics basis, the method with cluster obtains typical characteristic distribution again, and what clustering algorithm adopted is the K-Means clustering algorithm.Color feature vector (512 dimension) and normalized textural characteristics proper vector (13 dimension) are carried out cluster respectively, and characteristic distance adopts Euclidean distance.In the invention process, calculation in the class of color characteristic and textural characteristics all is made as 5, so the feature F of one section video (0) is just described by 5 color feature vectors and 5 texture feature vectors.To be one section video represent with the fixing proper vector of dimension the advantage of this representation, can resist the distortion of individual frame feature, also can eliminate the influence of frame per second variation.
In the mode of setting up the video features model video detecting method/device of the present invention is described further below.
One, color character
As a kind of visual signature commonly used, color histogram is widely used in video frequency searching and video identification.It has described the color value distributed intelligence of image, and advantage is insensitive to the localized variation of noise and image, is used for identification and has robustness preferably.When representing the color distribution characteristic of one section video with color histogram, we need do following processing.One section video can be expressed as the set of diagrams picture, in order to reduce data volume, will carry out the sampling of picture frame usually.If each sampling frame all adopts color histogram to represent, then data volume can be bigger, and when frame per second changed, number of key frames can change, and caused unique point not line up phenomenon, makes the difficulty of characteristic matching strengthen.Therefore we describe the colouring information of set of diagrams picture with the accumulative total colouring information of sampling frame.Simple in order to calculate, accumulative total color distribution (Cumulative Color Distribution) is estimated with the DC coefficient (DC coefficient) in the sampling frame.Each Color Channel (accumulative histogram Cr) can be defined as follows for c=Y, Cb:
H c CCD = 1 M Σ i = b k b k + M - 1 H i ( j ) j=1,…,B
H wherein iThe color histogram of representing a sampling frame in one section video.M is all the sampling frame number in the window, and B is the color quantizing grade.In the present embodiment, B=24 (uniform quantization).Therefore, the total dimension of three-channel color characteristic is 3x24=72.The calculating of video segment color histogram as shown in Figure 2.
Two, textural characteristics
For each sampling frame of video segment, except calculating its color histogram, the present invention also calculates its textural characteristics.Present embodiment uses a kind of co-occurrence tactical deployment of troops of probability texture analysis to calculate textural characteristics, and this method at first forms an interdependent matrix of gray space, can calculate various textural characteristics according to this matrix then.Utilize these textural characteristics to classify, can improve nicety of grading.
1, gray space co-occurrence matrix
Suppose that the image that will analyze is a rectangle, have the distinguishable pixel of Nx in the horizontal direction, have the Ny pixel in vertical direction.The gray scale of pixel is quantized to the Ng level, establishes
Lx={1,2,3,......,Nx}
Ly={1,2,3,......,Ny}
G={1,2,3,......,Ng}
Ly * Lx is the pixel by this group image of row one row arrangement.G is one group and quantizes gray scale.
The matrix that the gray space co-occurrence matrix is made up of the number of times that is occurred the gray scale of image adjacent picture elements.Matrix size is relevant with gray scale quantized level Ng, i.e. Ng * Ng.
If the gray scale of certain pixel is i, the gray scale of adjacent picture elements is j, and i and j gray scale are P to the number of times that is occurred in image Ij, then the element of the capable j row of the interdependent matrix of gray space i is p IjThe distance of adjacent picture elements is d, is close to mutually as if two pixels, then d=1.In addition, adjacent picture elements has direction, divides 4 usually, and that is for differing 45 °, along continuous straight runs is 0 °, and vertical direction is 90 °, and right diagonal is 45 °, left side diagonal is 135 °, and 4 not normalized gray space co-occurrence matrixs of direction can be represented by the formula
P(i,j,d,0°)=#{((k,l),(m,n))∈(L y×L x)×(L y×L x)/
k-m=0,|l-n|=d,I(k,l)=i,I(m,n)=j}
P(i,j,d,45°)=#{((h,l),(m,n))∈(L y×L x)×(L y×L x)/
(k-m=d, l-n=-d) or (k-m=d, l-n=-d),
I(k,l)=i,I(m,n)=j}
P(i,j,d,90°)=#{((h,l),(m,n))∈(L y×L x)×(L y×L x)/
|k-m|=d,l-n=0,I(k,l)=i,I(m,n)=j}
P(i,j,d,135°)=#{((k,l),(m,n))∈(L y×L x)×(L y×L x)/
(k-m=d, l-n=d) or (k-m=-d, l-n=-d),
I(k,l)=i,I(m,n)=j}
In the formula, i is certain some gray scale, and j is the consecutive point gray scale, d is the point to point distance, and k and 1 is that gray scale is the position of pixel in image of i, and m and n are that gray scale is the position of pixel in image of sound, # is under certain distance and direction condition, and gray scale is i and the number of times of j adjacent picture elements to being occurred.The gray space co-occurrence matrix has symmetry, promptly
P(i,j,d,a)=P(j,i,d,a)
Usually, this matrix need the normalization, promptly a certain element of matrix divided by the right total R of this image adjacent picture elements, make this matrix element sum be constantly equal to 1.The calculating of P can be determined (when d=1) by following formula,
In the time of a=0 °, R=2N y(N x-1)
In the time of a=90 °, R=2N x(N y-1)
When a=45 °, in the time of 135 °, R=2 (N x-1) (N y-1)
2, co-occurrence matrix method textural characteristics calculates
After the interdependent matrix of gray space formed, desired textural characteristics can come out from this matrix computations, and this group textural characteristics has 13, that is:
1) the second order moment at angle
f i = Σ i Σ j { P ( i , j ) } 2
I and j are the gray-scale value that the interdependent matrix of gray space is positioned at row and column in the formula, and (i j) is the interdependent matrix of normalization back gray space the (i, j) right probability to P.
2) contrast
f 2 = Σ n = 0 N g - 1 n 2 { Σ i = 1 | i - j | = n N g Σ j = 1 N g P ( i , j ) }
3) related coefficient
f 3 = Σ i Σ j ( i , j ) P ( i , j ) - μ x μ y δ x δ y
μ in the formula x, μ y, δ xAnd δ yBe P xAnd P yAverage and standard deviation.And
P x ( i ) = Σ j = 1 Ng P ( i , j )
P y ( i ) = Σ i = 1 Ng P ( i , j )
4) variance
f 4 = Σ i Σ j ( i - μ ) 2 P ( i , j )
μ is the gray average of image I in the formula.
5) Cha moment reciprocal
f 5 = Σ i Σ j 1 1 + ( i - j ) 2 P ( i , j )
6) and mean value
f 6 = Σ k = 2 2 Ng kP x + y ( k )
In the formula
P x + y ( k ) = Σ i = 1 | i + j | = k N g Σ j = 1 N g P ( i , j ) k=2,3,…,2Ng
7) and variance
f 7 = Σ k = 2 2 Ng ( k - f 6 ) 2 P x + y ( k )
8) and entropy
f 8 = - Σ k = 2 2 Ng P x + y ( k ) log { P x + y ( k ) }
9) entropy
f 9 = - Σ i Σ j P ( i , j ) log ( P ( i , j ) )
10) Cha variance
f 10 = Σ k = 0 Ng - 1 ( k - f 14 ) 2 P x - y ( k )
In the formula
P x - y ( k ) = Σ i = 1 | i - j | = k N g Σ j = 1 N g P ( i , j ) k=0,1,2,…,Ng-1
f 14 = Σ k = 0 Ng - 1 kP x - y ( k )
11) Cha entropy
f 11 = - Σ k = 0 Ng - 1 P x - y ( k ) log { P x - y ( k ) }
12,13) relevant information is measured
f 12 = HXY - HXY 1 max { HX , HY }
f 13=(1-exp[-2.0(HXY2-HXY)]) 1/2
HX in the formula, HY are P x, P yEntropy,
HXY = - Σ i Σ j P ( i , j ) log ( P ( i , j ) )
HXY 1 = - Σ i Σ j P ( i , j ) log ( P x ( i ) P y ( j ) )
HXY 2 = - Σ i Σ j P x ( i ) P y ( j ) log ( P x ( i ) P y ( j ) )
At first calculate gray scale co-occurrence matrix, divide 0,45,90,135 degree four directions to seek neighborhood, neighborhood can be established apart from d, and matrix size is Ng * Ng, and Ng is a gray level.On the co-occurrence matrix, calculate 13 textural characteristics, comprise 13 real numbers such as second order moment, contrast, related coefficient at angle.
3, textural characteristics normalization
Because the physical significance of above 13 textural characteristics components is different with span, need carry out inner normalization to them, like this calculated characteristics apart from the time, can make each component have equal weight.The Gaussian normalization method is a kind of method for normalizing preferably, is characterized in that the element value distribution influence after a small amount of super large or extra small element value are to whole normalization is little, and concrete grammar is as follows:
The proper vector of a N dimension can be designated as: F=[f 1, f 2..., f n].As use I 1, I 2..., I MM sub-picture in the representing images storehouse is to each width of cloth image I 1, its corresponding proper vector is F i=[f I1, f I2..., f In].Suppose characteristic component value series [f 1, j, f 2, j..., f M, j] meet Gaussian distribution, calculate its average m j, and standard deviation sigma j, utilize the following formula can be then with f I, jBe normalized to [1,1] interval.
f i , j ( N ) = f i , j - m j σ j
After following formula normalization, each f I, jAll be transformed into and have that N (0,1) distributes
Figure B200910084336XD0000135
If use 3 σ jCarry out normalization, then Value drop on [1,1] interval probability and can reach 99%.
4, feature clustering
Feature clustering is exactly to find out representative unique point from the unique point that is dispersed in higher dimensional space in a large number to represent.These unique points can be described as the class center, and then carry out follow-up processing according to these class central points.We carry out cluster analysis to the textural characteristics of all key frames of one section video, and the fundamental purpose of cluster has following several respects:
(1) represents the number of characteristics point with base point, reach the purpose of characteristic compression;
(2) come the representation class attribute with the typical unique point of minority, can reduce the complexity of tagsort;
(3) eliminate number of key frames and change, cause unique point not line up phenomenon;
(4) eliminate the influence that the individual frame feature distorts.
The K-Means cluster is classical clustering algorithm, and the purpose of this clustering procedure is that to wish to reduce in each classification the square distance of every bit and group center poor as far as possible.
Suppose the feature point set that has one group to comprise c classification now, wherein k classification can be with set C kRepresent, suppose C kComprise n kIndividual unique point { X 1, X 2..., X Nk, this type of center is y kSuch difference of two squares e of , The kCan be defined as:
e k=∑ id 2(x i, y k), x wherein iIt is the unique point that belongs to k group.
And the total difference of two squares E of this c classification is the difference of two squares summation of each class:
E=∑?k=1-c?ek
So the method for K mean cluster is an optimized problem with regard to becoming, be exactly how to choose c classification and relevant class center, make the value of E be minimum.
Concrete K-MEANS algorithm is that the method with iteration realizes that algorithm steps is as follows:
(1) select k object as initial cluster center arbitrarily from c data object;
(2) circulation (3) to (4) is till each cluster centre convergence;
(3), calculate the distance of each object and these center object according to the average (center object) of each cluster object; And again corresponding object is divided according to minor increment;
(4) recomputate the average (center object) of each cluster
The K mean cluster has following characteristics: each cluster itself is compact as much as possible, and separates as much as possible between each cluster.
We set K=5 in the present embodiment, so the textural characteristics of video segment
Figure B200910084336XD0000141
Just represent by 5 texture feature vectors
Figure B200910084336XD0000142
Figure B200910084336XD0000143
Represent a textural characteristics cluster centre.
5, characteristic matching
Because video library has hundreds and thousands of sample videos usually, has identical content in order to judge video segment to be measured and which sample video, can adopt the nearest neighbor classifier of band threshold value.If the minor increment d between test video Q and the arbitrary sample video Min<θ, θ are the threshold value of nearest neighbor classifier, and then video Q is identified as the sample video nearest with it, otherwise Q is judged as not identical with any sample video.Figure 3 shows that video identification apparatus structure synoptic diagram of the present invention, as shown in the figure, video identification device of the present invention is mainly formed by two parts cascade, the video signal collective part of front end for broadcasting, gather ts stream (Transport stream transport stream) signal in real time, the rear end is a sample video identification part.In the present embodiment, video signal collective is partly gathered digital television signal, behind the demultiplexing video pes (packet elementary stream) bag is sent to sample video identification search engine by network, the sample video recognition engine to video data decode, feature extraction and content recognition, and recognition result write database.In the embodiment of the invention, video acquisition and identification co-operation are in a high-performance calculation machine equipment.
The present invention is by setting up the sample pattern storehouse of known video, then with broadcast the stream from Digital Television or the video clips that intercepts in the network flow-medium carries out quick content match, thereby detect whether contain known video in the broadcast items in real time.Thereby the method, device that proposes of the present invention can be widely used in detecting the known video segment in real time from the video flowing that broadcasts such as digital TV programme, and is first-class as advertisement, sheet.
Should be understood that simultaneously the present invention's scope of asking for protection is illustrated in the appended claims, and can not be with the foregoing description of instructions as restriction, every conspicuous modification within aim of the present invention is also due within protection scope of the present invention.

Claims (9)

1. a video detecting method is characterized in that comprising the steps:
The first step, the standard duration L that determines the sample video and assay intervals T;
Second step, in the video flowing that receives, selected moment from this current check point, intercepts the long video clips of L+T forward as a surveyed area as check point; And in the sample video, intercept the long video segment of L as the master sample video;
The 3rd step, extract the video features of described master sample video, judge whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video flowing of reception if exist; If do not exist and then think and do not comprise this sample video in this surveyed area, and carried out for the 4th step;
The 4th goes on foot, chooses in a manner described next surveyed area, judge and whether have video clips isometric with described master sample video and that video features is consistent in this surveyed area, then think and comprise this sample video in the video flowing that receives if exist, then do not continue to choose next surveyed area and detect if do not exist, reach predetermined number up to the surveyed area number of using, if do not find video segment isometric with described master sample video and that video features is consistent this moment yet, then think not comprise this sample video in the video flowing that receives;
The standard duration of described sample video is not more than the length of sample video, and described assay intervals refers to implement the time interval between twice detection, i.e. distance between two adjacent surveyed area check points, and the check point of each surveyed area is uniform on time shaft.
2. video detecting method according to claim 1 is characterized in that:
The standard duration of described sample video equals the length of sample video.
3. video detecting method according to claim 1 is characterized in that:
In second step, during intercepting master sample video, be that the starting point from the sample video intercepts backward.
4. according to claim 1,2 or 3 described video detecting methods, it is characterized in that, in the 3rd step, is to extract video features in the following way:
A, frame of video is sampled;
B, calculate the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, make the codomain of respectively tieing up component of textural characteristics all drop between designation area;
C, with each the sampling frame color characteristic and textural characteristics carry out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.
5 video detecting methods according to claim 4 is characterized in that:
In step B, between described designation area [0,1] interval.
6. video detecting method according to claim 4 is characterized in that:
In step B, the method for normalized is Gauss's normalization method.
7. video detecting method according to claim 4 is characterized in that:
In step C, the clustering method of employing is the K-Means clustering procedure.
8. according to claim 1,2 or 3 described video detecting methods, it is characterized in that:
According to this method, when having a more than sample video to be contemplated as falling with in the video flowing of reception, then think only to comprise in video features and the described surveyed area the best sample video of video segment consistance accordingly in the video flowing that receives.
9. video detecting device is characterized in that:
This device comprise broadcast video acquisition module, surveyed area is chosen module and sample video identification module;
Wherein, described broadcast video acquisition module is used to gather the video data of broadcast;
Described surveyed area is chosen module and is used for the described video data gathered, selected as required several surveyed areas, surveyed area all is from this selected regional check point, intercept forward that the long video clips of L+T obtains, each regional check point evenly distributes on time shaft, distance between adjacent two check points is T, and L is the standard duration of sample video, and it is not more than the length of sample video;
Described surveyed area is chosen module and also is used for intercepting the long video segment of L as the master sample video at the sample video;
Described sample video identification module is used to extract the video features of described master sample video, judge and whether have video clips isometric with described master sample video and that video features is consistent in the current detection zone, then think and comprise this sample video in the video that broadcasts if exist, if do not have then judge whether there is video clips isometric with described master sample video and that video features is consistent in the next surveyed area, then think and comprise this sample video in the video flowing that receives if exist, otherwise continuing to choose another surveyed area detects, if all there is not such video segment in described each surveyed area, then think not comprise this sample video in the video flowing that receives.
10. video detecting device according to claim 9 is characterized in that:
Described sample video identification module comprises that video features extracts submodule and judgement submodule;
Wherein, described video features extracts the video features that submodule is used for extracting in the following manner the corresponding video segment of described master sample video and described surveyed area:
At first, frame of video is sampled;
Secondly, calculate the color characteristic and the textural characteristics of the frame of respectively sampling, and the textural characteristics of each frame is carried out normalized respectively, the codomain of respectively tieing up component of textural characteristics is all dropped between designation area, for example on [0,1] interval;
Then, color characteristic and the textural characteristics with each sampling frame carries out clustering processing respectively, obtain the class central point of color characteristic and textural characteristics, with the class central point of color characteristic color characteristic as video, with the class central point of textural characteristics textural characteristics, then with described color characteristic and textural characteristics video features as this video as video.
Wherein, described judgement submodule is used for judging whether described surveyed area exists video segment isometric with described master sample video and that video features is consistent, thereby draws the conclusion that whether comprises this sample video in the video flowing of reception.
CN200910084336XA 2009-05-21 2009-05-21 Video detection method and device Pending CN101894251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910084336XA CN101894251A (en) 2009-05-21 2009-05-21 Video detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910084336XA CN101894251A (en) 2009-05-21 2009-05-21 Video detection method and device

Publications (1)

Publication Number Publication Date
CN101894251A true CN101894251A (en) 2010-11-24

Family

ID=43103440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910084336XA Pending CN101894251A (en) 2009-05-21 2009-05-21 Video detection method and device

Country Status (1)

Country Link
CN (1) CN101894251A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020153A (en) * 2012-11-23 2013-04-03 黄伟 Advertisement identification method based on videos
CN103065661A (en) * 2012-09-20 2013-04-24 中华电信股份有限公司 Signal detection method for recording medium
CN103853795A (en) * 2012-12-07 2014-06-11 中兴通讯股份有限公司 Image indexing method and device based on n-gram model
CN104079924A (en) * 2014-03-05 2014-10-01 北京捷成世纪科技股份有限公司 Mistakenly-played video detection method and device
CN104698090A (en) * 2015-03-17 2015-06-10 芜湖凯博实业股份有限公司 Fault diagnosis method of cooling tower
CN107784321A (en) * 2017-09-28 2018-03-09 深圳市奇米教育科技有限公司 Numeral paints this method for quickly identifying, system and computer-readable recording medium
CN107943849A (en) * 2017-11-03 2018-04-20 小草数语(北京)科技有限公司 The search method and device of video file
CN108234433A (en) * 2016-12-22 2018-06-29 华为技术有限公司 For handling the method and apparatus of video traffic
CN112418167A (en) * 2020-12-10 2021-02-26 深圳前海微众银行股份有限公司 Image clustering method, device, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张文哲: "《基于内容的视频分析与检索方法研究》", 《西北工业大学硕士学位论文》 *
曹建荣,蔡安妮: "《一种压缩域中基于镜头的视频检索方法》", 《微电子学与计算机》 *
杨显锋,袁敏,刘晓蓉: "一种实时数字视频内容识别方法", 《电视技术》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065661A (en) * 2012-09-20 2013-04-24 中华电信股份有限公司 Signal detection method for recording medium
CN103020153A (en) * 2012-11-23 2013-04-03 黄伟 Advertisement identification method based on videos
CN103853795A (en) * 2012-12-07 2014-06-11 中兴通讯股份有限公司 Image indexing method and device based on n-gram model
CN104079924A (en) * 2014-03-05 2014-10-01 北京捷成世纪科技股份有限公司 Mistakenly-played video detection method and device
CN104698090A (en) * 2015-03-17 2015-06-10 芜湖凯博实业股份有限公司 Fault diagnosis method of cooling tower
CN108234433A (en) * 2016-12-22 2018-06-29 华为技术有限公司 For handling the method and apparatus of video traffic
CN107784321A (en) * 2017-09-28 2018-03-09 深圳市奇米教育科技有限公司 Numeral paints this method for quickly identifying, system and computer-readable recording medium
CN107784321B (en) * 2017-09-28 2021-06-25 深圳市快易典教育科技有限公司 Method and system for quickly identifying digital picture books and computer readable storage medium
CN107943849A (en) * 2017-11-03 2018-04-20 小草数语(北京)科技有限公司 The search method and device of video file
CN107943849B (en) * 2017-11-03 2020-05-08 绿湾网络科技有限公司 Video file retrieval method and device
CN112418167A (en) * 2020-12-10 2021-02-26 深圳前海微众银行股份有限公司 Image clustering method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN101894251A (en) Video detection method and device
CN102317957B (en) Improved image identification
Ye et al. Fast and robust text detection in images and video frames
US8150164B2 (en) System and method for identifying image based on singular value decomposition and feature point
US10127454B2 (en) Method and an apparatus for the extraction of descriptors from video content, preferably for search and retrieval purpose
US6744922B1 (en) Signal processing method and video/voice processing device
CN101957920B (en) Vehicle license plate searching method based on digital videos
CN102176208B (en) Robust video fingerprint method based on three-dimensional space-time characteristics
CN103605991A (en) Automatic video advertisement detection method
CN100365661C (en) Signal processing method and equipment
CN103390040A (en) Video copy detection method
US11330329B2 (en) System and method for detecting and classifying direct response advertisements using fingerprints
CN104268590B (en) The blind image quality evaluating method returned based on complementary combination feature and multiphase
CN101937506A (en) Similar copying video detection method
CN101821753B (en) Enhanced image identification
CN106557728A (en) Query image processing and image search method and device and surveillance
US10733453B2 (en) Method and system for supervised detection of televised video ads in live stream media content
CN101711394A (en) High performance image identification
CN100515048C (en) Method and system for fast detecting static stacking letters in online video stream
CN102306179B (en) Image content retrieval method based on hierarchical color distribution descriptor
US10719715B2 (en) Method and system for adaptively switching detection strategies for watermarked and non-watermarked real-time televised advertisements
CN106066887A (en) A kind of sequence of advertisements image quick-searching and the method for analysis
CN103679170A (en) Method for detecting salient regions based on local features
Hesson et al. Logo and trademark detection in images using color wavelet co-occurrence histograms
Ouali et al. Robust video fingerprints using positions of salient regions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20101124