CN104077590A

CN104077590A - Video fingerprint extraction method and system

Info

Publication number: CN104077590A
Application number: CN201410307572.4A
Authority: CN
Inventors: 吴金勇; 孙威; 王军
Original assignee: China Security and Surveillance Technology PRC Inc
Current assignee: China Security and Surveillance Technology PRC Inc
Priority date: 2014-06-30
Filing date: 2014-06-30
Publication date: 2014-10-01

Abstract

The invention belongs to the technical field of communication information security, and provides a video fingerprint extraction method and system. The method comprises the steps of conducting preprocessing, and extracting Y-channel information in video information; obtaining three levels of information in the Y-channel information, obtaining different pieces of feature information for the three levels of information respectively, and conducting Hash feature extraction on the obtained different pieces of feature information respectively to obtain a video clip information Hash code, a frame image information Hash code and a motion object level information Hash code; integrating the video clip information Hash code, the frame image information Hash code and the motion object level information Hash code to organize and construct dendritical structures. According to the video fingerprint extraction method and system, the one-to-one corresponding relation is built according to the integration of the three different levels of Hash codes and the actual structure of video clips, and the dendritical structures are constructed; according to comparison of the dendritical structures of the Hash codes, whether frame information is tampered or not can be directly judged, the tampered position can be rapidly positioned, the tampering detection capacity is high, and security is high.

Description

A kind of method for extracting video fingerprints and system

Technical field

The present invention relates to communication message safety technical field, relate in particular to a kind of method for extracting video fingerprints and system.

Background technology

Along with the development of human society, public safety problem more and more causes people's attention.Video monitoring system is more and more applied among real life as a kind of supplementary means fast and effectively that solves public safety problem, has also produced numerous and jumbled video information when its application offers convenience.The information transmission technology has obtained significant progress simultaneously, and a large amount of information can be mutual in different shared among users.Yet, in message transmitting procedure, need by public passage, and the security of public passage cannot be effectively guaranteed and cause receiving party can not clearly receive the security of information.In the face of the demand of the safety evaluatio of the video information of enormous amount, rely on that manually to evaluate be an almost impossible mission.

Video finger print extractive technique, by video is processed, generates one section of unique fingerprint, evaluates the security of video by the comparison of fingerprint.The extracting method of video finger print is the gordian technique in video information safety evaluation system, and this technology characterizes video by the extraction of video content features being obtained to one section of string of binary characters.The video that the method possesses similar content can access the characteristic of similar character string.Current method for extracting video fingerprints is mainly by extracting the key frame images in video, to key frame, utilize the technology of image perception Hash to carry out the feature extraction of perception Hash, then feature key frame being obtained is carried out the series connection of Hash coding and is obtained video finger print, the advantage of these class methods is to utilize the image perception salted hash Salted of existing comparative maturity, realize the similarity comparison of video, its shortcoming is only key frame images to be carried out the extraction of Hash feature, and face distorting of non-key frame, can not effectively be authenticated, easily make authentication result occur mistake.Therefore prior art security aspect video authentication is lower, distorts detectability not strong.

Summary of the invention

In view of this, the present invention proposes a kind of method for extracting video fingerprints and system, by extracting the Y channel information in video information, obtain respectively the different characteristic information in video segment information, frame image information and Moving Objects level information, and different characteristic informations is carried out to Hash obtain multi-level Hash codes, according to the practical structures of video segment and multi-level Hash codes, set up relation one to one, build tree structure, can orient fast tampered position, distort detectability strong, safe.

To achieve these goals, the invention provides a kind of method for extracting video fingerprints, comprise step:

Y channel information is extracted in pre-service in video information;

Obtain three hierarchical informations in Y channel information, described three hierarchical informations are respectively video segment information, frame image information and Moving Objects level information; Described three hierarchical informations are obtained respectively to different characteristic informations;

Respectively the described different characteristic information obtaining is carried out to Hash feature extraction, obtain video segment information Hash codes, frame image information Hash codes, Moving Objects level information Hash codes;

Video segment information Hash codes, frame image information Hash codes and Moving Objects level information Hash codes are gathered, and tissue construction tree structure, mode for the information of same level with cascade connects, and the information of different levels connects with pointer according to the hypotaxis between different levels information in video information.

Wherein, the video segment information of obtaining described in Y channel information comprises:

Sequence of image frames in video segment is fixed to frame period sampling;

The sequence of image frames that sampling is obtained carries out weight addition, obtains time-space domain frame.

Wherein, described video segment information is carried out Hash feature extraction and is comprised:

Utilize horizontal operator G _xwith vertical operator G _ycarry out convolution with time-space domain frame respectively, obtain respectively horizontal gradient image and VG (vertical gradient) image;

Wherein, horizontal operator G _xwith vertical operator G _ybe respectively:

G_{x} = (\begin{matrix} - 1 & 0 & 1 \\ - 1 & 0 & 1 \\ - 1 & 0 & 1 \end{matrix})

G_{y} = (\begin{matrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ - 1 & - 1 & - 1 \end{matrix})

Respectively horizontal gradient image and VG (vertical gradient) image are carried out to piecemeal;

Calculate the average gray of pixel in the piece of each piecemeal, and to the average gray of pixel in the piece of each piecemeal, adopt the method for adaptive threshold to carry out thresholding to obtain video segment information Hash codes.

Wherein, the method for described adaptive threshold is specially: adopt the median of the average gray of pixel in the piece of each piecemeal as adaptive threshold, the average gray of piecemeal is greater than described adaptive threshold and gets 1, otherwise gets 0.

Wherein, described in, obtaining frame image information comprises:

Whole two field picture is normalized to size;

Each image after normalization size is carried out to piecemeal and obtain image block;

Calculate the related coefficient of each image block and R-matrix.

Wherein, described in obtain Moving Objects level information and comprise Moving Objects region is extracted, Moving Objects region is extracted specifically and is comprised:

Adopt the modeling of many Gaussian Background: regard each pixel as many Gausses weight distribution, gaussian probability distribution function is as follows:

p (I (i, j)) = Σ_{k = 1}^{K} w_{ij, t}^{k} η [I (i, j), μ_{ij, t}^{k}, {(σ_{ij, t}^{k})}^{2}]

Wherein represent that t is positioned at the weight of k the gaussian component of many Gaussian distribution of (i, j) point constantly, meets

Σ_{k = 1}^{K} w_{ij, t}^{k} = 1

And the Gaussian probability-density function that represents I (i, j), wherein represent that respectively t is positioned at average and the variance of k the gaussian component of many Gaussian distribution of (i, j) point constantly;

Average using the value of pixel in the first two field picture as first gaussian component of many Gaussian Background, and its variance and weight are made as predetermined value, and according to size sorts;

Since the second two field picture, in each later two field picture, the value of each pixel is all mated with many Gaussian Background model.

Wherein, the described value by each pixel in each later two field picture of the second two field picture all comprises with the matching process that many Gaussian Background model mates:

Judge whether current pixel point meets the Gaussian distribution in background model, if met, be considered as background and upgrade average, variance and the weight of the Gaussian distribution being satisfied in background model;

If do not met, be considered as target, and the average using the value of current pixel point as new Gaussian distribution, and set its variance and weight.

Wherein, the Hash feature of described extraction Moving Objects level information comprises:

Size normalization is carried out in Moving Objects region, obtain the small images of n*n size, wherein n is integer;

Adopt Block DCT algorithm to carry out image processing to Y channel information;

Extract 1 DC coefficient and K ac coefficient, wherein the integer of K ∈ [5,9];

Obtain thus:

feature ^T＝{Y ₀,Y ₁,Y ₂,...,Y _i,...Y _K}

Y wherein ₀the DC coefficient that represents Y passage, Y _ibe i ac coefficient of Y passage;

N*n ordinal number formed to a FEATURE matrix:

FEATURE = {{feature}_{1}^{T}, {feature}_{2}^{T}, . . . {feature}_{i}^{t} . . ., {feature}_{n^{2}}^{T}}

Wherein the AC and DC coefficient set that represents i small images;

Every row of FEATURE matrix is chosen to the method for adaptive threshold and carried out binaryzation, obtain the binaryzation matrix of FEATURE matrix.

Wherein, described according to the hypotaxis between different levels information in video information, different levels information is connected and comprised with pointer: the storage unit that the pointed in storage unit in video segment information Hash codes layer is represented to the Hash codes of the first two field picture in two field picture Hash codes layer in video segment or the Hash codes place of last frame image; By the storage unit at first Moving Objects level information Hash codes of this two field picture in the pointed Moving Objects Hash codes layer in the storage unit in frame image information Hash codes layer or last Moving Objects level information Hash codes place.

To achieve these goals, the present invention also provides a kind of video finger print extraction system, comprising:

Pretreatment module, for video information is carried out to pre-service, extracts the Y channel information in video information;

Acquisition module, for obtaining three hierarchical informations of Y channel information, described three hierarchical informations are respectively video segment information, frame image information and Moving Objects level information, and described three hierarchical informations are obtained respectively to different characteristic informations;

Processing module, for respectively the described different characteristic information obtaining being carried out to Hash feature extraction, obtains video segment information Hash codes, frame image information Hash codes, Moving Objects level information Hash codes;

Build module, video segment information Hash codes, frame image information Hash codes and Moving Objects level information Hash codes are gathered, and tissue construction tree structure, mode for the information of same level with cascade connects, and the information of different levels connects with pointer according to the hypotaxis between different levels information in video information.

Described method for extracting video fingerprints provided by the invention and system are by extracting Y channel information to video information, and obtain video segment information, different characteristic information in frame image information and Moving Objects level information, respectively described different characteristic information is carried out to Hash, obtain Hash codes corresponding to each information, in conjunction with the Hash codes set of three different levels and set up relation one to one according to the practical structures of video segment, build tree structure, thereby when in video segment, arbitrary storage unit of arbitrary levels changes, all can be embodied directly in the variation of certain branch of whole tree structure or the Hash codes unit of certain leaf node.By the contrast to Hash codes tree structure, can directly judge frame information and whether be tampered, can orient fast tampered position, distort detectability strong, safe.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is embodiment of the present invention method for extracting video fingerprints process flow diagram;

Fig. 2 is that embodiment of the present invention video segment information is carried out Hash feature extraction process flow diagram;

Fig. 3 is that embodiment of the present invention frame image information carries out Hash feature extraction process flow diagram;

Fig. 4 is that embodiment of the present invention Moving Objects level information is carried out Hash feature extraction process flow diagram;

Fig. 5 is the mode figure that embodiment of the present invention binary matrix is cascaded as Hash codes;

Fig. 6 is the multi-level Hash codes organizational form of embodiment of the present invention figure;

Fig. 7 is embodiment of the present invention video finger print extraction system structural drawing.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.

Embodiment mono-:

Refer to Fig. 1 to Fig. 4, it,, for a kind of method for extracting video fingerprints that the embodiment of the present invention provides, specifically comprises the steps:

Y channel information is extracted in S10, pre-service in video information;

It should be noted that: in YUV color space, Y channel information has determined the lightness of color, U channel information and V channel information have determined that color itself is colourity.The angle of learning from human cognitive, the cognitive sensitivity of the mankind to video information, monochrome information will be far longer than colouring information, therefore the perception Hash feature extraction in the present embodiment is only processed monochrome information, only in video information, extracts Y channel information.

In the present embodiment, original video information can be that YUV color space can be also other color space.If YUV color space directly extracts its Y channel information, if other color space needs to be first converted into YUV color space, and then extract Y channel information.In the present embodiment, the resolution of institute's input video can be but be not limited to CIF, D1,720p, 1080p etc., and frame per second does not limit equally.

S20, the Y channel information extracting in video information is divided into three hierarchical informations of refinement gradually, described three hierarchical informations are respectively video segment information, frame image information and Moving Objects level information; And by different information representation modes, described three hierarchical informations are obtained respectively to different characteristic informations respectively;

S30 carries out Hash to the different characteristic information extracting respectively, obtains corresponding video segment information Hash codes, frame image information Hash codes, Moving Objects level information Hash codes.Detailed process is as follows:

S201, obtain the video segment information in Y channel information;

Video segment information is represented by time-space domain information, and the extracting method of time-space domain information comprises following two steps:

(a) first the sequence of image frames in video segment is fixed to frame period sampling;

Because the time-space domain information of video is mainly reflected in interframe, change, and variation between consecutive frame is general

Obvious not, therefore adopt anchor-frame interval to sample, sampling interval is 3-10 frame, in this reality

Executing preferred every 5 frames in example once samples.

(b) sequence of image frames then sampling being obtained carries out weight addition, obtains time-space domain frame;

Concrete grammar can show with following formula table:

F (m, n) = Σ_{k = 1}^{J} w_{k} F (m, n, k)

Wherein, after F (m, n, k) representative sampling, in k frame, coordinate is the gray-scale value that (m, n) locates, w _kbe the weight of k frame, F (m, n) represents the gray-scale value that time-space domain information (m, n) is located.W in this preferred embodiment _kuse exponential function γ ^krepresent, and preferred γ=0.6.

So, by above (a), (b) two steps, can obtain the video segment information in Y passage that used time spatial information (si) represents.

S202, video segment information is carried out to Hash feature extraction;

In this preferred embodiment, utilize horizontal operator G _xwith vertical operator G _ycarry out convolution with time-space domain frame respectively, obtain respectively horizontal gradient image and VG (vertical gradient) image.Wherein, horizontal operator G _xwith vertical operator G _ybe respectively:

G_{x} = (\begin{matrix} - 1 & 0 & 1 \\ - 1 & 0 & 1 \\ - 1 & 0 & 1 \end{matrix})

G_{y} = (\begin{matrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ - 1 & - 1 & - 1 \end{matrix})

Respectively horizontal gradient image and VG (vertical gradient) image are carried out to piecemeal, preferred 4*4 piecemeal herein, then calculate the average gray of the interior pixel of piece of each piecemeal, to the average gray of pixel in the piece of each piecemeal, adopt the method for adaptive threshold to carry out thresholding again, the method of adaptive threshold is specially in the present embodiment: adopt the median MedianNumber of the average gray of pixel in the piece of each piecemeal as adaptive threshold, the average gray of piecemeal is greater than this adaptive threshold and gets 1, otherwise get 0, obtain like this video segment information Hash codes VideoclipHash.This video segment information Hash codes VideoclipHash only has 4*4*2=32 binary digit, as rough an expression of video segment information integral body.

S203, obtain the frame image information of video segment;

Frame image information is directly represented by whole two field picture, and every two field picture is extracted to its rough textural characteristics.Rough textural characteristics adopts the related coefficient of image block and R-matrix to represent, Gauss's low pass matrix R of the preferred 8*8 of R-matrix in the present embodiment, standard deviation δ _r=0.5,

Gauss's low pass matrix R is specifically expressed as follows:

R = [\begin{matrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 8.09 E - 05 & 0.0044 & 0.0044 & 8.09 E - 05 & 0 & 0 \\ 0 & 0 & 0.0044 & 0.241 & 0.241 & 0.0044 & 0 & 0 \\ 0 & 0 & 0.0044 & 0.241 & 0.241 & 0.0044 & 0 & 0 \\ 0 & 0 & 8.09 E - 05 & 0.0044 & 0.0044 & 8.09 E - 05 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}]

In order to guarantee to process image block and R-matrix has formed objects, by whole two field picture normalization size, be 64*64 in the present embodiment, this image is carried out to the piecemeal of 8*8, obtain the image block of 64 8*8, calculate the related coefficient of each image block and R-matrix:

ρ = \frac{Σ_{i = 1}^{n} (I (i) - μ_{I}) (R (i) - μ_{R})}{\sqrt{Σ_{i = 1}^{n} {(I (i) - μ_{I})}^{2}} \sqrt{Σ_{i = 1}^{n} {(R (i) - μ_{R})}^{2}}}

μ wherein _ithe partitioned matrix I standard deviation, the μ that represent each image block _rthe standard deviation that represents Gauss's low pass matrix R.I (i), R (i) are respectively i data value in partitioned matrix I and Gauss's low pass matrix R.The rough textural characteristics that can obtain so whole frame image information represents:

feature _T＝{ρ ₁，ρ ₂，…ρ _i…ρ ₆₄}

ρ wherein _iinteger in middle i ∈ [1,64].

S204, the rough textural characteristics of frame image information is carried out to Hash feature extraction;

Method to the rough textural characteristics of frame image information by adaptive threshold quantizes, and the method for adaptive threshold is specially in the present embodiment: ask for feature _tin each element ρ _imedian MedianNumber, adopt median MedianNumber as adaptive threshold, for feature _teach element ρ _ibe greater than median MedianNumber and get 1, otherwise get 0, as rough an expression of rough textural characteristics.Then according to the position of image block from top to bottom, concatenated in order from left to right obtains the frame image information Hash codes Imagehash of each two field picture, and the length of this Hash codes is 64 binary digits in the present embodiment.The frame image information Hash codes collection that two field picture under whole video segment generates can be expressed as:

Imagehash ⁿ＝{Imagehash ₁，Imagehash ₂，…，Imagehash _n}

Wherein n is the quantity of two field picture in video segment.

S205, obtain Moving Objects level information in video segment;

It should be noted that: from general cognitive science angle, people is divided into global information and two aspects of local message for the cognition of the picture frame in video, and human visual system is the highest to the Moving Objects susceptibility in video.Based on this, the present invention adopts Moving Objects to carry out the local message in presentation video.Specifically comprise the steps:

S2051, Moving Objects extracted region;

Particularly, Moving Objects region adopts many Gaussian Background modeling method to extract.Regard each pixel as many Gausses weight distribution, probability distribution function is as follows:

p (I (i, j)) = Σ_{k = 1}^{K} w_{ij, t}^{k} η [I (i, j), μ_{ij, t}^{k}, {(σ_{ij, t}^{k})}^{2}]

Σ_{k = 1}^{K} w_{ij, t}^{k} = 1

And the Gaussian probability-density function that represents I (i, j), wherein represent that respectively t is positioned at average and the variance of k the gaussian component of many Gaussian distribution of (i, j) point constantly.

S2052, in the initialized process of many Gaussian Background template, the average using the value of pixel in the first two field picture as first gaussian component of many Gaussian Background, and its variance and weight are made as predetermined value, and according to size sorts;

S2053, since the second two field picture, in each later two field picture, the value of each pixel is all mated with many Gaussian Background model, matching process is specific as follows:

Judge whether current pixel point meets the Gaussian distribution in background model, if met, be considered as background and upgrade average, variance and the weight of the Gaussian distribution being satisfied in background model; If do not met, be considered as target, the average using the value of current pixel point as new Gaussian distribution, and set its variance and weight.

Through constantly upgrading, just complete and set up many Gaussian Background model like this, by this model, just can extract the moving target of each frame in video.

Through the method, obtain Moving Objects region and represented by n the rectangle frame perpendicular to coordinate axis, rectangle frame is comprised of four parameters:

rect _i＝{x _i，y _i，width _i，height _i}i＝1,2，…，n

(x wherein _i, y _i) represent the left upper apex coordinate of rectangle frame, width _i, height _ibe respectively the wide and high of rectangle frame.The Moving Objects set of regions obtaining like this can be expressed as follows:

rect ⁿ＝{rect ₁，rect ₂，…，rect _n}

Rect wherein _nrepresent the area information of n Moving Objects.

The Hash feature of S206, extraction Moving Objects level information;

In each Moving Objects region, include an object, therefore only need to carry out Hash to extracting the differentiation feature of different motion subject area, just can in the Hash codes comparison below, determine whether object is maliciously changed.Detailed process is as follows:

First size normalization is carried out in Moving Objects region, obtain the small images of n*n size, wherein n is integer preferred n=32 or 4 here;

Extract 1 direct current (DC) coefficient and exchange (AC) coefficient with K, wherein the number of K determines susceptibility and the robustness of Hash codes, and the larger susceptibility of K is better, corresponding robustness is poorer, in order to guarantee the equilibrium of susceptibility and robustness, the integer of K ∈ [5,9] in the present embodiment;

Obtain thus:

feature ^T＝{Y ₀,Y ₁,Y ₂,…,Y _i,…Y _K}

Y wherein ₀direct current (DC) coefficient that represents Y passage, Y _iit is i interchange (AC) coefficient of Y passage;

N*n ordinal number formed to a FEATURE matrix:

FEATURE = {{feature}_{1}^{T}, {feature}_{2}^{T}, . . . {feature}_{i}^{t} . . ., {feature}_{n^{2}}^{T}}

Wherein the DC and the AC coefficient set that represent i small images;

In the present embodiment, adopt piecemeal DCT (discrete cosine transform) algorithm to carry out image processing to Y channel information.With n=4, be illustrated below, each Moving Objects region is divided into 4*4 small images.

Each small images extraction 1 DC (direct current) coefficient, during AC (interchange) the coefficient number K=7 simultaneously extracting, obtains:

feature ^T＝{Y ₀,Y ₁,Y ₂,…Y _i…,Y ₇}

Y wherein ₀the DC coefficient that represents Y passage, and Y _ii the ac coefficient that represents Y passage.

Then column vector 4*4 fritter being obtained forms the matrix of a 8*16:

FEATURE = {{feature}_{1}^{T}, {feature}_{2}^{T}, . . . {feature}_{i}^{T} . . ., {feature}_{16}^{T}}

Wherein the DC and the AC coefficient set that represent i small images.

And work as 1 DC coefficient of each small images extraction, during AC (interchange) the coefficient number K=5 simultaneously extracting, obtain:

feature ^T＝{Y ₀,Y ₁,Y ₂,…Y _i…,Y ₅}

Y wherein ₀the DC coefficient that represents Y passage, and Y _ii the DC coefficient that represents Y passage.

Then column vector 4*4 fritter being obtained forms the matrix of a 6*16:

FEATURE = {{feature}_{1}^{T}, {feature}_{2}^{T}, . . . {feature}_{i}^{T} . . ., {feature}_{16}^{T}}

Wherein the DC and the AC coefficient set that represent i small images.

And work as 1 DC coefficient of each small images extraction, during AC (interchange) the coefficient number K=9 simultaneously extracting, obtain:

feature ^T＝{Y ₀,Y ₁,Y ₂,…Y _i…,Y ₉}

Then column vector 4*4 fritter being obtained forms the matrix of a 10*16:

FEATURE = {{feature}_{1}^{T}, {feature}_{2}^{T}, . . . {feature}_{i}^{T} . . ., {feature}_{16}^{T}}

Wherein the DC and the AC coefficient set that represent i small images.

Owing to extracting the difference of AC coefficient number, the length of the Hash codes obtaining also produces corresponding variation, and the AC coefficient extracting is more, and the length of Hash codes is just longer, and the resource that takies transmission channel is just more.And aspect information extraction, AC coefficient is more, the detailed information of extraction is just more, and the susceptibility of corresponding Hash codes is just better, and robustness is just poorer simultaneously.

Performance comparison can be summed up with following formula:

Length ₅＜Length ₇＜Length ₉

Sensitive ₅＜Sensitive ₇＜Sensitive ₉

Rubust ₅＞Rubust7＞Rubust ₉

Length wherein _ithe length of Hash codes while representing K=i, Sensitive _ithe susceptibility of Hash codes while representing K=i, and Rubust _ithe robustness of Hash codes while representing K=i, the integer of i ∈ [5,9].

Therefore in the present embodiment, the susceptibility of K=9 is greater than the susceptibility of K=7, and the susceptibility of K=7 is greater than the susceptibility of K=5; The robustness of K=5 is greater than the robustness of K=7, and the robustness of K=7 is greater than the robustness of K=9, and balance robustness and susceptibility factor are considered, preferred K=7 in the embodiment of the present invention.

The method of finally every row of FEATURE matrix being chosen to adaptive threshold is carried out binaryzation, the method of the adaptive threshold in the rough textural characteristics of the method for concrete adaptive threshold and above-mentioned frame image information is basic identical, obtain the binaryzation matrix of FEATURE matrix, and then last column data that is the previous row in matrix according to the mode of joining end to end in Fig. 5 by every row in binaryzation matrix is the first row data of next line, obtain the Moving Objects level information Hash codes Objectnesshash of each subject area, the Moving Objects level information Hash codes collection that two field picture under whole video segment generates can be expressed as:

Objectnesshash ⁿ＝{Objectnesshash ₁，…，Objectnesshash _n}

Objectnesshash wherein _nthe moving object-level information Hash codes that represents n Moving Objects.

So far, complete the Hash procedure of multi-level features information in whole video segment, obtained respectively the Hash codes set of three levels.

S40, these three Hash codes from different levels information of Moving Objects level information Hash codes Objectnesshash of video segment information Hash codes Videoclip, frame image information Hash codes Imagehash and subject area are gathered, and carry out tissue construction tree structure, organizational form is organized for the information exchange of the information of same level and different levels is crossed to two kinds of different modes.Refer to Fig. 6, particularly, for the information of same level, directly in the mode of cascade, connect, and the information of different levels is according to the hypotaxis between different levels information in video information, and different levels information is connected with pointer.And specific to the storage unit of different levels, not only comprise the Hash information of this hierarchical information, also include the further feature information of describing this hierarchical information.Concrete mode is as follows:

Video segment information Hash codes layer: this layer represents the Global Information of video segment, because the Global Information of video segment only has one section of VideoclipHash and represents, therefore this layer only needs a storage unit, this cell stores has the video segment of comprising information Hash codes VideoclipHash and a pointer, and this pointed represents the storage unit at the Hash codes of the first two field picture in two field picture Hash codes layer in video segment or the Hash codes place of last frame image.

Frame image information Hash codes layer: this layer represents the Global Information of each two field picture in video segment, because video segment is comprised of multiple image, and every two field picture all generates one section of frame image information Hash codes, therefore need a plurality of storage unit to form, storage unit number determines by the frame number of video segment, and the frame image information Hash codes obtaining is in the present embodiment:

Imagehash ⁿ＝{Imagehash ₁，Imagehash ₂，…，Imagehash _n}

Imagehash wherein _nthe frame image information Hash codes that represents n two field picture.

Therefore, need n storage unit to form, between each storage unit according to the concatenated in order of place frame number.Each storage unit comprises that a segment table shows frame image information Hash codes and a pointer of this two field picture, first Moving Objects level information Hash codes of this two field picture in pointed Moving Objects Hash codes layer or the storage unit at last Moving Objects level information Hash codes place simultaneously.

Moving Objects level information Hash codes layer: this layer represents the local message set of Moving Objects place two field picture.This layer is comprised of a plurality of storage unit equally, the Moving Objects number that the number of storage unit is obtained by every two field picture determines, each Moving Objects generates one section of Moving Objects level information Hash codes, and the Moving Objects level information Hash codes collection that a two field picture in the present embodiment obtains is:

Objectnesshash ⁿ＝{Objectnesshash ₁，…，Objectnesshash _n}

Objectnesshash wherein _nthe Moving Objects level information Hash codes that represents n Moving Objects.

Therefore Moving Objects level information Hash codes layer corresponding to this two field picture needs n storage unit to form, and each storage unit under same two field picture is according to rect _iin (x _i, y _i) sort, adopt in the present embodiment mode from left to right, from top to bottom to carry out.Each storage unit comprises two parts: the positional information rect of Moving Objects Hash codes and object _i.

Method for extracting video fingerprints of the present invention is by being divided into video segment information by the Y channel information in video information, frame image information, these three of Moving Objects level information are the hierarchical information of refinement progressively, and described three hierarchical information are extracted to different characteristic informations, to video segment information extraction gray feature, frame image information is extracted to textural characteristics and the DTC coefficient characteristics to the information extraction of Moving Objects level representing by related coefficient, according to the different characteristic of described different levels, obtain the feature Hash codes of every layer, by above-mentioned multi-level Hash codes is gathered, tissue construction tree structure, and then obtain video finger print.The present invention can set up relation one to one by the practical structures of the different levels Hash information obtaining in above-mentioned steps and video segment, when so in video segment, arbitrary storage unit of arbitrary levels changes, can be embodied directly in the variation of certain branch of whole tree structure or the Hash codes unit of certain leaf node.By the contrast to Hash codes tree structure, can directly judge frame information and whether be tampered, and can orient fast tampered position, distort detectability strong, safe.

Embodiment bis-:

Refer to Fig. 7, the invention provides a kind of video finger print extraction system, comprising:

Pretreatment module, for video information is carried out to pre-service, and extracts the Y channel information in video information;

Wherein, the different characteristic information of obtaining specifically comprises extracts textural characteristics and the DTC coefficient characteristics to the information extraction of Moving Objects level representing by related coefficient to video segment information extraction gray feature, to frame image information;

Build module, video segment information Hash codes, frame image information Hash codes and Moving Objects level information Hash codes are gathered, and tissue construction tree structure, mode for the information of same level with cascade connects, and the information of different levels connects with pointer according to the hypotaxis between different levels information in video information;

Particularly, build module more and comprise storage unit and generation unit,

Storage unit, for storing Hash codes and the pointer of different information;

Generation unit, sets up relation one to one by the practical structures of the Hash information of different levels and video segment, builds tree structure.

Video finger print extraction system of the present invention is extracted Y channel information by video pre-filtering module, and obtain video segment information, different characteristic information in frame image information and Moving Objects level information, respectively described different characteristic information is carried out to Hash, obtain the Hash codes that each characteristic information is corresponding, in conjunction with the Hash codes set of three different levels and set up relation one to one according to the practical structures of video segment, build tree structure, thereby when in video segment, arbitrary storage unit of arbitrary levels changes, all can be embodied directly in the variation of certain branch of whole tree structure or the Hash codes unit of certain leaf node.By the contrast to Hash codes tree structure, can directly judge frame information and whether be tampered, and can orient fast tampered position.

Method for extracting video fingerprints of the present invention and system are extracted roughly by the general characteristic in video segment (video segment information), local feature (Moving Objects level information) is carried out to meticulous extraction, and at Hash codes tissue, retained the structure of video itself, can realize the safety evaluatio of video information, and positioning tampering region quickly and accurately.The method of the invention and system can be used for the evaluation of a series of video information safeties that transmit by public passage such as video in video monitoring system, Internet Transmission, in addition can also be for video copy detection.

In the present invention, each modular unit or various method steps can concentrate on single computer installation, or be distributed on the network that a plurality of computer installations form, or they are made into respectively to integrated circuit modules, or a plurality of modules in them or step are made into single integrated circuit module realize.The present invention is not restricted to the combination of any specific hardware and software.

With reference to the accompanying drawings of the preferred embodiments of the present invention, not thereby limit to interest field of the present invention above.Those skilled in the art do not depart from the scope and spirit of the present invention, and can have multiple flexible program to realize the present invention, such as the feature as an embodiment can be used for another embodiment, obtain another embodiment.Allly using any modification of doing within technical conceive of the present invention, be equal to and replace and improve, all should be within covering scope of the present invention.

Claims

1. a method for extracting video fingerprints, is characterized in that, comprises step:

Y channel information is extracted in pre-service in video information;

Obtain three hierarchical informations in Y channel information, described three hierarchical informations are respectively video segment information, frame image information and Moving Objects level information, and described three hierarchical informations are obtained respectively to different characteristic informations;

Video segment information Hash codes, frame image information Hash codes and Moving Objects level information Hash codes are gathered, and tissue construction tree structure, the mode for the information of same level with cascade connects,

The information of different levels connects with pointer according to the hypotaxis between different levels information in video information.

2. method for extracting video fingerprints according to claim 1, is characterized in that, described in the video segment information obtained in Y channel information comprise:

Sequence of image frames in video segment is fixed to frame period sampling;

3. method for extracting video fingerprints according to claim 2, is characterized in that, described video segment information is carried out Hash feature extraction and comprised:

Utilize horizontal operator G _xwith vertical operator G _y, carry out convolution with time-space domain frame respectively, obtain respectively horizontal gradient image and VG (vertical gradient) image;

Wherein, horizontal operator G _xwith vertical operator G _ybe respectively:

G_{x} = (\begin{matrix} - 1 & 0 & 1 \\ - 1 & 0 & 1 \\ - 1 & 0 & 1 \end{matrix})

G_{y} = (\begin{matrix} 1 & 1 & 1 \\ 0 & 0 & 0 \\ - 1 & - 1 & - 1 \end{matrix})

4. method for extracting video fingerprints according to claim 3, it is characterized in that, the method of described adaptive threshold is specially: adopt the median of the average gray of pixel in the piece of each piecemeal as adaptive threshold, the average gray of piecemeal is greater than described adaptive threshold and gets 1, otherwise gets 0.

5. method for extracting video fingerprints according to claim 1, is characterized in that, described in obtain frame image information and comprise:

Whole two field picture is normalized to size;

Calculate the related coefficient of each image block and R-matrix.

6. method for extracting video fingerprints according to claim 1, is characterized in that, described in obtain Moving Objects level information and comprise Moving Objects region is extracted, Moving Objects region is extracted specifically and is comprised:

p (I (i, j)) = Σ_{k = 1}^{K} w_{ij, t}^{k} η [I (i, j), μ_{ij, t}^{k}, {(σ_{ij, t}^{k})}^{2}]

Σ_{k = 1}^{K} w_{ij, t}^{k} = 1

7. method for extracting video fingerprints according to claim 6, is characterized in that, the described value by each pixel in each later two field picture of the second two field picture all comprises with the matching process that many Gaussian Background model mates:

8. method for extracting video fingerprints according to claim 1, is characterized in that, the Hash feature of described extraction Moving Objects level information comprises:

Obtain thus:

feature ^T＝{Y ₀,Y ₁,Y ₂,...,Y _i,...Y _K}

N*n ordinal number formed to a FEATURE matrix:

FEATURE = {{feature}_{1}^{T}, {feature}_{2}^{T}, . . . {feature}_{i}^{t} . . ., {feature}_{n^{2}}^{T}}

Wherein the AC and DC coefficient set that represents i small images;

9. according to the method for extracting video fingerprints described in claim 1～8 any one, it is characterized in that, the hypotaxis in described video information between different levels information connects and comprises with pointer: the storage unit that the pointed in storage unit in video segment information Hash codes layer is represented to the Hash codes of the first two field picture in two field picture Hash codes layer in video segment or the Hash codes place of last frame image; By the storage unit at first Moving Objects level information Hash codes of this two field picture in the pointed Moving Objects Hash codes layer in the storage unit in frame image information Hash codes layer or last Moving Objects level information Hash codes place.

10. a video finger print extraction system, is characterized in that, comprising: