CN106055632B

CN106055632B - Video authentication method based on scene frame fingerprint

Info

Publication number: CN106055632B
Application number: CN201610367884.3A
Authority: CN
Inventors: 毛家发; 张明国; 钟丹虹; 高飞; 肖刚
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2016-05-27
Filing date: 2016-05-27
Publication date: 2019-06-14
Anticipated expiration: 2036-05-27
Also published as: CN106055632A

Abstract

Based on the video authentication method of scene frame fingerprint, first by scene frame fingerprint determination method, 5 continuous different scene frame fingerprints in video clips is extracted, video finger print is formed.Then first finger print data is formed with the id information of video itself, finger print information is stored using Bag-words form, saves 75% memory space.In searching verification process, using the row's of falling text, technology improves matching speed by half.Through emulation experiment, it is proposed that video authentication method there is good detection performance, Average Accuracy reaches 98% or more, and certification every video of speed is searched under Matlab soft environment at 12 seconds or so, can be realized the real-time detection under network environment.

Description

Video authentication method based on scene frame fingerprint

Technical field

The invention belongs to video authentication technology fields, disclose one kind and carry out video authentication under new media environment, hit Pirate new method.

Background technique

Most current digital video works use data ciphering method, and digital video content is encrypted, is only awarded The key that power user can just be decrypted.However, data encryption technology faces the problem of being stolen in cipher key transmitting process, once It is stolen that digital video is unable to get protection.The appearance of digital watermark technology can solve the problem of key is lost.Digital watermarking Technology is that hidden label is embedded in digital content, is extracted and is matched by detection instrument, realizes copyright protection purpose.But Digital watermarking product is not strong in the intentional or unintentional attacking ability of resistance at present, and robustness is not firm, greatly restricts number The application development of word digital watermark.

Fingerprint technique can make up the deficiency of encryption technology and digital watermark technology.Video finger print, which refers to, can represent one section The digital signature of vision signal very important visual feature, main purpose are to establish a kind of effective mechanism to compare two video datas Perceived quality.Pay attention to video data itself directly relatively usually very not bigger here, it is corresponding usually smaller to compare it Digital finger-print.

What video finger print technology was particular about is accuracy, robustness, fingerprint size, granularity, certification speed and versatility.Accurately Property includes correct recognition rata, false alarm rate, false dismissed rate；Robustness refers to that unknown video can be subjected to more serious video frequency signal processing After still be able to be identified；Fingerprint size largely determines the inherent capacity of fingerprint database；Granularity be one according to The parameter of Lai Yu application, that is, need unknown video clips how long to identify whole video；The video of practical, commercial is referred to For line system, certification speed is a crucial parameter；Versatility is to refer to carry out recognition capability to different video format. Around these characteristics, numerous scholars set about in terms of the time-space domain of video, airspace, time domain and color space, expand video and refer to The research of line technology achieves gratifying research achievement.In recent years, fingerprint technique is in copyright authentication, copy monitoring, multimedia inspection Rope and tracing pirate etc. are widely used, and vast Study on Fingerprint person proposes many video fingerprinting algorithms, summarize Existing video fingerprinting algorithms can be summarized as 4 classes: color space (color-space-based), time domain (temporal), sky Domain (spatial) and time-space domain (spatio-temporal).

Color space fingerprint extraction method is dependent on the color histogram in video time-space domain.Utilize the color of video clips Statistical property carries out video finger print extraction.But the present vedio color overwhelming majority is 24 true color, statistical magnitude is excessively It is huge, hinder the speed of fingerprint extraction.And different video formats its color can generate apparent change, and it is still more colored Space fingerprint extraction is not applied for black and white video, therefore this method is not applied widely.

Time domain fingerprint extraction method mainly from video sequence from extract time domain specification.This method needs longer video Sequence is not suitable for video clips in short-term.But short-time video is fairly common on webpage now, therefore time domain fingerprint is not It is adapted to online (online) application.

Airspace fingerprint method is to extract feature from each frame or key frame, these methods are similar to finger image method. Airspace fingerprint is divided into global fingerprint and local fingerprint again, and global fingerprint forgives global property, such as image histogram statistical property. The main local feature for extracting image of local fingerprint, such as the partial interest point in frame image, these points of interest are usually answered Use the target retrieval in multimedia.But extract point of interest and need to pre-process image, and video frame enormous amount, This will expend a large amount of calculator memory, therefore this fingerprint extraction method is rarely applied to video field.

Time-space domain fingerprint has forgiven the time domain and spatial information (si) of video, therefore time-space domain fingerprint performance is better than time domain and sky Domain.Mainly there are 3D-DCT, TIRI-DCT, 3D-STIP currently based on time-space domain fingerprint extraction method.These comprehensive video finger prints are calculated Method, they can be reasonably resistant to some common attacks to a certain extent, such as resolution ratio reduce, frame per second reduce, plus make an uproar, Brightness change, contrast change etc., but their certification energy to recodification, reacquisition plus the attacks such as Logo/Text, picture-in-picture Power is limited.

Summary of the invention

The present invention will overcome the disadvantages mentioned above of the prior art, provide a kind of video authentication method based on scene frame fingerprint.

Video authentication method of the present invention based on scene frame fingerprint, comprising the following steps:

1), to the pretreatment of the frame of video；

(1.1) color space conversion is carried out to the color framing in video, takes its luminance component, obtains gray level image；

(1.2) video frame surrounding is sheared, video frame central part is retained；It is scaled to again with fixed dimension (W × H picture Element)；

(1.3) video frame is filtered with 3 × 3 sizes, the Gaussian low-pass filtering that standard deviation is 0.95；

(1.4) by image scaling at 3/4QCIF size (QCIF (144 × 176 pixel)).

2), fingerprint extraction is carried out to by pretreated video frame, comprising the following steps:

(2.1) to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 region, a to h is local pixel Be averaged；So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion；(2) four difference element a-b, C-d, e-f and g-h；720 frame elements are always obtained, wherein 144 mean value elements, are denoted as element A, 576 difference elements are denoted as D element；

(2.2) four weight values are quantized into element A；For the element A of 1-144 dimension, if A_iFor element A value, using formula (1) These element As are quantized into four weight values x_i:

(2.3) threshold value ThA is dynamically sought, including the following steps:

(2.3.1) takes a_i=abs (A_i- 128), abs () is the operator that takes absolute value, by a_iA is arranged in by ascending order_k= {a₁,a₂,…,a_k,…,a_N}；Here index i and index k be not identical；

(2.3.2) threshold value ThA=a_k, k=floor (0.25*N), N=144, floor are to be rounded downwards here；

(2.4) to D Quantification of elements at four weight values；For the D element D of 145-720 dimension_i, they are quantified using formula (2) At four weight values x_i:

(2.5) threshold value ThD is dynamically sought, including the following steps:

(2.5.1) takes d_i=abs (D_i), by d_iD is arranged in by ascending order_k={ d₁,d₂,…,d_k,…,d_N}；Here index i It is not identical as index k；

(2.5.2) threshold value ThD=d_k, k=floor (0.25*N), N=576, floor are to be rounded downwards here；

(2.6) the 4 heavy element X={ x extracted are stored with binary coded form₁,x₂,…,x₇₂₀}

If word_i, i=1,2 ..., 180, which are defined as every 4- dimension element, accounts for 1 coding unit, and this coding mode is using such as Lower formula is calculated:

word_i=4³*x_(i-1)*4+1+4²*x_(i-1)*4+2+4*x_(i-1)*4+3+x_(i-1)*4+4 (3)

(2.7) extraction algorithm of scene frame fingerprint, comprising the following steps:

(2.7.1) whether be blank screen judgement；Applying equation (4) carries out blank screen judgement；

mean(F)<Th_BS (4)

Mean (F) is the mean value for indicating image pixel, Th_BSIt is blank screen threshold value；

(2.7,2) whether be scene frame judgement；Assuming that the fingerprint of previous scenario frame is SF_i-1, the fingerprint of present frame is F_i, i=2 ..., 5；If (5) set up, decide that present frame is another scene frame, otherwise present frame or previous scenario Frame；

d(SF_i-1,F_i)≥Th_SF, i=2 ..., 5 (5)

Here d (SF_i-1,F_i) indicate present frame fingerprint F_iPrevious scenario frame fingerprint SF_i-1The distance between, Th_SFTo determine Threshold value；

3) foundation in video finger print library；The user information, product information and finger print information of copyright authentication video will be needed to tie up It is scheduled on a record, generates metadata (meta data), collection of metadata constitutes metadatabase, it is advised by the row's of pushing down text It is then ranked up and stores；

4) our fingerprint feature: four weight values (Quaternion value) is combined, the invention proposes the row's of falling texts to reduce by half It searches for matching algorithm (inverted file&binary-based Search Matching), its step are as follows:

(4.1) 3600 dimension fingerprint vectors are combined into 900 word, as Bag-Words, each word value by formula (3) Range is 0-255；

(4.2) the literary queue of the row of falling is established；Each video finger print is sequentially inserted into down from small to large by first word size It arranges in literary queue, such as first word is identical, that is so continued on, by the value ascending order arrangement of second word until all Original video fingerprint be inserted into down in the literary queue of row；First fingerprint is constituted with the video finger print and video information of the literary rule compositor of the row of falling Database；

(4.3) Binary searches matching process；Assuming that the Bag-Words sequence of uncertified video fingerprint is AuBW_i, i=1, 2,…,900；Specific compromise search step is as follows:

(4.3.1): the record in all metadatabases is put on and does not look into label；

(4.3.2): taking its first word is AuBW₁, AuBW is searched in compromise in the literary queue of the row of falling₁, the result of lookup can Three kinds of situations can be will appear:

A1) there was only a record；The Bag-Words in the record is so reduced into four weight values fingerprint MeF_i, reduction side Method is that each word removes 4 remainders；It is asked to normalize Hamming distance from d by formula (6):

Here i=1,2 ..., L, L are fingerprint length, and AuF is the fingerprint for authenticating video；Then it is asked by (7) formula Value；

As T=0, poll-final shows that the corresponding video of this yuan record is exactly the video for needing to authenticate；Work as T=1 When, the position for writing down the metadata and Hamming distance by the record from and putting on and looked into label；As T=2, only by the record It puts on and has looked into label；

A2) there is a plurality of record；The Hamming distances of all these records is calculated from while marking these records by (6) formula On looked into label；Take minimum Hamming distance from by (7) formula progress evaluation, as T=0, poll-final shows that this yuan records institute Corresponding video is exactly the video for needing to authenticate；The position for writing down the metadata as T=1 and Hamming distance are from working as T=2 When, with no treatment, it is directly entered in next step；

A3 it) does not record；With no treatment, it is directly entered in next step；

(4.3.3): taking its i-th of word is AuBW_i, i=2,3 ..., K；AuBW is searched in compromise in the literary queue of the row of falling_i, The result of lookup is it is possible that four kinds of situations；It should be noted that K here is a unknown number, but centainly meet K≤L/m；m For the length of word, m=4 herein；

B1 several) indicate the record for having looked into label；Such case is directly entered in next step；

B2) only have one and do not indicate the record for having looked into label；Such case is pressed and the processing of A1 in (4.3.2)) situation；

B3) there is a plurality of record for not indicating and having looked into label；Such case is pressed and the processing of A2 in (4.3.2)) situation；

B4 it) does not record；In this case by A3 in (4.3.2)) situation processing；

Repeat (4.3.3), until occur T=0 or all record all put on looked into label until；

(4.3.4): if the first two step is that T=0 situation do not occur, only two kinds of situations occur:

C1) at least one record meets T=1；Such case takes the smallest Hamming distance to record from that member, this The video that member record exactly needs to authenticate；Poll-final；

C2) meet T=1 without a record；Such case shows that the video of certification not in metadatabase, issues refusal Information；Poll-final.

Poll-final.

The invention has the advantages that

A. select the intermediate region of video frame as the object to take the fingerprint, this is characterized with the finger print using the mankind The theory of different people it is consistent, while doing so the data operation quantity that can reduce fingerprint extraction process, improve fingerprint Extraction rate.

B. we characterize the difference in video frame region using four weight values, more smart than with two-value Hash, three weight values characterization Carefully, more rationally, to also improve certification discrimination.

C. we store fingerprint metadata library using Bag-words form, save 75% memory space.

D. using the literary binary search algorithm of the row of falling, lookup matching speed is improved.

Detailed description of the invention

Fig. 1 is image block schematic diagram of the invention.

Fig. 2 is that video finger print of the invention extracts flow chart schematic diagram.

Fig. 3 is that the present invention works as Th_SFWhen=0.426, five in video display " 28 Weeks Later " segment are continuous different Scene frame.

Fig. 4 aTh_SFAcquired five different scenes frame when=0.40.Fig. 4 b is Th_SFIt is acquired when=0.412 Five different scenes frames.Fig. 4 c is Th_SFAcquired five different scenes frame when=0.44.Fig. 4 d is Th_SFWhen=0.452 Acquired five different scenes frame.

Fig. 5 is that video finger print of the invention matches architecture diagram.

Specific embodiment

The present invention is further illustrated with reference to the accompanying drawing.

Video authentication method based on scene frame fingerprint of the invention, comprising the following steps:

1), to the pretreatment of the frame of video；

(1.4) by image scaling at 3/4QCIF size (QCIF (144 × 176 pixel)).

2), to by pretreated video frame carry out fingerprint extraction, process as shown in Fig. 2 in Figure of description, including Following steps:

(2.1) as shown in Figure of description 1, to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 area In domain, a to h is being averaged for local pixel；So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion； (2) four difference elements a-b, c-d, e-f and g-h；720 frame elements are always obtained, wherein 144 mean value elements, are denoted as A member Element, 576 difference elements, is denoted as D element；

(2.3) threshold value ThA is dynamically sought, including the following steps:

(2.3.1) takes a_i=abs (A_i- 128), abs () is the operator that takes absolute value, by a_iA is arranged in by ascending order_k={ a₁, a₂,…,a_k,…,a_N}；Here index i and index k be not identical；

(2.5) threshold value ThD is dynamically sought, including the following steps:

word_i=4³*x_(i-1)*4+1+4²*x_(i-1)*4+2+4*x_(i-1)*4+3+x_(i-1)*4+4 (3)

mean(F)<Th_BS (4)

d(SF_i-1,F_i)≥Th_SF, i=2 ..., 5 (5)

It is to work as Th as shown in attached drawing 3 in specification_SFWhen=0.426, five in video display " 28 Weeks Later " segment Continuous different scene frame.When taking different decision thresholds, the differentiation of scene frame difference, as shown in Figure of description 4, Fig. 4 a is as threshold value Th_SFAcquired five different scenes frame when=0.40.Fig. 4 b is as threshold value Th_SFInstitute when=0.412 The five different scenes frames obtained.Fig. 4 c is as threshold value Th_SFAcquired five different scenes frame when=0.44.Fig. 4 d is to work as Threshold value Th_SFAcquired five different scenes frame when=0.452.

3) foundation in video finger print library；The user information, product information and finger print information of copyright authentication video will be needed to tie up It is scheduled on a record, generates metadata (meta data), collection of metadata constitutes metadatabase, it is advised by the row's of pushing down text It is then ranked up and stores, the Meta Fingerprint Database in Figure of description 5 is that the video that we are established refers to Line library；

4) our fingerprint feature: four weight values (Quaternion value) is combined, the invention proposes the row's of falling texts to reduce by half It searches for matching algorithm (inverted file&binary-based Search Matching), if Figure of description 5 is video Fingerprint matching architecture diagram, the figure illustrate macroscopical matching process of fingerprint matching, and its step are as follows:

(4.3.3): taking its i-th of word is AuBW_i, i=2,3 ..., K；AuBW is searched in compromise in the literary queue of the row of falling_i, The result of lookup is it is possible that four kinds of situations；It should be noted that K here is a unknown number, but centainly meet K≤L/m； M is the length of word, herein middle m=4；

B4 it) does not record；In this case by A3 in (4.3.2)) situation processing；

Claims

1. the video authentication method based on scene frame fingerprint, comprising the following steps:

1), to the pretreatment of the frame of video；

(1.2) video frame surrounding is sheared, video frame central part is retained；It is scaled to the fixed ruler with W × H pixel size again It is very little；

(1.4) by image scaling at 3/4QCIF size, QCIF is having a size of 144 pixels × 176 pixels image；

(2.1) to pretreated video frame is passed through, piecemeal is carried out, in one 9 × 11 region, a to h is the flat of local pixel ?；So frame element extraction method are as follows: (1) the mean value element of entire 9 × 11 subregion；(2) four difference element a-b, c-d, E-f and g-h；720 frame elements are always obtained, wherein 144 mean value elements, are denoted as element A, 576 difference elements are denoted as D member Element；

(2.2) four weight values are quantized into element A；For the element A of 1-144 dimension, if A_iFor element A value, using formula (1) this A little element As are quantized into four weight values x_i:

(2.3) threshold value ThA is dynamically sought, including the following steps:

(2.4) to D Quantification of elements at four weight values；For the D element D of 145-720 dimension_i, they are quantized into four using formula (2) Weight values x_i:

(2.5) threshold value ThA is dynamically sought, including the following steps:

(2.5.1) takes d_i=abs (D_i), by d_iD is arranged in by ascending order_k={ d₁,d₂,…,d_k,…,d_N}；Here index i and rope It is not identical to draw k；

If word_i, i=1,2 ..., 180, which are defined as every 4- dimension element, accounts for 1 coding unit, and this coding mode is using following public Formula is calculated:

word_i=4³*x_(i-1)*4+1+4²*x_(i-1)*4+2+4*x_(i-1)*4+3+x_(i-1)*4+4 (3)

Mean (F) < Th_BS (4)

(2.7.2) whether be scene frame judgement；Assuming that the fingerprint of previous scenario frame is SF_i-1, the fingerprint of present frame is F_i, i= 2,…,5；If formula (5) is set up, decide that present frame is another scene frame, otherwise present frame or previous scenario frame；

d(SF_i-1,F_i)≥Th_SF, i=2 ..., 5 (5)

Here d (SF_i-1,F_i) indicate present frame fingerprint F_iPrevious scenario frame fingerprint SF_i-1The distance between, Th_SFFor decision threshold；

3) foundation in video finger print library；The user information, product information and finger print information of copyright authentication video will be needed to be bundled in On one record, metadata meta data is generated, collection of metadata constitutes metadatabase, it is carried out by the row's of pushing down text rule It sorts and stores；

4) fingerprint feature: four weight values Quaternion value is combined, the literary Binary searches matching algorithm of row is proposed down Inverted file&binary-based Search Matching, its step are as follows:

(4.1) 3600 dimension fingerprint vectors are combined into 900 word, as Bag-Words, each word value range by formula (3) For 0-255；

(4.2) the literary queue of the row of falling is established；Each video finger print is sequentially inserted into down row's text by first word size from small to large In queue, such as first word is identical, that is so continued on, by the value ascending order arrangement of second word until all originals Video finger print is inserted into down in the literary queue of row；First finger print data is constituted with the video finger print and video information of the literary rule compositor of the row of falling Library；

(4.3) Binary searches matching process；Assuming that the Bag-Words sequence of uncertified video fingerprint is AuBW_i, i=1,2 ..., 900；Specific compromise search step is as follows:

(4.3.2): taking its first word is AuBW₁, AuBW is searched in compromise in the literary queue of the row of falling₁, the result of lookup may There are three kinds of situations:

A1) there was only a record；The Bag-Words in the record is so reduced into four weight values fingerprint MeF_i, restoring method is every A word removes 4 remainders；It is asked to normalize Hamming distance from d by formula (6):

Here i=1,2 ..., L, L are fingerprint length, and AuF is the fingerprint for authenticating video；Then evaluation is carried out by (7) formula；

As T=0, poll-final shows that video corresponding to the record is exactly the video for needing to authenticate；As T=1, write down The position of the record and Hamming distance by the record from and putting on and looked into label；As T=2, only the record is put on and has been looked into Label；

A2) there is a plurality of record；The Hamming distances of all these records is calculated from while putting on these records by (6) formula Look into label；Take minimum Hamming distance from by (7) formula progress evaluation, as T=0, poll-final shows corresponding to the record Video is exactly the video for needing to authenticate；The position for writing down the record as T=1 and Hamming distance be not from appointing as T=2 Where reason, is directly entered in next step；

(4.3.3): taking its i-th of word is AuBW_i, i=2,3 ..., K；AuBW is searched in compromise in the literary queue of the row of falling_i, search Result it is possible that four kinds of situations；It should be noted that K here is a unknown number, but centainly meet K≤L/m；Herein Locate m=4；

B4 it) does not record；In this case by A3 in (4.3.2)) situation processing；

C1) at least one record meets T=1；Such case takes the smallest Hamming distance from that member record, this member note Record the video for exactly needing to authenticate；Poll-final；

C2) meet T=1 without a record；Such case shows the video of certification not in metadatabase, issues refusal letter Breath；Poll-final.