CN101835040B

CN101835040B - Digital Video Source Forensics Method

Info

Publication number: CN101835040B
Application number: CN 201010126186
Authority: CN
Inventors: 苏育挺; 张静; 徐俊瑜
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2010-03-17
Filing date: 2010-03-17
Publication date: 2012-07-04
Anticipated expiration: 2030-03-17
Also published as: CN101835040A

Abstract

The invention belongs to the technical field of digital video detection, and in particular relates to a digital video source evidence forensics method. The method comprises the following steps of: establishing a video sequence sample library used for training; calculating the activity of each of image groups by using an activity and complexity analysis module, and dividing the activity into high activity, medium activity and low activity; decoding the video sequence partially, acquiring various information in a compressed domain thereof, and respectively extracting three characteristics, namely a code rate characteristic, a texture characteristic and a motion vector characteristic; establishing three classifiers, and classifying the image groups with different activities by adopting different classifiers; reading the video sequence of a video source to be detected, acquiring a characteristic vector of each image group of the video sequence, selecting a corresponding classifier according to the activity thereof, and giving a classification result; and synthesizing the classification results of all image groups of the video sequence to perform final judgment. The method has high pertinence and high practicability, and is not easy to attack.

Description

Digital video source evidence forensics method

Technical field

The invention belongs to digital video resource information safe practice field, be specifically related to a kind of digital video source evidence forensics method.

Background technology

The beginning of this century; During digital video has been widely used in daily life and has worked; Meanwhile, developing rapidly of video editing and process software makes the video amateur utilize these video editing instruments more easily; Revise video content and produce the digital video of mixing the spurious with the genuine, this has overturned the traditional concept of people " seeing is believing ".Be used to formal medium, scientific discovery, insurance and court's exhibit etc. if digital video is distorted and forged, will produce significant effects politics and social stability.One tame TV station of Czech Republic in 2007 has play one section not through the strict video of examining; Make ten hundreds of spectators see the false picture of bohemia generation nuclear explosion; Almost cause society panic, it is formed by Bohemian local outdoor scene and nuclear mushroom cloud video-splicing in fact.On the other hand, along with the development of networking, various video sharing website has appearred, for example YouTube, excellent cruel.These Online Video resources and traditional video have very big difference, its more privatization, with a low credibility.Therefore how effectively these multimedia resources of supervision and management have become the key of keeping information industry health, stable development.

Amplification along with the network bandwidth; The digital video resource just progressively replaces the status of literal and rest image; Become the main flow of network information resource; And the appearance of video editing software easily makes that the video technology of distorting is popular, and in this case, the forensic technologies of digital video resource becomes a focus of information security field.The digital video forensic technologies comprises initiatively evidence obtaining and two kinds of passive evidence obtainings, and whether they can pass through to distort to video is carried out authentication, and different applications is still arranged separately.It is the anti-counterfeiting technology of representative that existing active forensic technologies comprises with the robust digital watermark, is the tamperproof technology of representative with the fragile digital watermark, and is the authentication techniques of representative with digital finger-print, digital signature.The basic ideas that these technology are adopted all are through adding additional information digital video to be carried out the authenticity and integrity discriminating.But present case is not contain digital watermarking or digital digest in the DV of the overwhelming majority.Along with video is forged the technology of distorting and developed rapidly, initiatively forensic technologies is owing to receive the restriction of application conditions, can't contain fundamentally that video distorts development, and passive forensic technologies research is more paid attention in digital video evidence obtaining now.

Current research about the passive forensic technologies of video concentrates on two aspects: detection is detected and distorts in the digital video source.It is the first step of medium authentication that the digital video source is detected, and the information of relevant digital video collection, processing, output equipment (like digital video camera) mainly is provided, and in brief, analyzes and answer the problem of " coming from what " and " how producing " exactly.The digital video source evidence forensics technology only needs the participation of evidence obtaining side just can implement evidence obtaining; Can accomplish by evidence obtaining side is independent; Be that digital video source evidence forensics is directly to differentiate according to medium itself, do not need in advance digital video to be done any preliminary treatment as adding digital watermarking that practicality is stronger.The simplest a kind of method is checked the video file header exactly; Devices such as general digital camera can write information such as relevant system information, camera type, coding mode, date and time in video file packet header; But these information all are easy to be changed, and are with a low credibility; Another kind method is exactly to utilize the inherent attribute of encoder and the statistical property of output video code stream to differentiate its source, and this method is with a high credibility, vulnerable not.

Summary of the invention

The object of the present invention is to provide a kind of digital video source evidence forensics method with a high credibility, not pregnable; This technology is not needing other supplementarys (for example shifting to an earlier date embed watermark, video heads file etc.); Single from video encode stream analysis itself, identify this video file by which video camera or software encoder output, this method is with strong points; Practical, vulnerable not.For this reason, the present invention adopts following technical scheme.

A kind of digital video source evidence forensics method may further comprise the steps:

(1) set up the sample storehouse that is used to train, the sample storehouse comprises video sequence that obtains after some are taken by video camera and the video sequence that is produced by various software encoder codings, and these video sequences all are the original compression sequences;

(2) video sequence at first through activity and analysis of complexity module, calculates the activity of each image sets, and utilizes double threshold that it is divided into high activity, middle activity, low activity, and its concrete computational methods are:

I. at first according to formula (1) calculate the luminance component of every adjacent two frames in an image sets (GOP) lining energy difference fd (x, y):

fd(x，y)＝|f ₁(x，y)-f ₂(x，y)| (1)

In the formula, f ₁(x, y) and f ₂(x y) represents respectively in the 1st frame and the 2nd frame and is positioned at (x, y) the DC coefficient value of the luminance block of position;

Ii. calculate total average energy difference Fd then:

Fd = \frac{1}{M} \underset{x}{Σ} \underset{y}{Σ} fd (x, y) - - - (2)

In the formula, M representes the number of piece in the frame, and (x y) is the energy difference that calculates in the step 1) to fd;

Iii. calculate the energy variance of an image sets at last according to formula (3), and utilize it to decide this fragment to belong to high activity image sets, middle activity image sets or low activity image sets:

Z = \frac{1}{n - 1} Σ_{1}^{n} {| Fd (i) |}^{2} - - - (3)

In the formula, Fd (i) is that the average energy of every adjacent two frames is poor, and i is a frame index number, and n is a frame number that image sets comprises; Define two threshold T at last ₁And T ₂(T ₁＜T ₂), if Z＞T ₂Then it is labeled as high activity image sets; If T ₂＞Z＞T ₁Activity image sets in then it being labeled as; Otherwise with the low activity image sets of its mark;

(3) then video sequence is carried out partial decoding of h, obtain the various information of its compression domain, and extract three category features respectively: code check characteristic, textural characteristics, motion vector characteristic;

Iv. wherein the code check characteristic is made up of following 7 stack features amounts, and this patent is with NB _IThe code check of representing I frame in the image sets, NB _p(i) code check of i P frame in image sets of expression, NB _B(j) code check of j B frame in image sets of expression:

A) M, N, they are respectively the frame number of P frame in the image sets and the frame number of B frame;

B) NB _I, it is the code check of I frame in the image sets;

C) RPI, it is the average bit rate of P frame in the image sets and the ratio of the code check of I frame, its computing formula is following:

RPI = \frac{\frac{1}{M} Σ_{i = 1}^{M} {NB}_{P} (i)}{{NB}_{I}} - - - (4)

D) RBI, it is the average bit rate of B frame in the image sets and the ratio of the code check of I frame, its computing formula is following:

RBI = \frac{\frac{1}{N} Σ_{i = 1}^{N} {NB}_{B} (i)}{{NB}_{I}} - - - (5)

E) RAP, RVP, they are respectively the average and the variance of the relative difference of adjacent two P frame code checks in the image sets:

{RA}_{P} = \frac{1}{M - 1} Σ_{j = 1}^{M - 1} D_{P} (j) - - - (6)

{RV}_{P} = \frac{1}{M - 1} Σ_{j = 1}^{M - 1} {(D_{P} (j) - {RA}_{P})}^{2} - - - (7)

Wherein, D _P(j) be the relative difference of adjacent two P frame code checks, computing formula is following:

D_{P} (j) = \frac{| {NB}_{P} (j + 1) - {NB}_{P} (j) |}{{NB}_{P} (j)} j = 1,2, . . . M - 1 - - - (8)

F) RAB, RVB, they are respectively the continuous average and the variances of the relative difference of B frame code check in twos in the image sets:

{RA}_{B} = \frac{1}{N - 1} Σ_{j = 1}^{N - 1} D_{B} (j) - - - (9)

{RV}_{B} = \frac{1}{N - 1} Σ_{j = 1}^{N - 1} {(D_{B} (j) - {RA}_{B})}^{2} - - - (10)

Wherein, D _B(j) be the continuous relative difference of B frame code check in twos, computing formula is following:

D_{B} (j) = \frac{| {NB}_{B} (j + 1) - {NB}_{B} (j) |}{{NB}_{B} (j)} j = 1,3, N - 1 - - - (11)

G) RDIP, it is the ratio of the I frame code check difference and the P frame code check difference of two adjacent images group, its computing formula is following:

RDIP = \frac{I 2 - I 1}{P 2 - P 1} - - - (12)

In the formula, I1 is the I frame code check of previous image sets, and P1 is the code check of first P frame of adjacent I1; I2 is the I frame code check of present image group, and P2 is the code check of first P frame of adjacent I2;

V. 7 stack features amounts below wherein textural characteristics has comprised are with Q (i) _kBe illustrated in the quantization parameter of i macro block in the frame of video of k type, the K frame is a kind of in I frame, P frame, the B frame; QS (i) _kBe illustrated in the k type frame, i has the macro block number of identical quantization parameter continuously; With QD (i) _kRepresent the quantization parameter difference of i to adjacent two macro blocks:

A) QA _k, QV _k, k ∈ I, and P, B}, it is in an image sets, the Q of k type frame (i) _kAverage and variance;

B) QMA _k, QMI _k, k ∈ I, and P, B}, it is in an image sets, the QS of the frame of k type (i) _kMaximum and minimum value;

C) QSA _k, QSV _k, k ∈ I, and P, B}, it is in an image sets, the QS of the frame of k type (i) _kAverage and variance;

D) QMD _k, k ∈ I, and P, B}, it is in an image sets, the QD of the frame of k type (i) _kMaximum;

E) QAD _k, QVD _k, k ∈ I, and P, B}, it is in an image sets, the QD of the frame of k type (i) _kAverage and variance;

F) ADI, it is that absolute frame between the two adjacent images group I frame is poor;

G) HEP _k, k ∈ I, and P, B}, the high-frequency energy of all kinds frame accounts for the ratio of integral energy in its image sets;

Vi. a few stack features amounts below wherein the motion vector characteristic has comprised are with MV (k; X, y) expression is positioned at k type frame (x, the y) motion vector of the macro block of position, MVH (k; X, y), MVV (k; X y) is its level and vertical component respectively:

A) MX, MY, it is motion vector MV (k; X, level y) and the maximum of vertical component;

B) MZ, it is static macro block characteristics amount:

MZ = \frac{MM + MS}{2} - - - (13)

Wherein, MM and MS definition is as follows:

MM = \min_{n} (Σ_{x = 1}^{8} Σ_{y = 1}^{8} | X_{M} (x, y; n) - X_{M}^{R} (x, y; n) |) n = 1,2, . . . - - - (14)

MS = \max_{n} (Σ_{x = 1}^{8} Σ_{y = 1}^{8} | X_{S} (x, y; m) - X_{S}^{R} (x, y; m) |) m = 1,2, . . . - - - (15)

In the formula, X _M(x, y; N) be (x, y) locational pixel value, the X of n motion macro block of present frame _M ^R(x, y; N) be pixel value on its reference frame relevant position; Similarly, X _S(x, y; M) be (x, y) locational pixel value, the X of present frame n static macro block _S ^R(x, y; M) be its pixel value on the relevant position in reference frame;

C) MAX _k, MAY _k, MDX _k, MDY _k, { they are respectively average and the variance of relative error on level and vertical component of motion vector to k ∈ for P, B}, and this relative error is meant the motion vector MV that current decoding obtains, and (x is y) with optimal motion vector MV ₀(x, the distance between y), this optimal motion vector MV ₀(x is to utilize a global search algorithm based on TM5 to draw y);

The relative error computing formula of its horizontal component and vertical component is following:

F_{H} (k; x, y) = | \frac{MVH (k; x, y) - {MVH}_{0} (k; x, y)}{{MVH}_{0} (k; x, y)} | - - - (16)

F_{V} (k; x, y) = | \frac{MVV (k; x, y) - {MVV}_{0} (k; x, y)}{{MVV}_{0} (k; x, y)} | - - - (17)

MVH in the formula ₀(k; X, y), MVV ₀(k; X y) is the optimal motion vector MV of K type predictive frame ₀(x, level y) and vertical component;

D) MC, the matching criterior characteristic quantity:

MC = \frac{1}{m} \underset{x}{Σ} \underset{y}{Σ} R_{m} (x, y) - - - (18)

Wherein, R _m(x, y) be positioned in m the P frame (it is defined as for x, the matching attribute of macro block y):

R (x, y) = \{\begin{matrix} 1 & if \min_{i, j} (MAE (i + MVH, j + MVV) = MAE (MVH, MVV) i, j = - 1,0,1 \\ 0 & otherwise \end{matrix} - - - (19)

In the formula, MAE (x, y) function be calculate current macro and motion vector (x, y) reference macroblock of indication between mean absolute difference;

(4) set up 3 graders; And train respectively: to the image sets of different activity; Adopt different graders,, only use the motion vector characteristic for high activity image sets; Only utilize code check characteristic and textural characteristics during low activity image sets, use three stack features amounts to classify simultaneously during middle activity image sets;

(5) read the video sequence of source video sequence to be detected, for its each image sets, repeating step (2) to (4) obtains the characteristic vector of image sets, selects respective classified device and provides classification results according to its activity;

(6) classification results of all images group of comprehensive video sequence carries out conclusive judgement.

The present invention is mainly in order to satisfy when video resource is collected evidence; Know that video segment is the needs of being recorded by the video camera of what type; And a kind of source detection technique of design; It has effectively verified the authenticity of video, accomplishes the first step of medium authentication, for judiciary provides evidence.Its evident characteristic of source video sequence forensic technologies of the present invention comprises:

(1) practicality: the participation that the source video sequence forensic technologies only need the side of evidence obtaining just can be implemented evidence obtaining; Can accomplish by evidence obtaining side is independent; Be that the source evidence forensics technology is directly to come differential coding device according to the built-in attribute and the statistical property of video code flow; Do not need in advance digital video to be done any preliminary treatment as at the encoder-side embed digital watermark, practical.

(2) real-time: all be to extract characteristic quantity in compression field in characteristic extracting module, promptly only need partial decoding of h, the system resource and the time that need are all few especially.

(3) novelty: in activity and analysis of complexity module; Each image sets (GOP) is carried out the activity judgement respectively and is divided into high activity, middle activity, low activity; Video to different activity adopts different character amount and grader; With strong points, the various video resources of adaptation that so just can be flexible and changeable have improved the accuracy that source video sequence detects greatly.

Description of drawings

Fig. 1 is the flow chart of whole video source evidence forensics of the present invention;

Fig. 2 is the flow chart of characteristic extracting module of the present invention;

Fig. 3 is the flow chart of comprehensive decision device of the present invention.

Embodiment

Source video sequence forensic technologies of the present invention mainly is made up of several sections such as activity and analysis of complexity module, characteristic extracting module, grader, comprehensive decision devices.Fig. 1 has described the process of whole video source evidence forensics.In steps A, to calculate the activity of each GOP through activity and analysis of complexity module, and utilize double threshold that it is divided into high activity, middle activity, low activity, its computational methods are:

At first calculate the energy difference of the luminance component of every adjacent two frames in the GOP, its computing formula is following:

fd(x，y)＝|f ₁(x，y)-f ₂(x，y)| (1)

F in the formula ₁(x, y) and f ₂(x, y) represent respectively in the 1st frame and the 2nd frame and be positioned at (then total average energy difference is for x, y) the DC coefficient value of the luminance block of position:

Fd = \frac{1}{M} \underset{x}{Σ} \underset{y}{Σ} fd (x, y) - - - (2)

M representes the number of piece in the frame in the formula.Through calculating the energy variance of an image sets (GOP), decide this fragment to belong to high activity GOP, middle activity GOP or low activity GOP then, its computing formula is following:

Z = \frac{1}{n - 1} Σ_{1}^{n} {| Fd (i) |}^{2} - - - (3)

Fd in the formula (i) is in last a one-step process, and the average energy of every adjacent two frames that calculate is poor, and i is a frame index number, and n is a frame number that GOP comprises.Define two threshold T at last ₁And T ₂(T ₁＜T ₂), if Z＞T ₂Then it is labeled as high activity GOP; If T ₂＞Z＞T ₁Activity GOP in then it being labeled as; Otherwise with the low activity GOP of its mark.

At step B, utilize characteristic extracting module at first to realize data decode, obtaining three types of initial data and then extracting three category features is code check characteristic, textural characteristics, motion vector characteristic.At step C, select corresponding characteristic vector and grader that the testing image group is classified based on the result of activity and analysis of complexity module.At step D, the classification results of comprehensive all images group utilizes the method for maximal possibility estimation to make final justice.

Fig. 2 has described the flow chart of characteristic extracting module.At first utilize data-analyzing machine that video resource is carried out partial decoding of h, and obtain three types of initial data.Be input to these three types of initial data respectively in code check feature extractor, texture feature extraction device, the motion vector feature extractor then, export three category features vector.

1. code check characteristic vector: Data Rate Distribution is the first step of Rate Control strategy, can fully reflect the different thinkings of encoder designer.Before coding one frame, bit rate controller can come the bit number of preassignment present frame according to a lot of parameters, like frame type, and the bit number of former same type frame, current buffer conditions, complexity of present frame or the like.Conversely, the storage size of every type of frame in obtaining each image sets (GOP) (have 3 kinds be I frame, P frame, B frame) is a code check, and we can extract some characteristic quantities and come reverse modeling to embody the difference of encoder.We use NB _IThe code check of representing I frame among the GOP, NB _P(i) code check of i P frame among GOP of expression, NB _B(j) code check of j B frame among GOP of expression.

(1) frame number of P frame and B frame among M, the N, GOP, its key reaction the frame structure of an image sets.

(2) code check of I frame among the NBI, GOP, it has mainly embodied the code check benchmark of encoder.

(3) ratio of the average bit rate of P frame and the code check of I frame among the RPI, GOP, it has reflected the code check pre-distribution scheme in the encoder.Its computing formula is following:

RPI = \frac{\frac{1}{M} Σ_{i = 1}^{M} {NB}_{P} (i)}{{NB}_{I}} - - - (4)

(4) ratio of the average bit rate of B frame and the code check of I frame among the RBI, GOP, it has reflected the code check pre-distribution scheme in the encoder.Its computing formula is following:

RBI = \frac{\frac{1}{N} Σ_{i = 1}^{N} {NB}_{B} (i)}{{NB}_{I}} - - - (5)

(5) average and the variance of the relative difference of adjacent two P frame code checks among RAP, the RVP, GOP, it reflects the fine-tuning capability of encoder to the P frame.

{RA}_{P} = \frac{1}{M - 1} Σ_{j = 1}^{M - 1} D_{P} (j) - - - (6)

{RV}_{P} = \frac{1}{M - 1} Σ_{j = 1}^{M - 1} {(D_{P} (j) - {RA}_{P})}^{2} - - - (7)

D wherein _P(j) be the relative difference of code check, computing formula is following:

D_{P} (j) = \frac{| {NB}_{P} (j + 1) - {NB}_{P} (j) |}{{NB}_{P} (j)} j = 1,2, . . . M - 1 - - - (8)

(6) RAB, RVB, the continuous average and the variance of the relative difference of B frame code check in twos among the GOP, it reflects the fine-tuning capability of encoder to the B frame.

{RA}_{B} = \frac{1}{N - 1} Σ_{j = 1}^{N - 1} D_{B} (j) - - - (9)

{RV}_{B} = \frac{1}{N - 1} Σ_{j = 1}^{N - 1} {(D_{B} (j) - {RA}_{B})}^{2} - - - (10)

D wherein _B(j) be the relative difference of code check, computing formula is following:

D_{B} (j) = \frac{| {NB}_{B} (j + 1) - {NB}_{B} (j) |}{{NB}_{B} (j)} j = 1,3, N - 1 - - - (11)

(7) RDIP, the ratio of the I frame code check difference of adjacent two GOP and P frame code check difference, its computing formula is following:

RDIP = \frac{I 2 - I 1}{P 2 - P 1} - - - (12)

Wherein I1 is the I frame code check of previous GOP, and P1 is the code check of first P frame of adjacent I1; In like manner, I2 is the I frame code check of current GOP, and P2 is the code check of first P frame of adjacent I2.

The code check characteristic vector that these characteristic quantities are formed can reflect effectively that the Rate Control strategy can be differentiated the video segment of low activity effectively in the difference of aspects such as preassignment, fine setting in the encoder.For example, if coder side overweights the quality that improves spatial discrimination, it can strengthen the code check of I frame so, and RPI will descend; Otherwise if lay particular emphasis on the quality that improves temporal resolution, it will improve the code check of P frame so, and RPI will promote.For another example, the relative difference of code check can reflect the Data Rate Distribution fine-tuning capability of different coding device to consecutive frame, and some simple or real-time encoder is the Data Rate Distribution that can not change adjacent B frame basically.

2. texture feature vector: the different coding device adopts the different code rate control strategy according to concrete needs, but the method for its stable code stream but is the same, promptly is to adjust three parameters---the coding mode of quantization parameter, frame per second, interframe block.Latter two parameter all be used for handling abnormal situation such as buffering area unusual, changing quantization parameter is the main means that realize the Rate Control target.The rule change of quantization parameter can reflect the difference of encoder equally.At first utilize data-analyzing machine that video data is carried out partial decoding of h, obtain the quantization parameter distribution situation of every frame among the GOP, propose various characteristic quantities then.We are with Q (i) _kBe illustrated in the quantization parameter of i macro block in the frame of video (I frame, P frame or B frame) of k type; QS (i) _kRepresent that i has the macro block number of identical quantization parameter continuously; With QD (i) _kRepresent the quantization parameter difference of i to adjacent two macro blocks.

(1) QA _k, QV _k, k ∈ I, and P, B}, it is in an image sets (GOP), average and the variance of the Q of k type frame (i).

(2) QMAk, QMIk, k ∈ I, and P, B}, it is in a GOP, the QS of the frame of k type (i) _kMaximum and minimum value.

(3) QSAk, QSVk, ∈ I, and P, B}, it is in a GOP, the QS of the frame of k type (i) _kAverage and variance.

(4) QMDk, k ∈ I, and P, B}, it is in a GOP, the QD of the frame of k type (i) _kMaximum.

(5) QADk, QVDk, k ∈ I, and P, B}, it is in a GOP, the QD of the frame of k type (i) _kAverage and variance.

(6) ADI, the absolute frame between adjacent two I frames is poor.

(7) HEPk, { high-frequency energy of all kinds frame accounts for the ratio of integral energy to k ∈ among the B}, GOP for I, P.

These characteristic quantities have been combined to form texture feature vector, and it can reflect the difference of the Rate Control strategy in the encoder.For example, if the Rate Control strategy lays particular emphasis on the quality of entire frame, it will change the quantization parameter of adjacent macroblocks seldom significantly so; Otherwise if tend to stably bit rate output stream, its fine setting mechanism that will induce one so is according to removing to modulate the quantization parameter of current macro when the capacity of anterior bumper and the quantization parameter of previous macro block.

3. motion vector characteristic vector: motion estimation algorithm is the nucleus module of inter prediction encoding technology, accounts for 70% of whole coded system computational load greatly, is the key that improves the whole system performance.Encoder designer can be weighed coding real-time and code efficiency according to concrete needs when the design encoder.Therefore, its coding emphasis is different, is embodied on the motion vector Distribution Statistics also just different.It obtains the motion vector difference situation of each predictive frame among the GOP (being P frame, B frame) at first to utilize data analysis, proposes various characteristic quantities then.We are with MV (k; X, y) expression is positioned at k type frame (x, the y) motion vector of the macro block of position, MVH (k; X, y), MVV (k; X y) is its level and vertical component respectively.

(1) MX, MY, it is motion vector MV (k; X, level y) and the maximum of vertical component.

(2) MZ, static macro block characteristics amount, it has reacted motion estimation algorithm and has judged whether current macro is the threshold value of static macro block.So-called static macro block is meant that motion vector is those zero inter macroblocks in P frame or B frame.

MZ = \frac{MM + MS}{2} - - - (13)

Wherein MM and MS definition is as follows:

MM = \min_{n} (Σ_{x = 1}^{8} Σ_{y = 1}^{8} | X_{M} (x, y; n) - X_{M}^{R} (x, y; n) |) n = 1,2, . . . - - - (14)

MS = \max_{n} (Σ_{x = 1}^{8} Σ_{y = 1}^{8} | X_{S} (x, y; m) - X_{S}^{R} (x, y; m) |) m = 1,2, . . . - - - (15)

X in the formula _M(x, y; N) be (x, y) locational pixel value, the X of n motion macro block of present frame _M ^R(x, y; N) be pixel value on its reference frame relevant position.Similarly, X _S(x, y; M) be (x, y) locational pixel value, the X of present frame n static macro block _S ^R(x, y; M) be its pixel value on the relevant position in reference frame.

(3) MAX _k, MAY _k, MDX _k, MDY _k, { P, B} are respectively the average and the variances of the relative error of motion vector level and vertical component to k ∈.The relative error here is meant the motion vector MV that current decoding obtains, and (x is y) with optimal motion vector MV ₀(x, the distance between y).In order to estimate the performance of motion estimation algorithm, utilize a global search algorithm to go to reappraise based on TM5 (MPEG-2TestModel 5), obtain optimum motion vector MV ₀(x, y).

Its horizontal component relative error computing formula is following:

F_{H} (k; x, y) = | \frac{MVH (k; x, y) - {MVH}_{0} (k; x, y)}{{MVH}_{0} (k; x, y)} | - - - (16)

Its vertical component relative error computing formula is following:

F_{V} (k; x, y) = | \frac{MVV (k; x, y) - {MVV}_{0} (k; x, y)}{{MVV}_{0} (k; x, y)} | - - - (17)

MVH in the formula ₀(k; X, y), MVV ₀(k; X y) is the optimal motion vector MV of K type predictive frame ₀(x, level y) and vertical component.

(4) MC, the matching criterior characteristic quantity.

MC = \frac{1}{m} \underset{x}{Σ} \underset{y}{Σ} R_{m} (x, y) - - - (18)

R wherein _m(x is to be positioned at (x, the matching attribute of macro block y) in m the P frame y).It is defined as:

R (x, y) = \{\begin{matrix} 1 & if \min_{i, j} (MAE (i + MVH, j + MVV) = MAE (MVH, MVV) i, j = - 1,0,1 \\ 0 & otherwise \end{matrix} - - - (19)

MAE in the formula (x, y) function be calculate current macro and motion vector (x, y) reference macroblock of indication between mean absolute difference.Be R _m(x is that 1 the meaning is that the mean absolute difference of current macro and its reference macroblock is minimum in 3 * 3 neighborhoods of reference macroblock y).MC can reflect the matching criterior that encoder adopts and the distance of mean absolute difference criterion.

The motion vector characteristic vector that this several characteristic amount is formed can reflect the difference of encoder in motion estimation algorithm.For example, though coding standard has been stipulated the maximum search window, practical encoder all can define an especially little search window, so that reduce computation complexity.And for example, real-time encoder can adopt a bigger static macro block threshold value, makes more macro block be judged to static macro block, thereby improves coding rate; And other encoder system can adopt less threshold value in order to make full use of predictive coding, yet can improve code efficiency greatly.

What Fig. 3 described is the flow chart of comprehensive decision device.This patent is a detecting unit with image sets (GOP), extract three category feature vectors after, select the respective classified device according to the activity of its image sets.To the GOP of different activity, adopt different mode classifications.Wherein grader 1 is that it only utilizes the motion vector characteristic vector to high activity image sets; Grader 2 is that it uses motion vector characteristic vector, code check characteristic vector and texture feature vector simultaneously to middle activity image sets; Grader 3 is that it only utilizes code check characteristic vector and texture feature vector to low activity image sets.Then remaining image sets is carried out identical operations, and provide the corresponding judgment result, the classification results of last comprehensive video sequence all images group utilizes the method for maximal possibility estimation to make conclusive judgement.

Before carrying out digital video source evidence forensics, at first to set up the sample storehouse that is used to train, it comprises video segment that obtains after some are taken by video camera and the video sequence that is produced by various software encoder codings, these video sequences all are the original compression sequences.Then each image sets in the sample storehouse is carried out partial decoding of h and feature extraction, obtain characteristic vector, and set up grader and utilize corresponding characteristic vector to train.And carry out the image sets of source evidence forensics for needs, also adopting uses the same method obtain its characteristic vector after, adjudicate with the respective classified device, and to provide this video sequence be the result who is produced by which encoder, thereby accomplish the detection in digital video source.

Claims

1. A digital video source evidence collection method, comprising the following steps:

(1) Set up a sample library for training. The sample library includes some video sequences obtained after being taken by the camera and video sequences encoded by various software encoders. These video sequences are all original compressed sequences;

(2) The video sequence first calculates the activity of each image group through the activity and complexity analysis module, and uses double thresholds to divide it into high activity, medium activity and low activity. The specific calculation method is:

i. First calculate the energy difference fd(x, y) of the luminance components of every two adjacent frames in a group of pictures (GOP) according to formula (1):

fd(x, y)=|f ₁ (x, y)-f ₂ (x, y)| (1)

In the formula, f ₁ (x, y) and f ₂ (x, y) respectively represent the DC coefficient value of the luminance block at the position (x, y) in the first frame and the second frame;

ii. Then calculate the total mean energy difference Fd:

In the formula, S represents the number of blocks in a frame, and fd(x, y) is the energy difference calculated in step (1);

iii. Finally, calculate the energy variance of an image group according to formula (3) and use it to decide whether the segment belongs to a high activity image group, a medium activity image group or a low activity image group:

In the formula, Fd(i) is the average energy difference between two adjacent frames, i is the frame index number, n is the number of frames contained in a picture group; finally define two thresholds T ₁ and T ₂ , T ₁ < T ₂ , if Z>T ₂ , mark it as a high-activity image group; if T ₂ >Z>T ₁ , mark it as a medium-activity image group; otherwise, mark it as a low-activity image group;

(3) Then partially decode the video sequence, obtain various information in its compressed domain, and extract three types of features: bit rate feature, texture feature, and motion vector feature;

i. wherein the code rate feature is composed of the following 7 groups of feature quantities, with NB _I representing the code rate of the I frame in a group of pictures, NB _p (i) representing the code rate of the i-th P frame in a group of pictures, and NB _B ( j) represents the code rate of the jth B frame in a picture group:

a) M, N, they are respectively the number of frames of the P frame and the number of frames of the B frame in a picture group;

b) NB _I , which is the code rate of the I frame in a group of pictures;

c) RPI, which is the ratio of the average code rate of the P frame to the code rate of the I frame in a picture group, and its calculation formula is as follows:

d) RBI, which is the ratio of the average code rate of the B frame in a picture group to the code rate of the I frame, and its calculation formula is as follows:

e) RA _P , RV _P , which are the mean and variance of the relative difference between the code rates of two adjacent P frames in an image group:

Among them, D _P (j) is the relative difference between the code rates of two adjacent P frames, and the calculation formula is as follows:

f) RA _B , RV _B , which are respectively the mean and variance of the relative difference between the code rates of two consecutive B frames in an image group:

Among them, D _B (j) is the relative difference between the code rates of two consecutive B frames, and the calculation formula is as follows:

g) RDIP, which is the ratio of the I frame code rate difference and the P frame code rate difference of two adjacent image groups, and its calculation formula is as follows:

In the formula, I1 is the I frame code rate of the previous picture group, P1 is the code rate of the first P frame next to I1; I2 is the I frame code rate of the current picture group, and P2 is the first P frame close to I2 The code rate of the P frame;

ii. wherein the texture features include the following 7 sets of feature quantities, Q(i) _k represents the quantization parameter of the i-th macroblock in a k-type video frame, and the K frame is one of the I frame, P frame, and B frame QS(i) _k represents the number of the i-th consecutive macroblocks with the same quantization parameter in the k-type frame; QD(i) _k represents the quantization parameter difference between the i-th pair of adjacent two macroblocks:

a) QA _k , QV _k , k∈{I, P, B}, which is the mean and variance of Q(i) _k of k type frames in an image group;

b) QMA _k , QMI _k , k ∈ {I, P, B}, which is the maximum and minimum values of QS(i) _k of k type frames in a group of images;

c) QSA _k , QSV _k , k ∈ {I, P, B}, which is the mean and variance of QS(i) _k for frames of type k in a group of images;

d) QMD _k , k ∈ {I, P, B}, which is the maximum value of QD(i) _k for frames of type k in a group of pictures;

e) QAD _k , QVD _k , k ∈ {I, P, B}, which is the mean and variance of QD(i) _k of frames of type k in an image group;

f) ADI, which is the absolute frame difference between two adjacent image group I frames;

g) HEP _k , k∈{I, P, B}, which is the ratio of the high-frequency energy of various types of frames in an image group to the overall energy;

iii. wherein the motion vector feature includes the following groups of feature quantities, with MV (k; x, y) representing the motion vector of the macroblock located at the k type frame (x, y) position, MVH (k; x, y), MVV(k; x, y) are its horizontal and vertical components respectively:

a) MX, MY, which is the maximum value of the horizontal and vertical components of the motion vector MV(k; x, y);

b) MZ, which is a static macroblock feature quantity:

Among them, MM and MS are defined as follows:

In the formula, X _M (x, y; n) is the pixel value at the (x, y) position of the nth motion macroblock in the current frame,

is the pixel value at the corresponding position of its reference frame; similarly, X _S (x, y; m) is the pixel value at the (x, y) position of the nth static macroblock in the current frame, is the pixel value at the corresponding position in the reference frame;

c) MAX _k , MAY _k , MDX _k , MDY _k , k∈{P, B}, they are the mean and variance of the relative error of the motion vector on the horizontal and vertical components respectively, the relative error refers to the current decoding obtained the distance between the motion vector MV(x _{, y) and the optimal motion vector MV 0} ₍ x, y), which is obtained using a TM5-based global search algorithm;

The relative error calculation formula of its horizontal component and vertical component is as follows:

In the formula MVH ₀ (k; x, y), MVV ₀ (k; x, y) is the horizontal and vertical component of the optimal motion vector MV ₀ (x, y) of the K type prediction frame;

d) MC, matching criterion feature quantity:

Among them, R _m (x, y) is the matching factor of the macroblock located at (x, y) in the mth P frame, which is defined as:

In the formula, the MAE (x, y) function is to calculate the average absolute difference between the current macroblock and the reference macroblock indicated by the motion vector (x, y);

(4) Establish three classifiers and train them separately: for different activity image groups, use different classifiers, for high activity image groups, only use motion vector features, and for low activity image groups, only use code Rate features and texture features, and three sets of feature quantities are used for classification when the active image group is used at the same time;

(5) Read the video sequence of the video source to be detected, for each image group, repeat steps (2) to (4) to obtain the feature vector of the image group, select the corresponding classifier according to its activity and give the classification result;

(6) The classification results of all image groups of the video sequence are integrated to make a final decision. the