CN116994120A

CN116994120A - Pseudo identification method for synthesized video images

Info

Publication number: CN116994120A
Application number: CN202310971368.1A
Authority: CN
Inventors: 郑威; 云剑; 凌霞; 郑晓玲; 周凡棣; 海涵; 辛鑫; 刘澎
Original assignee: China Academy of Information and Communications Technology CAICT
Current assignee: China Academy of Information and Communications Technology CAICT
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2023-11-03

Abstract

The application discloses a fake identifying method of a synthetic video image, which relates to the technical field of synthetic video identification and comprises the following specific steps: preprocessing an identification sample to obtain sample data, wherein the sample data comprises first image data and second image data; analyzing and processing according to the first image data, and labeling a first analysis tag on the identification sample; analyzing the second image data, and labeling a second analysis tag in the identification sample; integrating the first analysis tag and the second analysis tag in the identification sample to generate an identification result; according to the application, by constructing the figure-light intensity analysis model, the single-frame video image in the identification sample is subjected to illumination intensity analysis by utilizing a reflectivity estimation method, the figure angle is matched through the separated illumination intensity, and the actual figure angle is compared with the model figure angle, so that the analysis of second data is realized, and the reliability and accuracy of identification data are enhanced.

Description

Pseudo identification method for synthesized video images

Technical Field

The application relates to the field of composite video identification, in particular to a fake identification method for composite video images.

Background

With the rise of short videos, more and more video software has the function of P-pictures frame by frame, and some people utilize the method to forge and splice partial videos, so that judgment of video viewers and video identifiers is affected.

Through retrieval, a timing method, a timing device and a timing system for verifying the time reliability of the multi-source fusion video are provided in a comparison file with the publication number of CN115662109A, and the comparison file controls a minute timing display device, a second timing display device and an millisecond timing display device to turn on according to a set program through a timing device so as to achieve the effects of accurately timing and providing a time anchor point for the multi-source video; the display picture of the multi-source video shooting timing equipment is analyzed frame by frame, the frame interval time of the multi-source video can be determined, and then the frame rate of the multi-source video is determined, so that the purpose of judging the time reliability of the multi-source video is achieved.

Referring to the comparison document, it was found that the following disadvantages still exist in the prior art:

1. in the identification process of the synthesized video, the situation that P pictures are carried out frame by frame in a framing way exists, and based on the situation, the situation that errors occur in result analysis can exist by using a traditional video identification method.

2. In the process of video identification, video basic data is mostly adopted to judge video, but not analysis is carried out according to the condition of video content, and identification carelessness exists.

In order to solve the above mentioned problems, an authentication method for synthesizing video images is proposed.

Disclosure of Invention

The application aims to provide a fake identifying method for synthesizing video images, which aims to solve the defects in the background technology.

In order to achieve the above object, the present application provides the following technical solutions:

the fake identifying method of the synthesized video image comprises the following steps:

preprocessing an identification sample to obtain sample data, wherein the sample data comprises first image data and second image data;

analyzing and processing according to the first image data, and labeling a first analysis tag on the identification sample;

analyzing the second image data, and labeling a second analysis tag in the identification sample;

and integrating the first analysis tag and the second analysis tag in the identification sample to generate an identification result.

In a preferred embodiment, the identification sample is divided into L identification intervals according to the number of frames, wherein l= {0,1,2,3, 4..n }, n being an integer greater than 1, each identification interval containing 1 video frame number.

In a preferred embodiment, the first image data includes an analysis target edge line-hooking ambiguity and a depth-of-field range mean, the analysis target edge line-hooking ambiguity is the sharpness of a human image edge line-hooking of an identified human image, the smaller the analysis target edge line-hooking ambiguity is, the greater the suspicion that a P-map is spliced is, the depth-of-field range value difference refers to the mean between the maximum value and the minimum value of the depth range regarded as the sharpness in an image or a video, the depth range represents the distance range from the foreground to the background in the image, namely, the visual range between the foreground and the background can be simultaneously maintained, the greater the depth-of-field range value mean is, the plurality of objects in the image can be clearly represented, including the foreground and the background, the greater the synthesis probability is, the smaller the depth-of-field range value mean is, the fact that only a few objects in the image are clearly displayed is indicated, and the synthesis probability is also greater;

the second image data comprises a shadow angle and image illumination intensity, the shadow angle represents the angle ratio between the identification target and the shadow of the image illumination intensity, the image illumination intensity is calculated through reflectivity, the compactness of the association degree between the shadow angle and the image illumination intensity is achieved, when the compactness of the association degree is bigger, the synthesis degree of the video is lower, and when the compactness of the association degree is smaller, the synthesis degree of the video is higher.

In a preferred embodiment, the first image data analysis process includes the steps of:

the method comprises the steps of obtaining video images of an nth identification interval in L identification intervals, obtaining analysis target edge line-hooking ambiguity x and depth-of-field range mean value s in the video images, obtaining a first operation result through logarithm operation on an evolution of the analysis target edge line-hooking ambiguity x, carrying out ratio operation on the first operation result and the depth-of-field range mean value s, multiplying the first operation result by an ambiguity correction constant P, obtaining a first image coefficient zeta, wherein the authenticity of the video images is reduced as the first image coefficient zeta is larger or smaller, and the image processing degree in the video images is higher.

In a preferred embodiment, the first analysis tag includes a first genuine tag and a first modified tag, and the labeling of the first analysis tag includes:

setting first analysis tag comparison thresholds ki1 and ki2, wherein the first analysis tag comparison threshold ki1 is larger than the first analysis tag comparison threshold ki2, and substituting the first image coefficient ζ into the first analysis tag comparison thresholds ki1 and ki2 for comparison;

if the first image coefficient ζ is greater than 0 and smaller than the first analysis tag comparison threshold ki2, labeling the first modification tag on the identification sample;

if the first image coefficient ζ is greater than the first analysis tag comparison threshold ki2 and less than the first analysis tag comparison threshold ki1, labeling the identification sample with a first real tag;

if the first image coefficient ζ is larger than a first analysis tag comparison threshold ki1, labeling the first modification tag on the identification sample;

the lower the first genuine label has a video image of the authentication sample modified compared to the first fake label.

In a preferred embodiment, the step of the second image data analysis process is:

the method comprises the steps of obtaining a figure angle a and an image illumination intensity g in a video image, constructing a figure-light intensity analysis model, substituting the image illumination intensity g into the figure-light intensity analysis model for analysis, and outputting a figure angle range a 1-a 2 of the model;

the specific steps for constructing the figure-light intensity analysis model are as follows:

and (3) data acquisition: collecting image including human shadow images under different illumination conditions and corresponding illumination intensity data for training and testing a model;

feature extraction: extracting useful features from the shadow image, such as leg keypoints, and illumination intensity information;

reflectance estimation: and constructing a reflectivity estimation model by utilizing the existing illumination intensity data and the characteristic information in the shadow image, and estimating the illumination intensity of each image.

And (3) data processing: and combining the human body key points and the illumination intensity data into sample data for training and testing a model.

Model training: training a shadow-light intensity analysis model by adopting a regression algorithm or a classification algorithm, so that the shadow-light intensity analysis model can accurately predict a shadow angle corresponding to the illumination intensity;

model output: converting the human shadow-light intensity relation obtained by model training into a predicted value of a human shadow angle and a predicted value of illumination intensity;

setting a specific range: setting a specific figure angle range value for judging whether the detected figure angle is in the range;

prediction and comparison: inputting the detected shadow angle into a model to obtain predicted illumination intensity, and comparing the predicted illumination intensity with a specific shadow angle range value;

and (3) outputting results: and judging whether the figure angle is in a specific range according to the result output by the model, and obtaining a final labeling result.

In a preferred embodiment, the second analysis tag includes a second genuine tag and a second modified tag, and the labeling step of the second analysis tag is as follows:

substituting the human shadow angle a into an output model human shadow angle range a 1-a 2 output by the human shadow-light intensity analysis model for analysis;

if the figure angle a is in the model figure angle range a 1-a 2, labeling a second real label on the identification sample;

if the artifact angle a is not in the model artifact angle ranges a 1-a 2, performing secondary judgment;

the secondary judgment step is as follows:

adding a correction constant T into the human shadow-light intensity analysis model, adjusting the model human shadow angle ranges a 1-a 2 to a 1-T-a 2+ T, substituting the human shadow angle a into the adjusted model human shadow angle ranges a 1-T-a 2+ T for analysis;

if the artifact angle a is in the model artifact angle range a 1-T-a2+T, marking the identification sample with a second real label;

if the artifact angle a is not in the model artifact angle range a 1-T-a2+T, labeling the identification sample with a second modification label;

the authentication sample authenticity of the second authentic tag is greater than the authentication sample authenticity of the second modified tag.

In a preferred embodiment, the authentication result is generated by the following steps:

performing integrated analysis on the first analysis tag and the second analysis tag in the identification sample;

the method comprises the steps that the number D of modification intervals of a first modification label and a second modification label is stored in a statistical identification sample at the same time, wherein the number D of the modification intervals is more than or equal to 0 and less than or equal to the total number L of identification intervals;

setting modification interval comparison thresholds DL1 and DL2, wherein DL1 is more than or equal to 0 and less than DL2 is more than or equal to L, substituting the number D of modification intervals into the modification interval comparison thresholds DL1 and DL2 for comparison analysis;

if the number D of the modification intervals is more than or equal to 0 and less than the modification interval comparison threshold DL1, generating a real result for the identification sample;

if the number D of the modification intervals is larger than or equal to the modification interval comparison threshold DL1 and smaller than the modification interval comparison threshold DL2, generating an in-doubt result for the identification sample;

if the number D of the modification intervals is larger than or equal to the modification interval comparison threshold DL2 and smaller than or equal to the total number L of the identification intervals, generating a modification result for the identification sample;

the greater the degree of modification of the modified result compared to the in-doubt result, and so on.

In the technical scheme, the application has the technical effects and advantages that:

1. the application can help to determine the foreground and background elements in the image or video by considering the conditions of target edge line and depth range mean value, and the like, is beneficial to target detection, tracking and other image processing tasks, and simultaneously adopts two types of data which are difficult to think, thereby being easier to identify the authenticity of the video.

2. According to the application, by constructing the figure-light intensity analysis model, the single-frame video image in the identification sample is subjected to illumination intensity analysis by utilizing a reflectivity estimation method, the figure angle is matched through the separated illumination intensity, and the actual figure angle is compared with the model figure angle, so that the analysis of second data is realized, and the reliability and accuracy of identification data are enhanced.

3. According to the method, the number of spliced, modified and synthesized video image intervals in the identification sample is accurately obtained through superposition analysis of multiple data, secondary judgment is carried out according to the number of the video image intervals, analysis and judgment are further carried out on the identification sample, judgment logic is increased, and accuracy of video identification is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for those skilled in the art.

Fig. 1 is a flowchart of a method for authenticating a composite video image according to the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1, the method for authenticating a synthesized video image according to the present embodiment includes the following steps:

The identification samples were divided into L identification intervals according to the number of frames, where l= {0,1,2,3, 4..n }, n being an integer greater than 1, each identification interval containing 1 video frame number.

It should be noted that: the pretreatment steps and the related technology are as follows:

video reading: reading video frame data from a video file, a common image/video processing library, such as OpenCV, etc., may be used;

video frame extraction: splitting a video into multiple frames of images;

image preprocessing: and preprocessing each frame of image, including denoising, enhancing, clipping, scaling and the like, so as to improve the effect of subsequent analysis.

Feature extraction: extracting key features from each frame of image for subsequent analysis and processing, wherein the extracted data are specifically sample data;

analysis algorithm: selecting a proper analysis algorithm, such as target detection, object tracking, action recognition and the like, according to a specific task;

and (3) data storage: and storing the video frame data subjected to framing treatment, the extracted characteristics and other information, so that the subsequent comprehensive analysis is convenient.

The application processes based on the video identification scheme with the portrait;

the first image data comprises an analysis target edge line-drawing ambiguity and a depth range mean value, the analysis target edge line-drawing ambiguity is the definition degree of a human image edge line-drawing of an identification human image, the smaller the analysis target edge line-drawing ambiguity is, the greater the suspicion that P image splicing exists, the depth range value difference refers to the mean value between the maximum value of the depth range and the minimum value of the depth range which are regarded as definition in an image or a video, the depth range represents the distance range from the foreground to the background in the image, namely, the visual range between the foreground and the background which can simultaneously keep definition, the greater the depth range value mean value is, the representation of a plurality of objects in the image can be clearly represented, including the foreground and the background, the greater the synthesis probability is, the smaller the depth range value mean value is, the representation of only a few objects in the image is clearly displayed, and the synthesis probability is also greater.

It should be noted that: the larger the depth range numerical mean, the greater the likelihood of being a composite image or a computer generated image, because in the real world the depth of field is typically limited and near objects are not simultaneously clearly displayed. Thus, an excessive depth of field range may make the image appear too realistic, and in fact synthetic or computer-generated;

the smaller the depth range value mean, the more blurred or unclear the presentation of the object may be due to signs of editing using blurring effects or image processing software. Too small a depth of field range may make the image appear too processed, and in fact edited or modified.

Furthermore, the related techniques extracted from the first image data are respectively:

edge-hooking ambiguity extraction: edge detection algorithms or edge ambiguity calculations, such as Canny's algorithm, sobel's algorithm, laplacian algorithm, etc., are employed to detect edge-delineating ambiguities in the image.

Depth-of-field range mean value extraction: depth information of objects in the image is estimated by using depth estimation techniques in computer vision, such as structured light, stereo matching, deep learning, etc.

It should be noted that:

the human shadow angle adopts a human body detection and key point positioning method taking legs as key points, and the method can be specifically carried out according to the following steps:

human body detection: firstly, performing human body detection in a video frame image by using a deep learning model to find the position and posture information of a person, wherein the common deep learning model comprises SSD (Single Shot Multibox Detector), YOLO (You Only Look Once) and the like;

positioning key points: and using a key point positioning model, such as OpenPose, to perform key point positioning on the detected person. Of particular concern are leg keypoints, which typically include the junction of the thigh and shank;

shadow detection: in the video frame image, using image processing and segmentation techniques to detect and segment the location and shape of the shadow;

and (3) angle calculation: with the position information of the person's position, key points and shadow, the angles between the leg key points and shadow can be calculated using geometric and trigonometric knowledge. The angle information can be obtained by calculating the included angle between the connecting line between the key points of the leg and the shadow center point and the horizontal line or the included angle between the vertical lines;

angle correction: in some cases, it may be necessary to correct the angle to account for tilting of the camera or perspective transformation.

The illumination intensity of the image is obtained through reflectivity estimation, and the method is specifically as follows:

the reflectivity estimation is a method for estimating the illumination intensity in an image through the relation between the reflectivity of an object and the illumination intensity, wherein the reflectivity refers to the light reflecting capacity of the surface of the object, and objects with different materials have different reflectivities.

The specific reflectivity estimation method can be realized by the following steps:

acquiring a reference image: firstly, a reference image is required to be acquired, wherein the image is shot under the known specific conditions to be used as a reference of reflectivity, and the known specific conditions comprise known illumination intensity and camera parameters;

extracting an object region: and (3) estimating the object with reflectivity according to the requirement, and extracting an object region by utilizing an image segmentation or object detection algorithm. The method comprises the steps of carrying out a first treatment on the surface of the

Estimating reflectivity: for the extracted object region, a corresponding region is found in the reference image, and the pixel value of the region is acquired. Assuming that the pixel value in the reference image is I_ref and the pixel value in the image to be estimated is I_test;

calculating the reflectivity: the reflectivity r=i_test/i_ref according to the definition of reflectivity. By calculating the reflectivity of each pixel point, the reflectivity of the object surface in the image to be estimated can be obtained.

The reflectivity estimation is a relative estimation method, which needs to acquire a reference image in advance, and assume that the illumination intensity and the camera parameters are the same in the reference image and the image to be estimated, so that more complex techniques, such as multi-view imaging or reflection model fitting, may need to be used to obtain more accurate reflectivity estimation.

The first image data analysis processing step comprises the following steps:

obtaining video images of an nth identification interval in L identification intervals, obtaining analysis target edge line-hooking ambiguity x and a depth-of-field range mean value s in the video images, obtaining a first operation result by carrying out logarithmic operation on an evolution of the analysis target edge line-hooking ambiguity x, carrying out ratio operation on the first operation result and the depth-of-field range mean value s, and multiplying the first operation result by an ambiguity correction constant P to obtain a first image coefficient zeta, wherein an obtaining formula of the first image coefficient zeta is as follows:

(P > 0 and x is always greater than 0);

the greater or lesser the first image coefficient ζ, the less the authenticity of the video image, and the higher the image processing degree in the video image.

The first analysis tag comprises a first real tag and a first modification tag, and the labeling steps of the first analysis tag are as follows:

The second image data analysis processing step comprises the following steps:

It should be noted that: the larger the illumination intensity of the image is, the smaller the angle value between the human and the shadow is, under the condition that the illumination intensity is larger, the shadow of the human is clearer and shorter, the included angle between the human and the shadow is smaller, and under the condition that the illumination intensity is smaller, the shadow is blurred and prolonged, and the included angle between the human and the shadow is larger;

the above-mentioned figure-light intensity analysis model can be solved by a supervised learning regression model, and when the figure-light intensity analysis model is trained, a data set with labels can be used, wherein the figure image, the illumination intensity and the corresponding figure angle are included, and by training, the figure-light intensity analysis model can learn the relation between the illumination intensity and the figure angle, and then the figure angle in the image to be detected can be predicted by using the relation.

The second analysis tag comprises a second real tag and a second modification tag, and the labeling steps of the second analysis tag are as follows:

the secondary judgment step is as follows:

The flow of the identification result generation is as follows:

The method can help to determine the foreground and background elements in the image or video by considering the conditions of target edge line drawing, depth of field range average value and the like, is beneficial to target detection, tracking and other image processing tasks, and adopts two types of data which are difficult to think, so that the authenticity of the video is more easily identified; the illumination intensity analysis is carried out on a single-frame video image in the identification sample by constructing a figure-light intensity analysis model and utilizing a reflectivity estimation method, figure angle matching is carried out through the separated illumination intensity, and the actual figure angle is compared with the model figure angle, so that the analysis of second data is realized, and the reliability and the accuracy of identification data are enhanced;

in addition, through superposition analysis of multiple data, the number of spliced, modified and synthesized video image intervals in the identification sample is accurately obtained, secondary judgment is carried out according to the number of the video image intervals, analysis judgment is further carried out on the identification sample, judgment logic is increased, and accuracy of video identification is improved.

The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for authenticating a composite video-like image, the method comprising the steps of:

2. The method according to claim 1, wherein the identification samples are divided into L identification sections according to the number of frames, wherein l= {0,1,2,3, 4..n }, n is an integer greater than 1, and each identification section has 1 video frame number.

3. The method for authenticating a composite video image according to claim 2, wherein the first image data includes an analysis target edge line-hooking ambiguity and a depth-of-field range mean value, the analysis target edge line-hooking ambiguity is the sharpness of a human image edge line-hooking for authenticating a human image, the smaller the analysis target edge line-hooking ambiguity is, the greater the suspicion that P-map stitching exists, the depth-of-field range value difference refers to the mean value between a maximum value and a minimum value of a depth range regarded as sharpness in an image or a video, the depth range represents a distance range from foreground to background in the image, namely, a visual range between the foreground and the background capable of simultaneously maintaining sharpness, the greater the depth-of-field range value mean value indicates that a plurality of objects in the image can be clearly represented, including the foreground and the background, the greater the synthesis probability is, the smaller the depth-of-field range value mean value indicates that only a few objects in the image are clearly displayed, and the synthesis probability is also greater;

4. A method for authenticating a composite video image as set forth in claim 3 wherein said first image data analyzing step comprises:

5. The method for authenticating a composite video image as set forth in claim 4, wherein the first analysis tag includes a first genuine tag and a first modified tag, and the labeling of the first analysis tag includes:

setting first analysis tag comparison thresholds ki1 and ki2, wherein the first analysis tag comparison threshold ki1 is larger than the first analysis tag comparison threshold k i, and substituting the first image coefficient ζ into the first analysis tag comparison thresholds ki1 and ki2 for comparison;

6. The method for authenticating a composite video image as set forth in claim 5 wherein the step of analyzing the second image data comprises:

7. The method for authenticating a composite video image as set forth in claim 6, wherein the second analysis tag includes a second genuine tag and a second modified tag, and the labeling of the second analysis tag includes:

the secondary judgment step is as follows:

8. The method for authenticating a composite video image as set forth in claim 7, wherein the step of generating the authentication result is: