CN108596141B

CN108596141B - Detection method and system for generating face image by deep network

Info

Publication number: CN108596141B
Application number: CN201810434620.4A
Authority: CN
Inventors: 李昊东; 黄继武
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2022-05-17
Anticipated expiration: 2038-05-08
Also published as: CN108596141A; WO2019214557A1

Abstract

The invention discloses a detection method and a system for generating a face image by a depth network, which construct a training sample set consisting of a real face image and a generated face image; modeling the training sample set based on the color relation, and extracting statistical characteristics; training the statistical characteristics to obtain a classification model; and predicting the image to be detected based on the classification model. According to the inconsistency of the statistical characteristics of the face images generated by the depth network with the real images, a group of symbiotic matrix characteristics based on the color relationship of adjacent pixels is designed, the face images of different sizes generated by different types of depth networks have very high detection accuracy, whether the given face image is a false image generated by the depth network can be effectively judged, and the safety is improved.

Description

Detection method and system for generating face image by deep network

Technical Field

The invention relates to the field of multimedia information security and forensics, in particular to a detection method and a detection system for generating a face image by a deep network.

Background

With the rapid development of science and technology, digital images are widely applied in various aspects of social production and life, and become important carriers for recording objective facts. At the same time, powerful multimedia processing software is becoming popular. With the help of professional image editing software such as Adobe Photoshop, GIMP, ACDese, etc., the ordinary user can easily edit and modify the image data without leaving obvious visual traces, thereby masking and even distorting the truth of the fact. Moreover, nowadays, a large number of real images are learned through a deep network, and an image generation model can be trained. Such image generation models can be used to generate a large number of realistic false scene images, such as human faces. Once used in sensitive fields such as news reports, identity verification, judicial evidence collection, etc., these false images seriously hamper the normal order of the society. Therefore, authenticating the authenticity of an image is a practical problem to be solved.

Generally, authentication techniques for digital images can be classified into two broad categories, active authentication and passive authentication. The active authentication comprises methods such as digital signature and digital watermark, and such technologies need to add additional authentication information such as embedded signature or watermark when the digital image is generated or before the digital image is spread, and then judge whether the image is real or complete by identifying whether the embedded information is changed. However, in reality, the digital images are from different sources, and it is often difficult to embed information into them in advance, which greatly limits the application of active authentication techniques.

Compared with the active authentication technology, the passive authentication technology does not need to embed information in the image in advance, and only relies on the data of the image to perform authentication, so that the passive authentication technology is more practical. The basic basis of passive authentication is: the hardware characteristics of digital cameras and various signal processing means in the image capturing process leave inherent properties in the image data, and modifying the image destroys these inherent properties or introduces new traces. By extracting relevant features, the source of the images can be identified and whether they have been modified or not can be determined.

Conventional image tampering means include stitching, area copy shifting, image enhancement, and the like. The commonality of these means of tampering is that they are all edited and modified on an existing real image. Compared with the traditional tampering means, the image generation model constructed through the deep network can achieve the tampering effect of 'no Chinese characters'. By selecting appropriate parameters, a tamperer can generate a specific scene, such as a facial image conforming to certain shape, posture and age characteristics, by using the trained deep network. The prior art shows that such generated images can sometimes be animated, able to disguise the human eye.

Lawbreakers gain profits through false photos generated by a deep network, which brings about a lot of potential safety hazards.

Therefore, the existing detection technology for generating images still needs to be improved and developed.

Disclosure of Invention

The invention provides an effective detection method aiming at the face image generated by the depth network, which can accurately judge whether a given face image is a real image or a false image generated by the depth network, thereby improving the safety.

The technical scheme adopted by the invention for solving the technical problem is as follows:

a detection method for generating a face image by a deep network comprises the following steps:

A. constructing a training sample set consisting of a real face image and a generated face image;

B. modeling the training sample set based on the color relation, and extracting statistical characteristics;

C. training the statistical characteristics to obtain a classification model;

D. and detecting the image to be detected based on the classification model, and outputting a detection identification result.

The detection method for generating the face image by the deep network comprises the following steps:

a1, obtaining a real face image through an imaging device;

a2, generating a face image through a trained depth network by a random noise vector;

and A3, regarding the real face image as a negative sample, regarding the generated face image as a positive sample, and forming a training sample set.

b1, extracting the size relation of adjacent pixel values in the color channel of each sample in the training sample set;

b2, describing the color and texture information of each sample in the training sample set through a symbiotic matrix;

and B3, obtaining the characteristics of each image.

The detection method for generating the face image by the deep network, wherein the B1 specifically is as follows:

let the input image be I, and its R, G, B three color channels are I respectively_r，I_gAnd I_bThen, the magnitude relation of the values of the adjacent pixels in each color channel is calculated according to the following formula:

R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x+i,x+j)}；

where c e { r, g, b }, (i, j) ∈ { (0,1), (0, -1), (1,0), (-1,0) }, if and only if the logical expression in parentheses is true Φ { · } 1, consider the size relationship of R, G, B three channels as a triplet, that is:

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x,y))

R_i,jeach component in (x, y) takes a value of 0 or 1, and the following equivalent transformation is carried out:

the detection method for generating a face image by using a deep network, wherein the B2 specifically includes:

use symbiotic matrix to pair R'_i,jModeling and calculating the method as follows (taking a k-order co-occurrence matrix in the horizontal direction as an example):

R′_i,j(x,y+k-1))＝＝(v₁,v₂,…,v_k)}

wherein (v)₁,v₂,…,v_k) Is the index of the co-occurrence matrix subscript, N is the normalization factor, Φ {. is 1 if and only if the logical expression in parenthesis is true, else Φ {. is 0.

and training an integrated classifier which takes a linear discriminant analyzer as a base classifier by using a supervised learning method to serve as a binary classification model.

predicting the image to be detected through a classification model, and if the image to be detected is predicted to be a generated face image through the classification model, judging the image to be the generated face image; otherwise, the image is a real face image.

A detection system for a deep network generated face image, wherein the detection system for a deep network generated face image comprises:

the sample construction module is used for constructing a training sample set consisting of a real face image and a generated face image;

the characteristic extraction module is used for modeling the training sample set based on the color relation and extracting statistical characteristics;

the characteristic training module is used for training the statistical characteristics to obtain a classification model;

and the image detection module is used for detecting the image to be detected based on the classification model and outputting a detection identification result.

Wherein the feature extraction module comprises: a pixel relation module and a statistical description module.

The pixel relation module is used for extracting the size relation of adjacent pixel values in the color channel of each sample in the training sample set. Specifically, the method comprises the following steps:

R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x+i,x+j)}；

if and only if the logical expression in parentheses is true Φ { · } 1, consider the size relationship of R, G, B three channels as a triplet, that is:

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x,y))

statistics ofThe description module is used for describing color and texture information of each sample in the training sample set. Specifically, R 'is paired using a symbiotic matrix'_i,jModeling and calculating the method as follows:

R′_i,j(x,y+k-1))＝＝(v₁,v₂,…,v_k)}

The invention discloses a detection method and a system for generating a face image by a deep network, which construct a training sample set consisting of a real face image and a generated face image; modeling the training sample set based on the color relation, and extracting statistical characteristics; training the statistical characteristics to obtain a classification model; and predicting the image to be detected based on the classification model. According to the inconsistency of the statistical characteristics of the face images generated by the depth network with the real images, a group of symbiotic matrix characteristics based on the color relationship of adjacent pixels is designed, the face images of different sizes generated by different types of depth networks have very high detection accuracy, whether the given face image is a false image generated by the depth network can be effectively judged, and the safety is improved.

Drawings

Fig. 1 is a flow chart of an embodiment of the present invention for detecting a face image.

Fig. 2 is a schematic diagram of the present invention for generating a face image.

FIGS. 3(a) and 3(b) are diagrams of the process of calculating the color relationship for adjacent pixels of an image according to the present invention.

FIG. 4 is a schematic diagram of an image statistical co-occurrence matrix through pixel color relationships according to the present invention.

Fig. 5(a) and 5(b) are schematic diagrams comparing features of a real face image and a generated face image.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for detecting a face image generated by a deep network according to a preferred embodiment of the present invention. The detection method for generating the face image by the deep network comprises the following steps:

and step S10, constructing a training sample set consisting of real face images and generated face images.

Specifically, firstly, a real face image training depth network is used as a face image generator, and a random noise vector is input into the trained depth network to obtain a generated face image, as shown in fig. 2. The types of the trained deep network include but are not limited to a variational self-coder and a generation countermeasure network. The real face image is taken as a negative sample, the generated face image is taken as a positive sample to form a training set, the real face image is obtained by shooting through imaging equipment, and the generated face image is obtained by generating a random noise vector through a trained depth network.

S20, modeling the training sample set based on the color relation, and extracting statistical characteristics;

the specific method comprises the following steps: respectively extracting the characteristics of the face image generated by the depth network and the real face image,

for each image I, the magnitude relation of values of adjacent pixels in R, G, B color channels is calculated according to the following formula:

R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x+i,x+j)}

in the above equation, c ∈ { r, g, b }, (i, j) ∈ { (0,1), (0, -1), (1,0), (-1,0) }, Φ { · 1 if and only if the logical expression in parentheses is true, and Φ { · 0 otherwise. To describe the color relationship between pixels, the size relationship of R, G, B three channels is considered as a triplet, namely:

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x,y))

to facilitate subsequent statistics, the triplet R is used_i,j(x, y) is equivalently converted into a value in the interval [0,7 ]]The integer of (1):

to R'_i,jRespectively counting the frequency of continuous 3 elements in the horizontal direction and the vertical direction to obtain a 3-order co-occurrence matrix

And

each co-occurrence matrix contains a dimension d-8³512. Taking the horizontal direction as an example, the co-occurrence matrix

The calculating method comprises the following steps:

R′_i,j(x,y+2))＝＝(v₁,v₂,v₃)}

wherein (v)₁,v₂,v₃) Is the index of the co-occurrence matrix subscript, N is the normalization factor, Φ {. is 1 if and only if the logical expression in parenthesis is true, else Φ {. is 0. Last pair of

Summing and calculating the mean value to obtain a group of 512-dimensional statistical characteristics.

For a given pointRespectively extracting R, G, B pixel values of the three color channels, as shown in fig. 3(a), and substituting R, G, B pixel values of the three color channels into R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x + i, x + j) } and

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x, y)) was calculated to obtain the result shown in (b) 3.

Then according to

The conversion was carried out to obtain the results shown in FIG. 4.

Finally, according to

Statistical horizontal direction co-occurrence matrix

Resulting in 512-dimensional features. Similarly, separately calculate

And obtaining the average value of the features to obtain the features of the image.

Fig. 5 shows a mean curve of 512-dimensional features calculated from 1000 real face images and 1000 generated face images, respectively. It can be seen that the features of the real face image and the generated face image are significantly different in many dimensions.

Step S30, training the statistical characteristics to obtain a classification model;

specifically, for each sample in the training image set, the above method is adopted to extract the co-occurrence matrix as a feature, and a supervised learning method is utilized to train an integrated classifier which takes a Linear Discriminant Analyzer (LDA) as a base classifier as a binary classification model. The classification model is a two classifier obtained by supervised learning of a training sample set. The modeling of the training sample set based on the color relationship is the modeling of the color relationship of the training sample set through a co-occurrence matrix. The integrated classifier is used because the integrated classifier still has high operation efficiency when the training samples are many and the feature dimension is high, and can obtain good classification performance. In practical application, other types of classifiers such as Support Vector Machines (SVMs) can be selected according to requirements.

And step S40, detecting the image to be detected based on the classification model.

Specifically, for a given face image to be detected, the same method is also adopted to extract the co-occurrence matrix as the feature. And inputting the characteristics into the trained classification model to obtain a prediction result. If the prediction result shows that the image to be detected is a face image, judging that the image is the face image generated through the depth network; otherwise, the image is a real face image.

In a further preferred embodiment, a CelebA face image dataset (containing 202599 real face images) is used to train a depth network consisting of a Variational auto-encoder (VAE) as the face image generator. The trained depth network generates a face image equivalent to the CelebA data set. In this example, the image size is 64 × 64. And randomly dividing the generated face image and the real face image in CelebA into a training set and a testing set at a ratio of 50 percent respectively.

In the training set, the above 512-dimensional features are extracted for each image, and a two-class model is trained. After the classification model is trained, detecting the images in the test set, and obtaining the following detection results:

actual class/predicted class	Real face image	Generating a face image
			Real face image	99.90％	0.10％
Generating a face image	0.04％	99.96％

In a further preferred embodiment, a generated confrontation network (GAN) is trained using the CelebA face image dataset as the face image generator. And generating a face image with the same quantity as the CelebA data set by using the trained depth network, wherein the size of the image is 64 multiplied by 64. Then, the generated face image and the real face image in CelebA are randomly divided into a training set and a testing set according to a proportion of 50%, characteristics are extracted, a classification model is trained and tested, and the following experimental results are obtained:

in a further preferred embodiment, a depth network composed of variational self-coders is trained using the CelebA face image dataset as a face image generator to generate a face image of size 128 × 128. Respectively randomly dividing the generated face image and the real face image in CelebA into a training set and a testing set according to a proportion of 50%, and performing a series of experiments to obtain the following experiment results:

actual class \ predicted class	Real face image	Generating a face image
			Real face image	99.99％	0.01％
Generating a face image	0.00％	100％

In a further preferred embodiment, a generative confrontation network is trained using the CelebA face image dataset as a face image generator, generating a face image of size 128 x 128. Respectively randomly dividing the generated face image and the real face image in CelebA into a training set and a testing set according to a proportion of 50%, and performing a series of experiments to obtain the following experiment results:

actual class/predicted class	Real face image	Generating a face image
			Real face image	100％	0.00％
Generating a face image	0.00％	100％

In a further preferred embodiment, CelebA-HQ facial image data set (containing 30000 real facial images) is used for training a Progressive Growing confrontation network (Growing of GANs) as a facial image generator to generate high-definition facial images with the size of 1024 × 1024. Randomly dividing the generated face image and the real face image in CelebA into a training set and a testing set according to a proportion of 50%, and performing a series of experiments to obtain the following experimental results:

actual class/predicted class	Real face image	Generating a face image
			Real face image	99.07％	0.93％
Generating a face image	0.38％	99.62％

According to the experimental results, the method has very high detection accuracy rate for the face images with different sizes generated by different types of depth networks, and can effectively judge whether the given face image is the generated false image. The method has important significance for practical occasions related to the safety of the face images.

Based on the above method embodiment, the present invention further provides a detection system for generating a face image by a deep network, wherein the detection system for generating a face image by a deep network comprises: the device comprises a sample construction module, a feature extraction module, a feature training module and an image detection module.

and the image detection module is used for detecting the image to be detected based on the classification model.

The feature extraction module comprises a pixel relation module and a statistic description module.

R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x+i,x+j)}；

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x,y))

the statistical description module is used for describing color and texture information of each sample in the training sample set. Specifically, R 'is paired using a symbiotic matrix'_i,jModeling and calculating the method as follows:

R′_i,j(x,y+k-1))＝＝(v₁,v₂,…,v_k)}

The principle of the invention is as follows: although the face image generated by the depth network can simulate the whole information of the face such as the shape, the posture, the expression and the like to a certain extent, the texture details existing in the real image cannot be generated well, so that the internal relation among the pixels of the generated image is inconsistent with the real image, and the inherent statistical characteristics of the real image cannot be maintained. Therefore, by extracting statistical features in the color relationship between adjacent pixels of the image, the real face image can be effectively distinguished and the face image can be generated.

The invention discloses a detection method and a system for detecting a depth network generated face image, which construct a training sample set consisting of a real face image and a generated face image; modeling the training sample set based on the color relation, and extracting statistical characteristics; training the statistical characteristics to obtain a classification model; and predicting the image to be detected based on the classification model. According to the inconsistency of the statistical characteristics of the face image generated by the depth network with the real image, a group of symbiotic matrix characteristics based on the color relationship of adjacent pixels is designed, the face images of different sizes generated by different types of depth networks have very high detection accuracy, and whether the given face image is a false image generated by the depth network can be effectively judged.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A detection method for generating a face image by a deep network is characterized in that the detection step for generating the face image by the deep network comprises the following steps:

C. training the statistical characteristics to obtain a classification model;

D. detecting the image to be detected based on the classification model and outputting a detection identification result;

the step B specifically comprises the following steps:

b3, obtaining the characteristics of each image;

the step B1 specifically includes:

let the input image be I, and its R, G, B three color channels are I respectively_r，I_gAnd I_bAnd calculating the size relation of values of adjacent pixels in each color channel according to the following formula:

R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x+i,y+j)}

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x,y))

wherein, R'_i,j(x, y) is a triplet R_i,jAn equivalent scalar representation of (x, y).

2. The method for detecting a face image generated by a deep network according to claim 1, wherein the step a specifically includes:

a1, obtaining a real face image through an imaging device;

3. The method for detecting a face image generated by a deep network according to claim 1, wherein the step B2 specifically includes:

using co-occurrence matrix to R'_i,jModeling, extracting a k-order co-occurrence matrix, wherein the calculation method comprises the following steps:

wherein (v)₁,v₂,…,v_k) Is an index of the co-occurrence matrix; n is a normalization factor; Δ x, Δ y represent the offset of the adjacent elements, Φ { · } if and only if the logical expressions in parentheses are all true1, otherwise Φ { · } -, 0.

4. The method for detecting a face image generated by a deep network according to claim 1, wherein the step C specifically includes:

5. The method for detecting a face image generated by a deep network according to claim 1, wherein the step D specifically includes:

6. A detection system for generating a face image by a deep network is characterized by comprising:

the image detection module is used for detecting the image to be detected based on the classification model and outputting a detection identification result;

the feature extraction module includes:

the pixel relation module is used for extracting the size relation of adjacent pixel values in the color channel of each sample in the training sample set;

specifically, the method comprises the following steps:

let the input image be I, and its R, G, B three color channels are I respectively_r，I_gAnd I_bEach color channel is calculated as followsThe value size relationship of the adjacent pixels:

R_c,i,j(x,y)＝Φ{I_c(x,y)>I_c(x+i,y+j)}

R_i,j(x,y)＝(R_r,i,j(x,y),R_g,i,j(x,y),R_b,i,j(x,y))

7. The system of claim 6, wherein the feature extraction module further comprises:

a statistical description module: color and texture information used for describing each sample in the training sample set;

using co-occurrence matrix to R'_i,jModeling and calculating the method as follows:

wherein (v)₁,v₂,…,v_k) Is an index of the co-occurrence matrix; n is a normalization factor; Δ x, Δ y represent the offset of the neighboring elements, Φ {. 1 if and only if the logical expressions in parenthesis are both true, else Φ {. 0.