CN111862031A

CN111862031A - Face synthetic image detection method and device, electronic equipment and storage medium

Info

Publication number: CN111862031A
Application number: CN202010681943.0A
Authority: CN
Inventors: 王珂尧; 冯浩城; 岳海潇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Guangzhou Dinghang Information Technology Service Co ltd
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2020-10-30

Abstract

The application discloses a face synthetic image detection method and device, electronic equipment and a storage medium, and relates to the fields of artificial intelligence, deep learning and image recognition. The specific scheme is as follows: inputting a face image to be detected into a pre-trained face key point detection model to obtain face key points of the face image to be detected; generating a corresponding region block based on each face key point of the face image to be detected; combining all the region blocks corresponding to the face key points into a feature map corresponding to the face image to be detected; and inputting the characteristic graph corresponding to the face image to be detected into a pre-trained convolutional neural network to obtain a detection result of the face image to be detected. The embodiment of the application can relieve the overfitting condition of the face synthetic image detection, improve the generalization and accuracy of the face synthetic image detection in a complex environment, and simultaneously avoid the interference of background noise, thereby improving the detection effect on unknown synthetic samples.

Description

Face synthetic image detection method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computers, in particular to a method and a device for detecting a face synthetic image, an electronic device and a storage medium, and further relates to the fields of artificial intelligence, deep learning and image recognition.

Background

The face composite image detection is to distinguish whether the face in one image is a composite face, and the module for realizing the face composite image detection is a basic composition module of the face recognition system and is used for ensuring the safety of the face recognition system. A face synthetic image detection algorithm based on a deep learning technology is a mainstream method in the field at present. The current face synthetic image detection algorithm mainly uses a deep learning method of a neural network. The main methods for detecting the face synthetic image by using deep learning include synthetic image discrimination of a convolutional neural network, synthetic image discrimination based on a Long Short-Term Memory network (LSTM), and the like.

In the prior art, a human face synthetic image detection model is difficult to learn the characteristic that a synthetic image and an original image have discriminability, is easy to overfit on a small-range training sample, and has limited generalization on an unknown synthetic sample; in addition, only a single convolutional neural network is used, and the recognition effect is not ideal due to poor robustness when the face posture is too large or the illumination difference is large in a real scene.

Disclosure of Invention

The application provides a method, a device, an electronic device and a storage medium for face synthetic image, which can relieve the overfitting condition of face synthetic image detection, improve the generalization and accuracy of face synthetic image detection in a complex environment, and simultaneously avoid the interference of background noise, thereby improving the detection effect on unknown synthetic samples.

In a first aspect, the present application provides a method for detecting a face synthesis image, including:

inputting a face image to be detected into a pre-trained face key point detection model, and extracting key points of the face image to be detected through the face key point detection model to obtain face key points of the face image to be detected;

generating a corresponding region block based on each face key point of the face image to be detected;

combining all the region blocks corresponding to the face key points into a feature map corresponding to the face image to be detected;

inputting the feature map corresponding to the face image to be detected into a pre-trained convolutional neural network, and calculating the feature map corresponding to the face image to be detected through the convolutional neural network to obtain a detection result of the face image to be detected; wherein the detection result comprises: the face image to be detected is a synthesized face image or a non-synthesized face image.

In a second aspect, the present application provides a face synthesis image detection apparatus, comprising: the device comprises a key point extraction module, a region block generation module, a region block merging module and a result calculation module; wherein,

the key point extraction module is used for inputting a face image to be detected into a pre-trained face key point detection model, and extracting key points of the face image to be detected through the face key point detection model to obtain face key points of the face image to be detected;

the region block generation module is used for generating a corresponding region block based on each face key point of the face image to be detected;

the region block merging module is used for merging region blocks corresponding to all the face key points into a feature map corresponding to the face image to be detected;

the result calculation module is used for inputting the feature map corresponding to the face image to be detected into a pre-trained convolutional neural network, and calculating the feature map corresponding to the face image to be detected through the convolutional neural network to obtain the detection result of the face image to be detected; wherein the detection result comprises: the face image to be detected is a synthesized face image or a non-synthesized face image.

In a third aspect, an embodiment of the present application provides an electronic device, including:

one or more processors;

a memory for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for detecting a face synthesis graph according to any embodiment of the present application.

In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for detecting a face synthesis image according to any embodiment of the present application.

The method solves the problems that in the prior art, a human face synthetic image detection model is difficult to learn the characteristic that a synthetic image and an original image have discriminability, is easy to overfit on a small-range training sample, and has limited generalization on an unknown synthetic sample; the technical scheme provided by the application can relieve the overfitting condition of face synthetic image detection, improve the generalization and accuracy of face synthetic image detection and improve the detection effect on unknown synthetic samples.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic flow chart of a face synthesis image detection method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a face synthesis image detection method according to a second embodiment of the present application;

fig. 3 is a schematic flow chart of a face synthesis image detection method according to a third embodiment of the present application;

FIG. 4 is a schematic structural diagram of a face synthesis image detection system according to a third embodiment of the present application

Fig. 5 is a schematic view of a first structure of a face synthesis image detection apparatus according to a fourth embodiment of the present application;

fig. 6 is a schematic diagram of a second structure of a face synthesis image detection apparatus according to a fourth embodiment of the present application;

fig. 7 is a block diagram of an electronic device for implementing the face synthesis image detection method according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Example one

Fig. 1 is a flowchart of a face composite image detection method according to an embodiment of the present application, where the method may be executed by a face composite image detection apparatus or an electronic device, where the apparatus or the electronic device may be implemented by software and/or hardware, and the apparatus or the electronic device may be integrated in any intelligent device with a network communication function. As shown in fig. 1, the face synthesis image detection method may include the following steps:

s101, inputting a face image to be detected into a pre-trained face key point detection model, and extracting key points of the face image to be detected through the face key point detection model to obtain face key points of the face image to be detected.

In a specific embodiment of the present application, the electronic device may input the face image to be detected into a pre-trained face key point detection model, and perform key point extraction on the face image to be detected through the face key point detection model to obtain face key points of the face image to be detected. In one embodiment, the method for extracting the key points of the face image to be detected through the face key point detection model may include the following steps: 1) acquiring a face image to be detected and face region information used for indicating a face region in the face image to be detected; 2) extracting a face image from a face image to be detected based on the face region information; 3) inputting a face image into a pre-trained face key point detection model to obtain the probability that each pixel point included in the face image belongs to the category indicated by each category identifier in a preset category identifier set; 4) inputting the face image into a pre-trained face key point positioning model to obtain coordinates of each face key point included in the face image; the face key point positioning model can be used for representing the corresponding relation between the image including the face and the coordinates of each face key point. Note that the face key points may be pre-specified points with strong semantic information in the face (e.g., eye corners, mouth corners, wing of nose positions, points in the contour, etc.). In practice, the number of the face key points may be 72, or may be other preset values, and this embodiment does not limit this aspect at all.

S102, generating a corresponding area block based on each face key point of the face image to be detected.

In a specific embodiment of the present application, the electronic device may generate a corresponding region block based on each face key point of the face image to be detected. In one embodiment, the electronic device can extract face key points of a face image to be detected and image features of the face key points through a face key point detection model; then, obtaining a region block corresponding to each face key point of the face image to be detected according to the face key points of the face image to be detected and the image characteristics of each face key point; the size of the region block is 36 × 36 × 3. Specifically, the number of face key points is 72, which are respectively: (x)₁,y₁)、(x₂,y₂)、…、(x₇₂,y₇₂)。

And S103, combining all the region blocks corresponding to the face key points into a feature map corresponding to the face image to be detected.

In a specific embodiment of the present application, the electronic device may combine all the region blocks corresponding to the face key points into a feature map corresponding to the face image to be detected. In one embodiment, area blocks corresponding to each face key point are combined on three channels of red, green and blue (RGB) to obtain a feature map of a face image to be detected; the size of the feature map is 36 × 36 × 216.

S104, inputting the feature map corresponding to the face image to be detected into a pre-trained convolutional neural network, and calculating the feature map corresponding to the face image to be detected through the convolutional neural network to obtain a detection result of the face image to be detected; wherein, the detection result includes: the face image to be detected is a synthesized face image or a non-synthesized face image.

In a specific embodiment of the application, the electronic device may input the feature map corresponding to the face image to be detected into a pre-trained convolutional neural network, and calculate the feature map corresponding to the face image to be detected through the convolutional neural network to obtain a detection result of the face image to be detected; wherein, the detection result includes: the face image to be detected is a synthesized face image or a non-synthesized face image. In one embodiment, the convolutional neural network may comprise: 5 convolutional layers, 3 maximum pooling layers and 1 full-link layer; after the feature map corresponding to the face image to be detected is processed by 5 convolution layers, 3 maximum pooling layers and 1 full-connection layer respectively, a 2-dimensional vector can be obtained, and the face image to be detected is judged to be a synthetic face image or a non-synthetic face image according to the 2-dimensional vector.

The face synthetic image detection method provided by the embodiment of the application comprises the steps of firstly inputting a face image to be detected into a pre-trained face key point detection model, and obtaining face key points of the face image to be detected through the face key point detection model; generating a corresponding region block based on each face key point of the face image to be detected; then, combining all the region blocks corresponding to the face key points into a feature map corresponding to the face image to be detected; and inputting the characteristic graph corresponding to the face image to be detected into a pre-trained convolutional neural network, and obtaining the detection result of the face image to be detected through the convolutional neural network. That is to say, the method and the device for detecting the face image can generate a region block corresponding to each face key point of the face image to be detected based on each face key point, and obtain the feature map corresponding to the face image to be detected based on the region block corresponding to each face key point, so that the detection result of the face image to be detected can be obtained based on the feature map corresponding to the face image to be detected through the convolutional neural network. In the existing human face synthetic image detection method, a deep learning method of a neural network is mainly used, the method is difficult to learn the characteristic that the synthetic image and the original image have discriminability, is easy to overfit on a small-range training sample, and has limited generalization on an unknown synthetic sample; and only a single convolutional neural network is used, so that the recognition effect is not ideal when the human face posture in a real scene is too large or the illumination difference is large, and the robustness is poor. Because the technical means that each face key point based on the face image to be detected generates a corresponding region block and the region block corresponding to each face key point is used for obtaining the feature map corresponding to the face image to be detected, the method overcomes the defects that in the prior art, a face composite map detection model is difficult to learn the distinguishing features of a composite map and an original map, is easy to overfit on a small-range training sample and has limited generalization on an unknown composite sample; the technical scheme provided by the application can relieve the overfitting condition of face synthetic image detection, improve the generalization and accuracy of face synthetic image detection in a complex environment, and simultaneously avoid the interference of background noise, thereby improving the detection effect on unknown synthetic samples; moreover, the technical scheme of the embodiment of the application is simple and convenient to implement, convenient to popularize and wide in application range.

Example two

Fig. 2 is a schematic flow chart of a face synthesis image detection method according to the second embodiment of the present application. As shown in fig. 2, the face synthesis image detection method may include the following steps:

s201, inputting a face image to be detected into a face key point detection model trained in advance, and extracting key points of the face image to be detected through the face key point detection model to obtain face key points of the face image to be detected.

S202, extracting the face key points of the face image to be detected and the image characteristics of the face key points through the face key point detection model.

In a specific embodiment of the application, the electronic device can extract face key points of a face image to be detected and image features of the face key points through a face key point detection model; wherein, the human face key point is 72, and respectively: (x)₁,y₁)、(x₂,y₂)、…、(x₇₂,y₇₂)。

S203, obtaining a region block corresponding to each face key point of the face image to be detected according to the face key points of the face image to be detected and the image characteristics of each face key point.

In a specific embodiment of the application, the electronic device may obtain, according to the face key points of the face image to be detected and the image features of each face key point, an area block corresponding to each face key point of the face image to be detected; the size of the region block is 36 × 36 × 3. In this embodiment, the first "36" indicates the length of the region block corresponding to each pixel point, the second "36" indicates the width of the region block corresponding to each pixel point, and the third "3" indicates the number of channels of the region block corresponding to each pixel point. The method and the device can construct a region block for each pixel point.

And S204, combining the region blocks corresponding to each face key point on the red, green and blue channels to obtain a feature map of the face image to be detected.

In a specific embodiment of the application, the electronic device can combine the region blocks corresponding to each face key point on the red, green and blue channels to obtain a feature map of the face image to be detected; the size of the feature map is 36 × 36 × 216. In an embodiment, the electronic device may superimpose the region blocks corresponding to each pixel point to obtain a region block of 36 × 36 × 216, which is used as a feature map of the facial image to be detected.

S205, inputting the feature map corresponding to the face image to be detected into a pre-trained convolutional neural network, and calculating the feature map corresponding to the face image to be detected through the convolutional neural network to obtain a detection result of the face image to be detected; wherein, the detection result includes: the face image to be detected is a synthesized face image or a non-synthesized face image.

EXAMPLE III

Fig. 3 is a schematic flow chart of a face synthesis image detection method according to a third embodiment of the present application. As shown in fig. 3, the face synthesis image detection method may include the following steps:

s301, inputting the face image to be detected into a pre-trained face detection model, and identifying the face image to be detected through the face detection model to obtain a face detection frame of the face image to be detected.

In a specific embodiment of the present application, the electronic device may input the face image to be detected into a pre-trained face detection model, and identify the face image to be detected through the face detection model, so as to obtain a face detection frame of the face image to be detected. Specifically, the electronic device may obtain an RGB image including a face first, input the RGB image to a pre-trained face detection model, and recognize the RGB image through the pre-trained face detection model to obtain a face detection frame of the RGB image. The face detection model in this embodiment may be an existing face detection model, and the face detection model may detect a face position.

S302, expanding a face detection frame of the face image to be detected by a preset multiple to obtain an expanded face detection frame; intercepting the face in the face image to be detected in the enlarged face detection frame to obtain an intercepted face image; and adjusting the intercepted face image to a preset size to obtain an adjusted face image.

In a specific embodiment of the application, the electronic device may expand a face detection frame of a face image to be detected by a preset multiple to obtain an expanded face detection frame; intercepting the face in the face image to be detected in the enlarged face detection frame to obtain an intercepted face image; and adjusting the intercepted face image to a preset size to obtain an adjusted face image. Specifically, the electronic device may expand a face detection frame of the face image to be detected by 1.5 times; then, the face in the face image to be detected is intercepted, and the intercepted face image is adjusted to the same size 224 × 224.

And S303, calculating the pixel value of each pixel point based on the adjusted face image.

In a specific embodiment of the present application, the electronic device may calculate a pixel value of each pixel point based on the adjusted face image. Specifically, the electronic device may input the adjusted face image to a pixel calculation model, and the pixel value of each pixel point may be calculated by the pixel calculation model.

S304, carrying out normalization processing on the pixel value of each pixel point according to a preset mode to obtain a face image after normalization processing; and enabling the pixel value of each pixel point in the normalized human face image to be within a preset range.

In a specific embodiment of the present application, the electronic device may perform normalization processing on a pixel value of each pixel point according to a predetermined manner, so as to obtain a normalized face image; enabling the pixel value of each pixel point in the normalized human face image to be within a preset range; and inputting the normalized human face image into a human face key point detection model trained in advance. Specifically, the electronic device may subtract 128 from the pixel value of each pixel and divide by 256 such that the pixel value of each pixel is between [ -0.5, 0.5 ]. Preferably, the electronic device may further perform random data enhancement processing on the normalized face image.

S305, inputting the normalized human face image into a human face key point detection model trained in advance, and extracting key points of the human face image to be detected through the human face key point detection model to obtain human face key points of the human face image to be detected.

And S306, generating a corresponding area block based on each face key point of the face image after the normalization processing.

And S307, combining all the area blocks corresponding to the face key points into a feature map corresponding to the normalized face image.

S308, inputting the feature map corresponding to the normalized face image into a pre-trained convolutional neural network, and calculating the feature map corresponding to the normalized face image through the convolutional neural network to obtain a detection result of the normalized face image; wherein, the detection result includes: the face image after normalization processing is a synthesized face image or a non-synthesized face image.

It should be noted that the processing procedure of steps S305 to S308 for the normalized face image in this embodiment is the same as the processing procedure of steps S101 to S104 for the face image to be detected in the first embodiment, and details are not repeated here.

Preferably, before the face image to be detected is input to the pre-trained face key point detection model, the electronic device may also train the face key point detection model. Specifically, the electronic device may use a first face image acquired in advance as a current face image; if the face key point detection model does not meet the convergence condition corresponding to the face key point detection model, the electronic equipment can input the current face image into the face key point detection model, and the current face image is used for training the face key point detection model; and taking the next face image of the current face image as the current face image, and repeatedly executing the operations until the face key point detection model meets the corresponding convergence condition.

Fig. 4 is a schematic structural diagram of a face synthesis image detection system provided in the third embodiment of the present application. As shown in fig. 4, the face synthesis map detection system may include: the system comprises a face detection module, a face detection model, a face key point detection model, an area block generation module, an area block combination module and a convolutional neural network; the face detection module is used for detecting a face in a face image to be detected to obtain a face detection frame in the face image to be detected; the detection model is an existing face detection model and can detect the face position. In addition, the face detection module is also used for preprocessing a face detection frame in the face image to be detected. Specifically, after a face detection frame is expanded by 1.5 times, a face in a face image to be detected is intercepted in the expanded face detection frame to obtain an intercepted face image; and adjusting the intercepted face image to be 224 multiplied by 224 with the same size to obtain the adjusted face image. Furthermore, the electronic device may further perform normalization processing on the adjusted face image. Specifically, the electronic device may subtract 128 from each pixel value in the adjusted face image and divide by 256, so that the pixel value of each pixel point is [ -0.5, 0.5 ]; preferably, the electronic device may further perform random data enhancement processing on the normalized face image through a face detection module. The region block generating module may include: a first area block generation module, a second area block generation module, …, and an Nth area block generation module; wherein N is a natural number greater than 1. Preferably, N may take the value of 72. Specifically, the first region block generating module is configured to generate a region block corresponding to the first pixel point based on the first pixel point; the second area block generation module is used for generating an area block corresponding to the second pixel point based on the second pixel point; …, respectively; and the Nth area block generating module is used for generating an area block corresponding to the Nth pixel point based on the Nth pixel point. And the region block merging module is used for merging the region blocks corresponding to all the face key points into a feature map corresponding to the face image to be detected. Specifically, the module may merge 72 area blocks of 36 × 36 × 3 into one area block of 36 × 36 × 216, which is used as a feature map corresponding to the face image to be detected. And the convolutional neural network is used for calculating the characteristic image corresponding to the face image to be detected to obtain the detection result of the face image to be detected. Specifically, the convolutional neural network may include: 5 convolutional layers, 3 maximum pooling layers and 1 full-link layer; after the feature map corresponding to the face image to be detected is processed by 5 convolution layers, 3 maximum pooling layers and 1 full-connection layer respectively, a 2-dimensional vector can be obtained, and the face image to be detected is judged to be a synthetic face image or a non-synthetic face image according to the 2-dimensional vector.

The human face synthetic image detection is one of basic technologies in human face related fields, and is applied to a plurality of scenes such as security, attendance, finance, access control and the like. The method has the advantages that the method is widely applied to a plurality of current services, the technical scheme provided by the application is adopted, the prior information of the face synthetic graph mostly based on key point mapping is utilized, the key points and the synthetic graph are jointly trained in detection, more supervision information can be brought to the synthetic graph detection, the over-fitting condition of the synthetic graph detection can be relieved, meanwhile, more models can pay attention to the fitting traces of key point areas of the face synthetic graph, original graphs and more discriminative features in the synthetic graph can be extracted for classification, therefore, the technical performance of face in-vivo detection can be improved, a plurality of applications based on the face in-vivo detection technology are helped to improve the effect and the user experience, and the method is favorable for further popularization of service items.

Example four

Fig. 5 is a schematic view of a first structure of a face synthesis image detection apparatus according to a fourth embodiment of the present application. As shown in fig. 5, the apparatus 500 includes: a key point extracting module 501, a region block generating module 502, a region block merging module 503 and a result calculating module 504; wherein,

the key point extraction module 501 is configured to input a face image to be detected into a pre-trained face key point detection model, and perform key point extraction on the face image to be detected through the face key point detection model to obtain face key points of the face image to be detected;

the region block generating module 502 is configured to generate a corresponding region block based on each face key point of the face image to be detected;

the region block merging module 503 is configured to merge region blocks corresponding to all the face key points into a feature map corresponding to the to-be-detected face image;

the result calculation module 504 is configured to input the feature map corresponding to the facial image to be detected into a pre-trained convolutional neural network, and calculate the feature map corresponding to the facial image to be detected through the convolutional neural network to obtain a detection result of the facial image to be detected; wherein the detection result comprises: the face image to be detected is a synthesized face image or a non-synthesized face image.

Further, the region block generating module 502 is specifically configured to extract, through the face key point detection model, a face key point of the face image to be detected and image features of each face key point; obtaining a region block corresponding to each face key point of the face image to be detected according to the face key points of the face image to be detected and the image characteristics of each face key point; wherein the size of the region block is 36 × 36 × 3.

Further, the region block merging module 503 is specifically configured to merge region blocks corresponding to each face key point on three channels, namely red, green and blue channels, to obtain a feature map of the face image to be detected; wherein the size of the characteristic diagram is 36 × 36 × 216.

Fig. 6 is a schematic diagram of a second structure of the face synthesis image detection apparatus according to the fourth embodiment of the present application. As shown in fig. 6, the apparatus 500 further includes: a face detection module 505, configured to input a face image to be detected into a pre-trained face detection model, and identify the face image to be detected through the face detection model to obtain a face detection frame of the face image to be detected; expanding the face detection frame of the face image to be detected by a preset multiple to obtain an expanded face detection frame; intercepting the face in the face image to be detected in the enlarged face detection frame to obtain an intercepted face image; adjusting the intercepted face image to a preset size to obtain an adjusted face image; and executing the operation of inputting the face image to be detected into the face key point detection model trained in advance.

Further, the face detection module 505 is further configured to calculate a pixel value of each pixel point based on the adjusted face image; normalizing the pixel value of each pixel point according to a preset mode to obtain a normalized face image; enabling the pixel value of each pixel point in the normalized human face image to be within a preset range; and executing the operation of inputting the face image to be detected into the face key point detection model trained in advance.

Further, the apparatus further comprises: a training module 506 (not shown in the figure) for taking a first face image obtained in advance as a current face image; if the face key point detection model does not meet the convergence condition corresponding to the face key point detection model, inputting the current face image into the face key point detection model, and training the face key point detection model by using the current face image; and taking the next face image of the current face image as the current face image, and repeatedly executing the operations until the face key point detection model meets the corresponding convergence condition.

The face composite image detection device can execute the method provided by any embodiment of the application, and has corresponding functional modules and beneficial effects of the execution method. For details of the techniques not described in detail in this embodiment, reference may be made to the face synthesis image detection method provided in any embodiment of the present application.

EXAMPLE five

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 7 is a block diagram of an electronic device according to the face synthesis image detection method in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.

The memory 702 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor, so that the at least one processor executes the face synthesis image detection method provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the face synthesis map detection method provided by the present application.

The memory 702 serves as a non-transitory computer-readable storage medium, and may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the face synthesis map detection method in the embodiment of the present application (for example, the keypoint extraction module 501, the region block generation module 502, the region block merging module 503, and the result calculation module 504 shown in fig. 5). The processor 701 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 702, so as to implement the face synthesis image detection method in the above-described method embodiment.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the face synthesis image detection method, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include a memory remotely located from the processor 701, and these remote memories may be connected to the electronic device of the face composition detection method through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the face synthesis image detection method may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.

The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the face composition detection method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

According to the technical scheme of the embodiment of the application, firstly, a face image to be detected is input into a face key point detection model which is trained in advance, and face key points of the face image to be detected are obtained through the face key point detection model; generating a corresponding region block based on each face key point of the face image to be detected; then, combining all the region blocks corresponding to the face key points into a feature map corresponding to the face image to be detected; and inputting the characteristic graph corresponding to the face image to be detected into a pre-trained convolutional neural network, and obtaining the detection result of the face image to be detected through the convolutional neural network. That is to say, the method and the device for detecting the face image can generate a region block corresponding to each face key point of the face image to be detected based on each face key point, and obtain the feature map corresponding to the face image to be detected based on the region block corresponding to each face key point, so that the detection result of the face image to be detected can be obtained based on the feature map corresponding to the face image to be detected through the convolutional neural network. In the existing human face synthetic image detection method, a deep learning method of a neural network is mainly used, the method is difficult to learn the characteristic that the synthetic image and the original image have discriminability, is easy to overfit on a small-range training sample, and has limited generalization on an unknown synthetic sample; and only a single convolutional neural network is used, so that the recognition effect is not ideal when the human face posture in a real scene is too large or the illumination difference is large, and the robustness is poor. Because the technical means that each face key point based on the face image to be detected generates a corresponding region block and the region block corresponding to each face key point is used for obtaining the feature map corresponding to the face image to be detected, the method overcomes the defects that in the prior art, a face composite map detection model is difficult to learn the distinguishing features of a composite map and an original map, is easy to overfit on a small-range training sample and has limited generalization on an unknown composite sample; the technical scheme provided by the application can relieve the overfitting condition of face synthetic image detection, improve the generalization and accuracy of face synthetic image detection in a complex environment, and simultaneously avoid the interference of background noise, thereby improving the detection effect on unknown synthetic samples; moreover, the technical scheme of the embodiment of the application is simple and convenient to implement, convenient to popularize and wide in application range.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for detecting a composite image of a human face, the method comprising:

2. The method according to claim 1, wherein the generating a corresponding region block based on each face key point of the face image to be detected comprises:

extracting the face key points of the face image to be detected and the image characteristics of each face key point through the face key point detection model;

obtaining a region block corresponding to each face key point of the face image to be detected according to the face key points of the face image to be detected and the image characteristics of each face key point; wherein the size of the region block is 36 × 36 × 3.

3. The method according to claim 1, wherein the merging the region blocks corresponding to all the face key points into the feature map corresponding to the face image to be detected comprises:

Combining the region blocks corresponding to each face key point on the red, green and blue channels to obtain a feature map of the face image to be detected; wherein the size of the characteristic diagram is 36 × 36 × 216.

4. The method according to claim 1, wherein before inputting the face image to be detected to the face key point detection model trained in advance, the method further comprises:

inputting a face image to be detected into a pre-trained face detection model, and identifying the face image to be detected through the face detection model to obtain a face detection frame of the face image to be detected;

expanding the face detection frame of the face image to be detected by a preset multiple to obtain an expanded face detection frame; intercepting the face in the face image to be detected in the enlarged face detection frame to obtain an intercepted face image; adjusting the intercepted face image to a preset size to obtain an adjusted face image; and executing the operation of inputting the face image to be detected into the face key point detection model trained in advance.

5. The method according to claim 4, wherein before the performing the operation of inputting the face image to be detected into the pre-trained face key point detection model, the method further comprises:

Calculating the pixel value of each pixel point based on the adjusted face image;

normalizing the pixel value of each pixel point according to a preset mode to obtain a normalized face image; enabling the pixel value of each pixel point in the normalized human face image to be within a preset range; and executing the operation of inputting the face image to be detected into the face key point detection model trained in advance.

6. The method according to claim 1, wherein before inputting the face image to be detected to the face key point detection model trained in advance, the method further comprises:

taking a first face image obtained in advance as a current face image;

if the face key point detection model does not meet the convergence condition corresponding to the face key point detection model, inputting the current face image into the face key point detection model, and training the face key point detection model by using the current face image; and taking the next face image of the current face image as the current face image, and repeatedly executing the operations until the face key point detection model meets the corresponding convergence condition.

7. A face synthesis image detection apparatus, comprising: the device comprises a key point extraction module, a region block generation module, a region block merging module and a result calculation module; wherein,

8. The apparatus of claim 7, wherein:

the region block generation module is specifically used for extracting the face key points of the face image to be detected and the image characteristics of each face key point through the face key point detection model; obtaining a region block corresponding to each face key point of the face image to be detected according to the face key points of the face image to be detected and the image characteristics of each face key point; wherein the size of the region block is 36 × 36 × 3.

9. The apparatus of claim 7, wherein:

the region block merging module is specifically used for merging the region blocks corresponding to each face key point on three channels of red, green and blue to obtain a feature map of the face image to be detected; wherein the size of the characteristic diagram is 36 × 36 × 216.

10. The apparatus of claim 7, further comprising: the face detection module is used for inputting a face image to be detected into a pre-trained face detection model, and identifying the face image to be detected through the face detection model to obtain a face detection frame of the face image to be detected; expanding the face detection frame of the face image to be detected by a preset multiple to obtain an expanded face detection frame; intercepting the face in the face image to be detected in the enlarged face detection frame to obtain an intercepted face image; adjusting the intercepted face image to a preset size to obtain an adjusted face image; and executing the operation of inputting the face image to be detected into the face key point detection model trained in advance.

11. The apparatus of claim 10, wherein the face detection module is further configured to calculate a pixel value of each pixel point based on the adjusted face image; normalizing the pixel value of each pixel point according to a preset mode to obtain a normalized face image; enabling the pixel value of each pixel point in the normalized human face image to be within a preset range; and executing the operation of inputting the face image to be detected into the face key point detection model trained in advance.

12. The apparatus of claim 7, further comprising: the training module is used for taking a first face image acquired in advance as a current face image; if the face key point detection model does not meet the convergence condition corresponding to the face key point detection model, inputting the current face image into the face key point detection model, and training the face key point detection model by using the current face image; and taking the next face image of the current face image as the current face image, and repeatedly executing the operations until the face key point detection model meets the corresponding convergence condition.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.