CN111652878B

CN111652878B - Image detection method, image detection device, computer equipment and storage medium

Info

Publication number: CN111652878B
Application number: CN202010550677.8A
Authority: CN
Inventors: 姚太平; 王鑫瑶; 张克越; 吴双; 孟嘉; 丁守鸿; 李季檩
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2022-09-23
Anticipated expiration: 2040-06-16
Also published as: CN111652878A

Abstract

The embodiment of the application discloses an image detection method, an image detection device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps of obtaining texture features of a first face image of a target face image and edge features of a second face image, conducting fusion processing on the edge features and the texture features to obtain face features, conducting classification processing on the face features to obtain scores, and determining the target face image as a real face image under the condition that the scores belong to a preset numerical range. Because the textural features can distinguish whether the face is an edited face or not and the edge features can distinguish whether the face is a replaced face or not, the fused facial features can distinguish whether the face image is a real face image or not, and the accuracy of image detection can be improved. Moreover, each face image comprises texture features and edge features, so that the image detection method can be suitable for detecting any face image and has strong generalization.

Description

Image detection method, image detection device, computer equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to an image detection method, an image detection device, computer equipment and a storage medium.

Background

With the rapid development of computer technology, the application of image editing technology and video editing technology is more and more extensive, but this also poses certain threats to network security or social security. For example, a face image is forged by replacing a face in the face image, a video is forged by replacing a face in each frame of the face image in the video, and the like, so that face recognition is completed, and impact is caused on the safety of a face recognition system. Therefore, detection of an image is required to determine whether a face image is a real face image or a fake face image.

In the related art, an image detection model is trained according to a sample face image, and the image detection model learns image characteristics that can distinguish a real face image from a forged face image in the sample face image, so as to detect the face image by using the image detection model.

However, the image features learned by the image detection model are image features that can distinguish a real face image from a forged face image in a sample face image, and other face images may not have such image features, so that the image detection model cannot accurately detect a face image without such image features, and therefore the accuracy of the image detection method is low, and the generalization is poor.

Disclosure of Invention

The embodiment of the application provides an image detection method, an image detection device, computer equipment and a storage medium, which can improve the accuracy rate and the generalization of image detection. The technical scheme is as follows:

in one aspect, an image detection method is provided, and the method includes:

acquiring a first face image and a second face image corresponding to a target face image, wherein the first face image comprises a face area which does not contain a face edge in the target face image, and the second face image comprises a face area which contains a face edge in the target face image;

coding the first face image to obtain texture features corresponding to the first face image;

coding the second face image to obtain edge characteristics corresponding to the second face image;

performing fusion processing on the edge features and the texture features to obtain face features corresponding to the target face image;

classifying the facial features to obtain a score corresponding to the target face image;

and determining the target face image as a real face image under the condition that the score value belongs to a preset numerical range.

In another aspect, there is provided an image detection apparatus, the apparatus including:

the image acquisition module is used for acquiring a first face image and a second face image corresponding to a target face image, wherein the first face image comprises a face area which does not contain a face edge in the target face image, and the second face image comprises a face area which contains a face edge in the target face image;

the first encoding processing module is used for encoding the first face image to obtain texture features corresponding to the first face image;

the second coding processing module is used for coding the second face image to obtain the edge characteristics corresponding to the second face image;

the fusion processing module is used for carrying out fusion processing on the edge features and the texture features to obtain face features corresponding to the target face image;

the classification processing module is used for classifying the facial features to obtain a score corresponding to the target face image;

and the determining module is used for determining the target face image as a real face image under the condition that the score belongs to a preset numerical range.

Optionally, the image acquisition module includes:

the first cutting unit is used for carrying out face detection on the target face image to obtain a first face area which does not contain a face edge in the target face image, and cutting the first face area in the target face image to be used as the first face image;

and the second cutting unit is used for acquiring a second face area containing face edges in the target face image according to the first face area, and cutting the second face area in the target face image to be used as the second face image.

Optionally, the image acquisition module includes:

the third cutting unit is used for carrying out face detection on the target face image to obtain a third face area containing face edges in the target face image, and cutting the third face area in the target face image to be used as the second face image;

and the fourth cutting unit is used for acquiring a fourth face area which does not contain the face edge in the target face image according to the third face area, and cutting the fourth face area in the target face image to be used as the first face image.

Optionally, the first encoding processing module is configured to invoke a first encoding network in an image detection model, and perform encoding processing on the first face image to obtain a texture feature corresponding to the first face image;

the second coding processing module is used for calling a second coding network in the image detection model and coding the second face image to obtain an edge feature corresponding to the second face image;

the fusion processing module is used for calling a feature fusion network in the image detection model, and carrying out fusion processing on the edge features and the texture features to obtain face features corresponding to the target face image;

and the classification processing module is used for calling a classification network in the image detection model, classifying the facial features and obtaining a score corresponding to the target face image.

Optionally, the determining module includes:

and the first determining unit is used for determining the target face image as a real face image under the condition that the score is larger than a first preset threshold value.

Optionally, the determining module includes:

and the second determining unit is used for determining the target face image as a real face image under the condition that the score is smaller than a second preset threshold value.

Optionally, the apparatus further comprises:

the image acquisition module is further used for acquiring a first sample human face image and a second sample human face image corresponding to the sample human face image, wherein the first sample human face image comprises a face area which does not contain human face edges in the sample human face image, and the second sample human face image comprises a face area which contains human face edges in the sample human face image;

the first encoding processing module is further configured to invoke a first encoding network in the image detection model, and perform encoding processing on the first sample human face image to obtain a sample texture feature corresponding to the first sample human face image;

the second coding processing module is further configured to call a second coding network in the image detection model, and perform coding processing on the second sample face image to obtain a first sample edge feature corresponding to the second sample face image;

the fusion processing module is further configured to call a feature fusion network in the image detection model, and perform fusion processing on the sample texture features and the first sample edge features to obtain sample facial features corresponding to the sample facial image;

the classification processing module is further used for calling a classification network in the image detection model to classify the sample facial features to obtain a prediction score corresponding to the sample facial image;

and the training module is used for training the image detection model according to the prediction score and the sample score corresponding to the sample face image.

Optionally, the training module comprises:

the first decoding unit is used for calling a first decoding network in the image detection model and decoding the sample texture features to obtain a predicted texture feature image corresponding to the sample texture features;

and the first training unit is used for training the image detection model according to the prediction score and the sample score as well as the prediction texture characteristic image and the sample texture characteristic image corresponding to the sample face image.

Optionally, the training module comprises:

a second decoding unit, configured to invoke a second decoding network in the image detection model, and perform decoding processing on the first sample edge feature to obtain a predicted edge feature image corresponding to the first sample edge feature;

and the second training unit is used for training the image detection model according to the prediction score and the sample score as well as the prediction edge characteristic image and the sample edge characteristic image corresponding to the sample face image.

Optionally, the apparatus further comprises:

the first edge acquisition module is used for labeling an original region and a forged region in the second sample face image to obtain a labeled image, and performing feature extraction processing on the labeled image to obtain a sample edge feature image corresponding to the forged face image, wherein the sample face image is a forged face image; alternatively, the first and second electrodes may be,

and the second edge acquisition module is used for acquiring a preset edge characteristic image as a sample edge characteristic image corresponding to the real face image, wherein the sample face image is a real face image.

Optionally, the fusion processing module is configured to:

calling a second decoding network in the image detection model, and decoding the first sample edge feature to obtain a predicted edge feature image corresponding to the first sample edge feature;

calling an attention network in the image detection model, and adjusting the first sample edge feature according to the predicted edge feature image to obtain a second sample edge feature;

and calling the feature fusion network, and carrying out fusion processing on the sample texture features and the second sample edge features to obtain the sample facial features.

Optionally, the predicted edge feature image includes a plurality of pixel values, the first sample edge feature is a multi-dimensional feature matrix, the feature matrix of each dimension includes a plurality of feature values, and the plurality of pixel values are in one-to-one correspondence with the plurality of feature values; the fusion processing module is further configured to:

and calling the attention network, and respectively adjusting corresponding characteristic values in the characteristic matrix of each dimension in the first sample edge characteristic according to a plurality of pixel values of the predicted edge characteristic image to obtain the second sample edge characteristic.

In another aspect, a computer device is provided that includes a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to implement the image detection method.

In yet another aspect, a computer-readable storage medium having at least one instruction stored therein is provided, the at least one instruction being loaded and executed by a processor to implement the image detection method.

In yet another aspect, a computer program product is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image detection method.

The method, the device, the computer equipment and the storage medium provided by the embodiment of the application extract the texture features and the edge features of the target face image, the texture features can distinguish whether the face in the target face image is an edited face, and the edge features can distinguish whether the face in the target face image is a replaced face, so that the facial features for distinguishing whether the target face image is a real face image or a forged face image can be obtained by fusing the texture features and the edge features. The facial features are classified, so that a score for representing whether the target face image is a real face image or a forged face image can be obtained, and whether the target face image is the real face image or not is determined according to whether the score belongs to a preset numerical range or not. In the process of image detection, the texture features and the edge features of the face image are considered at the same time, so that the accuracy of detecting the face image can be improved. Moreover, each face image comprises texture features and edge features, so that the image detection method provided by the application can be suitable for detecting any face image and has strong generalization.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of an image detection model provided in an embodiment of the present application.

Fig. 2 is a schematic diagram of another image detection model provided in an embodiment of the present application.

Fig. 3 is a flowchart of an image detection method according to an embodiment of the present application.

Fig. 4 is a flowchart of an image detection model training method according to an embodiment of the present application.

Fig. 5 is a schematic diagram of a training image detection model according to an embodiment of the present application.

Fig. 6 is a flowchart of another image detection method provided in the embodiment of the present application.

Fig. 7 is a schematic diagram of a face image according to an embodiment of the present application.

Fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present application.

Fig. 9 is a schematic structural diagram of another image detection apparatus according to an embodiment of the present application.

Fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various concepts, which are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first facial image may be referred to as a second facial image, and similarly, a second facial image may be referred to as a first facial image, without departing from the scope of the present application.

The plurality of values means two or more, and for example, the plurality of pixel values may be any integer number of pixel values equal to or larger than two, such as two pixel values and three pixel values. Each refers to each of at least one, for example, each dimension refers to each of a plurality of dimensions, and if the plurality of dimensions is 3 dimensions, each dimension refers to each of the 3 dimensions.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. Artificial intelligence software techniques include natural language processing techniques and machine learning.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes technologies such as image processing, image recognition, image semantic understanding, image retrieval, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, virtual reality, augmented reality, synchronous positioning, map construction and the like, and also includes common biometric technologies such as face recognition, fingerprint recognition and the like.

The image detection method provided by the embodiment of the application will be described below based on an artificial intelligence technology and a computer vision technology.

The embodiment of the application provides an image detection method, wherein an execution main body is computer equipment, and whether a target face image is a real face image or not can be detected. In one possible implementation, the computer device is a terminal, and the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, or the like. In another possible implementation manner, the computer device is a server, which may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.

The method provided by the embodiment of the application can be applied to any scene for detecting whether the face image is a real face image.

For example, in a scene where whether a news video is a forged video is detected, a target face image in the news video to be detected is obtained, and a first face image and a second face image corresponding to the target face image are obtained. If the target face image is determined to be a forged face image according to the score, the news video is a forged video; and if the target face image is determined to be the real face image according to the score, the news video is the real video.

In addition, the image detection method provided by the embodiment of the application can be applied to a face recognition system or other scenes needing to identify whether the video is a real video.

In one possible implementation manner, as shown in fig. 1, the image detection model 11 provided by the embodiment of the present application may include a first coding network 1101, a second coding network 1102, a feature fusion network 1103, and a classification network 1104. The feature fusion network 1103 is connected with the first coding network 1101 and the second coding network 1102 respectively, the classification network 1104 is connected with the feature fusion network 1103, the first coding network 1101 is used for extracting texture features of the face image, the second coding network 1102 is used for extracting edge features of the face image, the feature fusion network 1103 is used for fusing the texture features and the edge features, and the classification network 1104 is used for obtaining scores corresponding to the face image according to the texture features and the edge features.

In another possible implementation manner, as shown in fig. 2, the image detection model 22 provided in the embodiment of the present application may include a first encoding network 2201, a second encoding network 2202, a first decoding network 2203, a second decoding network 2204, an attention network 2205, a feature fusion network 2206, and a classification network 2207.

The first encoding network 2201 is connected with the first decoding network 2203, the second encoding network 2202 is connected with the second decoding network 2204, the attention network 2205 is respectively connected with the second encoding network 2202 and the second decoding network 2204, the feature fusion network 2206 is respectively connected with the first encoding network 2201 and the attention network 2205, and the classification network 2207 is connected with the feature fusion network 2206.

The first coding network 2201 is used for extracting texture features of a face image, the second coding network 1102 is used for extracting edge features of the face image, the first decoding network 2203 is used for acquiring texture feature images according to the texture features, the second decoding network 2204 is used for acquiring edge feature images according to the edge features, the attention network 2205 is used for adjusting the edge features according to the edge feature images, the feature fusion network 2206 is used for fusing the texture features and the edge features, and the classification network 2207 is used for acquiring scores corresponding to the face image according to the texture features and the edge features. When the image detection model 22 is called, the first coding network 2201, the second coding network 2202, the feature fusion network 2206 and the classification network 2207 can be used for processing to obtain the score corresponding to the face image. In training the image detection model 22, in addition to the above-described 4 networks, the image detection model 22 may be jointly trained using a plurality of networks of the first decoding network 2203, the second decoding network 2204, and the attention network 2205.

Fig. 3 is a flowchart of an image detection method according to an embodiment of the present application. The execution subject of the embodiment of the present application is a computer device, and referring to fig. 3, the method includes:

301. and acquiring a first face image and a second face image corresponding to the target face image.

The computer device obtains a target face image to be detected, wherein the target face image can be a face image uploaded to the computer device by other devices, or a face image downloaded from other devices by the computer device, or a face image stored in the computer device by a user. Or, the target face image is a face image extracted from a face video by a computer device, or may also be a face image from another source, which is not limited in this embodiment of the application.

And the computer equipment cuts the target face image to obtain a first face image and a second face image corresponding to the target face image. The first face image comprises a face area which does not contain face edges in the target face image, and the second face image comprises a face area which contains face edges in the target face image. The face edge refers to an edge region of the face, that is, a contour region of the face.

302. And coding the first face image to obtain the texture characteristics corresponding to the first face image.

The real face image means that the face in the image is not edited or replaced, and the fake face image means that the face in the image is edited or replaced. The texture features refer to features of facial textures in the face image, and can represent roughness, directionality, granularity, randomness and the like of the face in the face image. The edge feature refers to the feature of the edge of the face in the face image. Because the face in the forged face image is edited or replaced, the texture features in the forged face image and the texture features in the real face image are different, and the edge features in the forged face image and the edge features in the real face image are also different, the embodiment of the application can detect whether the target face image is the real face image or the forged face image by using the texture features and the edge features corresponding to the target face image.

Since the first face image does not include the face edges, the first face image can be used for extracting accurate texture features. Then, when the computer device obtains the first face image corresponding to the target face image, the first face image is encoded to obtain the texture feature corresponding to the first face image, that is, the texture feature corresponding to the target face image.

303. And coding the second face image to obtain the edge characteristics corresponding to the second face image.

Because the second face image includes the face edge, the second face image can be used to extract the edge feature, and when the computer device acquires the second face image corresponding to the target face image, the second face image is encoded to obtain the edge feature corresponding to the second face image, that is, the edge feature corresponding to the target face image.

304. And carrying out fusion processing on the edge characteristics and the texture characteristics to obtain the face characteristics corresponding to the target face image.

When the computer equipment acquires the textural features and the edge features of the target face image, the edge features and the textural features are fused, and the fused features are the face features corresponding to the target face image and can be used for describing the face area in the target face image.

305. And classifying the facial features to obtain a score corresponding to the target face image.

And when the computer equipment acquires the fused facial features, classifying the facial features to obtain a score corresponding to the target face image. The score value may represent the possibility that the target face image is a real face image, or represent the possibility that the target face image is a fake face image. Therefore, whether the target face image is a real face image or a fake face image can be determined by the score.

306. And under the condition that the score value belongs to a preset numerical range, determining the target face image as a real face image.

When the computer equipment acquires the score corresponding to the target face image, whether the score belongs to a preset numerical range is judged. If the score value belongs to a preset numerical range, determining the target face image as a real face image; and if the score does not belong to the preset numerical range, determining the target face image as a forged face image.

The method provided by the embodiment of the application extracts the texture features and the edge features of the target face image, the texture features can distinguish whether the face in the target face image is an edited face or not, and the edge features can distinguish whether the face in the target face image is a replaced face or not, so that the face features used for distinguishing whether the target face image is a real face image or a forged face image can be obtained by fusing the texture features and the edge features. The facial features are classified, so that a score for representing whether the target face image is a real face image or a forged face image can be obtained, and whether the target face image is the real face image or not is determined according to whether the score belongs to a preset numerical range or not. In the process of image detection, the texture features and the edge features of the face image are considered at the same time, so that the accuracy of the detection of the face image can be improved. Moreover, each face image comprises texture features and edge features, so that the image detection method provided by the application can be suitable for detecting any face image and has strong generalization.

The image detection method provided by the embodiment of the application can call the image detection model for processing, and the image detection model needs to be trained before the image detection model is called. The following embodiment will describe in detail the process of training the image detection model.

Fig. 4 is a flowchart of an image detection model training method according to an embodiment of the present application. The execution subject of the embodiment of the application is computer equipment, and referring to fig. 4, the method comprises the following steps:

401. the computer equipment acquires a first sample face image and a second sample face image corresponding to the sample face image.

The computer equipment acquires a sample face image, cuts the sample face image and respectively obtains a first sample face image and a second sample face image. The first sample face image comprises a face area which does not contain face edges in the sample face image, and the second sample face image comprises a face area which contains face edges in the sample face image.

The sample face image can be a real face image or a forged face image. The sample face image may be a face image pre-stored in the computer device, or a face image downloaded from another device by the computer device, or an image uploaded to the computer device by another device, which is not limited in this embodiment of the present application.

The process of processing the sample face image by the computer device to obtain the first sample face image and the second sample face image is similar to the process of processing the target face image in the following step 601 to obtain the first face image and the second face image, and will not be described here for the moment.

402. And the computer equipment calls a first coding network in the image detection model, and codes the first same face image to obtain a sample texture characteristic corresponding to the first same face image.

Referring to fig. 2, the image detection model 22 in the embodiment of the present application may include a first encoding network 2201, a second encoding network 2202, a first decoding network 2203, a second decoding network 2204, an attention network 2205, a feature fusion network 2206, and a classification network 2207. The first encoding network 2201 is connected with the first decoding network 2203, the second encoding network 2202 is connected with the second decoding network 2204, the attention network 2205 is respectively connected with the second encoding network 2202 and the second decoding network 2204, the feature fusion network 2206 is respectively connected with the first encoding network 2201 and the attention network 2205, and the classification network 2207 is connected with the feature fusion network 2206. The first coding network 2201 is used for extracting texture features of a face image, the second coding network 1102 is used for extracting edge features of the face image, the first decoding network 2203 is used for acquiring texture feature images according to the texture features, the second decoding network 2204 is used for acquiring edge feature images according to the edge features, the attention network 2205 is used for adjusting the edge features according to the edge feature images, the feature fusion network 2206 is used for fusing the texture features and the edge features, and the classification network 2207 is used for acquiring scores corresponding to the face image according to the texture features and the edge features. When the image detection model 22 is called, the first coding network 2201, the second coding network 2202, the feature fusion network 2206 and the classification network 2207 can be used for processing to obtain the score corresponding to the face image. In training the image detection model 22, in addition to the above-described 4 networks, the image detection model 22 may be jointly trained using a plurality of networks of the first decoding network 2203, the second decoding network 2204, and the attention network 2205.

The first coding network 2201 and the second coding network 2202 may have different structures, or the first coding network 2201 and the second coding network 2202 may have the same structures but different parameters. The first decoding network 2203 and the second decoding network 2204 may have different structures, or the first decoding network 2203 and the second decoding network 2204 may have the same structures but different parameters.

Because the first same face image does not comprise the face edge, the first same face image can be used for extracting accurate textural features. And when the computer equipment acquires a first sample human face image corresponding to the sample human face image, calling a first coding network in the image detection model, and coding the first sample human face image to obtain a sample texture feature corresponding to the first sample human face image, namely the sample texture feature corresponding to the sample human face image. The sample texture feature may be a multi-dimensional feature matrix or other forms of features, which is not limited in this embodiment of the present application.

403. And calling a first decoding network in the image detection model by the computer equipment, and decoding the sample texture features to obtain a predicted texture feature image corresponding to the sample texture features.

When the computer equipment acquires the sample texture features, calling a first decoding network in the image detection model, and decoding the sample texture features to obtain a predicted texture feature image corresponding to the sample texture features, wherein the predicted texture feature image can be used for representing the features of the face textures of the sample face image.

Optionally, the sample texture features are a multi-dimensional feature matrix with a size a × b × c, a represents the dimensional degree, b × c is the size of the feature matrix, and then the size of the predicted texture feature image obtained according to the sample texture features is b × c.

404. And the computer equipment calls a second coding network in the image detection model to code the second sample face image to obtain a first sample edge characteristic corresponding to the second sample face image.

Because the second sample face image includes the face edge, the second sample face image can be used for extracting edge features, and when the computer device acquires the second sample face image corresponding to the sample face image, a second coding network in the image detection model is called to perform coding processing on the second sample face image, so as to obtain the first sample edge features corresponding to the second sample face image, that is, the first sample edge features corresponding to the sample face image. The first sample edge feature may be a multi-dimensional feature matrix or a feature in another form, which is not limited in this embodiment of the present application.

405. And calling a second decoding network in the image detection model by the computer equipment, and decoding the first sample edge feature to obtain a predicted edge feature image corresponding to the first sample edge feature.

When the computer equipment acquires the first sample edge feature, calling a second decoding network in the image detection model, and decoding the first sample edge feature to obtain a predicted edge feature image corresponding to the first sample edge feature, wherein the predicted edge feature image can be used for representing the feature of the face edge of the sample face image.

Optionally, the first sample edge feature is a multi-dimensional feature matrix with a size a × b × c, where a represents the number of dimensions, and b × c is the size of the feature matrix, and then the size of the predicted edge feature image obtained according to the first sample edge feature is b × c.

It should be noted that, in the embodiment of the present application, the steps 402-403 are performed first, and then the steps 404-405 are performed as an example for explanation. In another embodiment, the

steps

404 and 405 may be performed first, and then the

steps

402 and 403 may be performed. Alternatively, steps 402 and 404 and 405 are executed in parallel.

406. And calling a feature fusion network in the image detection model by the computer equipment, and fusing the sample texture features and the first sample edge features to obtain sample facial features corresponding to the sample facial images.

When the computer equipment acquires the sample texture features and the first sample edge features of the sample face image, calling a feature fusion network in an image detection model, carrying out fusion processing on the sample texture features and the first sample edge features, wherein the fused features are the sample face features corresponding to the sample face image, and the sample face features are used for describing the face region in the sample face image.

In one possible implementation, the sample texture feature and the first sample edge feature are both n-dimensional feature matrices, and n is a positive integer. And calling a feature fusion network by the computer equipment, and directly adding the texture features of the sample and the edge features of the first sample to obtain a 2 n-dimensional feature matrix, wherein the 2 n-dimensional feature matrix is the face features of the fused sample.

In another possible implementation manner, the sample texture feature and the first sample edge feature are both n-dimensional feature matrices, and a plurality of feature values included in the feature matrix of each dimension in the sample texture feature are in one-to-one correspondence with a plurality of feature values included in the feature matrix of each dimension in the first sample edge feature, and the computer device invokes the feature fusion network to add the plurality of feature values included in the feature matrix of each dimension in the sample texture feature and the corresponding feature values in the feature matrix of the corresponding dimension in the first sample edge feature, so as to obtain the fused sample facial feature.

In another possible implementation manner, the computer device invokes a second decoding network in the image detection model to decode the first sample edge feature, so as to obtain a predicted edge feature image corresponding to the first sample edge feature. And the computer equipment calls an attention network in the image detection model, and adjusts the first sample edge feature according to the predicted edge feature image to obtain a second sample edge feature. And calling the feature fusion network by the computer equipment, and fusing the sample texture features and the second sample edge features to obtain the sample facial features.

The attention network is used for deeply learning the edge characteristics of the face image, so that the attention network adopts an attention mechanism, the first sample edge characteristics are adjusted according to the predicted edge characteristic image, the second sample edge characteristics capable of describing the edge of the face image of the sample more accurately are obtained, and the accuracy of the image detection model is improved.

Optionally, the predicted edge feature image includes a plurality of pixel values, the first sample edge feature is a multi-dimensional feature matrix, the feature matrix of each dimension includes a plurality of feature values, and the plurality of pixel values are in one-to-one correspondence with the plurality of feature values. The computer device calls an attention network, and according to a plurality of pixel values of the predicted edge feature image, the corresponding feature values in the feature matrix of each dimension in the first sample edge feature are respectively adjusted to obtain a second sample edge feature.

The plurality of pixel positions in the predicted edge feature image correspond to the plurality of pixel positions in the sample face image one to one respectively, and each pixel position in the sample face image corresponds to the plurality of feature values in the feature matrix of each dimension in the first sample edge feature one to one respectively, so that the pixel values of the plurality of pixel positions in the predicted edge feature image correspond to the plurality of feature values of each dimension in the first sample edge feature one to one respectively. The computer equipment obtains pixel values corresponding to a plurality of pixel positions in the predicted characteristic image, and the pixel values are respectively multiplied by a plurality of characteristic values in one-to-one correspondence in a characteristic matrix of each dimension in the first sample edge characteristic in sequence, namely, the characteristic matrix of each dimension in the first sample edge characteristic is respectively adjusted according to the pixel values in the predicted characteristic image, so that the adjusted second sample edge characteristic is obtained.

For example, the plurality of pixel values of the predicted edge feature image are values belonging to the interval [0, 1], and the plurality of feature values corresponding to each dimension in the first sample edge feature are also values belonging to the interval [0, 1 ]. The human face edge is represented by a larger numerical value, and the closer the distance between the pixel position in the sample human face image and the human face edge is, the larger the pixel value of the pixel position in the corresponding predicted edge feature image is, and the larger the feature value corresponding to the pixel position of each dimension in the first sample edge feature is. The farther the distance between the pixel position in the sample face image and the face edge is, the smaller the pixel value of the pixel position in the corresponding predicted edge feature image is, and the smaller the feature value corresponding to the pixel position of each dimension in the first sample edge feature is. Therefore, by multiplying the plurality of pixel values of the predicted edge feature image by the plurality of feature values corresponding to each dimension in the first sample edge feature in sequence, the feature value corresponding to the pixel position farther away from the face edge in the adjusted second sample edge feature is smaller, so that the feature value corresponding to the pixel position closer to the face edge in the second sample edge feature can be more emphasized, that is, the second sample edge feature can more accurately describe the face edge of the sample face image.

It should be noted that, if the predicted edge feature image corresponding to the first sample edge feature is obtained in step 405, the step of obtaining the predicted edge feature image by decoding the first sample edge feature without repeatedly executing the step of calling the second decoding network in step 406 may be performed, and the predicted edge feature image obtained in step 405 may be directly obtained.

407. And calling a classification network in the image detection model by the computer equipment, and classifying the facial features of the sample to obtain a prediction score corresponding to the facial image of the sample.

When the computer equipment acquires the sample facial features corresponding to the sample facial images, a classification network in the image detection model is called to classify the sample facial features to obtain the prediction scores corresponding to the sample facial images.

The prediction score can represent the possibility that the sample face image is a real face image or the possibility that the sample face image is a fake face image, so that whether the sample face image is the real face image or the fake face image can be determined according to the prediction score. Alternatively, the predicted score may be a numerical value from 0 to 1 and including 0 and 1, or may also be a score in another range, which is not limited in this embodiment of the present application.

It should be noted that, in the embodiment of the present application, a process of processing a sample facial feature by invoking a classification network after fusing a sample texture feature and a first sample edge feature into the sample facial feature is described. In another embodiment, the sample texture feature and the first sample edge feature may not be fused, and the classification network may be directly invoked to process the sample texture feature and the first sample edge feature. Namely, the

steps

406 and 407 are replaced by: and calling a classification network in the image detection model, and classifying the sample texture features and the first sample edge features to obtain a prediction score corresponding to the sample face image.

408. And the computer equipment predicts the texture characteristic image and the sample texture characteristic image corresponding to the sample face image and predicts the edge characteristic image and the sample edge characteristic image corresponding to the sample face image according to the prediction score and the sample score corresponding to the sample face image, and trains an image detection model.

After the computer equipment acquires the prediction score, the prediction textural feature image and the prediction edge feature image corresponding to the sample face image, the sample score, the sample textural feature image and the sample edge feature image corresponding to the sample face image are acquired. The prediction score, the prediction texture feature image and the prediction edge feature image are prediction results obtained by processing a sample face image by an image detection model. The sample score, the sample texture feature image and the sample edge feature image are real score, texture feature image and edge feature image corresponding to the sample face image. Therefore, the computer device can adjust parameters of the first coding network, the second coding network and the classification network in the image detection model according to the difference between the prediction score and the sample score, the difference between the prediction texture feature image and the sample texture feature image, and the difference between the prediction edge feature image and the sample edge feature image, so as to train the image detection model, and make the difference between the prediction result output by the image detection model and the real result smaller and smaller.

The sample score is used for indicating whether the sample face image is a real face image or a fake face image. The sample score may be set by default by the computer device. The computer device may set the sample score to represent a likelihood that the sample face image is a real face image, and the sample score corresponding to the real face image in the sample face image is greater than the sample score corresponding to the counterfeit face image. For example, a sample score of 1 is set to indicate that the sample face image is a real face image, and a sample score of 0 is set to indicate that the sample face image is a fake face image. Alternatively, the sample score is set to indicate the likelihood that the sample face image is a fake face image. The sample score corresponding to the real face image in the sample face image is smaller than the sample score corresponding to the forged face image. For example, a sample score of 0 is set to indicate that the sample face image is a real face image, and a sample score of 1 is set to indicate that the sample face image is a fake face image.

Wherein, the sample texture characteristic image is used for representing the face texture of the sample face image. Optionally, the sample texture image may be an LBP (Local Binary Pattern) map of the first sample face image corresponding to the sample face image. Alternatively, the sample texture feature image may also be a depth map, a HoG map (Histogram of Oriented gradients), or the like, or may also be an image in other forms, which is not limited in this embodiment of the present application. Optionally, when the sample face image is a real face image, the computer device uses a preset texture feature image as the sample texture feature image corresponding to the sample face image, for example, a pixel value corresponding to each pixel position in the preset texture feature image is 0, that is, the preset texture feature image is a black image. Alternatively, when the sample face image pair is a forged face image, the computer device may use an LBP network to obtain an LBP map of the first sample face image corresponding to the sample face image.

The sample edge feature image is used for representing the face edge of the sample face image.

In a possible implementation manner, if the sample face image is a forged face image, the computer device performs labeling processing on the original region and the forged region in the second sample face image to obtain a labeled image. And carrying out feature extraction processing on the labeled image to obtain a sample edge feature image corresponding to the forged face image.

When the sample face image is a forged face image, the computer equipment acquires a second sample face image corresponding to the to-be-sampled face image, determines an original region and a forged region in the second sample face image, labels the original region and the forged region to obtain an labeled image, and performs feature extraction on the labeled image to obtain a sample edge feature image corresponding to the second sample face image, namely a sample edge feature image corresponding to the sample face image. The original region refers to a region which is not edited or replaced in the second sample face image, and the forged region refers to a region which is edited or replaced in the second sample face image.

Optionally, after the computer device determines the original region and the forged region, the computer device sets a pixel value corresponding to a pixel position of the original region to 1, and sets a pixel value corresponding to a pixel position of the forged region to 0, so as to obtain the annotation image. Then, the labeled image is processed through gaussian blurring, so that the corresponding pixel value of a pixel position in the labeled image, which is different from the pixel value of an adjacent pixel position, becomes a value between 0 and 1, thereby obtaining a processed first labeled image, where the size of a blurring kernel used in gaussian blurring may be 23 × 23, or a blurring kernel of another size, which is not limited in this embodiment of the present application. And the computer equipment executes the operation of '1-pixel value' on the pixel value corresponding to each pixel position in the first labeled image to obtain a second labeled image after being turned over, and multiplies the pixel values corresponding to the same pixel position of the first labeled image and the second labeled image to obtain a sample edge characteristic image corresponding to the second sample face image.

In another possible implementation manner, if the sample face image is a real face image, the computer device obtains a preset edge feature image as a sample edge feature image corresponding to the real face image. Optionally, the pixel value corresponding to each pixel position in the preset edge feature image is 0, that is, the preset edge feature image is a black image.

In another possible implementation manner, after the computer device acquires the predicted texture feature image and the sample texture feature image, the cross entropy between the predicted texture feature image and the sample texture feature image is acquired as the first loss value between the predicted texture feature image and the sample texture feature image. After the computer device acquires the predicted edge feature image and the sample edge feature image, acquiring the cross entropy between the predicted edge feature image and the sample edge feature image as a second loss value between the predicted edge feature image and the sample edge feature image. After the computer device acquires the prediction score and the sample score, the cross entropy between the prediction score and the sample score is acquired as a third loss value between the prediction score and the sample score. And training the image detection model by the computer equipment according to the first loss value, the second loss value and the third loss value, so that the result output by the image detection model is more accurate. Optionally, the computer device adds the first loss value, the second loss value, and the third loss value to obtain a total loss value, and trains the image detection model according to the total loss value.

It should be noted that, in the embodiment of the present application, an image detection model is only described by taking an example of training an image detection model according to the prediction score and the sample score, the predicted texture feature image and the sample texture feature image, and the predicted edge feature image and the sample edge feature image. In another embodiment, the step 403 may not be executed, and the computer device trains the image detection model according to the prediction score and the sample score corresponding to the sample face image, and the prediction edge feature image and the sample edge feature image corresponding to the sample face image.

Or, the step 405 may not be executed, and the computer device trains the image detection model only according to the prediction score and the sample score corresponding to the sample face image, and the prediction texture feature image and the sample texture feature image corresponding to the sample face image.

Alternatively, the

steps

403 and 405 may not be executed, and the computer device trains the image detection model according to the prediction score and the sample score corresponding to the sample face image.

Fig. 5 is a schematic diagram of a training image detection model provided in an embodiment of the present application, and referring to fig. 5, the image detection model includes a first encoding network 522, a second encoding network 521, a first decoding network 525, a second decoding network 523, an attention network 524, a feature fusion network 526, and a classification network 527. The computer device obtains a sample face image 511, cuts the sample face image 511 to obtain a second sample face image 512, and cuts the second sample face image 512 to obtain a first sample face image 513.

The computer device inputs the first sample face image 513 into the first encoding network 522 to obtain sample texture features 517, and inputs the sample texture features 517 into the first decoding network 525 to obtain a predicted texture feature image 518. The predicted texture feature image 518 and the sample texture feature image 5112 are processed to obtain a first loss value 5113.

The computer device inputs the second sample face image 512 into the second coding network 521 to obtain the first sample edge feature 514, and inputs the first sample edge feature 514 into the second decoding network 523 to obtain the predicted edge feature image 515. The predicted edge feature image 515 and the sample edge feature image 5111 are processed to obtain a second loss value 5114.

The computer device inputs the first sample edge feature 514 and the predicted edge feature image 515 into the attention network 524 resulting in a second sample edge feature 516. The sample texture features 517 and the second sample edge features 516 are input into a feature fusion network 526, resulting in sample facial features 519. Sample facial features 519 are input into the classification network 527 resulting in a prediction score 5110. The computer device processes predicted score 5110 and sample score 5115 to obtain third loss value 5116.

The computer device can train the image detection model based on the first loss value 5113, the second loss value 5114, and the third loss value 5116.

According to the method provided by the embodiment of the application, the image detection model is called to extract the sample texture feature and the first sample edge feature of the sample face image, the sample texture feature can distinguish whether the face in the sample face image is an edited face or not, and the first sample edge feature can distinguish whether the face in the sample face image is a replaced face or not, so that the sample face feature used for distinguishing whether the sample face image is a real face image or a forged face image can be obtained by fusing the sample texture feature and the first sample edge feature. Therefore, the sample facial features are classified, a prediction score for representing whether the sample facial image is a real facial image or a fake facial image can be obtained, and the image detection model is trained according to the prediction score and the sample score, so that the trained image detection model can detect whether the facial image is the real facial image or the fake facial image. In the process of training the model, the texture features and the edge features of the face image are considered at the same time, so that the accuracy of the image detection model can be improved, and the accuracy of image detection by using the image detection model is higher. Moreover, each face image comprises texture features and edge features, so that the image detection model trained by the embodiment of the application can be suitable for detecting any face image, and has strong generalization.

And the first sample human face image and the second sample human face image corresponding to the sample human face image are obtained, and the first sample human face image does not comprise a human face edge, so that the sample texture features obtained by coding the first sample human face image are not influenced by the human face edge, and the accuracy of the obtained sample texture features is improved. Because the second sample face image comprises the face edge, the second sample face image is coded to only obtain the first sample edge characteristic without considering the sample texture characteristic, and the accuracy of the obtained first sample edge characteristic is improved. Therefore, the accuracy of the trained image detection model for detecting the image can be improved.

And moreover, the first sample edge feature and the sample texture feature are fused to obtain a sample face feature corresponding to the sample face image, and when the sample face image is detected according to the sample face feature, the first sample edge feature and the sample texture feature are considered at the same time, so that the obtained detection result is more accurate.

And calling an attention network in the image detection model, and adjusting the first sample edge feature according to the predicted edge feature image to obtain a second sample edge feature, wherein the second sample edge feature can more accurately describe the face edge of the sample face image. Therefore, the image detection model can deeply learn the edge features in the face image through an attention mechanism, so that the accuracy of the image detection model is further improved.

In addition, the image detection model can be trained according to the prediction score and the sample score, the prediction edge feature image and the sample edge feature image, and the prediction texture feature image and the sample texture feature image. Or training the image detection model according to the prediction score and the sample score, the prediction edge characteristic image and the sample edge characteristic image. Or training the image detection model according to the prediction score and the sample score, the prediction texture feature image and the sample texture feature image. Or training the image detection model based only on the prediction scores and the sample scores. Therefore, various schemes for training the image detection model according to the output result of the image detection model are provided, and the flexibility of the training image detection model is improved.

In addition, the setting mode of the sample score corresponding to the sample face image can be set by default by computer equipment or set by a developer according to the actual condition, so that the flexibility of the training image detection model is improved.

After the image detection model is trained, the image detection model can be called to process, and the image can be detected. The following examples will describe the image detection process in detail.

Fig. 6 is a flowchart of another image detection method according to an embodiment of the present application. An execution subject of the embodiment of the present application is a computer device, and referring to fig. 6, the method includes:

601. the computer equipment acquires a first face image and a second face image corresponding to the target face image.

The computer equipment acquires a target face image, cuts the target face image and respectively obtains a first face image and a second face image. The first face image comprises a face area which does not contain face edges in the target face image, and the second face image comprises a face area which contains face edges in the target face image.

In a possible implementation manner, the computer device performs face detection on the target face image to obtain a first face region that does not include a face edge in the target face image, and cuts the first face region in the target face image to obtain a first face image. And the computer equipment acquires a second face area containing the face edge in the target face image according to the first face area, and cuts the second face area in the target face image to be used as a second face image.

The computer equipment carries out face detection on the acquired target image to obtain a first face area in the target image, and the first face area does not contain face edges, so that the first face area in the target face image can be cut off to serve as the first face image. Because the first face region does not include the face edge, after the computer device acquires the first face region, the computer device can adjust the target face image according to the first face region to obtain a second face region including the face edge, and therefore the second face region in the target face image can be cut out to serve as a second face image.

Optionally, after the computer device acquires the first face region, the target face image is expanded around the first face region as a center, a second face region including the first face region is acquired, and the ratio between the size of the second face region and the size of the first face region is a preset ratio, where the preset ratio is greater than 1.

Optionally, the computer device invokes a first face detector stored in advance to perform face detection on the target face image, so as to obtain a first face region in the target face image. The first face detector is used for detecting a face area which does not comprise the edge of the face in the face image.

In another possible implementation manner, the computer device performs face detection on the target face image to obtain a third face region including a face edge in the target face image, and performs clipping processing on the third face region in the target face image to serve as a second face image. And the computer equipment acquires a fourth face area which does not contain the face edge in the target face image according to the third face area, and cuts the fourth face area in the target face image to be used as the first face image.

The computer equipment carries out face detection on the acquired target image to obtain a third face area in the target image, wherein the third face area comprises a face edge, so that the third face area in the target face image can be cut off to be used as a second face image. Because the third face area includes the face edge, after the computer device acquires the third face area, the computer device may adjust the target face image according to the third face area to obtain a fourth face area that does not include the face edge, and thus the second face area in the target face image may be cut out to serve as the second face image.

Optionally, after the computer device acquires the third face region, the target face image is shrunk around the third face region as a center, a fourth face region in the third face region is acquired, and a ratio between a size of the third face region and a size of the fourth face region is a preset ratio, where the preset ratio is smaller than 1. Optionally, since the third face region includes the fourth face region, the computer device may perform a cropping process on the fourth face region in the second face image as the first face image.

Optionally, the computer device calls a second face detector stored in advance to perform face detection on the target face image, so as to obtain a third face region in the target face image. The second face detector is used for detecting a face area including a face edge in the face image.

In another possible implementation manner, the computer device performs face detection on the target face image to obtain a fifth face region not including a face edge and a sixth face region including a face edge in the target face image, performs clipping processing on the fifth face region in the target face image to serve as a first face image, and performs clipping processing on the sixth face region in the target face image to serve as a second face image.

Optionally, the computer device invokes a first face detector stored in advance to perform face detection on the target face image, so as to obtain a fifth face area in the target face image. The first face detector is used for detecting a face area which does not comprise the edge of the face in the face image. Optionally, the computer device invokes a second face detector stored in advance to perform face detection on the target face image, so as to obtain a sixth face area in the target face image. The second face detector is used for detecting a face area including a face edge in the face image.

Fig. 7 is a schematic diagram of a face image according to an embodiment of the present application, and referring to fig. 7, a target face image 701 is cut to obtain a first face image 702 and a second face image 703 corresponding to the target face image 701.

602. And the computer equipment calls a first coding network in the image detection model to code the first face image to obtain the texture characteristics corresponding to the first face image.

The image detection model in the embodiment of the application comprises a first coding network, a second coding network, a feature fusion network and a classification network. The feature fusion network is respectively connected with the first coding network and the second coding network, the classification network is connected with the feature fusion network, the first coding network is used for extracting texture features of the face image, the second coding network is used for extracting edge features of the face image, the feature fusion network is used for fusing the texture features and the edge features, and the classification network is used for obtaining scores corresponding to the face image according to the texture features and the edge features.

Since the first face image does not include the face edges, the first face image can be used for extracting accurate texture features. And when the computer equipment acquires a first face image corresponding to the target face image, calling a first coding network in the image detection model, and coding the first face image to obtain texture features corresponding to the first face image, namely texture features corresponding to the target face image. The texture feature may be a multi-dimensional feature matrix or other forms of features, which is not limited in this embodiment of the present application.

603. And the computer equipment calls a second coding network in the image detection model to code the second face image to obtain the edge characteristics corresponding to the second face image.

And when the computer equipment acquires a second face image corresponding to the target face image, calling a second coding network in the image detection model, and coding the second face image to obtain edge features corresponding to the second face image, namely the edge features corresponding to the target face image.

The edge feature may be a multi-dimensional feature matrix or other forms of features, which is not limited in this application.

It should be noted that, in the embodiment of the present application, the step 602 is performed first, and then the step 603 is performed as an example. In another embodiment, step 603 may be performed first and then step 602 is performed, that is, the edge feature is obtained first and then the texture feature is obtained. Or, step 602 and step 603 may also be performed simultaneously, that is, the first face image and the second face image are processed simultaneously to obtain texture features and edge features.

604. And calling a feature fusion network in the image detection model by the computer equipment, and carrying out fusion processing on the edge features and the texture features to obtain the face features corresponding to the target face image.

When the computer equipment acquires the texture features and the edge features of the target face image, a feature fusion network in an image detection model is called to perform fusion processing on the edge features and the texture features, and the fused features are the face features corresponding to the target face image.

The fusion method of the edge feature and the texture feature in step 604 is similar to the fusion method of the sample texture feature and the first sample edge feature in step 406, and is not described in detail here.

605. And calling a classification network in the image detection model by the computer equipment, and classifying the facial features to obtain a score corresponding to the target face image.

When the computer equipment acquires the facial features corresponding to the target face image, a classification network in the image detection model is called to classify the facial features to obtain the score corresponding to the target face image.

The score may represent a possibility that the target face image is a real face image or a possibility that the target face image is a fake face image, and therefore, whether the target face image is a real face image or a fake face image may be determined by the score. Alternatively, the score may be 0 to 1 and include numerical values of 0 and 1, or may also be a score in other ranges, which is not limited in the embodiment of the present application.

It should be noted that, in the embodiment of the present application, a process of processing a facial feature by invoking a classification network after fusing a texture feature and an edge feature into the facial feature is described. In another embodiment, the texture features and the edge features may not be fused, and the classification network may be directly invoked to process the texture features and the edge features. Namely, the above step 604-605 is replaced by: and calling a classification network in the image detection model, and classifying the texture features and the edge features to obtain a score corresponding to the target face image.

606. And under the condition that the score value belongs to a preset numerical range, the computer equipment determines the target face image as a real face image.

The preset numerical range is a numerical range which is required to be met by the score corresponding to the real face image. After the computer equipment acquires the score corresponding to the target face image, judging whether the score belongs to a preset numerical range, if the score belongs to the preset numerical range, determining the target face image as a real face image, and if the score does not belong to the preset numerical range, determining the target image as a forged face image.

In a possible implementation manner, if, during the training of the image detection model, the sample score corresponding to the real face image in the sample face image is greater than the sample score corresponding to the forged face image, that is, the score output by the classification network is trained to indicate the possibility that the face image is the real face image, the preset value range is set as: greater than a first preset threshold. And under the condition that the score is larger than a first preset threshold value, the computer equipment determines the target face image as a real face image. The first preset threshold value may be determined by the computer device according to a distribution of preset scores corresponding to a plurality of sample face images obtained in the process of training the image detection model.

After the computer equipment acquires the score, whether the score is larger than a first preset threshold value is judged. And if the score is greater than a first preset threshold value, determining the target face image as a real face image, and if the score is not greater than the first preset threshold value, determining the target face image as a forged face image.

In another possible implementation manner, if, during the training of the image detection model, the sample score corresponding to the real face image in the sample face image is smaller than the sample score corresponding to the fake face image, that is, the score output by the classification network is trained to indicate the possibility that the face image is a fake face image, the preset numerical range is set as: is less than a second preset threshold. And under the condition that the score is smaller than a second preset threshold value, the computer equipment determines the target face image as a real face image. The second preset threshold may be determined by the computer device according to a distribution of preset scores corresponding to a plurality of sample face images obtained in the process of training the image detection model.

After the computer equipment obtains the score, whether the score is smaller than a second preset threshold value is judged. And if the score is smaller than a second preset threshold value, determining the target face image as a real face image, and if the score is not smaller than the second preset threshold value, determining the target face image as a forged face image.

It should be noted that, the embodiments of the present application are described only by taking an example of implementing image detection by calling an image detection model. In another embodiment, the texture feature and the edge feature of the face image may be extracted by using other methods instead of calling the image detection model, the texture feature and the edge feature are fused to obtain the face feature, the face feature is classified to obtain a score, and the target face image is determined to be the real face image when the score belongs to the preset numerical range.

The method provided by the embodiment of the application extracts the texture features and the edge features of the target face image, the texture features can distinguish whether the face in the target face image is an edited face or not, and the edge features can distinguish whether the face in the target face image is a replaced face or not, so that the face features used for distinguishing whether the target face image is a real face image or a forged face image can be obtained by fusing the texture features and the edge features. The facial features are classified, so that a score for representing whether the target face image is a real face image or a fake face image can be obtained, and whether the target face image is the real face image or not is determined according to whether the score belongs to a preset numerical range or not. In the process of image detection, the texture features and the edge features of the face image are considered at the same time, so that the accuracy of the detection of the face image can be improved. Moreover, each face image comprises texture features and edge features, so that the image detection method provided by the application can be suitable for detecting any face image and has strong generalization.

And moreover, a first face image and a second face image corresponding to the target face image are obtained, and the first face image does not include a face edge, so that texture features obtained by coding the first face image are not affected by the face edge, and the accuracy of the obtained texture features is improved. Because the second face image comprises the face edge, the second face image is coded to only obtain the edge feature without considering the texture feature, and the accuracy of the obtained edge feature is improved. Therefore, the accuracy of the image detection model for detecting the image can be improved.

In addition, according to the method and the device, the first face area of the target face image can be firstly cut to obtain the first face image, then the second face area is determined according to the first face area, and the second face area is cut to obtain the second face image. Or the third face area of the target face image can be cut to obtain a second face image, then the fourth face area is determined according to the third face area, and the fourth face area is cut to obtain the first face image. The method can also be used for cutting a fifth face area of the target face image to obtain a first face image, and cutting a sixth face area of the target face image to obtain a second face image. Therefore, various schemes for acquiring the first human face image and the second human face image according to the target human face image are provided, and the flexibility of acquiring the first human face image and the second human face image is improved.

And moreover, the edge features and the textural features are fused to obtain the facial features corresponding to the target face image, and when the target face image is detected according to the facial features, the edge features and the textural features are considered at the same time, so that the obtained detection result is more accurate.

And determining the target face image as a real face image under the condition that the score value belongs to a preset numerical range. The preset value range can be set according to a setting method of a sample score in the process of training the image detection model, and can be set to be larger than a first preset threshold value or smaller than a second preset threshold value. The setting mode of the sample score can be set by default by a computer device or set by a developer according to the actual situation. Therefore, the flexibility of training the image detection model and the flexibility of detecting the image are improved.

Fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present application. Referring to fig. 8, the apparatus includes:

an image obtaining module 801, configured to obtain a first face image and a second face image corresponding to a target face image, where the first face image includes a face region that does not include a face edge in the target face image, and the second face image includes a face region that includes a face edge in the target face image;

the first encoding processing module 802 is configured to perform encoding processing on a first face image to obtain a texture feature corresponding to the first face image;

a second encoding processing module 803, configured to perform encoding processing on the second face image to obtain an edge feature corresponding to the second face image;

the fusion processing module 804 is used for performing fusion processing on the edge features and the texture features to obtain face features corresponding to the target face image;

a classification processing module 805, configured to perform classification processing on the facial features to obtain a score corresponding to the target face image;

a determining module 806, configured to determine the target face image as a real face image if the score belongs to a preset numerical range.

The device provided by the embodiment of the application extracts the texture features and the edge features of the target face image, the texture features can distinguish whether the face in the target face image is an edited face or not, and the edge features can distinguish whether the face in the target face image is a replaced face or not, so that the face features used for distinguishing whether the target face image is a real face image or a forged face image can be obtained by fusing the texture features and the edge features. The facial features are classified, so that a score for representing whether the target face image is a real face image or a forged face image can be obtained, and whether the target face image is the real face image or not is determined according to whether the score belongs to a preset numerical range or not. In the process of image detection, the texture features and the edge features of the face image are considered at the same time, so that the accuracy of the detection of the face image can be improved. Moreover, each face image comprises texture features and edge features, so that the image detection device provided by the application can be suitable for detecting any face image and has strong generalization.

Alternatively, referring to fig. 9, the image acquisition module 801 includes:

the first cropping unit 8011 is configured to perform face detection on the target face image, obtain a first face region that does not include a face edge in the target face image, and crop the first face region in the target face image to serve as a first face image;

the second clipping unit 8012 is configured to, according to the first face area, obtain a second face area including a face edge in the target face image, and perform clipping processing on the second face area in the target face image to obtain a second face image.

Alternatively, referring to fig. 9, the image acquisition module 801 includes:

the third clipping unit 8013 is configured to perform face detection on the target face image, obtain a third face area including a face edge in the target face image, and clip the third face area in the target face image as a second face image;

the fourth cropping unit 8014 is configured to, according to the third face area, acquire a fourth face area that does not include a face edge in the target face image, and perform cropping processing on the fourth face area in the target face image to obtain the first face image.

Optionally, referring to fig. 9, the first encoding processing module 802 is configured to invoke a first encoding network in the image detection model, and perform encoding processing on the first face image to obtain a texture feature corresponding to the first face image;

a second encoding processing module 803, configured to invoke a second encoding network in the image detection model, and perform encoding processing on the second face image to obtain an edge feature corresponding to the second face image;

the fusion processing module 804 is used for calling a feature fusion network in the image detection model, and performing fusion processing on the edge features and the texture features to obtain face features corresponding to the target face image;

and the classification processing module 805 is configured to invoke a classification network in the image detection model, and perform classification processing on the facial features to obtain a score corresponding to the target face image.

Optionally, referring to fig. 9, the determining module 806 includes:

the first determining unit 8061 is configured to determine the target face image as the real face image if the score is greater than a first preset threshold.

Optionally, referring to fig. 9, the determining module 806 includes:

a second determining unit 8062, configured to determine the target face image as a real face image if the score is smaller than a second preset threshold.

Optionally, referring to fig. 9, the apparatus further comprises:

the image obtaining module 801 is further configured to obtain a first sample face image and a second sample face image corresponding to the sample face image, where the first sample face image includes a face region that does not include a face edge in the sample face image, and the second sample face image includes a face region that includes a face edge in the sample face image;

the first encoding processing module 802 is further configured to invoke a first encoding network in the image detection model, and perform encoding processing on the first sample face image to obtain a sample texture feature corresponding to the first sample face image;

the second encoding processing module 803 is further configured to invoke a second encoding network in the image detection model, and perform encoding processing on the second sample face image to obtain a first sample edge feature corresponding to the second sample face image;

the fusion processing module 804 is further configured to invoke a feature fusion network in the image detection model, and perform fusion processing on the sample texture feature and the first sample edge feature to obtain a sample facial feature corresponding to the sample facial image;

the classification processing module 805 is further configured to invoke a classification network in the image detection model, perform classification processing on the sample facial features, and obtain a prediction score corresponding to the sample facial image;

the training module 807 is configured to train the image detection model according to the prediction score and the sample score corresponding to the sample face image.

Optionally, referring to fig. 9, training module 807 includes:

the first decoding unit 8071 is configured to invoke a first decoding network in the image detection model, perform decoding processing on the sample texture feature, and obtain a predicted texture feature image corresponding to the sample texture feature;

the first training unit 8072 is configured to train an image detection model according to the prediction score and the sample score, and the prediction texture feature image and the sample texture feature image corresponding to the sample face image.

Optionally, referring to fig. 9, the training module 807 includes:

a second decoding unit 8073, configured to invoke a second decoding network in the image detection model, and perform decoding processing on the first sample edge feature to obtain a predicted edge feature image corresponding to the first sample edge feature;

the second training unit 8074 is configured to train an image detection model according to the prediction score and the sample score, and the predicted edge feature image and the sample edge feature image corresponding to the sample face image.

Optionally, referring to fig. 9, the apparatus further comprises:

a first edge obtaining module 808, configured to label an original region and a forged region in a second sample face image to obtain a labeled image, and perform feature extraction processing on the labeled image to obtain a sample edge feature image corresponding to the forged face image, where the sample face image is a forged face image; alternatively, the first and second electrodes may be,

the second edge obtaining module 809 is configured to obtain a preset edge feature image as a sample edge feature image corresponding to the real face image, where the sample face image is a real face image.

Optionally, referring to fig. 9, the fusion processing module 804 is configured to:

calling an attention network in an image detection model, and adjusting the first sample edge feature according to the predicted edge feature image to obtain a second sample edge feature;

and calling a feature fusion network, and carrying out fusion processing on the sample texture features and the second sample edge features to obtain sample facial features.

Optionally, referring to fig. 9, the predicted edge feature image includes a plurality of pixel values, the first sample edge feature is a multi-dimensional feature matrix, the feature matrix of each dimension includes a plurality of feature values, and the plurality of pixel values are in one-to-one correspondence with the plurality of feature values; a fusion processing module 804, further configured to:

and calling an attention network, and respectively adjusting the corresponding characteristic value in the characteristic matrix of each dimension in the first sample edge characteristic according to a plurality of pixel values of the predicted edge characteristic image to obtain a second sample edge characteristic.

It should be noted that: in the image detection apparatus provided in the foregoing embodiment, only the division of the functional modules is illustrated when performing image detection, and in practical applications, the functions may be distributed by different functional modules as needed, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the image detection apparatus and the image detection method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

Fig. 10 is a schematic structural diagram of a terminal 1000 according to an exemplary embodiment of the present application. Terminal 1000 can be configured to perform steps performed by a computer device in the image detection methods described above.

In general, terminal 1000 can include: a processor 1001 and a memory 1002.

Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 1001 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1001 may be integrated with a GPU (Graphics Processing Unit, image Processing interactor) which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 1001 may further include an AI (Artificial Intelligence) processor for processing a computing operation related to machine learning.

Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1002 is used to store at least one instruction for being possessed by processor 1001 to implement the image detection methods provided by the method embodiments herein.

In some embodiments, the apparatus 1000 may further optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1004, camera head assembly 1005, and power supply 1006.

Peripheral interface 1003 may be used to connect at least one peripheral associated with I/O (Input/Output) to processor 1001 and memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The Radio Frequency circuit 1004 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 1004 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1004 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1004 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 1004 may communicate with other devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 1004 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The camera assembly 1005 is used to capture images or video. Optionally, camera assembly 1005 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of terminal 1000 and the rear camera is disposed on the back of terminal 1000. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1005 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Power supply 1006 is used to provide power to various components in terminal 1000. The power supply 1006 may be ac, dc, disposable or rechargeable. When the power supply 1006 comprises a rechargeable battery, the rechargeable battery may support wired charging or wireless charging. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and that terminal 1000 can include more or fewer components than shown, or some components can be combined, or a different arrangement of components can be employed.

Fig. 11 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1100 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 1101 and one or more memories 1102, where the memory 1102 stores at least one instruction, and the at least one instruction is loaded by the processors 1101 and executed to implement the methods provided by the foregoing method embodiments. Of course, the server 1100 may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the server may also include other components for implementing the functions of the device, which are not described herein again.

The server 1100 may be used to perform the steps performed by the computer device in the image detection method described above.

The embodiment of the present application further provides a computer device for image detection, where the computer device includes a processor and a memory, where the memory stores at least one instruction, and the at least one instruction is loaded and executed by the processor, so as to implement the image detection method of the foregoing embodiment.

The embodiment of the present application further provides a computer-readable storage medium, where at least one instruction is stored in the computer-readable storage medium, and the at least one instruction is loaded and executed by a processor to implement the image detection method of the foregoing embodiment.

Embodiments of the present application also provide a computer program product including computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image detection method.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only an alternative embodiment of the present application and should not be construed as limiting the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image detection method, characterized in that the method comprises:

coding the second face image to obtain an edge feature corresponding to the second face image;

classifying the facial features to obtain a score corresponding to the target face image, wherein the score is used for determining that the target face image is a real face image or a forged face image, and the classifying process is used for determining that the facial features belong to the facial features of the real face image or the facial features belong to the facial features of the forged face image;

and under the condition that the score value belongs to a preset numerical range, determining the target face image as a real face image.

2. The method according to claim 1, wherein the obtaining of the first face image and the second face image corresponding to the target face image comprises:

performing face detection on the target face image to obtain a first face area which does not contain a face edge in the target face image, and cutting the first face area in the target face image to be used as the first face image;

and according to the first face area, acquiring a second face area containing face edges in the target face image, and cutting the second face area in the target face image to obtain the second face image.

3. The method according to claim 1, wherein the obtaining of the first face image and the second face image corresponding to the target face image comprises:

performing face detection on the target face image to obtain a third face area containing face edges in the target face image, and performing cutting processing on the third face area in the target face image to obtain a second face image;

and according to the third face area, acquiring a fourth face area which does not contain a face edge in the target face image, and cutting the fourth face area in the target face image to be used as the first face image.

4. The method according to claim 1, wherein the encoding the first face image to obtain the texture feature corresponding to the first face image includes: calling a first coding network in an image detection model, and coding the first face image to obtain texture features corresponding to the first face image;

the encoding processing of the second face image to obtain the edge feature corresponding to the second face image includes: calling a second coding network in the image detection model, and coding the second face image to obtain edge features corresponding to the second face image;

the fusion processing of the edge features and the texture features to obtain the face features corresponding to the target face image includes: calling a feature fusion network in the image detection model, and carrying out fusion processing on the edge features and the texture features to obtain face features corresponding to the target face image;

the classifying the facial features to obtain the score corresponding to the target face image includes: and calling a classification network in the image detection model, and classifying the facial features to obtain a score corresponding to the target face image.

5. The method according to claim 1, wherein the determining the target face image as a real face image in the case that the score value belongs to a preset numerical range comprises:

and determining the target face image as a real face image under the condition that the score is larger than a first preset threshold value.

6. The method according to claim 1, wherein the determining the target face image as a real face image if the score value belongs to a preset numerical range comprises:

and determining the target face image as a real face image under the condition that the score is smaller than a second preset threshold value.

7. The method according to claim 4, wherein before the calling a first coding network in an image detection model and performing coding processing on the first face image to obtain the texture feature corresponding to the first face image, the method further comprises:

acquiring a first sample face image and a second sample face image corresponding to a sample face image, wherein the first sample face image comprises a face area which does not contain a face edge in the sample face image, and the second sample face image comprises a face area which contains a face edge in the sample face image;

calling a first coding network in the image detection model, and coding the first same face image to obtain a sample texture feature corresponding to the first same face image;

calling a second coding network in the image detection model, and coding the second sample face image to obtain a first sample edge characteristic corresponding to the second sample face image;

calling a feature fusion network in the image detection model, and carrying out fusion processing on the sample texture features and the first sample edge features to obtain sample facial features corresponding to the sample facial image;

calling a classification network in the image detection model, and carrying out classification processing on the sample facial features to obtain a prediction score corresponding to the sample facial image;

and training the image detection model according to the prediction score and the sample score corresponding to the sample face image.

8. The method of claim 7, wherein the training the image detection model according to the prediction score and a sample score corresponding to the sample face image comprises:

calling a first decoding network in the image detection model, and decoding the sample texture features to obtain a predicted texture feature image corresponding to the sample texture features;

and training the image detection model according to the prediction score and the sample score as well as the prediction textural feature image and the sample textural feature image corresponding to the sample face image.

9. The method of claim 7, wherein the training the image detection model according to the prediction score and a sample score corresponding to the sample face image comprises:

and training the image detection model according to the prediction score and the sample score as well as the prediction edge characteristic image and the sample edge characteristic image corresponding to the sample face image.

10. The method according to claim 9, wherein before training the image detection model according to the prediction score and the sample score, and the sample edge feature image corresponding to the predicted edge feature image and the sample face image, the method further comprises:

the sample face image is a forged face image, an original region and a forged region in the second sample face image are labeled to obtain a labeled image, and the labeled image is subjected to feature extraction processing to obtain a sample edge feature image corresponding to the forged face image; alternatively, the first and second electrodes may be,

the sample face image is a real face image, and a preset edge feature image is obtained and used as a sample edge feature image corresponding to the real face image.

11. The method according to claim 7, wherein the calling a feature fusion network in the image detection model to perform fusion processing on the sample texture feature and the first sample edge feature to obtain a sample facial feature corresponding to the sample facial image comprises:

12. The method according to claim 11, wherein the predicted edge feature image comprises a plurality of pixel values, the first sample edge feature is a multi-dimensional feature matrix, the feature matrix of each dimension comprises a plurality of feature values, and the plurality of pixel values are in one-to-one correspondence with the plurality of feature values; the calling an attention network in the image detection model, and adjusting the first sample edge feature according to the predicted edge feature image to obtain a second sample edge feature includes:

13. An image detection apparatus, characterized in that the apparatus comprises:

the classification processing module is used for classifying the facial features to obtain a score corresponding to the target face image, wherein the score is used for determining that the target face image is a real face image or a forged face image, and the classification processing module is used for determining that the facial features belong to the facial features of the real face image or the facial features belong to the facial features of the forged face image;

and the determining module is used for determining the target face image as a real face image under the condition that the score value belongs to a preset numerical range.

14. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the image detection method of any one of claims 1 to 12.

15. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, to implement the image detection method according to any one of claims 1 to 12.