CN115147895B

CN115147895B - Face fake identifying method and device

Info

Publication number: CN115147895B
Application number: CN202210688010.3A
Authority: CN
Inventors: 谭资昌; 缪长涛; 郭国栋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2023-06-30
Anticipated expiration: 2042-06-16
Also published as: CN115147895A

Abstract

The disclosure provides a face fake identification method, a device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to a deep learning technology, and can be used in a face fake identification scene. The specific implementation scheme is as follows: determining an initial feature map of the acquired face image; processing an input feature map of each central differential attention module of the plurality of central differential attention modules in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and the input feature map of each module after the first central differential attention module is an output feature map of a previous module; based on the processed feature map, it is determined whether the face image is a counterfeit face image. The method and the device improve the accuracy of the fake identification result of the face image.

Description

Face fake identifying method and device

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to a deep learning technology, and particularly relates to a face fake identification method and device, a training method of a face fake identification model, a training device of the face fake identification model, electronic equipment and a storage medium, which can be used in a face fake identification scene.

Background

With the rapid development of face counterfeiting technology, various algorithms have emerged that can generate counterfeit face images and videos indistinguishable to the human eye. Counterfeit face data may be misused, for example, propagating politics promotions and creating false news, posing a great threat to security. In this context, face-forgery detection has grown and has received increasing attention. At present, most face fake identifying methods are based on a trained convolutional neural network to identify face images, and have poor detection effect on fake face data.

Disclosure of Invention

The disclosure provides a face fake identification method and device, and a training method and device for a face fake identification model, electronic equipment and storage medium.

According to a first aspect, there is provided a face authentication method, including: determining an initial feature map of the acquired face image; processing, by each of the plurality of central differential attention modules in the series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; based on the processed feature map, it is determined whether the face image is a counterfeit face image.

According to a second aspect, a training method of a face authentication model is provided, including: acquiring a training sample set, wherein a training sample in the training sample set comprises a sample face image and a label for representing whether the sample face image is a fake face image; determining an initial feature map of the input sample face image through an embedding layer; processing, by each of the plurality of central differential attention modules in the series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; and taking the label corresponding to the input sample face image as expected output of the face fake identifying result obtained by the output layer based on the processed feature map, so as to obtain the face fake identifying model comprising the embedded layer, a plurality of central differential attention modules and the output layer through training by a machine learning method.

According to a third aspect, there is provided a face authentication device, comprising: a first determination unit configured to determine an initial feature map of the acquired face image; a deriving unit configured to process, by each of the plurality of central differential attention modules connected in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally derive a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module following the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; and a second determination unit configured to determine whether the face image is a fake face image based on the processed feature map.

According to a fourth aspect, there is provided a training device for a face authentication model, including: the training system comprises an acquisition unit, a detection unit and a detection unit, wherein the acquisition unit is configured to acquire a training sample set, and a training sample in the training sample set comprises a sample face image and a label for representing whether the sample face image is a fake face image or not; a training unit configured to determine an initial feature map of the input sample face image through the embedding layer; processing, by each of the plurality of central differential attention modules in the series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; and taking the label corresponding to the input sample face image as expected output of the face fake identifying result obtained by the output layer based on the processed feature map, so as to obtain the face fake identifying model comprising the embedded layer, a plurality of central differential attention modules and the output layer through training by a machine learning method.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first and second aspects.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described in any implementation of the first and second aspects.

According to the technology disclosed by the invention, the face fake identifying method is provided, and the characteristic processing is carried out on the basis of the central difference convolution method and the attention mechanism through each central difference attention module in a plurality of central difference attention modules connected in series so as to capture fake marks of local parts and fine granularity of the face image in the space domain, thereby improving the accuracy of fake identifying results of the face image.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram to which an embodiment according to the present disclosure may be applied;

FIG. 2 is a flow chart of one embodiment of a face authentication method according to the present disclosure;

fig. 3 is a schematic diagram of an application scenario of the face authentication method according to the present embodiment;

FIG. 4 is a flow chart of yet another embodiment of a face authentication method according to the present disclosure;

FIG. 5 is a schematic diagram of a face authentication model according to the present disclosure;

FIG. 6 is a flow chart of one embodiment of a training method for a face authentication model according to the present disclosure;

FIG. 7 is a block diagram of one embodiment of a face-based authentication device according to the present disclosure;

FIG. 8 is a block diagram of one embodiment of a training device for a face authentication model according to the present disclosure;

FIG. 9 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

Fig. 1 illustrates an exemplary architecture 100 to which the face authentication method and apparatus, and the training method and apparatus of the face authentication model of the present disclosure may be applied.

As shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The communication connection between the

terminal devices

101, 102, 103 constitutes a topology network, the network 104 being the medium for providing the communication link between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The

terminal devices

101, 102, 103 may be hardware devices or software supporting network connections for data interaction and data processing. When the

terminal device

101, 102, 103 is hardware, it may be various electronic devices supporting network connection, information acquisition, interaction, display, processing, etc., including but not limited to smartphones, tablet computers, electronic book readers, laptop and desktop computers, etc. When the

terminal devices

101, 102, 103 are software, they can be installed in the above-listed electronic devices. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server that provides various services, for example, a background processing server that acquires face images provided by the

terminal devices

101, 102, 103, performs feature processing based on a central differential convolution method and an attention mechanism by each of a plurality of central differential attention modules connected in series, to finally obtain a face authentication result. For another example, the background processing server for obtaining the face authentication model is trained through training samples provided by the

terminal devices

101, 102 and 103. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should also be noted that, the face authentication method and the training method of the face authentication model provided by the embodiments of the present disclosure may be executed by a server, or may be executed by a terminal device, or may be executed by the server and the terminal device in cooperation with each other. Accordingly, each part (for example, each unit) included in the face recognition device and the training device of the face recognition model may be all set in the server, all set in the terminal device, or all set in the server and the terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the face recognition method and the electronic device on which the training method of the face recognition model operates do not need to perform data transmission with other electronic devices, the system architecture may only include the face recognition method and the electronic device (e.g., a server or a terminal device) on which the training method of the face recognition model operates.

Referring to fig. 2, fig. 2 is a flowchart of a face authentication method provided in an embodiment of the disclosure, where the flowchart 200 includes the following steps:

step 201, determining an initial feature map of the acquired face image.

In this embodiment, the executing body of the face authentication method (for example, the terminal device or the server in fig. 1) may acquire the face image from a remote location or from a local location based on a wired network connection manner or a wireless network connection manner, and determine an initial feature map of the acquired face image.

The face image is image data including a face object, and may be a still image including a face object or a video frame in a dynamic video including a face position object. In order to determine the authenticity of a face object in a face image, a counterfeit authentication of the face object in the face image is required.

In order to identify the facial image, the executing body firstly performs feature extraction on the acquired facial image to obtain an initial feature map. As an example, the above-mentioned execution subject may perform feature extraction on the face image through a convolution operation, resulting in an initial feature map.

As yet another example, the above-described execution body may construct a feature embedding layer using spatial depth convolution and a linear layer (full-connection layer), and further perform feature extraction on the face image through the feature embedding layer, resulting in an initial feature map.

Step 202, processing, by each central differential attention module of the plurality of central differential attention modules connected in series, the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map.

In this embodiment, the executing body may process, through each central differential attention module of the plurality of central differential attention modules connected in series, the input feature map of the central differential attention module based on the central differential convolution method and the attention mechanism, so as to finally obtain the processed feature map. The input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module following the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module.

As an example, for a first central differential attention module in the central differential attention modules, taking the initial feature map as input, performing central differential convolution on the initial feature map by a central differential convolution method to obtain a central differential convolution feature map, and further performing feature processing on the central differential convolution feature map based on an attention mechanism to obtain an output feature map of the first central differential attention module.

And for each subsequent central differential attention module, taking the output characteristic diagram of the previous central differential attention module as an input characteristic diagram, and continuously processing the input characteristic diagram of the central differential attention module based on a central differential convolution method and an attention mechanism on the basis of the characteristic processing of the previous central differential attention module to obtain a corresponding output characteristic diagram. And finally, taking the output characteristic diagram of the last central differential attention module as a processed characteristic diagram.

The standard convolution is mainly composed of two steps of sampling and aggregation, and the central difference convolution adds a step of central difference between the sampling step and the aggregation step. Specifically, for a given feature map

(H, W, C represents the height, width and number of channels of the feature map, respectively), a common two-dimensional standard convolution is expressed as:

Wherein Y represents the convolution productFeature map, W (p) _n ) Representing the weight of the convolution parameter, p ₀ Representing the current position on the pre-and post-convolution feature maps, p _n Indicating an arbitrary position in receptive field region R.

The center differential convolution differs from the aggregation operation of the normal convolution in that it aggregates the center directional gradient of the sample values. The center differential convolution can be expressed as:

when p is _n When= (0, 0), the position p is relative to the center ₀ The gradient value itself is always zero. The center difference may enhance the ability of a common convolution to describe fine-grained invariant information. For the fake face images obtained by different fake modes, the fake face images are obtained by different fake modes, but the reserved fake marks in the fake face images in various fake modes have similarity or consistency, and compared with standard convolution, the central differential convolution enhances the capability of capturing unchanged information, namely the fake marks with similarity or consistency.

In this embodiment, the number of the central differential attention modules may be specifically set according to actual situations, where the number of the central differential attention modules is smaller, and the processing depth of the face pseudo-recognition model mainly including the central differential attention module on the feature map is shallower; when the number of the central differential attention modules is large, the processing depth of the face fake identifying model on the feature map is deep. As an example, the number of central differential attention modules may be 4.

Step 203, based on the processed feature map, it is determined whether the face image is a fake face image.

In this embodiment, the executing body may determine whether the face image is a fake face image based on the processed feature map.

As an example, the execution subject may input the processed image into a softmax classification layer to determine whether the face image is a face authentication result of a fake face image. Specifically, the classification layer may output the probability that the face image belongs to a fake face or a real face, and when the probability corresponding to the fake face exceeds a preset probability threshold, the face image is shown to be a fake face image; when the probability corresponding to the real face exceeds a preset probability threshold, the face image is indicated to be the real face image.

In this embodiment, the executing body may execute the steps 201 to 203 through a face authentication model. Specifically, the face authentication model includes a feature embedding layer that performs step 201, a plurality of central differential attention modules that perform step 202, and a classification layer that performs step 203.

With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the face authentication method according to the present embodiment. In the application scenario of fig. 3, a user 301 sends a face image 304 to a server 303 through a terminal device 302. The server 303 first determines an initial feature map 306 of the acquired face image through a feature embedding layer 3051 in the face authentication model 305; the input feature map of the central differential attention module is processed based on a central differential convolution method and an attention mechanism by each central differential attention module 3052 of the plurality of central differential attention modules in series to finally obtain a processed feature map 307. Wherein the input profile of a first central differential attention module of the plurality of central differential attention modules is an initial profile. Finally, based on the processed feature map 307, whether the face image is a fake face image is determined through the classification layer 3053, and a face fake identification result is obtained.

In the embodiment, a face fake identifying method is provided, and through each center difference attention module in a plurality of center difference attention modules connected in series, feature processing is performed based on a center difference convolution method and an attention mechanism, so that fake marks of local parts and fine granularity of a face image in a space domain are captured, and the accuracy of fake identifying results of the face image is improved.

In some optional implementations of this embodiment, the executing body may execute the step 202 as follows: by each of the plurality of center differential attention modules, performing the following:

first, the input feature map of the central differential attention module is convolved to obtain a convolution feature map. As an example, the above-described execution subject may convolve the input feature map of the central differential attention module by standard convolution to obtain a convolved feature map.

Second, based on the convolution feature map, a query vector is obtained.

In this implementation, the execution entity may transform the two-dimensional convolution feature map into a one-dimensional query (queries) vector through flattening and unwrapping operations

Each pixel in the query vector may be considered a token.

Thirdly, performing center differential convolution on the convolution feature map by a center differential convolution method to obtain a key vector and a value vector.

In this implementation manner, first, the execution body may perform a central differential convolution on the convolution feature map to obtain a central differential convolution feature map

Then, the two-dimensional center difference convolution characteristic diagram X _kv Characteristic sequence flattened in one dimension->

Finally respectively mapping matrix W by projection _k And W is _v Further projected as a key (keys) vector k and a value (values) vector v.

Fourth, the query vector, the key vector and the value vector are processed through an attention mechanism, and an output characteristic diagram of the central differential attention module is obtained.

As an example, the execution body determines the feature with higher attention in the query vector, the key vector and the value vector through the attention mechanism, and obtains the output feature diagram of the central differential attention module.

In the implementation manner, a specific implementation manner for processing the input feature map of the central differential attention module based on the central differential convolution method and the attention mechanism is provided, and the output feature map is obtained through the attention mechanism processing on the basis of performing feature processing on the input feature map based on the central differential convolution method to obtain the query vector, the key vector and the value vector, so that the capability of the central differential attention module for capturing fake marks of local and fine granularity of the face image in the spatial domain is further improved.

In some optional implementations of this embodiment, the executing body may execute the fourth step by: firstly, processing a query vector, a key vector and a value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagrams corresponding to the heads.

Specifically, a two-dimensional characteristic diagram

(N represents the number of tokens, D represents the dimension of each token) as input, and a multi-headed self-attention mechanism with M heads is formulated as:

q＝xW _q ，k＝xW _k ，v＝xW _v

z＝cat(z ₁ ，…，z _M )W _o

wherein σ (·) represents a softmax function, d=d/M, representing the dimension of each head, z _m An embedded output representing the mth attention header, q _m ，k _m ，

Respectively represent a query vector, a key vector and a value vector, W _q ，W _k ，W _v ，W _o Mapping matrix, cat, respectively representing query vector, key vector, value vector, and attention mechanism(. Cndot.) represents stitching.

In the implementation manner, a specific manner of processing the query vector, the key vector and the value vector based on a multi-head self-attention mechanism is provided, and the expressive force of outputting the fake trace of the local part and the fine granularity of the feature map in the space domain is further improved.

In some alternative implementations of this embodiment, a high frequency wavelet sampler is provided between two adjacent central differential attention modules. In this implementation manner, the execution body may execute the step 202 as follows:

first, for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain an input feature map of the next central differential attention module.

After the output characteristic diagram of the central differential attention module is obtained, the high-frequency information of the output characteristic diagram of the preamble central differential attention module is further extracted through a subsequent high-frequency wavelet sampler.

There are a variety of downsampling operations in existing deep neural networks, such as Max Pooling, average Pooling, stride-con-figuration, and so on. Max Pooling and Average Pooling are effective and primitive, but some research work indicates that they may ignore the beneficial details of the image. Although researchers introduced Mixed Pooling, stochastic Pooling, and maxbur Pooling to solve these problems, they did not all consider inconsistencies between real and fake faces in the frequency domain. High frequency wavelet samplers are quite different from these common samplers. The high-frequency wavelet sampler uses discrete wavelet transform, which not only can perform feature downsampling, but also can decompose an image into a low-frequency component and a high-frequency component.

And secondly, taking the output characteristic diagram of the last central differential attention module as a processed characteristic diagram.

In this implementation manner, the executing body further extracts high-frequency information in the frequency domain through the high-frequency wavelet sampler on the basis of performing feature processing through each central differential attention module, so that the input feature map of each central differential attention module has the high-frequency features of the space domain and the frequency domain at the same time, and the expressive force of the finally obtained processed feature map on the fake trace of local and fine granularity is further improved.

In some optional implementations of this embodiment, the foregoing execution body may execute the process of extracting the high-frequency information by the high-frequency wavelet sampler by:

first, by a high-frequency wavelet sampler between the center differential attention module and the next center differential attention module, various high-frequency components of each channel in an output characteristic diagram of the center differential attention module in a frequency domain are decomposed based on a discrete wavelet transform mode. And then splicing the same high-frequency components corresponding to the channels to obtain the spliced high-frequency components. And finally, cascading the spliced high-frequency components to obtain the high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In this implementation, the execution entity first decomposes the various high frequency components of each channel in the output feature map of the central differential attention module of the preamble by discrete wavelet transform in the frequency domain, which essentially captures different frequencies at different resolutions. The classical two-dimensional discrete wavelet transform comprises two filters, an L low-pass filter and an H high-pass filter.

The low-pass filter and the high-pass filter are specifically expressed as:

in particular, the low pass filter is focused on smooth surfaces that are primarily associated with low frequency signals, while the high pass filter captures most of the high frequency signals, such as vertical, horizontal, and diagonal edge signals. These two filters can be arbitrarily combined to form four kernels, namely, LL component, LH component, HL component, and HH component.

For a given feature map

(C, H and W represent the number of channels, height and width, respectively, of the feature map) discrete wavelet transform operations are performed on each channel. Specifically, feature X for the ith channel _i The subband features generated by the first order decomposition are as follows:

where i ε {0,1, …, C-1}.

These features are then stacked and connected together in the channel dimension and denoted as X _ll ，X _lh ，X _hl And X _hh 。

By analyzing the wavelet sub-bands (components) of the real face image and its corresponding counterfeit face image, the LL sub-band, which is composed mainly of low frequency information, is found, depicting the overall appearance common to the real and counterfeit face images, while the LH sub-band, HL sub-band, and HH sub-band contain information representing subtle artifacts and counterfeit marks (e.g., mixed boundaries, checkerboard, blurring artifacts, etc.) of the counterfeit face image. Since the low frequency information of the counterfeited face image is essentially an approximation of the original image, many research efforts have also shown that high frequencies (LH, HL, and HH) contribute to counterfeited face detection. Thus, the present implementation does not consider using the LL wavelet sub-band for fake face detection tasks. For the high frequency characteristics of LH, HL, and HH, they are aggregated together by a channel cascade (cat), which can be expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,

this not only aggregates the high frequency channel features, but also reduces the resolution of the input feature map.

In the implementation mode, a specific implementation process of extracting high-frequency features by the high-frequency wavelet sampler is provided, and the expressive force of fake marks of the input feature images corresponding to each central differential attention module in local and fine granularity is further improved.

In some optional implementations of this embodiment, the executing body may execute the following operations to concatenate the spliced high-frequency components to obtain the high-frequency features in the output feature map of the central differential attention module, and obtain the input feature map corresponding to the next central differential attention module:

firstly, cascading all spliced high-frequency components to obtain cascading high-frequency characteristics; and then, carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

Specifically, the above-mentioned execution body employs Layer normalization (Layer Norm) and linear Layer to reduce channel dimensions, specifically expressed as:

in the implementation mode, the characteristic processing is carried out through the layer normalization and the linear layer, so that the data volume of the characteristic is reduced on the basis of keeping the characteristic expressive force, and the information processing efficiency is improved.

In some optional implementations of this embodiment, the executing body may execute the step 202 by:

first, a supplemental feature map derived based on the initial feature map is provided for each of a plurality of center differential attention modules based on a jump connection.

As an example, the execution subject uses the initial feature map as a data base, and performs operations such as downsampling to obtain complementary feature maps of the same size as the input feature map of each central differential attention module.

And secondly, processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

The supplementary feature map may be subjected to an element-wise addition operation with the input feature map to obtain a fused feature map, and the fused feature map is further input to the central differential attention module. And supplementing the space perception local information for a multi-level network formed by the differential attention modules of each center through the supplementing feature diagram.

In the implementation mode, space perception local information is supplemented for a multi-stage network formed by the differential attention modules of each center in a jump connection mode, and the accuracy of the feature processing process is further improved.

With continued reference to fig. 4, there is shown a schematic flow 400 of yet another embodiment of a face authentication method according to the present disclosure, comprising the steps of:

Step 401, determining an initial feature map of the acquired face image.

Step 402, for a first central differential attention module of the plurality of central differential attention modules, processing the initial feature map based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module.

Step 403, for each subsequent central differential attention module, extracting high-frequency features in the output feature map of the previous central differential attention module by using a high-frequency wavelet sampler between the central differential attention module and the previous central differential attention module, and obtaining an input feature map of the central differential attention module.

Step 404, obtaining a supplementary feature map obtained based on the initial feature map and provided for the central differential attention module based on the jump connection mode.

And step 405, processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on the central differential convolution method and the attention mechanism to finally obtain a processed feature map.

Step 406, based on the processed feature map, it is determined whether the face image is a fake face image.

As can be seen from this embodiment, compared with the embodiment corresponding to fig. 2, the flow 400 of the face fake identifying method in this embodiment specifically illustrates the feature processing process based on the central differential attention module and the high-frequency wavelet sampler, and the process of providing the complementary feature map for each central differential attention module based on the jump connection mode, so that the expressive force of the obtained feature map on the local and fine-grained fake traces of the spatial domain is further improved, and the accuracy of the fake identifying result for the face image is improved.

With continued reference to fig. 5, a schematic diagram of the structure of the face authentication model is shown. The face authentication model 500 includes a

feature embedding layer

501,4 central

differential processing modules

502, 503, 504, 505,3 high

frequency wavelet samplers

506, 507, 508 and a classification layer 509. The initial feature of the face image determined by the feature embedding layer 501 is input into a central differential attention module 502, and the central differential attention module 502 processes the initial feature map based on a central differential convolution method and a multi-head self-attention mechanism to obtain an output feature map; the output feature map of the central differential attention module 502 is subjected to high-frequency feature extraction through a high-frequency wavelet sampler 506 to obtain an input feature map of the central differential processing module 503, the local jump connection strategy provides a supplementary feature map for the central differential processing module 503 based on the initial feature map, and the central differential processing module 503 performs feature processing on the corresponding input feature map and the supplementary feature map based on a central differential convolution method and an attention mechanism.

By circularly executing the above-described process, the processed feature map output by the central difference processing module 505 is finally obtained, so as to determine whether the input face image is a fake face image.

With continued reference to fig. 6, there is shown a schematic flow 600 of one embodiment of a training method for a face authentication model according to the present disclosure, comprising the steps of:

step 601, a training sample set is obtained.

In this embodiment, the execution subject of the training method of the face authentication model (for example, the terminal device or the server in fig. 1) may acquire the training sample set from a remote location or from a local location based on a wired network connection manner or a wireless network connection manner.

The training samples in the training sample set comprise sample face images and labels for representing whether the sample face images are fake face images or not. The training sample set includes both forged sample face images and true sample face images.

Step 602, determining an initial feature map of an input sample face image through an embedding layer; processing an input feature map of each central differential attention module in the plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map; and taking the label corresponding to the input sample face image as expected output of the face fake identifying result obtained by the output layer based on the processed feature map, so as to obtain the face fake identifying model comprising the embedded layer, a plurality of central differential attention modules and the output layer through training by a machine learning method.

In this embodiment, the executing body may determine an initial feature map of the input sample face image through an embedding layer; processing an input feature map of each central differential attention module in the plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map; and taking the label corresponding to the input sample face image as expected output of the face fake identifying result obtained by the output layer based on the processed feature map, so as to obtain the face fake identifying model comprising the embedded layer, a plurality of central differential attention modules and the output layer through training by a machine learning method. The input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module following the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module.

In this embodiment, for an input sample face image, the face fake identifying model outputs an actual face fake identifying result for the sample face image; further, determining cross entropy loss between the actual face identification result and the label corresponding to the input sample face image; further, parameters of the embedded layer, the plurality of center differential attention modules, and the output layer are updated based on the cross entropy loss.

And performing the training operation circularly, and responding to the preset ending condition to obtain the trained face fake identifying model. The preset ending condition may be, for example, that the training time exceeds a preset time threshold, and the training loss tends to converge when the training time exceeds a preset frequency threshold.

In the embodiment, the face fake identifying model performs feature processing based on a central difference convolution method and an attention mechanism through each central difference attention module in a plurality of central difference attention modules connected in series so as to capture fake marks of local parts and fine granularity of the face image in a space domain, and accuracy of the face fake identifying model on fake identifying results of the face image is improved.

In some optional implementations of this embodiment, the executing entity may process, by executing, through each of the plurality of central differential attention modules in the series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism, by:

by each of the plurality of center differential attention modules, performing the following:

firstly, convolving an input feature map of the central differential attention module to obtain a convolution feature map; secondly, obtaining a query vector based on the convolution feature map; thirdly, performing center differential convolution on the convolution feature map by a center differential convolution method to obtain a key vector and a value vector; fourth, the query vector, the key vector and the value vector are processed through an attention mechanism, and an output characteristic diagram of the central differential attention module is obtained.

In some optional implementations of this embodiment, the executing body may execute the fourth step by: firstly, processing a query vector, a key vector and a value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; then, based on the head characteristic diagrams corresponding to the heads, an output characteristic diagram of the central differential attention module is obtained.

In some optional implementations of this embodiment, the face authentication model further includes a high frequency wavelet sampler disposed between two adjacent central differential attention modules. In this implementation manner, the executing body may execute the following manner to process, by using each of the plurality of central differential attention modules connected in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism, so as to finally obtain a processed feature map:

firstly, for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module to obtain the input feature map of the next central differential attention module; then, the output feature map of the last central differential attention module is taken as a processed feature map.

In this implementation, in each training operation, the executing body needs to update parameters of the embedded layer, the plurality of center differential attention modules, the plurality of high-frequency wavelet samplers and the output layer according to the obtained cross entropy loss.

In some optional implementations of this embodiment, the executing body may extract, by a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module, high-frequency features in an output feature map of the central differential attention module to obtain an input feature map of the next central differential attention module, where the executing body includes:

first, decomposing to obtain various high-frequency components of each channel in an output characteristic diagram of the central differential attention module under a frequency domain based on a discrete wavelet transformation mode through a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module; secondly, splicing the same high-frequency components corresponding to each channel to obtain each spliced high-frequency component; thirdly, cascading each spliced high-frequency component to obtain high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine an input characteristic diagram corresponding to the next central differential attention module.

In some optional implementations of this embodiment, the executing body may execute the third step by: firstly, cascading all spliced high-frequency components to obtain cascading high-frequency characteristics; and then, carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

In some optional implementations of this embodiment, the executing body may execute the step 202 as follows: firstly, providing a supplementary feature map obtained based on an initial feature map for each central differential attention module in a plurality of central differential attention modules based on a jump connection mode; then, through each central differential attention module in the plurality of central differential attention modules connected in series, the supplementary feature map and the input feature map corresponding to the central differential attention module are processed based on a central differential convolution method and an attention mechanism, so as to finally obtain a processed feature map.

It should be noted that, each implementation in the embodiment 600 may be executed with reference to each implementation in the embodiment 200, which is not described herein. The trained face authentication model may be used to implement the

embodiments

200, 400 described above.

With continued reference to fig. 7, as an implementation of the method shown in the foregoing drawings, the present disclosure provides an embodiment of a face authentication device, where an embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device may be specifically applied to various electronic devices.

As shown in fig. 7, the face authentication device 700 includes: a first determining unit 701 configured to determine an initial feature map of the acquired face image; a deriving unit 702 configured to process, by each of the plurality of central differential attention modules connected in series, the input feature map of that central differential attention module based on a central differential convolution method and an attention mechanism to finally derive a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module following the first central differential attention module, the input feature map of that central differential attention module is an output feature map of a preceding central differential attention module; the second determining unit 703 is configured to determine whether the face image is a fake face image based on the processed feature map.

In some optional implementations of the present embodiment, the deriving unit 702 is further configured to: by each of the plurality of center differential attention modules, performing the following: convolving the input feature map of the central differential attention module to obtain a convolution feature map; obtaining a query vector based on the convolution feature map; performing center differential convolution on the convolution feature map by using a center differential convolution method to obtain a key vector and a value vector; and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output characteristic diagram of the central differential attention module.

In some optional implementations of the present embodiment, the deriving unit 702 is further configured to: processing the query vector, the key vector and the value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagrams corresponding to the heads.

In some optional implementations of the present embodiment, a high frequency wavelet sampler is provided between two adjacent central differential attention modules, and the deriving unit 702 is further configured to: for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module to obtain the input feature map of the next central differential attention module; and taking the output characteristic diagram of the last central differential attention module as a processed characteristic diagram.

In some optional implementations of the present embodiment, the deriving unit 702 is further configured to: decomposing to obtain various high-frequency components of each channel in the output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transformation mode by a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module; splicing the same high-frequency components corresponding to the channels to obtain the spliced high-frequency components; and cascading the spliced high-frequency components to obtain the high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In some optional implementations of the present embodiment, the deriving unit 702 is further configured to: cascading the spliced high-frequency components to obtain cascading high-frequency characteristics; and carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

In some optional implementations of the present embodiment, the deriving unit 702 is further configured to: providing a supplementary feature map obtained based on the initial feature map for each of the plurality of central differential attention modules based on the jump connection; and processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

In the embodiment, the face false identification device is provided, and the characteristic processing is performed on the basis of the central differential convolution method and the attention mechanism through each central differential attention module in a plurality of central differential attention modules connected in series so as to capture the false trace of the local part and the fine granularity of the face image in the space domain, thereby improving the accuracy of the false identification result of the face image.

With continued reference to fig. 8, as an implementation of the method shown in the foregoing drawings, the present disclosure provides an embodiment of a training apparatus for a face authentication model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 6, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 8, the training apparatus 800 of the face authentication model includes: an obtaining unit 801 configured to obtain a training sample set, wherein a training sample in the training sample set includes a sample face image and a tag that characterizes whether the sample face image is a fake face image; a training unit 802 configured to determine an initial feature map of the input sample face image through the embedding layer; processing, by each of the plurality of central differential attention modules in the series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; and taking the label corresponding to the input sample face image as expected output of the face fake identifying result obtained by the output layer based on the processed feature map, so as to obtain the face fake identifying model comprising the embedded layer, a plurality of central differential attention modules and the output layer through training by a machine learning method.

In some optional implementations of this embodiment, training unit 802 is further configured to: by each of the plurality of center differential attention modules, performing the following: convolving the input feature map of the central differential attention module to obtain a convolution feature map; obtaining a query vector based on the convolution feature map; performing center differential convolution on the convolution feature map by using a center differential convolution method to obtain a key vector and a value vector; and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output characteristic diagram of the central differential attention module.

In some optional implementations of this embodiment, training unit 802 is further configured to: processing the query vector, the key vector and the value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagrams corresponding to the heads.

In some optional implementations of this embodiment, the face authentication model further includes a high frequency wavelet sampler disposed between two adjacent central differential attention modules, and the training unit 802 is further configured to: for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module to obtain the input feature map of the next central differential attention module; and taking the output characteristic diagram of the last central differential attention module as a processed characteristic diagram.

In some optional implementations of this embodiment, training unit 802 is further configured to: decomposing to obtain various high-frequency components of each channel in the output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transformation mode by a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module; splicing the same high-frequency components corresponding to the channels to obtain the spliced high-frequency components; and cascading the spliced high-frequency components to obtain the high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In some optional implementations of this embodiment, training unit 802 is further configured to: cascading the spliced high-frequency components to obtain cascading high-frequency characteristics; and carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

In some optional implementations of this embodiment, training unit 802 is further configured to: providing a supplementary feature map obtained based on the initial feature map for each of the plurality of central differential attention modules based on the jump connection; and processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

In this embodiment, a training device for a face fake-identifying model is provided, and the face fake-identifying model performs feature processing based on a central difference convolution method and an attention mechanism through each of a plurality of central difference attention modules connected in series, so as to capture fake marks of local parts and fine granularity of a face image in a space domain, and improve accuracy of fake-identifying results of the face fake-identifying model on the face image.

According to an embodiment of the present disclosure, the present disclosure further provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, where the instructions are executable by the at least one processor, so that the at least one processor, when executed, can implement the face authentication method and the training method of the face authentication model described in any of the foregoing embodiments.

According to an embodiment of the present disclosure, there is further provided a readable storage medium storing computer instructions for enabling a computer to implement the face authentication method and the training method of the face authentication model described in any of the above embodiments when executed.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, such as a face authentication method. For example, in some embodiments, the face authentication method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the face authentication method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the face authentication method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called as a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical host and virtual special server (VPS, virtual Private Server) service; or may be a server of a distributed system or a server incorporating a blockchain.

According to the technical scheme of the embodiment of the disclosure, the face fake identifying method is provided, and the characteristic processing is performed on the basis of the central difference convolution method and the attention mechanism through each of a plurality of central difference attention modules connected in series, so that the fake trace of the face image in the local part and the fine granularity of the space domain is captured, and the accuracy of the fake identifying result of the face image is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A face authentication method, comprising:

determining an initial feature map of the acquired face image;

processing, by each of a plurality of central differential attention modules in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is the initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module;

determining whether the face image is a fake face image based on the processed feature map;

wherein, through each central differential attention module in a plurality of central differential attention modules in series, the processing of the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism includes:

by each of the plurality of central differential attention modules in the series, performing the following:

Convolving the input feature map of the central differential attention module to obtain a convolution feature map;

obtaining a query vector based on the convolution feature map;

performing center differential convolution on the convolution feature map by using a center differential convolution method to obtain a key vector and a value vector;

and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output characteristic diagram of the central differential attention module.

2. The method of claim 1, wherein the processing the query vector, the key vector, and the value vector by an attention mechanism to obtain an output feature map of the central differential attention module comprises:

processing the query vector, the key vector and the value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head;

and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagrams corresponding to the heads.

3. The method according to any of claims 1-2, wherein a high frequency wavelet sampler is provided between two adjacent central differential attention modules, and

the processing, by each of the plurality of center differential attention modules connected in series, the input feature map of the center differential attention module based on a center differential convolution method and an attention mechanism to finally obtain a processed feature map includes:

Processing an input feature map of each central differential attention module in the plurality of central differential attention modules based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module to obtain the input feature map of the next central differential attention module;

and taking the output characteristic diagram of the last central differential attention module as the processed characteristic diagram.

4. A method according to claim 3, wherein the extracting, by the high-frequency wavelet sampler between the central differential attention module and the next central differential attention module, the high-frequency feature in the output feature map of the central differential attention module, to obtain the input feature map of the next central differential attention module includes:

decomposing to obtain various high-frequency components of each channel in the output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transformation mode by a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module;

Splicing the same high-frequency components corresponding to the channels to obtain the spliced high-frequency components;

and cascading the spliced high-frequency components to obtain the high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

5. The method of claim 4, wherein concatenating the spliced high-frequency components to obtain the high-frequency features in the output feature map of the central differential attention module and obtain the input feature map corresponding to the next central differential attention module, includes:

cascading the spliced high-frequency components to obtain cascading high-frequency characteristics;

and carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

6. The method of claim 4, wherein the processing the input feature map by each of the plurality of central differential attention modules in series based on a central differential convolution method and an attention mechanism to ultimately result in a processed feature map comprises:

providing a supplementary feature map obtained based on the initial feature map for each of the plurality of central differential attention modules based on a jump connection;

And processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism by each central differential attention module in the plurality of central differential attention modules so as to finally obtain the processed feature map.

7. A training method of a human face fake identification model comprises the following steps:

acquiring a training sample set, wherein a training sample in the training sample set comprises a sample face image and a label for representing whether the sample face image is a fake face image;

determining an initial feature map of the input sample face image through an embedding layer; processing, by each of a plurality of central differential attention modules in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is the initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; the label corresponding to the input sample face image is used as the expected output of the face fake identifying result obtained by the output layer based on the processed feature image, so that the face fake identifying model comprising the embedded layer, the plurality of central differential attention modules and the output layer is obtained through training by a machine learning method;

obtaining a query vector based on the convolution feature map;

8. The method of claim 7, wherein the processing the query vector, the key vector, and the value vector by an attention mechanism to obtain an output feature map of the central differential attention module comprises:

9. The method of any of claims 7-8, wherein the face authentication model further comprises a high frequency wavelet sampler disposed between two adjacent central differential attention modules, and

10. The method according to claim 9, wherein the extracting, by the high-frequency wavelet sampler between the central differential attention module and the next central differential attention module, the high-frequency feature in the output feature map of the central differential attention module, to obtain the input feature map of the next central differential attention module includes:

11. The method of claim 10, wherein concatenating the spliced high-frequency components to obtain the high-frequency features in the output feature map of the central differential attention module, and obtaining the input feature map corresponding to the next central differential attention module, includes:

12. The method of claim 10, wherein the processing the input feature map by each of the plurality of central differential attention modules in series based on a central differential convolution method and an attention mechanism to ultimately result in a processed feature map comprises:

13. A face-based authentication device comprising:

a first determination unit configured to determine an initial feature map of the acquired face image;

a deriving unit configured to process, by each of a plurality of central differential attention modules connected in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally derive a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is the initial feature map, and for each central differential attention module following the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module;

A second determination unit configured to determine whether the face image is a fake face image based on the processed feature map;

wherein the deriving unit is further configured to:

convolving the input feature map of the central differential attention module to obtain a convolution feature map; obtaining a query vector based on the convolution feature map; performing center differential convolution on the convolution feature map by using a center differential convolution method to obtain a key vector and a value vector; and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output characteristic diagram of the central differential attention module.

14. A training device for a face authentication model, comprising:

an acquisition unit configured to acquire a training sample set, wherein a training sample in the training sample set includes a sample face image and a tag that characterizes whether the sample face image is a counterfeit face image;

a training unit configured to determine an initial feature map of the input sample face image through the embedding layer; processing, by each of a plurality of central differential attention modules in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is the initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; the label corresponding to the input sample face image is used as the expected output of the face fake identifying result obtained by the output layer based on the processed feature image, so that the face fake identifying model comprising the embedded layer, the plurality of central differential attention modules and the output layer is obtained through training by a machine learning method;

Wherein the training unit is further configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-12.