CN115147895A

CN115147895A - Face counterfeit discrimination method and device and computer program product

Info

Publication number: CN115147895A
Application number: CN202210688010.3A
Authority: CN
Inventors: 谭资昌; 缪长涛; 郭国栋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-16
Filing date: 2022-06-16
Publication date: 2022-10-04
Anticipated expiration: 2042-06-16
Also published as: CN115147895B

Abstract

The invention provides a face counterfeit identification method, a face counterfeit identification device, electronic equipment, a storage medium and a computer program product, relates to the technical field of artificial intelligence, in particular to a deep learning technology, and can be used in a face counterfeit identification scene. The specific implementation scheme is as follows: determining an initial characteristic image of the acquired face image; processing the input feature map of each central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in a plurality of central differential attention modules connected in series to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is an initial feature map, and the input feature map of each module after the first central differential attention module is an output feature map of a previous module; and determining whether the face image is a forged face image or not based on the processed feature image. The method and the device improve the accuracy of the false distinguishing result of the face image.

Description

Face counterfeit discrimination method and device and computer program product

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a deep learning technique, and more particularly, to a method and an apparatus for identifying a human face, a method and an apparatus for training a human face identification model, an electronic device, a storage medium, and a computer program product, which can be used in a human face identification scene.

Background

With the rapid development of face counterfeiting technology, various algorithms capable of generating counterfeit face images and videos which cannot be distinguished by human eyes appear. Counterfeit face data can be abused, for example, propagating political promotions and creating false news, posing a significant threat to security. In this context, face forgery detection is taking place and is receiving increasing attention. At present, most of face counterfeit distinguishing methods are based on trained convolutional neural networks to distinguish the face images, and the detection effect on forged face data is poor.

Disclosure of Invention

The disclosure provides a face authentication method and device, a training method and device of a face authentication model, electronic equipment, a storage medium and a computer program product.

According to a first aspect, there is provided a face authentication method, comprising: determining an initial feature map of the acquired face image; processing the input feature map of the central differential attention module by each central differential attention module in a plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is an initial feature map, and for each central differential attention module after the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module; and determining whether the face image is a forged face image or not based on the processed feature image.

According to a second aspect, there is provided a training method for a face counterfeit detection model, comprising: acquiring a training sample set, wherein the training samples in the training sample set comprise sample face images and labels for representing whether the sample face images are forged face images; determining an initial feature map of the input sample face image through the embedding layer; processing the input feature map of the central differential attention module by each central differential attention module in a plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is an initial feature map, and for each central differential attention module after the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module; and taking the label corresponding to the input sample face image as expected output of a face counterfeit identification result obtained by the output layer based on the processed characteristic diagram, and training by a machine learning method to obtain a face counterfeit identification model comprising an embedding layer, a plurality of central differential attention modules and the output layer.

According to a third aspect, there is provided a face authentication apparatus comprising: a first determination unit configured to determine an initial feature map of the acquired face image; a deriving unit configured to process, by each of a plurality of center differential attention modules connected in series, an input feature map of the center differential attention module based on a center differential convolution method and an attention mechanism to finally derive a processed feature map, wherein the input feature map of a first center differential attention module of the plurality of center differential attention modules is an initial feature map, and the input feature map of the center differential attention module is an output feature map of a previous center differential attention module for each center differential attention module subsequent to the first center differential attention module; and the second determination unit is configured to determine whether the face image is a forged face image or not based on the processed feature map.

According to a fourth aspect, there is provided a training device for a face counterfeit detection model, comprising: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a training sample set, wherein training samples in the training sample set comprise sample face images and labels for representing whether the sample face images are forged face images; a training unit configured to determine an initial feature map of the input sample face image through the embedding layer; processing the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in a plurality of central differential attention modules connected in series to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is an initial feature map, and for each central differential attention module behind the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module; and taking the label corresponding to the input sample face image as expected output of a face counterfeit identification result obtained by the output layer based on the processed characteristic diagram, and training to obtain a face counterfeit identification model comprising an embedding layer, a plurality of central differential attention modules and the output layer by a machine learning method.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first and second aspects.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first and second aspects.

According to a seventh aspect, there is provided a computer program product comprising: a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect and the second aspect.

According to the technology of the disclosure, a face counterfeit discrimination method is provided, wherein feature processing is performed on the basis of a center difference convolution method and an attention mechanism through each center difference attention module in a plurality of center difference attention modules connected in series so as to capture local and fine-grained counterfeit traces of a face image in a spatial domain, and the accuracy of a counterfeit discrimination result of the face image is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment according to the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for face authentication according to the present disclosure;

fig. 3 is a schematic diagram of an application scenario of the face authentication method according to the embodiment;

FIG. 4 is a flow chart of yet another embodiment of a face authentication method according to the present disclosure;

FIG. 5 is a schematic structural diagram of a face authentication model according to the present disclosure;

FIG. 6 is a flow diagram of one embodiment of a training method for a face authentication model according to the present disclosure;

FIG. 7 is a block diagram of one embodiment of a face authentication device according to the present disclosure;

FIG. 8 is a block diagram of an embodiment of a training apparatus for a face authentication model according to the present disclosure;

FIG. 9 is a schematic block diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 1 shows an exemplary architecture 100 to which the face authentication method and apparatus, and the training method and apparatus of the face authentication model of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The communication connections between the

terminal devices

101, 102, 103 form a topological network, and the network 104 serves to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 may be hardware devices or software that support network connections for data interaction and data processing. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices supporting network connection, information acquisition, interaction, display, processing, and the like, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example, a background processing server that obtains a face image provided by the

terminal devices

101, 102, and 103, and performs feature processing based on a central differential convolution method and an attention mechanism through each central differential attention module of a plurality of central differential attention modules connected in series to finally obtain a face counterfeit detection result. For another example, the background processing server of the face authentication model is trained through the training samples provided by the

terminal devices

101, 102, 103. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules for providing distributed services) or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the face authentication method and the training method of the face authentication model provided by the embodiments of the present disclosure may be executed by a server, or may be executed by a terminal device, or may be executed by the server and the terminal device in cooperation with each other. Accordingly, the face authentication device and the training device of the face authentication model may all be disposed in the server, may all be disposed in the terminal device, and may also be disposed in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the face authentication method and the training method of the face authentication model are run does not need to perform data transmission with other electronic devices, the system architecture may only include the electronic device (e.g., a server or a terminal device) on which the face authentication method and the training method of the face authentication model are run.

Referring to fig. 2, fig. 2 is a flowchart of a face authentication method according to an embodiment of the present disclosure, where the flowchart 200 includes the following steps:

step 201, determining an initial feature map of the acquired face image.

In this embodiment, an executing subject (for example, the terminal device or the server in fig. 1) of the face authentication method may acquire the face image from a remote location or a local location based on a wired network connection manner or a wireless network connection manner, and determine an initial feature map of the acquired face image.

The face image is image data including a face object, and may be a still image including the face object or a video frame in a dynamic video including a face portion object. In order to determine the authenticity of the face object in the face image, the face object in the face image needs to be subjected to counterfeit identification.

In order to perform face image false identification, the execution subject performs feature extraction on the acquired face image to obtain an initial feature map. As an example, the execution subject may perform feature extraction on a face image through a convolution operation to obtain an initial feature map.

As another example, the execution body may employ spatial depth convolution and a linear layer (full connection layer) to construct a feature embedding layer, and then perform feature extraction on the face image through the feature embedding layer to obtain an initial feature map.

Step 202, processing the input feature map of each center differential attention module in the plurality of center differential attention modules connected in series based on a center differential convolution method and an attention mechanism to finally obtain a processed feature map.

In this embodiment, the execution subject may process the input feature map of the center differential attention module through each center differential attention module of the plurality of center differential attention modules connected in series based on a center differential convolution method and an attention mechanism, so as to obtain a processed feature map finally. Wherein, the input feature map of the first central differential attention module in the plurality of central differential attention modules is an initial feature map, and for each central differential attention module after the first central differential attention module, the input feature map of the central differential attention module is an output feature map of the previous central differential attention module.

As an example, for a first central differential attention module in the central differential attention modules, the initial feature map is used as an input, the central differential convolution is performed on the initial feature map through a central differential convolution method to obtain a central differential convolution feature map, and then feature processing is performed on the central differential convolution feature map based on an attention mechanism to obtain an output feature map of the first central differential attention module.

And for each subsequent central differential attention module, taking the output feature map of the previous central differential attention module as an input feature map, and continuously processing the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism on the basis of feature processing of the previous central differential attention module to obtain a corresponding output feature map. Finally, the output feature map of the last central differential attention module is taken as the processed feature map.

The standard convolution mainly comprises two steps of sampling and aggregation, and the central differential convolution adds a central differential step between the sampling step and the aggregation step. In particular, for a given profile

(H, W, C represent the height, width and number of channels, respectively, of the feature map), and the common two-dimensional standard convolution is expressed as:

wherein Y represents a feature map after convolution, W (p) _n ) Representing the weight of a convolution parameter, p ₀ Represents the current position on the pre-convolution and post-convolution feature maps, and p _n Represents an arbitrary position in the receptive field region R.

The central differential convolution differs from the aggregation operation of ordinary convolution in that it aggregates the central directional gradients of the sample values. The center difference convolution can be expressed as:

when p is _n In case of = (0, 0), relative to the center position p ₀ Ladder itselfThe value is always zero. The center difference can enhance the ability of ordinary convolution to describe fine-grained invariant information. Although the forged face images obtained by different forging modes are forged by different modes, the remained forged traces of various forging modes in the forged face images have similarity or consistency, and compared with the standard convolution, the central differential convolution enhances the capability of capturing the unchanged information of the forged traces with the similarity or consistency.

In this embodiment, the number of the central differential attention modules may be specifically set according to actual conditions, and when the number of the central differential attention modules is small, the processing depth of the face authentication model mainly including the central differential attention module to the feature map is shallow; when the number of the central differential attention modules is large, the processing depth of the face counterfeit identification model to the feature map is deep. As an example, the number of central differential attention modules may be 4.

And step 203, determining whether the face image is a forged face image or not based on the processed feature image.

In this embodiment, the execution subject may determine whether the face image is a forged face image based on the processed feature map.

As an example, the execution subject may input the processed image to the softmax classification layer, and determine whether the face image is a face counterfeit result of a counterfeit face image. Specifically, the classification layer can output the probability that the face image belongs to a forged face or a real face, and when the probability corresponding to the forged face exceeds a preset probability threshold value, the face image is a forged face image; and when the probability corresponding to the real face exceeds a preset probability threshold value, indicating that the face image is a real face image.

In this embodiment, the execution subject may execute the steps 201 to 203 through a human face authentication model. Specifically, the face authentication model includes a feature embedding layer for performing step 201, a plurality of central differential attention modules for performing step 202, and a classification layer for performing step 203.

With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the face authentication method according to the embodiment. In the application scenario of fig. 3, a user 301 sends a face image 304 to a server 303 via a terminal device 302. The server 303 firstly determines an initial feature map 306 of the acquired face image through a feature embedding layer 3051 in the face authentication model 305; the input feature map of each of the plurality of center differential attention modules in the series is processed by each of the center differential attention modules 3052 based on a center differential convolution method and an attention mechanism to finally obtain a processed feature map 307. Wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map. Finally, based on the processed feature map 307, whether the face image is a forged face image is determined through the classification layer 3053, and a face counterfeit identification result is obtained.

In the embodiment, a human face counterfeit discrimination method is provided, in which each central differential attention module of a plurality of central differential attention modules connected in series is used for performing feature processing based on a central differential convolution method and an attention mechanism to capture local and fine-grained counterfeit traces of a human face image in a spatial domain, so that the accuracy of a counterfeit discrimination result of the human face image is improved.

In some optional implementations of this embodiment, the executing main body may execute the step 202 by: by each central differential attention module of the plurality of central differential attention modules, performing the following operations:

first, the input feature map of the central difference attention module is convolved to obtain a convolved feature map. As an example, the execution subject may convolve the input feature map of the central difference attention module by a standard convolution pair to obtain a convolution feature map.

Secondly, based on the convolution characteristic graph, a query vector is obtained.

In this implementation, the execution agent may transform the two-dimensional convolution feature map into a one-dimensional query (queries) vector through flattening and expansion operations

Each pixel in the query vector may be considered a token.

Thirdly, performing center difference convolution on the convolution characteristic graph through a center difference convolution method to obtain a key vector and a value vector.

In this implementation, first, the execution main body may perform center difference convolution on the convolution feature map to obtain a center difference convolution feature map

Then, the two-dimensional central difference convolution characteristic diagram X _kv Feature sequence flattened into one dimension

Finally mapping the matrix W through projection respectively _k And W _v Further projected as a key vector k and a value vector v.

Fourthly, the query vector, the key vector and the value vector are processed through an attention mechanism, and an output feature map of the central differential attention module is obtained.

As an example, the execution subject determines the feature with higher attention in the query vector, the key vector and the value vector through the attention mechanism, and obtains the output feature map of the central differential attention module.

In the implementation mode, a specific implementation mode for processing the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism is provided, on the basis of processing the input feature map to obtain a query vector, a key vector and a value vector based on the central differential convolution method, the output feature map is obtained through the attention mechanism processing, and the capability of the central differential attention module for capturing local and fine-grained forged traces of a face image in a spatial domain is further improved.

In some optional implementations of this embodiment, the executing body may execute the fourth step by: firstly, processing a query vector, a key vector and a value vector by a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagram corresponding to each head.

In particular, a two-dimensional characteristic diagram

(N denotes the number of tokens and D denotes the dimension of each token) as input, a multi-head autofocusing machine with M heads is formulated as:

q＝xW _q ，k＝xW _k ，v＝xW _v

z＝cat(z ₁ ，…，z _M )W _o

where σ (·) denotes the softmax function, D = D/M, denotes the dimensions of each head, z _m Represents the embedded output of the mth attention head, q _m ，k _m ，

Respectively representing a query vector, a key vector and a value vector, W _q ，W _k ，W _v ，W _o Respectively representing mapping matrixes corresponding to the query vector, the key vector, the value vector and the attention mechanism, and cat (-) representing splicing.

In the implementation mode, a specific mode for processing the query vector, the key vector and the value vector based on a multi-head self-attention mechanism is provided, and the expressive force of local and fine-grained forged traces of the output characteristic diagram in a spatial domain is further improved.

In some optional implementations of this embodiment, a high-frequency wavelet sampler is disposed between two adjacent central differential attention modules. In this implementation, the executing body may execute the step 202 as follows:

firstly, for each central differential attention module in a plurality of central differential attention modules, processing the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain an input feature map of the next central differential attention module.

After the output characteristic diagram of the central difference attention module is obtained, the high-frequency information of the output characteristic diagram of the preamble central difference attention module is further extracted through a subsequent high-frequency wavelet sampler.

There are various down-sampling operations in the existing deep neural network, such as Max Pooling, average Pooling, and strided-convolution. Max Pooling and Average Pooling are effective and primitive, but some research work indicates that they may ignore the beneficial details of the image. Although researchers have introduced Mixed Pooling (Mixed Pooling), stochastic Pooling (Stochastic Pooling), and MaxBlur Pooling (maximum fuzzy Pooling) to solve these problems, they have not all considered the inconsistency between real faces and fake faces in the frequency domain. High frequency wavelet samplers are very different from these common samplers. The high-frequency wavelet sampler uses discrete wavelet transform, not only can perform characteristic down-sampling, but also can decompose an image into a low-frequency component and a high-frequency component.

Second, the output feature map of the last central differential attention module is taken as the processed feature map.

In this implementation, the execution main body further extracts high-frequency information in the frequency domain through a high-frequency wavelet sampler on the basis of performing feature processing through each central differential attention module, so that the input feature map of each central differential attention module has high-frequency features in both the spatial domain and the frequency domain, and the expressive force of the processed feature map in local and fine-grained forged traces is further improved.

In some optional implementations of the present embodiment, the executing body may execute the extracting process of the high-frequency information by the high-frequency wavelet sampler by:

firstly, decomposing and obtaining various high-frequency components of each channel in the output characteristic diagram of the central difference attention module in a frequency domain based on a discrete wavelet transform mode through a high-frequency wavelet sampler between the central difference attention module and the next central difference attention module. And then, splicing the same high-frequency components corresponding to the channels to obtain the spliced high-frequency components. And finally, cascading the spliced high-frequency components to obtain the high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In this implementation, the execution body first decomposes the various high frequency components in the frequency domain for each channel in the output feature map of the central difference attention module of the preamble by discrete wavelet transform, which essentially captures different frequencies at different resolutions. The classical two-dimensional discrete wavelet transform contains two filters, an L low-pass filter and an H high-pass filter.

The low-pass filter and the high-pass filter are specifically represented as:

in particular, the low pass filter focuses on smooth surfaces that are primarily associated with low frequency signals, while the high pass filter captures most high frequency signals, such as vertical, horizontal, and diagonal edge signals. These two filters can be arbitrarily combined to form four kernels, i.e., an LL component, an LH component, an HL component, and an HH component.

For a given one of the feature maps

(C, H, and W represent the number of channels, height, and width of the feature map, respectively), and a discrete wavelet transform operation is performed for each channel. Specifically, feature X for the ith channel _i The subband features generated by the first-order decomposition are as follows：

Where i ∈ {0,1, \8230;, C-1}.

These features are then stacked and connected together in channel dimensions, and denoted X _ll ，X _lh ，X _hl And X _hh 。

By analyzing wavelet subbands (components) of a real face image and a corresponding forged face image thereof, an LL subband mainly composed of low-frequency information is found, depicting the overall appearance common to the real and fake face images, while LH, HL, and HH subbands contain information representing subtle artifacts and forged traces (e.g., mixed boundaries, checkerboards, blurring artifacts, etc.) of the forged face image. Since the low frequency information of the face image being forged is essentially an approximation of the original image, many research works have also shown that high frequencies (LH, HL, and HH) contribute to the detection of forged faces. Thus, the present implementation does not consider using the LL wavelet sub-band for the task of face detection falsification. For the high frequency characteristics of LH, HL and HH, they are aggregated together by channel cascade (cat), which can be expressed as:

wherein,

this approach not only aggregates the high frequency channel features, but also reduces the resolution of the input feature map.

In the implementation mode, a specific implementation process of extracting the high-frequency features by the high-frequency wavelet sampler is provided, and the expressive force of the local and fine-grained forged traces of the input feature map corresponding to each central differential attention module is further improved.

In some optional implementation manners of this embodiment, the executing body may execute the following operations to cascade the spliced high-frequency components, obtain a high-frequency feature in the output feature map of the central differential attention module, and obtain an input feature map corresponding to a next central differential attention module:

firstly, cascading the spliced high-frequency components to obtain a cascading high-frequency characteristic; and then, carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

Specifically, the implementation body uses Layer normalization and linear layers to reduce the channel dimension, which is specifically expressed as:

wherein,

in the implementation mode, the layer normalization and the linear layer are used for carrying out the feature processing, so that the data volume of the features is reduced on the basis of keeping the feature expression, and the improvement of the information processing efficiency is facilitated.

In some optional implementations of this embodiment, the executing main body may execute the step 202 by:

first, a supplemental feature map derived based on the initial feature map is provided for each of a plurality of central differential attention modules based on a jump connection approach.

For example, the execution body processes the initial feature map as a data base and performs down-sampling or the like to obtain a supplementary feature map having the same size as the input feature map of each central difference attention module.

Secondly, processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

The supplementary feature map and the input feature map can be subjected to element-by-element addition operation to obtain a fused feature map, and the fused feature map is input to the central differential attention module. And spatial perception local information is supplemented for a multi-stage network consisting of the central differential attention modules through supplementing the characteristic diagram.

In the implementation mode, the spatial perception local information is supplemented for the multi-level network formed by the central differential attention modules in a jumping connection mode, and the accuracy of the feature processing process is further improved.

With continued reference to fig. 4, a schematic flow chart 400 of yet another embodiment of a method for authenticating a human face according to the present disclosure is shown, comprising the steps of:

step 401, determining an initial feature map of the acquired face image.

Step 402, for a first central differential attention module of the plurality of central differential attention modules, processing the initial feature map based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module.

In step 403, for each subsequent central differential attention module, extracting the high-frequency features in the output feature map of the previous central differential attention module through the high-frequency wavelet sampler between the central differential attention module and the previous central differential attention module, so as to obtain the input feature map of the central differential attention module.

And step 404, acquiring a supplementary feature map which is provided for the central differential attention module based on the jump connection mode and is obtained based on the initial feature map.

And step 405, processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map.

And step 406, determining whether the face image is a forged face image or not based on the processed feature map.

As can be seen from this embodiment, compared with the embodiment corresponding to fig. 2, the flow 400 of the human face counterfeit identification method in this embodiment specifically illustrates a feature processing process based on the central differential attention module and the high-frequency wavelet sampler, and a process of providing a supplementary feature map for each central differential attention module based on a jump connection manner, so as to further improve the expressive force of local and fine-grained counterfeit traces of the obtained feature map in a spatial domain, and improve the accuracy of a counterfeit identification result for a human face image.

With continued reference to fig. 5, a schematic structural diagram of the face authentication model is shown. The face authentication model 500 includes a

feature embedding layer

501,4 central

difference processing modules

502, 503, 504, 505,3 high

frequency wavelet samplers

506, 507, 508, and a classification layer 509. The initial feature of the face image determined by the feature embedding layer 501 is input into a central differential attention module 502, and the central differential attention module 502 processes the initial feature map based on a central differential convolution method and a multi-head self-attention mechanism to obtain an output feature map; the output feature map of the central difference attention module 502 performs high-frequency feature extraction through a high-frequency wavelet sampler 506 to obtain an input feature map of the central difference processing module 503, the local jump connection strategy provides a supplementary feature map for the central difference processing module 503 based on the initial feature map, and the central difference processing module 503 performs feature processing on the corresponding input feature map and the supplementary feature map based on a central difference convolution method and an attention mechanism.

By executing the above processes in a circulating manner, the processed feature map output by the central difference processing module 505 is finally obtained, so as to determine whether the input face image is a forged face image.

With continuing reference to FIG. 6, a schematic flow chart 600 illustrating one embodiment of a method for training a face counterfeit detection model in accordance with the present disclosure is shown and includes the following steps:

step 601, obtaining a training sample set.

In this embodiment, an executing entity (for example, the terminal device or the server in fig. 1) of the training method for the face counterfeit detection model may obtain the training sample set from a remote location or a local location based on a wired network connection manner or a wireless network connection manner.

The training samples in the training sample set comprise sample face images and labels for representing whether the sample face images are forged face images. The training sample set comprises forged sample face images and real sample face images.

Step 602, determining an initial feature map of the input sample face image through an embedding layer; processing the input feature map of each central differential attention module through each central differential attention module in the plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map; and taking the label corresponding to the input sample face image as expected output of a face counterfeit identification result obtained by the output layer based on the processed characteristic diagram, and training to obtain a face counterfeit identification model comprising an embedding layer, a plurality of central differential attention modules and the output layer by a machine learning method.

In this embodiment, the execution subject may determine an initial feature map of the input sample face image through the embedding layer; processing the input feature map of each central differential attention module through each central differential attention module in the plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map; and taking the label corresponding to the input sample face image as expected output of a face counterfeit identification result obtained by the output layer based on the processed characteristic diagram, and training to obtain a face counterfeit identification model comprising an embedding layer, a plurality of central differential attention modules and the output layer by a machine learning method. Wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module.

In the embodiment, for the input sample face image, the face authentication model outputs the actual face authentication result of the sample face image; further, determining cross entropy loss between an actual face counterfeit discrimination result and a label corresponding to the input sample face image; further, parameters of the embedding layer, the plurality of central differential attention modules, and the output layer are updated according to the cross entropy loss.

And (4) performing the training operation in a circulating manner, and responding to the preset finishing condition, so as to obtain the trained face counterfeit identification model. The preset ending condition may be, for example, that the training time exceeds a preset time threshold, and the training loss tends to converge when the training time exceeds a preset number of times threshold.

In this embodiment, the human face counterfeit detection model performs feature processing on the basis of the center difference convolution method and the attention mechanism through each center difference attention module of the plurality of center difference attention modules connected in series to capture local and fine-grained forged traces of the human face image in a spatial domain, so that the accuracy of the human face counterfeit detection result of the human face counterfeit detection model on the human face image is improved.

In some optional implementations of this embodiment, the execution subject may process the input feature map of each central differential attention module of the plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism by performing the following:

by each central differential attention module of the plurality of central differential attention modules, performing the following operations:

firstly, convolving the input characteristic diagram of the central difference attention module to obtain a convolution characteristic diagram; secondly, obtaining a query vector based on the convolution characteristic graph; thirdly, performing center difference convolution on the convolution characteristic graph through a center difference convolution method to obtain a key vector and a value vector; fourthly, the query vector, the key vector and the value vector are processed through an attention mechanism, and an output feature map of the central differential attention module is obtained.

In some optional implementations of this embodiment, the executing body may execute the fourth step by: firstly, processing a query vector, a key vector and a value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; then, based on the head feature map corresponding to each head, an output feature map of the center differential attention module is obtained.

In some optional implementations of the present embodiment, the face counterfeit detection model further includes a high-frequency wavelet sampler disposed between two adjacent central differential attention modules. In this implementation, the execution body may process the input feature map of each central differential attention module in the plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to obtain a processed feature map finally by performing the following steps:

firstly, for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain an input feature map of the next central differential attention module; the output signature of the last central differential attention module is then taken as the processed signature.

In this implementation, in each training operation, the execution subject needs to update parameters of the embedding layer, the central differential attention modules, the high-frequency wavelet samplers, and the output layer according to the obtained cross entropy loss.

In some optional implementations of this embodiment, the executing entity may extract the high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and the next central differential attention module to obtain the input feature map of the next central differential attention module by performing the following steps:

firstly, decomposing and obtaining various high-frequency components of each channel in an output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transform mode through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module; secondly, splicing the same high-frequency components corresponding to each channel to obtain spliced high-frequency components; thirdly, cascading the spliced high-frequency components to obtain the high-frequency characteristics in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In some optional implementations of this embodiment, the executing body may execute the third step by: firstly, cascading all spliced high-frequency components to obtain a cascading high-frequency characteristic; and then, carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

In some optional implementations of this embodiment, the executing main body may execute the step 202 by: firstly, providing a supplementary feature map obtained based on an initial feature map for each central differential attention module in a plurality of central differential attention modules based on a jump connection mode; then, processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

It should be noted that each implementation manner in embodiment 600 may be executed with reference to each implementation manner in embodiment 200, and details are not described herein. The trained face authentication model can be used to implement the

above embodiments

200, 400.

With continuing reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a human face counterfeit detection apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied to various electronic devices.

As shown in fig. 7, the face authentication apparatus 700 includes: a first determining unit 701 configured to determine an initial feature map of the acquired face image; a deriving unit 702 configured to process, by each central differential attention module of a plurality of central differential attention modules connected in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally derive a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is an initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a preceding central differential attention module; a second determining unit 703 configured to determine whether the face image is a forged face image based on the processed feature map.

In some optional implementations of this embodiment, the deriving unit 702 is further configured to: by each central differential attention module of the plurality of central differential attention modules, performing the following operations: convolving the input feature map of the central difference attention module to obtain a convolution feature map; obtaining a query vector based on the convolution feature map; performing center difference convolution on the convolution characteristic graph by a center difference convolution method to obtain a key vector and a value vector; and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output feature map of the central differential attention module.

In some optional implementations of this embodiment, the deriving unit 702 is further configured to: processing the query vector, the key vector and the value vector by a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagram corresponding to each head.

In some optional implementations of this embodiment, a high frequency wavelet sampler is disposed between two adjacent central differential attention modules, and the deriving unit 702 is further configured to: for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain an input feature map of the next central differential attention module; and taking the output characteristic diagram of the last central differential attention module as a processed characteristic diagram.

In some optional implementations of this embodiment, the deriving unit 702 is further configured to: decomposing to obtain various high-frequency components of each channel in the output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transform mode through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module; splicing the same high-frequency components corresponding to the channels to obtain spliced high-frequency components; and cascading the spliced high-frequency components to obtain the high-frequency characteristic in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In some optional implementations of this embodiment, the deriving unit 702 is further configured to: cascading the spliced high-frequency components to obtain a cascading high-frequency characteristic; and carrying out layer normalization on the cascade high-frequency features to obtain an input feature map of the next central differential attention module.

In some optional implementations of this embodiment, the deriving unit 702 is further configured to: providing a supplementary feature map obtained based on the initial feature map for each of the plurality of central differential attention modules based on a jump connection manner; and processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

In this embodiment, a human face counterfeit discrimination apparatus is provided, where feature processing is performed on the basis of a center difference convolution method and an attention mechanism through each center difference attention module of a plurality of center difference attention modules connected in series to capture local and fine-grained counterfeit traces of a human face image in a spatial domain, so as to improve accuracy of a counterfeit discrimination result for the human face image.

With continuing reference to fig. 8, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a training apparatus for a face counterfeit detection model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 6, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 8, the training apparatus 800 for the face authentication model includes: an obtaining unit 801 configured to obtain a training sample set, where training samples in the training sample set include a sample face image and a label that represents whether the sample face image is a fake face image; a training unit 802 configured to determine an initial feature map of the input sample face image through the embedding layer; processing the input feature map of the central differential attention module by each central differential attention module in a plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is an initial feature map, and for each central differential attention module after the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module; and taking the label corresponding to the input sample face image as expected output of a face counterfeit identification result obtained by the output layer based on the processed characteristic diagram, and training to obtain a face counterfeit identification model comprising an embedding layer, a plurality of central differential attention modules and the output layer by a machine learning method.

In some optional implementations of the present embodiment, the training unit 802 is further configured to: by each central differential attention module of the plurality of central differential attention modules, performing the following operations: convolving the input feature map of the central difference attention module to obtain a convolution feature map; obtaining a query vector based on the convolution feature map; performing center difference convolution on the convolution feature map by a center difference convolution method to obtain a key vector and a value vector; and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output feature map of the central differential attention module.

In some optional implementations of the present embodiment, the training unit 802 is further configured to: processing the query vector, the key vector and the value vector by a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head; and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagram corresponding to each head.

In some optional implementations of this embodiment, the face counterfeit detection model further includes a high-frequency wavelet sampler disposed between two adjacent central differential attention modules, and the training unit 802 is further configured to: for each central differential attention module in a plurality of central differential attention modules, processing an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain an input feature map of the next central differential attention module; and taking the output characteristic diagram of the last central differential attention module as a processed characteristic diagram.

In some optional implementations of the present embodiment, the training unit 802 is further configured to: decomposing and obtaining various high-frequency components of each channel in the output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transform mode through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module; splicing the same high-frequency components corresponding to the channels to obtain spliced high-frequency components; and cascading the spliced high-frequency components to obtain the high-frequency characteristic in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

In some optional implementations of the present embodiment, the training unit 802 is further configured to: cascading the spliced high-frequency components to obtain a cascading high-frequency characteristic; and carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

In some optional implementations of the present embodiment, the training unit 802 is further configured to: providing a supplemental feature map based on the initial feature map for each of the plurality of central differential attention modules based on a jump connection manner; and processing the supplementary feature map and the input feature map corresponding to the central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in the plurality of central differential attention modules connected in series to finally obtain a processed feature map.

In the embodiment, a training device for a face counterfeit detection model is provided, in which the face counterfeit detection model performs feature processing based on a center difference convolution method and an attention mechanism through each center difference attention module of a plurality of center difference attention modules connected in series to capture local and fine-grained forged traces of a face image in a spatial domain, so that accuracy of a counterfeit detection result of the face counterfeit detection model on the face image is improved.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the storage stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that when executed by the at least one processor, the method for identifying a human face and the method for training a human face identification model described in any of the above embodiments can be implemented.

According to an embodiment of the present disclosure, the present disclosure further provides a readable storage medium, which stores computer instructions for enabling a computer to implement the face authentication method and the training method of the face authentication model described in any of the above embodiments when executed.

The disclosed embodiments provide a computer program product, which when executed by a processor can implement the method for identifying a false face and the method for training a face identification model described in any of the above embodiments.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the face authentication method. For example, in some embodiments, the face authentication method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the above-described face authentication method may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the face authentication method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility existing in the traditional physical host and Virtual Private Server (VPS) service; it may also be a server of a distributed system, or a server incorporating a blockchain.

According to the technical scheme of the embodiment of the disclosure, a human face counterfeit discrimination method is provided, and feature processing is performed on the basis of a central differential convolution method and an attention mechanism through each central differential attention module in a plurality of central differential attention modules connected in series so as to capture local and fine-grained forged traces of a human face image in a spatial domain, so that the accuracy of a counterfeit discrimination result of the human face image is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially or in different orders, as long as the desired results of the technical solutions provided by this disclosure can be achieved, which are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A face authentication method comprises the following steps:

determining an initial characteristic image of the acquired face image;

processing the input feature map of each central differential attention module based on a central differential convolution method and an attention mechanism through each central differential attention module in a plurality of central differential attention modules connected in series to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is the initial feature map, and the input feature map of the central differential attention module for each central differential attention module after the first central differential attention module is the output feature map of the previous central differential attention module;

and determining whether the face image is a forged face image or not based on the processed feature image.

2. The method of claim 1, wherein the processing of the input feature map of each of the plurality of central differential attention modules in series based on a central differential convolution method and an attention mechanism by the central differential attention module comprises:

by each of the plurality of central differential attention modules in the series, performing the following operations:

convolving the input feature map of the central difference attention module to obtain a convolution feature map;

obtaining a query vector based on the convolution feature map;

performing center difference convolution on the convolution characteristic graph by a center difference convolution method to obtain a key vector and a value vector;

and processing the query vector, the key vector and the value vector through an attention mechanism to obtain an output feature map of the central differential attention module.

3. The method of claim 2, wherein the processing the query vector, the key vector, and the value vector through an attention mechanism to obtain the output feature map of the central differential attention module comprises:

processing the query vector, the key vector and the value vector through a multi-head self-attention mechanism to obtain a head characteristic diagram corresponding to each head;

and obtaining an output characteristic diagram of the central differential attention module based on the head characteristic diagram corresponding to each head.

4. A method according to any one of claims 1-3, wherein a high frequency wavelet sampler is provided between two adjacent central differential attention modules, and

processing the input feature map of each center differential attention module in the plurality of center differential attention modules connected in series based on a center differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the processing comprises:

for each central differential attention module in the plurality of central differential attention modules, processing the input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to obtain an output feature map of the central differential attention module, and extracting high-frequency features in the output feature map of the central differential attention module through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain an input feature map of the next central differential attention module;

and taking the output characteristic map of the last central differential attention module as the processed characteristic map.

5. The method of claim 4, wherein said extracting high frequency features from the output feature map of the central differential attention module by a high frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain the input feature map of the next central differential attention module comprises:

decomposing and obtaining various high-frequency components of each channel in the output characteristic diagram of the central differential attention module in a frequency domain based on a discrete wavelet transform mode through a high-frequency wavelet sampler between the central differential attention module and a next central differential attention module;

splicing the same high-frequency components corresponding to the channels to obtain spliced high-frequency components;

and cascading the spliced high-frequency components to obtain the high-frequency characteristic in the output characteristic diagram of the central differential attention module so as to determine the input characteristic diagram corresponding to the next central differential attention module.

6. The method of claim 5, wherein the concatenating the post-stitching high-frequency components to obtain the high-frequency feature in the output feature map of the center differential attention module to obtain the input feature map corresponding to the next center differential attention module comprises:

cascading the spliced high-frequency components to obtain a cascading high-frequency characteristic;

and carrying out layer normalization on the cascade high-frequency characteristics to obtain an input characteristic diagram of the next central differential attention module.

7. The method of claim 5, wherein the processing the input feature map based on a center differential convolution method and an attention mechanism by each center differential attention module of a plurality of center differential attention modules in series to obtain a processed feature map comprises:

providing a supplementary feature map obtained based on the initial feature map for each of the plurality of central differential attention modules based on a jump connection manner;

and processing the supplementary feature map and the input feature map corresponding to each central differential attention module through each central differential attention module in the plurality of central differential attention modules based on a central differential convolution method and an attention mechanism to finally obtain the processed feature map.

8. A training method of a face counterfeit identification model comprises the following steps:

acquiring a training sample set, wherein training samples in the training sample set comprise sample face images and labels for representing whether the sample face images are forged face images;

determining an initial feature map of the input sample face image through the embedding layer; processing an input feature map of a central differential attention module through each central differential attention module in a plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is the initial feature map, and for each central differential attention module after the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module; and taking the label corresponding to the input sample face image as expected output of a face authentication result obtained by the output layer based on the processed feature map, and training by a machine learning method to obtain a face authentication model comprising the embedding layer, the plurality of central differential attention modules and the output layer.

9. The method of claim 8, wherein processing the input feature map of each of the plurality of central differential attention modules in series based on a central differential convolution method and an attention mechanism by the central differential attention module comprises:

obtaining a query vector based on the convolution feature map;

10. The method of claim 9, wherein the processing the query vector, the key vector, and the value vector through an attention mechanism to obtain the output feature map of the central differential attention module comprises:

11. The method according to any one of claims 8-10, wherein the face discrimination model further comprises a high-frequency wavelet sampler disposed between two adjacent central differential attention modules, and

12. The method of claim 11, wherein said extracting high frequency features from the output feature map of the central differential attention module by a high frequency wavelet sampler between the central differential attention module and a next central differential attention module to obtain the input feature map of the next central differential attention module comprises:

13. The method of claim 12, wherein the concatenating the post-stitching high-frequency components to obtain the high-frequency feature in the output feature map of the central differential attention module to obtain the input feature map corresponding to the next central differential attention module comprises:

and carrying out layer normalization on the cascade high-frequency features to obtain an input feature map of the next central differential attention module.

14. The method of claim 12, wherein the processing the input feature map based on a center differential convolution method and an attention mechanism by each center differential attention module of a plurality of center differential attention modules in series to obtain a processed feature map comprises:

15. A face authentication device, comprising:

a first determination unit configured to determine an initial feature map of the acquired face image;

a deriving unit configured to process, by each central differential attention module of a plurality of central differential attention modules connected in series, an input feature map of the central differential attention module based on a central differential convolution method and an attention mechanism to finally derive a processed feature map, wherein the input feature map of a first central differential attention module of the plurality of central differential attention modules is the initial feature map, and for each central differential attention module subsequent to the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module;

a second determination unit configured to determine whether the face image is a forged face image based on the processed feature map.

16. A training device for a face counterfeit identification model comprises:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a training sample set, wherein training samples in the training sample set comprise sample face images and labels for representing whether the sample face images are forged face images;

a training unit configured to determine an initial feature map of the input sample face image through the embedding layer; processing an input feature map of a central differential attention module through each central differential attention module in a plurality of central differential attention modules connected in series based on a central differential convolution method and an attention mechanism to finally obtain a processed feature map, wherein the input feature map of a first central differential attention module in the plurality of central differential attention modules is the initial feature map, and for each central differential attention module after the first central differential attention module, the input feature map of the central differential attention module is an output feature map of a previous central differential attention module; and taking the label corresponding to the input sample face image as expected output of a face authentication result obtained by the output layer based on the processed feature map, and training by a machine learning method to obtain a face authentication model comprising the embedding layer, the plurality of central differential attention modules and the output layer.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-14.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-14.

19. A computer program product, comprising: computer program, which when executed by a processor implements the method according to any of claims 1-14.