CN115311719A

CN115311719A - Face attribute recognition algorithm and system based on multi-order attention mechanism fusion

Info

Publication number: CN115311719A
Application number: CN202210964078.XA
Authority: CN
Inventors: 姚灿荣; 张光斌; 吴俊毅; 高志鹏; 赵建强; 杜新胜; 韩名羲
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-11-08

Abstract

The invention provides a face attribute recognition algorithm based on multi-order attention mechanism fusion, which comprises the following steps: responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image; inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing; and completing the attribute recognition of the face image. A multi-order attention mechanism fusion network is introduced on the basis of a convolutional neural network, the capability of constructing global characteristic information by using a Transformer is utilized, all facial attribute information of an image is learned, the powerful characteristic information extraction capability of the convolutional neural network and the capability of modeling the global characteristic information by using the multi-order attention mechanism fusion network are utilized, and the recognition capability of an algorithm on various attributes of the face is improved.

Description

Face attribute recognition algorithm and system based on multi-order attention mechanism fusion

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a face attribute recognition algorithm and system based on multi-order attention mechanism fusion.

Background

The face recognition technology is one of the most mature technologies of artificial intelligence, is taken as an important face recognition module, and the face attribute recognition and analysis are paid attention by the industry and academia, and are widely applied to image retrieval, face recognition, pedestrian re-recognition, micro-expression recognition, image generation and recommendation systems and the like.

With the rapid development of face recognition technology, many tasks similar to face recognition are derived, for example: face micro-expression recognition, face cross-modal retrieval, face forgery detection, face attribute recognition and the like. The task of face attribute recognition is to predict multiple facial attributes such as gender, whether to wear glasses, and smile.

In recent years, because a Convolutional Neural Network (CNN) has a strong feature learning capability, most image learning tasks adopt the CNN to perform feature extraction, and then perform various operations on features to realize results such as image classification and target detection. For example, the general advanced pedestrian attribute identification algorithm utilizes CNN to extract facial attributes, and then classifies them. The traditional face attribute recognition algorithm also mainly utilizes a convolutional neural network to generate a plurality of effects with remarkable performance, the algorithm usually adopts MobileNet and EfficientNet as main networks, and then the face attribute is predicted in a multi-label or multi-task mode in a classification layer. Therefore, the performance of the image feature extractor plays a decisive role in the overall face attribute learning task.

However, the current algorithm ignores that the receptive field of the convolutional neural network presents gaussian distribution, and can over-concentrate local information in the image and ignore the learning of other local attributes. If a large CNN structure is adopted for feature learning and extraction, the model brings huge operation pressure to some front-end face applications, and the model learning process may be trapped in the over-fitting dilemma. Meanwhile, part of attribute data in the face attribute data set is difficult to acquire, and the problem of serious imbalance of data among classes exists, so that an algorithm based on a convolutional neural network is difficult to generalize on part of attributes, and the recognition rate of part of attributes has large deviation.

In view of the above, it is very significant to provide a face attribute recognition algorithm and system based on multi-order attention mechanism fusion.

Disclosure of Invention

The invention provides a human face attribute recognition algorithm and system based on multi-order attention mechanism fusion, and aims to solve the problems that local information in an image is too concentrated, learning of other local attributes is omitted, acquisition of partial attribute data in a human face attribute data set is unbalanced, model operation pressure is high and the like in the conventional algorithm.

In a first aspect, the present invention provides a face attribute recognition algorithm based on multi-order attention mechanism fusion, which includes:

responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image;

inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; and

simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing;

and completing the attribute recognition of the face image.

Preferably, the convolutional neural network model adopts a lightweight MobileNetV2 model, wherein the MobileNetV2 model includes 8 residual volume blocks { b0, b1, \8230;, b7}, and each residual volume block is composed of a depth separable volume layer-BN layer-an active layer-a residual connecting layer.

Further preferably, the convolutional neural network model extracts image features of the face image in the following manner:

F＝(B{I _i |θ ₁ ,θ ₂ ,…,θ _n })

wherein B represents the forward operation of the MobileNet V2 model, I _i Representing the input RGB image, theta ₁ ,θ ₂ ,…,θ _n Representing the parameters of the residual volume block.

Further preferably, the training and processing of image features by using the MobileNetV2 model includes:

firstly, passing the acquired image features through a global pooling layer and a classification layer;

the MobileNetV2 model is then trained under the constraints of the BCE loss function.

Preferably, the multi-order attention mechanism fusion network model adopts a Transformer model.

Preferably, the training and processing of the image features of the face image by using the multi-order attention mechanism fusion network model includes:

modeling global feature information in the face image by using the multi-order attention mechanism fusion network model;

the obtained global feature information is used as the input of the multi-order attention mechanism fusion network model and is coded by the multi-order attention mechanism fusion network model;

and carrying out optimization constraint on the multi-order attention mechanism fusion network under the constraint of a BCE loss function and then outputting.

Preferably, the BCE loss function is as follows:

wherein N represents the number of images trained, M represents the number of face attribute classes, log i ts _ij Representing the output of the classification layer, y _ij Represents the image label, σ (z) = 1/(1 + e) ^-z )，ω _j The expression is as follows:

the loss function of the whole algorithm is expressed as follows:

L＝l _bce1 +l _bce2

wherein l _bce1 Represents the loss function, l, of the MobileNet V2 model _bce2 A loss function representing the model of the multi-order attention mechanism fusion network,/ _bce1 And l _bce2 The proportions of (A) and (B) are consistent.

In a second aspect, the present application further provides a face attribute recognition system based on multi-order attention mechanism fusion, including:

an acquisition module: the method comprises the steps of acquiring a complete face area in an image according to a face detection method and a face alignment method;

an input module: the system comprises a convolutional neural network model, a multi-order attention mechanism fusion network model, a face image acquisition unit and a face image acquisition unit, wherein the convolutional neural network model is used for inputting an acquired face image into the convolutional neural network model and inputting image characteristics of the acquired face image into the multi-order attention mechanism fusion network model;

an output module: the image characteristic used for outputting the human face image and outputting the human face image;

an extraction module: the face image extraction device is used for extracting a plurality of image characteristics of a face image;

a training and processing module: and the method is used for training and processing the extracted image features.

In a third aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out a method as described in any one of the implementations of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method described in any implementation manner of the first aspect.

Compared with the prior art, the beneficial results of the invention are as follows:

(1) The method has the advantages that a multi-order attention mechanism fusion network (Transformer) is introduced on the basis of a convolutional neural network, the capability of constructing global characteristic information by the Transformer is utilized to learn all facial attribute information of an image, in a face attribute recognition learning task, on the basis of a light convolutional network, the strong characteristic information extraction capability of the convolutional neural network and the capability of modeling the global characteristic information by the multi-order attention mechanism fusion network are utilized through the capability of modeling the global characteristic information, the recognition capability of an algorithm on the facial attributes is improved, experiments are carried out on an open-source face attribute data set, the face attribute recognition performance can be effectively improved from 83.7% to 92.3%, and the recognition capability of the algorithm on various attributes of the face is further improved.

(2) The problem of learning that some local attributes are ignored due to the fact that a receptive field of a traditional convolution neural network is in Gaussian distribution and local information in an image is too focused is solved, and the deviation of the recognition rate of some attributes in a face image is reduced.

(3) The huge operation pressure brought by the application of a large CNN structure model to some front-end faces is solved, and the dilemma of overfitting in the model learning process is avoided; the method and the device solve the problems that partial attribute data in the face attribute data set are difficult to collect, and data among classes are seriously unbalanced.

Drawings

The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.

FIG. 1 is an exemplary device architecture diagram in which an embodiment of the present invention may be employed;

FIG. 2 is an overall frame diagram of a face attribute recognition algorithm based on multi-order attention mechanism fusion according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a face attribute recognition algorithm based on multi-order attention mechanism fusion according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a face attribute recognition system based on multi-step attention mechanism fusion according to an embodiment of the present invention;

FIG. 5 is a schematic block diagram of a computer apparatus suitable for use in implementing an electronic device of an embodiment of the invention.

Detailed Description

In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. In this regard, directional terminology, such as "top," "bottom," "left," "right," "up," "down," etc., is used with reference to the orientation of the figures being described. Because components of embodiments can be positioned in a number of different orientations, the directional terminology is used for purposes of illustration and is in no way limiting. It is to be understood that other embodiments may be utilized and logical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 1 illustrates an exemplary system architecture 100 for a method for processing information or an apparatus for processing information to which embodiments of the present invention may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having communication functions, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background information processing server that processes check request information transmitted by the

terminal apparatuses

101, 102, 103. The background information processing server may perform processing such as analysis on the received verification request information, and obtain a processing result (for example, verification success information used for representing that the verification request is a legal request).

It should be noted that the method for processing information provided by the embodiment of the present invention is generally executed by the server 105, and accordingly, the apparatus for processing information is generally disposed in the server 105. In addition, the method for sending information provided by the embodiment of the present invention is generally executed by the

terminal equipment

101, 102, 103, and accordingly, the apparatus for sending information is generally disposed in the

terminal equipment

101, 102, 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or a plurality of software modules, and is not limited in particular herein.

With the rapid development of face recognition technology, many tasks similar to face recognition are derived, for example: face micro-expression recognition, face cross-modal retrieval, face forgery detection, face attribute recognition and the like. The task of face attribute recognition is to predict multiple facial attributes such as gender, whether glasses are worn, and smiling.

In recent years, because a Convolutional Neural Network (CNN) has a strong feature learning capability, most image learning tasks adopt the CNN to perform feature extraction, and then perform various operations on features to realize results such as image classification and target detection. For example, the general advanced pedestrian attribute identification algorithm utilizes CNN to extract facial attributes, and then classifies them.

The traditional face attribute recognition algorithm also mainly utilizes a convolutional neural network to generate a plurality of effects with remarkable performance, the algorithm usually adopts MobileNet and EfficientNet as main networks, and then the face attribute is predicted in a multi-label or multi-task mode in a classification layer. Therefore, the performance of the image feature extractor plays a decisive role in the whole face attribute learning task.

The multi-task-based face attribute recognition algorithm is an effective learning paradigm, and the performance of a target task is improved with the help of some related auxiliary tasks. A multi-label-based face attribute algorithm can simultaneously predict various face attributes in an end-to-end training network. Because each face image is associated with multiple attribute labels, multi-label learning is well suited for face attribute recognition. However, the current algorithm ignores that the receptive field of the convolutional neural network presents gaussian distribution, and can over-concentrate local information in the image and ignore the learning of other local attributes. If a large CNN structure is adopted for feature learning and extraction, the model brings huge computational pressure to some front-end face applications, and the model may be trapped in the over-fitting dilemma in the learning process. Meanwhile, partial attribute data in the face attribute data set are difficult to acquire, and the problem of serious imbalance of data among classes exists, so that an algorithm based on the convolutional neural network is difficult to generalize on partial attributes, and the recognition rate of the partial attributes has large deviation.

Based on the above problems, a lightweight network structure such as MobileNetV2 is used as a convolutional neural network part for feature learning and extraction, a multi-order attention mechanism fusion network (Transformer) is introduced on the basis, and all face attribute information of an image is learned by using the capability of the Transformer to construct global characteristic information.

The invention provides a face attribute recognition algorithm based on multi-order attention mechanism fusion, which improves the recognition capability of the algorithm on face attributes by utilizing the strong characteristic information capability of a convolutional neural network and the capability of the multi-order attention mechanism fusion network for modeling global characteristic information. As shown in fig. 2, the main network of the whole framework includes a lightweight convolutional neural network mobilonetv 2 and a multi-order attention mechanism fusion network (Transformer).

Fig. 3 shows an embodiment of the present invention discloses a face attribute recognition algorithm based on multi-step attention mechanism fusion, and as shown in fig. 2 and fig. 3, the algorithm includes:

s1, responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image;

specifically, the face detection method and the face alignment method mentioned in this embodiment both adopt a method common to face processing, such as an mtcnn face detection algorithm, and therefore do not relate to the core invention point of the present invention, and are not described herein again. And (4) scratching the complete face area in the image through a face detection algorithm, and outputting the face image in the size of 224x 224.

S21, inputting the acquired face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image, and training and processing the image characteristics; and

specifically, in this embodiment, the convolutional neural network model is a lightweight MobileNetV2 model, and MobileNetV2 is a lightweight convolutional neural network and uses deep separable convolution. In the invention, a lightweight network structure of MobileNet V2 is used as a convolutional neural network part for feature learning and extraction. Convolutional neural networks with the same effect, such as MobileNet, leNet-5, alexNet, etc., may also be used in other embodiments.

The extracted face image is used as the input of a MobileNet V2, the MobileNet V2 network comprises 8 residual volume blocks { b0, b1, \8230;, b7}, and each residual volume block consists of a depth separable volume layer-BN layer-an active layer-a residual connecting layer. The characteristics of the image information are extracted layer by layer from the 8 residual convolution blocks of the image, the characteristics are gradually extracted from the edge information to the high-level semantic information, and the characteristic extraction mode is as follows:

F＝(B{I _i |θ ₁ ,θ ₂ ,…,θ _n })

wherein B represents the forward operation of the MobileNet V2 model, I _i Representing an input RGB image, θ ₁ ,θ ₂ ,…,θ _n Representing parameters of the residual volume block.

Specifically, the face image passes through the features obtained in step S21, then passes through a global pooling layer and a classification layer, and then is trained on MobileNetV2 under the constraint of the BCE loss function.

S22, inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing;

in this embodiment, a transform model is used as the multi-order attention mechanism fusion network model. The features of the step S21 are sent into a global pooling layer and a classification layer, and also input into a multi-order attention mechanism fusion network for modeling global feature information in the human face image, and the feature information is used as the input of the multi-order attention mechanism fusion network and is converted into an image pixel sequence for final human face attribute prediction. The characteristics are coded by the multi-order attention mechanism fusion network, and the final output of the multi-order attention mechanism fusion network is optimized and constrained under the constraint of a BCE loss function.

Specifically, the training and processing of the image features of the face image by using the multi-order attention mechanism fusion network model comprises:

s221, modeling global feature information in the face image by using the multi-order attention mechanism fusion network model;

s222, the obtained global feature information is used as input of the multi-order attention mechanism fusion network model and is coded through the multi-order attention mechanism fusion network model;

and S223, carrying out optimization constraint on the multi-order attention mechanism fusion network under the constraint of a BCE loss function, and then outputting.

Wherein the BCE loss function is as follows:

the loss function of the whole algorithm in this embodiment is expressed as follows:

L＝l _bce1 +l _bce2

wherein l _bce1 Loss function, l, representing the MobileNet V2 model _bce2 A loss function representing the model of the multi-order attention mechanism fusion network,/ _bce1 And l _bce2 The proportions of (a) and (b) are consistent.

In order to enable the convolutional neural network and the multi-order attention mechanism fusion network to have the same hierarchical learning on the local feature information and the global feature information of the image, the proportion of the two loss functions is the same.

And S3, completing the attribute identification of the face image.

In order to solve the problem that the traditional convolutional neural network ignores certain local attributes due to the fact that a receptive field presents Gaussian distribution and local information in an image is too focused, and the deviation of the recognition rate of the partial attributes in a face image is reduced, the invention provides a multi-order attention mechanism fusion network (Transformer) introduced on the basis of the convolutional neural network, and the capability of the Transformer for constructing global characteristic information is utilized to learn all face attribute information of the image. In the human face attribute recognition learning task, on the basis of a lightweight convolutional network, the recognition capability of an algorithm on various facial attributes is improved through the capability of modeling global characteristic information.

In order to verify the effectiveness of the algorithm, experiments are carried out on the open-source face attribute data set, and the face attribute recognition performance can be effectively improved from 83.7% to 92.3%.

In a second aspect, the present application further provides a face attribute recognition system based on multi-order attention mechanism fusion, as shown in fig. 4, including:

the acquisition module 41: the method comprises the steps of acquiring a complete face area in an image according to a face detection method and a face alignment method;

the input module 42: the system comprises a convolutional neural network model, a multi-order attention mechanism fusion network model, a face image acquisition unit and a face image acquisition unit, wherein the convolutional neural network model is used for inputting an acquired face image into the convolutional neural network model and inputting image characteristics of the acquired face image into the multi-order attention mechanism fusion network model;

the output module 43: the image characteristic used for outputting the human face image and outputting the human face image;

the extraction module 44: the face image extraction device is used for extracting a plurality of image features of a face image;

training and processing module 45: and the method is used for training and processing the extracted image features.

Referring now to FIG. 5, a block diagram of a computer apparatus 600 suitable for use with an electronic device (e.g., the server or terminal device shown in FIG. 1) to implement an embodiment of the invention is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 5, the computer apparatus 600 includes a Central Processing Unit (CPU) 601 and a Graphics Processing Unit (GPU) 602, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 603 or a program loaded from a storage section 609 into a Random Access Memory (RAM) 606. In the RAM 604, various programs and data necessary for the operation of the apparatus 600 are also stored. The CPU 601, GPU602, ROM 603, and RAM 604 are connected to each other via a bus 605. An input/output (I/O) interface 606 is also connected to bus 605.

The following components are connected to the I/O interface 606: an input portion 607 including a keyboard, a mouse, and the like; an output section 608 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 609 including a hard disk and the like; and a communication section 610 including a network interface card such as a LAN card, a modem, or the like. The communication section 610 performs communication processing via a network such as the internet. The drive 611 may also be connected to the I/O interface 606 as needed. A removable medium 612 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 611 as necessary, so that a computer program read out therefrom is mounted into the storage section 609 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication section 610, and/or installed from the removable media 612. The computer programs, when executed by a Central Processing Unit (CPU) 601 and a Graphics Processor (GPU) 602, perform the above-described functions defined in the method of the present invention.

It should be noted that the computer readable medium of the present invention can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The modules described may also be provided in a processor.

As another aspect, the present invention also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to a face detection method and a face alignment method, acquiring a complete face region in the image, and outputting a face image; inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing; and completing the attribute recognition of the face image.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the scope of the invention as defined by the appended claims. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims

1. A face attribute recognition algorithm based on multi-order attention mechanism fusion is characterized by comprising the following steps:

and completing the attribute recognition of the face image.

2. The algorithm of claim 1, wherein the convolutional neural network model is a lightweight MobileNetV2 model, wherein the MobileNetV2 model comprises 8 residual convolutional blocks { b0, b1, \8230;, b7}, and each residual convolutional block is composed of a depth separable convolutional layer-BN layer-an active layer-a residual connected layer.

3. The algorithm for face attribute recognition based on multi-step attention mechanism fusion as claimed in claim 2, wherein the convolutional neural network model extracts the image features of the face image as follows:

F＝(B{I _i |θ ₁ ,θ ₂ ,…,θ _n })

4. The algorithm for face attribute recognition based on multi-step attention mechanism fusion of claim 3, wherein the training and processing of image features using the MobileNet V2 model comprises:

5. The algorithm for face attribute recognition based on multi-order attention mechanism fusion of claim 1, wherein the multi-order attention mechanism fusion network model employs a Transformer model.

6. The algorithm for recognizing human face attributes based on multi-order attention mechanism fusion as claimed in claim 1, wherein the training and processing of image features of human face images by using the multi-order attention mechanism fusion network model comprises:

modeling global feature information in the face image by utilizing the multi-order attention mechanism fusion network model;

7. The algorithm for face attribute recognition based on multi-step attention mechanism fusion as claimed in claim 3 or 6, wherein the BCE loss function is as follows:

wherein N represents the number of images to be trained, M represents the number of face attribute categories, locations _ij Representing the output of the classification layer, y _ij Represents the image label, σ (z) = 1/(1 + e) ^-z )，ω _j The expression is as follows:

the loss function of the whole algorithm is expressed as follows:

L＝l _bce1 +l _bce2

8. A face attribute recognition system based on multi-order attention mechanism fusion is characterized by comprising:

an extraction module: the face image extraction device is used for extracting a plurality of image features of a face image;

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.