CN115311719A - Face attribute recognition algorithm and system based on multi-order attention mechanism fusion - Google Patents

Face attribute recognition algorithm and system based on multi-order attention mechanism fusion Download PDF

Info

Publication number
CN115311719A
CN115311719A CN202210964078.XA CN202210964078A CN115311719A CN 115311719 A CN115311719 A CN 115311719A CN 202210964078 A CN202210964078 A CN 202210964078A CN 115311719 A CN115311719 A CN 115311719A
Authority
CN
China
Prior art keywords
face
image
attention mechanism
mechanism fusion
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210964078.XA
Other languages
Chinese (zh)
Inventor
姚灿荣
张光斌
吴俊毅
高志鹏
赵建强
杜新胜
韩名羲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202210964078.XA priority Critical patent/CN115311719A/en
Publication of CN115311719A publication Critical patent/CN115311719A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a face attribute recognition algorithm based on multi-order attention mechanism fusion, which comprises the following steps: responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image; inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing; and completing the attribute recognition of the face image. A multi-order attention mechanism fusion network is introduced on the basis of a convolutional neural network, the capability of constructing global characteristic information by using a Transformer is utilized, all facial attribute information of an image is learned, the powerful characteristic information extraction capability of the convolutional neural network and the capability of modeling the global characteristic information by using the multi-order attention mechanism fusion network are utilized, and the recognition capability of an algorithm on various attributes of the face is improved.

Description

Face attribute recognition algorithm and system based on multi-order attention mechanism fusion
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a face attribute recognition algorithm and system based on multi-order attention mechanism fusion.
Background
The face recognition technology is one of the most mature technologies of artificial intelligence, is taken as an important face recognition module, and the face attribute recognition and analysis are paid attention by the industry and academia, and are widely applied to image retrieval, face recognition, pedestrian re-recognition, micro-expression recognition, image generation and recommendation systems and the like.
With the rapid development of face recognition technology, many tasks similar to face recognition are derived, for example: face micro-expression recognition, face cross-modal retrieval, face forgery detection, face attribute recognition and the like. The task of face attribute recognition is to predict multiple facial attributes such as gender, whether to wear glasses, and smile.
In recent years, because a Convolutional Neural Network (CNN) has a strong feature learning capability, most image learning tasks adopt the CNN to perform feature extraction, and then perform various operations on features to realize results such as image classification and target detection. For example, the general advanced pedestrian attribute identification algorithm utilizes CNN to extract facial attributes, and then classifies them. The traditional face attribute recognition algorithm also mainly utilizes a convolutional neural network to generate a plurality of effects with remarkable performance, the algorithm usually adopts MobileNet and EfficientNet as main networks, and then the face attribute is predicted in a multi-label or multi-task mode in a classification layer. Therefore, the performance of the image feature extractor plays a decisive role in the overall face attribute learning task.
However, the current algorithm ignores that the receptive field of the convolutional neural network presents gaussian distribution, and can over-concentrate local information in the image and ignore the learning of other local attributes. If a large CNN structure is adopted for feature learning and extraction, the model brings huge operation pressure to some front-end face applications, and the model learning process may be trapped in the over-fitting dilemma. Meanwhile, part of attribute data in the face attribute data set is difficult to acquire, and the problem of serious imbalance of data among classes exists, so that an algorithm based on a convolutional neural network is difficult to generalize on part of attributes, and the recognition rate of part of attributes has large deviation.
In view of the above, it is very significant to provide a face attribute recognition algorithm and system based on multi-order attention mechanism fusion.
Disclosure of Invention
The invention provides a human face attribute recognition algorithm and system based on multi-order attention mechanism fusion, and aims to solve the problems that local information in an image is too concentrated, learning of other local attributes is omitted, acquisition of partial attribute data in a human face attribute data set is unbalanced, model operation pressure is high and the like in the conventional algorithm.
In a first aspect, the present invention provides a face attribute recognition algorithm based on multi-order attention mechanism fusion, which includes:
responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image;
inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; and
simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing;
and completing the attribute recognition of the face image.
Preferably, the convolutional neural network model adopts a lightweight MobileNetV2 model, wherein the MobileNetV2 model includes 8 residual volume blocks { b0, b1, \8230;, b7}, and each residual volume block is composed of a depth separable volume layer-BN layer-an active layer-a residual connecting layer.
Further preferably, the convolutional neural network model extracts image features of the face image in the following manner:
F=(B{I i12 ,…,θ n })
wherein B represents the forward operation of the MobileNet V2 model, I i Representing the input RGB image, theta 12 ,…,θ n Representing the parameters of the residual volume block.
Further preferably, the training and processing of image features by using the MobileNetV2 model includes:
firstly, passing the acquired image features through a global pooling layer and a classification layer;
the MobileNetV2 model is then trained under the constraints of the BCE loss function.
Preferably, the multi-order attention mechanism fusion network model adopts a Transformer model.
Preferably, the training and processing of the image features of the face image by using the multi-order attention mechanism fusion network model includes:
modeling global feature information in the face image by using the multi-order attention mechanism fusion network model;
the obtained global feature information is used as the input of the multi-order attention mechanism fusion network model and is coded by the multi-order attention mechanism fusion network model;
and carrying out optimization constraint on the multi-order attention mechanism fusion network under the constraint of a BCE loss function and then outputting.
Preferably, the BCE loss function is as follows:
Figure BDA0003794003180000031
wherein N represents the number of images trained, M represents the number of face attribute classes, log i ts ij Representing the output of the classification layer, y ij Represents the image label, σ (z) = 1/(1 + e) -z ),ω j The expression is as follows:
Figure BDA0003794003180000032
the loss function of the whole algorithm is expressed as follows:
L=l bce1 +l bce2
wherein l bce1 Represents the loss function, l, of the MobileNet V2 model bce2 A loss function representing the model of the multi-order attention mechanism fusion network,/ bce1 And l bce2 The proportions of (A) and (B) are consistent.
In a second aspect, the present application further provides a face attribute recognition system based on multi-order attention mechanism fusion, including:
an acquisition module: the method comprises the steps of acquiring a complete face area in an image according to a face detection method and a face alignment method;
an input module: the system comprises a convolutional neural network model, a multi-order attention mechanism fusion network model, a face image acquisition unit and a face image acquisition unit, wherein the convolutional neural network model is used for inputting an acquired face image into the convolutional neural network model and inputting image characteristics of the acquired face image into the multi-order attention mechanism fusion network model;
an output module: the image characteristic used for outputting the human face image and outputting the human face image;
an extraction module: the face image extraction device is used for extracting a plurality of image characteristics of a face image;
a training and processing module: and the method is used for training and processing the extracted image features.
In a third aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out a method as described in any one of the implementations of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method described in any implementation manner of the first aspect.
Compared with the prior art, the beneficial results of the invention are as follows:
(1) The method has the advantages that a multi-order attention mechanism fusion network (Transformer) is introduced on the basis of a convolutional neural network, the capability of constructing global characteristic information by the Transformer is utilized to learn all facial attribute information of an image, in a face attribute recognition learning task, on the basis of a light convolutional network, the strong characteristic information extraction capability of the convolutional neural network and the capability of modeling the global characteristic information by the multi-order attention mechanism fusion network are utilized through the capability of modeling the global characteristic information, the recognition capability of an algorithm on the facial attributes is improved, experiments are carried out on an open-source face attribute data set, the face attribute recognition performance can be effectively improved from 83.7% to 92.3%, and the recognition capability of the algorithm on various attributes of the face is further improved.
(2) The problem of learning that some local attributes are ignored due to the fact that a receptive field of a traditional convolution neural network is in Gaussian distribution and local information in an image is too focused is solved, and the deviation of the recognition rate of some attributes in a face image is reduced.
(3) The huge operation pressure brought by the application of a large CNN structure model to some front-end faces is solved, and the dilemma of overfitting in the model learning process is avoided; the method and the device solve the problems that partial attribute data in the face attribute data set are difficult to collect, and data among classes are seriously unbalanced.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.
FIG. 1 is an exemplary device architecture diagram in which an embodiment of the present invention may be employed;
FIG. 2 is an overall frame diagram of a face attribute recognition algorithm based on multi-order attention mechanism fusion according to an embodiment of the present invention;
FIG. 3 is a schematic flowchart of a face attribute recognition algorithm based on multi-order attention mechanism fusion according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a face attribute recognition system based on multi-step attention mechanism fusion according to an embodiment of the present invention;
FIG. 5 is a schematic block diagram of a computer apparatus suitable for use in implementing an electronic device of an embodiment of the invention.
Detailed Description
In the following detailed description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. In this regard, directional terminology, such as "top," "bottom," "left," "right," "up," "down," etc., is used with reference to the orientation of the figures being described. Because components of embodiments can be positioned in a number of different orientations, the directional terminology is used for purposes of illustration and is in no way limiting. It is to be understood that other embodiments may be utilized and logical changes may be made without departing from the scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Fig. 1 illustrates an exemplary system architecture 100 for a method for processing information or an apparatus for processing information to which embodiments of the present invention may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having communication functions, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background information processing server that processes check request information transmitted by the terminal apparatuses 101, 102, 103. The background information processing server may perform processing such as analysis on the received verification request information, and obtain a processing result (for example, verification success information used for representing that the verification request is a legal request).
It should be noted that the method for processing information provided by the embodiment of the present invention is generally executed by the server 105, and accordingly, the apparatus for processing information is generally disposed in the server 105. In addition, the method for sending information provided by the embodiment of the present invention is generally executed by the terminal equipment 101, 102, 103, and accordingly, the apparatus for sending information is generally disposed in the terminal equipment 101, 102, 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or a plurality of software modules, and is not limited in particular herein.
With the rapid development of face recognition technology, many tasks similar to face recognition are derived, for example: face micro-expression recognition, face cross-modal retrieval, face forgery detection, face attribute recognition and the like. The task of face attribute recognition is to predict multiple facial attributes such as gender, whether glasses are worn, and smiling.
In recent years, because a Convolutional Neural Network (CNN) has a strong feature learning capability, most image learning tasks adopt the CNN to perform feature extraction, and then perform various operations on features to realize results such as image classification and target detection. For example, the general advanced pedestrian attribute identification algorithm utilizes CNN to extract facial attributes, and then classifies them.
The traditional face attribute recognition algorithm also mainly utilizes a convolutional neural network to generate a plurality of effects with remarkable performance, the algorithm usually adopts MobileNet and EfficientNet as main networks, and then the face attribute is predicted in a multi-label or multi-task mode in a classification layer. Therefore, the performance of the image feature extractor plays a decisive role in the whole face attribute learning task.
The multi-task-based face attribute recognition algorithm is an effective learning paradigm, and the performance of a target task is improved with the help of some related auxiliary tasks. A multi-label-based face attribute algorithm can simultaneously predict various face attributes in an end-to-end training network. Because each face image is associated with multiple attribute labels, multi-label learning is well suited for face attribute recognition. However, the current algorithm ignores that the receptive field of the convolutional neural network presents gaussian distribution, and can over-concentrate local information in the image and ignore the learning of other local attributes. If a large CNN structure is adopted for feature learning and extraction, the model brings huge computational pressure to some front-end face applications, and the model may be trapped in the over-fitting dilemma in the learning process. Meanwhile, partial attribute data in the face attribute data set are difficult to acquire, and the problem of serious imbalance of data among classes exists, so that an algorithm based on the convolutional neural network is difficult to generalize on partial attributes, and the recognition rate of the partial attributes has large deviation.
Based on the above problems, a lightweight network structure such as MobileNetV2 is used as a convolutional neural network part for feature learning and extraction, a multi-order attention mechanism fusion network (Transformer) is introduced on the basis, and all face attribute information of an image is learned by using the capability of the Transformer to construct global characteristic information.
The invention provides a face attribute recognition algorithm based on multi-order attention mechanism fusion, which improves the recognition capability of the algorithm on face attributes by utilizing the strong characteristic information capability of a convolutional neural network and the capability of the multi-order attention mechanism fusion network for modeling global characteristic information. As shown in fig. 2, the main network of the whole framework includes a lightweight convolutional neural network mobilonetv 2 and a multi-order attention mechanism fusion network (Transformer).
Fig. 3 shows an embodiment of the present invention discloses a face attribute recognition algorithm based on multi-step attention mechanism fusion, and as shown in fig. 2 and fig. 3, the algorithm includes:
s1, responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image;
specifically, the face detection method and the face alignment method mentioned in this embodiment both adopt a method common to face processing, such as an mtcnn face detection algorithm, and therefore do not relate to the core invention point of the present invention, and are not described herein again. And (4) scratching the complete face area in the image through a face detection algorithm, and outputting the face image in the size of 224x 224.
S21, inputting the acquired face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image, and training and processing the image characteristics; and
specifically, in this embodiment, the convolutional neural network model is a lightweight MobileNetV2 model, and MobileNetV2 is a lightweight convolutional neural network and uses deep separable convolution. In the invention, a lightweight network structure of MobileNet V2 is used as a convolutional neural network part for feature learning and extraction. Convolutional neural networks with the same effect, such as MobileNet, leNet-5, alexNet, etc., may also be used in other embodiments.
The extracted face image is used as the input of a MobileNet V2, the MobileNet V2 network comprises 8 residual volume blocks { b0, b1, \8230;, b7}, and each residual volume block consists of a depth separable volume layer-BN layer-an active layer-a residual connecting layer. The characteristics of the image information are extracted layer by layer from the 8 residual convolution blocks of the image, the characteristics are gradually extracted from the edge information to the high-level semantic information, and the characteristic extraction mode is as follows:
F=(B{I i12 ,…,θ n })
wherein B represents the forward operation of the MobileNet V2 model, I i Representing an input RGB image, θ 12 ,…,θ n Representing parameters of the residual volume block.
Specifically, the face image passes through the features obtained in step S21, then passes through a global pooling layer and a classification layer, and then is trained on MobileNetV2 under the constraint of the BCE loss function.
S22, inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing;
in this embodiment, a transform model is used as the multi-order attention mechanism fusion network model. The features of the step S21 are sent into a global pooling layer and a classification layer, and also input into a multi-order attention mechanism fusion network for modeling global feature information in the human face image, and the feature information is used as the input of the multi-order attention mechanism fusion network and is converted into an image pixel sequence for final human face attribute prediction. The characteristics are coded by the multi-order attention mechanism fusion network, and the final output of the multi-order attention mechanism fusion network is optimized and constrained under the constraint of a BCE loss function.
Specifically, the training and processing of the image features of the face image by using the multi-order attention mechanism fusion network model comprises:
s221, modeling global feature information in the face image by using the multi-order attention mechanism fusion network model;
s222, the obtained global feature information is used as input of the multi-order attention mechanism fusion network model and is coded through the multi-order attention mechanism fusion network model;
and S223, carrying out optimization constraint on the multi-order attention mechanism fusion network under the constraint of a BCE loss function, and then outputting.
Wherein the BCE loss function is as follows:
Figure BDA0003794003180000101
wherein N represents the number of images trained, M represents the number of face attribute classes, log i ts ij Representing the output of the classification layer, y ij Represents the image label, σ (z) = 1/(1 + e) -z ),ω j The expression is as follows:
Figure BDA0003794003180000102
the loss function of the whole algorithm in this embodiment is expressed as follows:
L=l bce1 +l bce2
wherein l bce1 Loss function, l, representing the MobileNet V2 model bce2 A loss function representing the model of the multi-order attention mechanism fusion network,/ bce1 And l bce2 The proportions of (a) and (b) are consistent.
In order to enable the convolutional neural network and the multi-order attention mechanism fusion network to have the same hierarchical learning on the local feature information and the global feature information of the image, the proportion of the two loss functions is the same.
And S3, completing the attribute identification of the face image.
In order to solve the problem that the traditional convolutional neural network ignores certain local attributes due to the fact that a receptive field presents Gaussian distribution and local information in an image is too focused, and the deviation of the recognition rate of the partial attributes in a face image is reduced, the invention provides a multi-order attention mechanism fusion network (Transformer) introduced on the basis of the convolutional neural network, and the capability of the Transformer for constructing global characteristic information is utilized to learn all face attribute information of the image. In the human face attribute recognition learning task, on the basis of a lightweight convolutional network, the recognition capability of an algorithm on various facial attributes is improved through the capability of modeling global characteristic information.
In order to verify the effectiveness of the algorithm, experiments are carried out on the open-source face attribute data set, and the face attribute recognition performance can be effectively improved from 83.7% to 92.3%.
In a second aspect, the present application further provides a face attribute recognition system based on multi-order attention mechanism fusion, as shown in fig. 4, including:
the acquisition module 41: the method comprises the steps of acquiring a complete face area in an image according to a face detection method and a face alignment method;
the input module 42: the system comprises a convolutional neural network model, a multi-order attention mechanism fusion network model, a face image acquisition unit and a face image acquisition unit, wherein the convolutional neural network model is used for inputting an acquired face image into the convolutional neural network model and inputting image characteristics of the acquired face image into the multi-order attention mechanism fusion network model;
the output module 43: the image characteristic used for outputting the human face image and outputting the human face image;
the extraction module 44: the face image extraction device is used for extracting a plurality of image features of a face image;
training and processing module 45: and the method is used for training and processing the extracted image features.
Referring now to FIG. 5, a block diagram of a computer apparatus 600 suitable for use with an electronic device (e.g., the server or terminal device shown in FIG. 1) to implement an embodiment of the invention is shown. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the computer apparatus 600 includes a Central Processing Unit (CPU) 601 and a Graphics Processing Unit (GPU) 602, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 603 or a program loaded from a storage section 609 into a Random Access Memory (RAM) 606. In the RAM 604, various programs and data necessary for the operation of the apparatus 600 are also stored. The CPU 601, GPU602, ROM 603, and RAM 604 are connected to each other via a bus 605. An input/output (I/O) interface 606 is also connected to bus 605.
The following components are connected to the I/O interface 606: an input portion 607 including a keyboard, a mouse, and the like; an output section 608 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 609 including a hard disk and the like; and a communication section 610 including a network interface card such as a LAN card, a modem, or the like. The communication section 610 performs communication processing via a network such as the internet. The drive 611 may also be connected to the I/O interface 606 as needed. A removable medium 612 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 611 as necessary, so that a computer program read out therefrom is mounted into the storage section 609 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication section 610, and/or installed from the removable media 612. The computer programs, when executed by a Central Processing Unit (CPU) 601 and a Graphics Processor (GPU) 602, perform the above-described functions defined in the method of the present invention.
It should be noted that the computer readable medium of the present invention can be a computer readable signal medium or a computer readable medium or any combination of the two. A computer readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The modules described may also be provided in a processor.
As another aspect, the present invention also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to a face detection method and a face alignment method, acquiring a complete face region in the image, and outputting a face image; inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing; and completing the attribute recognition of the face image.
The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the scope of the invention as defined by the appended claims. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims (10)

1. A face attribute recognition algorithm based on multi-order attention mechanism fusion is characterized by comprising the following steps:
responding to a face detection method and a face alignment method, acquiring a complete face region in an image, and outputting a face image;
inputting the obtained face image into a convolutional neural network model, further extracting a plurality of image characteristics of the face image for training and processing; and
simultaneously inputting the image characteristics of the acquired face image into a multi-order attention mechanism fusion network model for training and processing;
and completing the attribute recognition of the face image.
2. The algorithm of claim 1, wherein the convolutional neural network model is a lightweight MobileNetV2 model, wherein the MobileNetV2 model comprises 8 residual convolutional blocks { b0, b1, \8230;, b7}, and each residual convolutional block is composed of a depth separable convolutional layer-BN layer-an active layer-a residual connected layer.
3. The algorithm for face attribute recognition based on multi-step attention mechanism fusion as claimed in claim 2, wherein the convolutional neural network model extracts the image features of the face image as follows:
F=(B{I i12 ,…,θ n })
wherein B represents the forward operation of the MobileNet V2 model, I i Representing the input RGB image, theta 12 ,…,θ n Representing the parameters of the residual volume block.
4. The algorithm for face attribute recognition based on multi-step attention mechanism fusion of claim 3, wherein the training and processing of image features using the MobileNet V2 model comprises:
firstly, passing the acquired image features through a global pooling layer and a classification layer;
the MobileNetV2 model is then trained under the constraints of the BCE loss function.
5. The algorithm for face attribute recognition based on multi-order attention mechanism fusion of claim 1, wherein the multi-order attention mechanism fusion network model employs a Transformer model.
6. The algorithm for recognizing human face attributes based on multi-order attention mechanism fusion as claimed in claim 1, wherein the training and processing of image features of human face images by using the multi-order attention mechanism fusion network model comprises:
modeling global feature information in the face image by utilizing the multi-order attention mechanism fusion network model;
the obtained global feature information is used as the input of the multi-order attention mechanism fusion network model and is coded by the multi-order attention mechanism fusion network model;
and carrying out optimization constraint on the multi-order attention mechanism fusion network under the constraint of a BCE loss function and then outputting.
7. The algorithm for face attribute recognition based on multi-step attention mechanism fusion as claimed in claim 3 or 6, wherein the BCE loss function is as follows:
Figure FDA0003794003170000021
wherein N represents the number of images to be trained, M represents the number of face attribute categories, locations ij Representing the output of the classification layer, y ij Represents the image label, σ (z) = 1/(1 + e) -z ),ω j The expression is as follows:
Figure FDA0003794003170000022
the loss function of the whole algorithm is expressed as follows:
L=l bce1 +l bce2
wherein l bce1 Represents the loss function, l, of the MobileNet V2 model bce2 A loss function representing the model of the multi-order attention mechanism fusion network,/ bce1 And l bce2 The proportions of (a) and (b) are consistent.
8. A face attribute recognition system based on multi-order attention mechanism fusion is characterized by comprising:
an acquisition module: the method comprises the steps of acquiring a complete face area in an image according to a face detection method and a face alignment method;
an input module: the system comprises a convolutional neural network model, a multi-order attention mechanism fusion network model, a face image acquisition unit and a face image acquisition unit, wherein the convolutional neural network model is used for inputting an acquired face image into the convolutional neural network model and inputting image characteristics of the acquired face image into the multi-order attention mechanism fusion network model;
an output module: the image characteristic used for outputting the human face image and outputting the human face image;
an extraction module: the face image extraction device is used for extracting a plurality of image features of a face image;
a training and processing module: and the method is used for training and processing the extracted image features.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210964078.XA 2022-08-11 2022-08-11 Face attribute recognition algorithm and system based on multi-order attention mechanism fusion Pending CN115311719A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210964078.XA CN115311719A (en) 2022-08-11 2022-08-11 Face attribute recognition algorithm and system based on multi-order attention mechanism fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210964078.XA CN115311719A (en) 2022-08-11 2022-08-11 Face attribute recognition algorithm and system based on multi-order attention mechanism fusion

Publications (1)

Publication Number Publication Date
CN115311719A true CN115311719A (en) 2022-11-08

Family

ID=83862460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210964078.XA Pending CN115311719A (en) 2022-08-11 2022-08-11 Face attribute recognition algorithm and system based on multi-order attention mechanism fusion

Country Status (1)

Country Link
CN (1) CN115311719A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631330A (en) * 2022-12-20 2023-01-20 浙江太美医疗科技股份有限公司 Feature extraction method, model training method, image recognition method and application

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631330A (en) * 2022-12-20 2023-01-20 浙江太美医疗科技股份有限公司 Feature extraction method, model training method, image recognition method and application
CN115631330B (en) * 2022-12-20 2023-03-10 浙江太美医疗科技股份有限公司 Feature extraction method, model training method, image recognition method and application

Similar Documents

Publication Publication Date Title
CN111444340B (en) Text classification method, device, equipment and storage medium
CN113326764B (en) Method and device for training image recognition model and image recognition
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
CN110569359B (en) Training and application method and device of recognition model, computing equipment and storage medium
CN111709240A (en) Entity relationship extraction method, device, equipment and storage medium thereof
WO2022253074A1 (en) Data processing method and related device
US11036996B2 (en) Method and apparatus for determining (raw) video materials for news
CN112463968B (en) Text classification method and device and electronic equipment
CN114241459B (en) Driver identity verification method and device, computer equipment and storage medium
CN114550053A (en) Traffic accident responsibility determination method, device, computer equipment and storage medium
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN114429566A (en) Image semantic understanding method, device, equipment and storage medium
CN112668638A (en) Image aesthetic quality evaluation and semantic recognition combined classification method and system
CN115311719A (en) Face attribute recognition algorithm and system based on multi-order attention mechanism fusion
CN111444335B (en) Method and device for extracting central word
CN113705192A (en) Text processing method, device and storage medium
CN113962737A (en) Target recognition model training method and device, and target recognition method and device
CN115827865A (en) Method and system for classifying objectionable texts by fusing multi-feature map attention mechanism
CN116258147A (en) Multimode comment emotion analysis method and system based on heterogram convolution
CN113806507B (en) Multi-label classification method, device and readable medium
CN113706207B (en) Order success rate analysis method, device, equipment and medium based on semantic analysis
CN117009577A (en) Video data processing method, device, equipment and readable storage medium
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN112446738A (en) Advertisement data processing method, device, medium and electronic equipment
CN113239215A (en) Multimedia resource classification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination