CN113239844A

CN113239844A - Intelligent cosmetic mirror system based on multi-head attention target detection

Info

Publication number: CN113239844A
Application number: CN202110576729.3A
Authority: CN
Inventors: 刘斌毓; 张丽平; 夏劲松
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-08-10
Anticipated expiration: 2041-05-26
Also published as: CN113239844B

Abstract

The invention discloses an intelligent cosmetic mirror system based on multi-head attention target detection, which comprises an image acquisition unit, a target detection unit and a control unit, wherein the image acquisition unit is used for acquiring an image of a user; the image acquisition unit is used for acquiring a facial image of a user; the target detection unit extracts the features of the user image, detects different face areas, evaluates the makeup of the different face areas and makes up in a high-definition mode; the control unit combines the information fed back by the target detection unit to generate a current makeup preview image on the cosmetic mirror; on the basis of the traditional cosmetic mirror, target detection and a multi-head attention mechanism are added, an artificial intelligent middle-depth learning technology such as an antagonistic network is generated, different parts of a human face are identified by using the target detection technology, the makeup information of different parts is compared with a cloud database, the current makeup is scored, high-definition makeup of specific parts is performed, and the final makeup effect is predicted, so that a user can visually see the makeup trying effect of the user, and the time of the user is saved.

Description

Intelligent cosmetic mirror system based on multi-head attention target detection

Technical Field

The invention relates to the technical field of cosmetic mirrors, in particular to an intelligent cosmetic mirror system based on multi-head attention target detection.

Background

The cosmetic mirror is a necessity of daily life of a female group, and the traditional cosmetic mirror has a single function and only has a basic function of imaging or light supplement. When a user only tries some makeup, the user must completely finish the makeup on the face and then wipe the makeup, which easily causes great damage to the skin of the user, and wastes not only a large amount of cosmetics but also time of the user. In addition, when the user is making up, the user does not know whether the current makeup effect is expected or not, and the user does not know which makeup is more suitable for the current makeup. Therefore, how to improve the use efficiency of the cosmetic mirror is a problem to be solved urgently.

The intelligent cosmetic mirror system based on the multi-head attention target detection is designed by utilizing the technology of generating an confrontation network based on the multi-head attention target detection, the problems can be solved, and the service efficiency of the cosmetic mirror is effectively improved.

Disclosure of Invention

The invention provides an intelligent cosmetic mirror system based on multi-head attention target detection, which aims to overcome the defects of the prior art. In order to achieve the purpose, the invention adopts the following specific technical scheme:

the method for extracting the face features of the face image by using the residual error network comprises the following steps:

(1) extracting the feature information of the face image through the convolution layer, wherein the feature extraction formula is as follows:

Y^p＝f(z^p)

wherein W^p∈R^c×w×hIn the form of a three-dimensional convolution kernel,

representing a convolution operation, X being the input face image data, b^pFor convolution biasing, f (-) is the Relu activation function, Y^pAre human face features. Wherein the Relu activation function calculation formula is as follows:

(2) residual error connection is carried out on the face features output by the sub-networks in different residual error modules, and the specific calculation formula is as follows:

(3) inputting the face feature map finally output by the residual error network into a multi-head attention coding neural network, wherein a multi-head attention calculation formula is as follows:

wherein K_iIs the ith key, V_iIs the ith value, q_iFor the ith query, W^k，W^v，W^q，v^TA projection matrix of keys, values, queries, activation vectors, tanh (-), softmax (-), is an activation function.

(4) Defining a face region target detection loss function as:

and training the target detection network based on the multi-head attention mechanism.

(5) The makeup information of different parts of the face image is transmitted to a cloud server through a control unit wireless communication module by utilizing a face region target prediction frame generated by a target detection unit, the cloud server compares the makeup information with relevant makeup information in a cloud database to give a score of the makeup of the current user, and the control unit feeds the score back to the user through a makeup mirror after receiving the score. And for the too low makeup score, the server recommends a corresponding cosmetic to the user according to the current face area.

(6) And inputting the face features obtained by the residual error network into a generated confrontation network to obtain a final predicted effect image of the current makeup of the user.

The invention has the beneficial effects that: compared with other target detection methods, the method disclosed by the invention has the advantages that the target frame is predicted by directly utilizing the characteristic information of the image, the design process of artificially carrying out an anchor frame in the traditional target detection method is omitted, and the end-to-end target detection is really realized. The target detection technology can help the user to know the current makeup information more quickly, save a large amount of time of the user and improve the use experience of the user.

Drawings

FIG. 1 is a schematic diagram of a target detection unit human face feature extraction residual error network used in the present invention.

FIG. 2 is a schematic diagram of an object detection network based on a multi-head attention mechanism of an object detection unit used in the present invention.

Fig. 3 is a schematic diagram of a generator in a target detection unit generation countermeasure network used in the present invention.

Fig. 4 is a schematic diagram of an arbiter in a target detection unit generation countermeasure network used in the present invention.

Description of reference numerals: 1-face image data; 2-residual module 1; 3-residual module 2; 4-residual module 3; 5-residual module 4; 6-sequence embedding layer; 7-multi-head attention target detection; 8-target prediction box; 9-residual module 5; 10-residual module 6; 11-residual module 7; 12-residual module 8; 13-average pooling layer 1; 14-residual module 9; 15-residual module 10; 16-residual module 11; 17-residual block 12; average pooling layer 2.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and to the detailed description of an embodiment of the invention. It is to be understood that the described embodiments are merely exemplary of some, and not all, embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for extracting features of a face image by using a residual error network comprises the following steps:

(1) 1, inputting a plurality of images obtained by an image acquisition unit into a residual error network as a batch;

(2) firstly, extracting features of the image through a 2 first residual error module, wherein residual errors are connected among residual error sub-networks, and finally, the first residual error module outputs a 56 x 256 face feature map through a plurality of convolution kernels of 1 x 1 and 3 x 3;

(3) inputting the 56 × 256 face feature maps into a 3 second residual module, wherein residual sub-networks are connected with each other through residual errors, and finally the second residual module outputs 28 × 512 face feature maps through a plurality of convolution kernels of 1 × 1 and 3 × 3;

(4) inputting the 28 × 512 face feature map into 4 third residual modules, wherein residual sub-networks are connected with each other through residual errors, and finally the third residual module outputs 14 × 1024 face feature map through a plurality of convolution kernels of 1 × 1 and 3 × 3;

(5) and (3) inputting the 14 × 1024 face feature map into a 5 th residual module, wherein residual sub-networks are connected with one another through residual errors, and finally the fourth residual module outputs a 7 × 2048 face feature map through a plurality of convolution kernels of 1 × 1 and 3 × 3.

As shown in fig. 2, the method for identifying different regions of a human face by using a multi-head attention target detection network comprises the following steps:

(1) carrying out Hadamard product on coding information of different areas of the face in the face characteristic image and the face image, inputting the Hadamard product result into a 6-sequence embedding layer, inputting the result into a 7-multi-head attention coding neural network, and carrying out different subspace projection on the Hadamard product by using keys, values and a projection matrix of query respectively;

(2) calculating attention scores of different heads by using a zooming dot product;

(3) normalizing the attention score by a softmax (·) activation function;

(4) respectively normalizing and regularizing the attention scores of different heads, and inputting a final sequence result 8 into a feedforward neural network;

(5) predicting frames of different parts of the face through different full-connection layers, inputting final information of the predicting frames into a control unit, marking the different parts of the face on a cosmetic mirror according to the different predicting frames by the control unit, capturing makeup information of the different parts, and transmitting the makeup information to a cloud server for makeup effect evaluation.

As shown in fig. 3 and 4, the method for predicting the final makeup effect of the face by using the generator and the discriminator for generating the confrontation network comprises the following steps:

(1) inputting the face feature map finally output by the residual error network into a generation countermeasure network generator, wherein the generator generates a predicted image of the final effect of the current makeup of the user through a plurality of different convolution layers of 9, 10, 11, 12-3 x 3 and a plurality of different pooling layers of 13-2 x 2;

(2) inputting the predicted image of the final effect of the current makeup of the user generated by the generator into an arbiter, the arbiter utilizes a plurality of 14, 15, 16, 17-3 x 3 convolution layers and 18 pooling layers to extract the characteristics of the image, the characteristics are input into a control unit, the control unit is combined with a cloud database to inquire a makeup effect image which best accords with the characteristics of the current makeup, the image is fed back to the arbiter, the arbiter utilizes the inner product by pixel to calculate the similarity between the predicted image of the final effect of the current makeup of the user generated by the generator and the makeup effect image fed back to the control unit, the confidence coefficient of the predicted image of the final effect of the current makeup of the user generated by the generator is obtained through cross entropy loss, if the confidence coefficient is greater than a preset threshold value of 0.8, the predicted image of the final effect of the current makeup of the user generated by the generator is input into the control unit, and the control unit is drawn on a cosmetic mirror, and if the current makeup final effect prediction graph generated by the generator is smaller than the threshold, inputting the current makeup final effect prediction graph generated by the generator into the generator, and enabling the generator to regenerate the current makeup final effect prediction graph until the current makeup final effect prediction graph is larger than the preset threshold.

Examples

To verify the effectiveness of multi-head attention target detection, a comparison was made on the well-known target detection data set COCO with the current best target detection method, fast R-CNN network. Wherein the Faster RCNN-FPN is a fast R-CNN network with regional suggestions, and the Faster RCNN-R101-FPN is a master R-CNN network with ResNet101 as a backbone network. The results are shown in Table 1, where AP represents the mean accuracy, AP-50 represents the AP measurement at a threshold of IoU of 0.5, AP-75 represents the measurement at a threshold of IoU of 0.75, AP-S represents the AP measurement of the target box at pixel areas less than 32 x 32, AP-M represents the measurement of the target box at pixel areas between 32 x 32 and 96 x 96, and AP-L represents the AP measurement of the target box at pixel areas greater than 96 x 96.

Table 1 test results of the validity experiment of the present invention for multi-head attention target detection.

Claims

1. An intelligent cosmetic mirror system based on multi-head attention target detection is characterized by comprising: the system comprises an image acquisition unit, a target detection unit and a control unit; the image acquisition unit acquires a face image of a user under the control of the control unit through an external camera; the target detection unit utilizes a residual error network to extract human face features for the collected human face images under the control of the control unit, and the extracted human face features are respectively input into a coding neural network based on a multi-head attention mechanism and a generator for generating a confrontation network; inputting the output of the coding neural network into a decoding neural network to obtain a human face area detection frame, and inputting the human face makeup image output by the generator into a discriminator for generating a countermeasure network to obtain a final makeup predicted image; the result of the target detection unit is displayed on the cosmetic mirror and presented to the user under the control of the control unit; the control unit comprises a wireless communication module, a voice input module, a voice output module, a clock module, a storage module, an arithmetic logic operation module and a microprogram conversion module.

2. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the residual error network of the target detection unit comprises 3 residual error modules 1, 4 residual error modules 2, 6 residual error modules 3 and 3 residual error modules 4, and residual error connection is carried out among the residual error modules; wherein the residual module 1 comprises a convolution layer of 1 × 64, a convolution layer of 3 × 64, and a convolution layer of 1 × 256; residual module 2 contains one convolution layer of 1 × 128, one convolution layer of 3 × 128, and one convolution layer of 1 × 512; residual module 3 contains one convolution layer of 1 × 256, one convolution layer of 3 × 256, and one convolution layer of 1 × 1024; residual module 4 contains one convolution layer of 1 × 512, one convolution layer of 3 × 512, and one convolution layer of 1 × 2048; the final residual network outputs a 7 × 2048 feature map.

3. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the coding neural network based on the multi-head attention mechanism of the target detection unit comprises a sequence embedding layer, a multi-head attention layer, a network layer for regularizing the output of the multi-head attention layer, a feedforward neural network layer and a network layer for regularizing the output of the feedforward neural network; the input of the coding neural network based on the multi-head attention mechanism is a 7 × 2048 feature map finally output by a residual error network, the feature map is converted into 49 sequence type data of 1 × 2048 through a sequence embedding layer and is input into a multi-head attention layer, the multi-head attention layer captures the human face region features in the sequence data, the human face region features are input into the feed-forward neural network through a regularization layer, and the final sequence data related to the human face region features are obtained after regularization.

4. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the generation countermeasure network of the target detection unit includes a generator for outputting data of a specified type from random input data, a discriminator for judging output data of the generator based on real data; the input of the generator is a 7 × 2048 feature map finally output by the residual error network, and an effect map after face makeup is predicted is output; the input of the discriminator is the makeup effect picture output by the generator, when the confidence level of the makeup effect picture generated by the generator calculated by the discriminator is larger than the threshold value, the makeup effect picture is output to the control unit, the control unit can display the makeup effect picture on the cosmetic mirror, otherwise, the discriminator can input the makeup effect picture into the generator again and requires the generator to regenerate the makeup effect picture until the confidence level of the generated makeup effect picture is larger than the threshold value.

5. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the wireless communication module that the control unit contained is used for establishing the connection between control unit and high in the clouds server, mobile terminal, mobile network, realizes the data interaction and the control interaction of control unit and corresponding part.

6. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the voice input module contained in the control unit receives the voice control command of the user and completes the execution and the corresponding of the corresponding command.

7. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the voice output module contained in the control unit converts the internal control signal into voice information and outputs the voice information to the loudspeaker to prompt the user of the related information.

8. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the clock module contained in the control unit generates periodic pulse signals, so that the components in the control unit can execute commands in order; the control unit comprises a storage module for storing data generated in the running of the microprogram, an execution result of a corresponding control command, a result generated by the target detection unit and input data of each network for target detection; the arithmetic logic operation module of the control unit performs corresponding arithmetic operation and logic operation; the microprogram conversion module contained in the control unit is used for converting other programs into microprograms executable by the control unit.