CN113239844B - Intelligent cosmetic mirror system based on multi-head attention target detection - Google Patents
Intelligent cosmetic mirror system based on multi-head attention target detection Download PDFInfo
- Publication number
- CN113239844B CN113239844B CN202110576729.3A CN202110576729A CN113239844B CN 113239844 B CN113239844 B CN 113239844B CN 202110576729 A CN202110576729 A CN 202110576729A CN 113239844 B CN113239844 B CN 113239844B
- Authority
- CN
- China
- Prior art keywords
- target detection
- control unit
- makeup
- head attention
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- A—HUMAN NECESSITIES
- A45—HAND OR TRAVELLING ARTICLES
- A45D—HAIRDRESSING OR SHAVING EQUIPMENT; EQUIPMENT FOR COSMETICS OR COSMETIC TREATMENTS, e.g. FOR MANICURING OR PEDICURING
- A45D42/00—Hand, pocket, or shaving mirrors
- A45D42/08—Shaving mirrors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0641—Shopping interfaces
- G06Q30/0643—Graphical representation of items or shoppers
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses an intelligent cosmetic mirror system based on multi-head attention target detection, which comprises an image acquisition unit, a target detection unit and a control unit, wherein the image acquisition unit is used for acquiring an image of a user; the image acquisition unit is used for acquiring a facial image of a user; the target detection unit extracts the features of the user image, detects different face areas, evaluates the makeup of the different face areas and makes up in a high-definition mode; the control unit combines the information fed back by the target detection unit to generate a current makeup preview image on the cosmetic mirror; on the basis of the traditional cosmetic mirror, target detection and a multi-head attention mechanism are added, an artificial intelligent middle-deep learning technology such as an antagonistic network is generated, different parts of a human face are identified by using the target detection technology, the makeup information of different parts is compared with a cloud database, the current makeup is scored, high-definition makeup of specific parts is performed, and the final makeup effect is predicted, so that a user can visually see the makeup trying effect of the user, and the time of the user is saved.
Description
Technical Field
The invention relates to the technical field of cosmetic mirrors, in particular to an intelligent cosmetic mirror system based on multi-head attention target detection.
Background
The cosmetic mirror is a necessity of daily life of a female group, and the traditional cosmetic mirror is single in function and only has basic functions of imaging or light supplement. When a user only tries some makeup, the user must completely finish the makeup on the face and then wipe the makeup, so that the skin of the user is easily damaged greatly, and a large amount of cosmetics and time of the user are wasted. In addition, when the user is making up, the user does not know whether the current makeup effect is expected or not, and the user does not know which makeup is more suitable for the current makeup. Therefore, how to improve the use efficiency of the cosmetic mirror is a problem to be solved urgently.
The intelligent cosmetic mirror system based on the multi-head attention target detection is designed by utilizing the technology of generating an confrontation network based on the multi-head attention target detection, the problems can be solved, and the service efficiency of the cosmetic mirror is effectively improved.
Disclosure of Invention
The invention provides an intelligent cosmetic mirror system based on multi-head attention target detection, which aims to overcome the defects in the prior art.
The intelligent cosmetic mirror system consists of an image acquisition unit, a target detection unit and a control unit.
The image acquisition unit acquires the face image of the user under the control of the control unit through an external camera.
The target detection unit utilizes a residual error network to extract human face features of the collected human face images under the control of the control unit, utilizes a coding neural network based on a multi-head attention mechanism to identify different human face areas, and utilizes a generated confrontation network to obtain a final makeup effect image.
The method comprises the steps of identifying different face regions by utilizing a coding neural network based on a multi-head attention mechanism, and carrying out Hadamard product on a face feature map output by a residual error network and coding information of different regions of a face in a face image, wherein the coding information of different regions of the face is a learnable tensor which has the same dimensionality with the face feature map and is randomly initialized. Inputting the Hadamard product result into a sequence embedding layer, inputting the obtained result into a multi-head attention coding neural network, predicting prediction frames of different parts of the face through different full-connection layers, inputting the final prediction frame information into a control unit, marking the different parts of the face on a cosmetic mirror according to the different prediction frames by the control unit, capturing the makeup information of the different parts, and transmitting the makeup information to a cloud server for makeup effect evaluation.
And obtaining a final makeup effect picture by utilizing a generated confrontation network, wherein the generated confrontation network consists of a generator and an arbiter. The input of the generator is a characteristic diagram finally output by the residual error network, an effect diagram after face makeup is predicted is output, the input of the discriminator is the makeup effect diagram output by the generator, when the confidence of the makeup effect diagram generated by the discriminator calculation generator is larger than a threshold value, the makeup effect diagram is output to the control unit, the control unit can display the makeup effect diagram on the makeup mirror, otherwise, the discriminator can input the makeup effect diagram into the generator again, and the generator is required to regenerate the makeup effect diagram until the confidence of the generated makeup effect diagram is larger than the threshold value.
The control unit comprises a wireless communication module, a voice input module, a voice output module, a clock module, a storage module, an arithmetic logic operation module and a microprogram conversion module.
The wireless communication module that the control unit contained is used for establishing the connection between control unit and high in the clouds server, mobile terminal, mobile network, realizes the data interaction and the control interaction of control unit and corresponding part.
The voice input module contained in the control unit receives the voice control command of the user and completes the execution and response of the corresponding command.
The voice output module contained in the control unit converts the internal control signal into voice information and outputs the voice information to the loudspeaker to prompt the user of the related information.
The control unit comprises a clock module which generates periodic pulse signals so that the components in the control unit can execute commands in order.
The control unit comprises a storage module used for storing data generated in the running of the microprogram, an execution result of the control command, a result generated by the target detection unit and input data of each network for target detection.
And the arithmetic logic operation module of the control unit performs corresponding arithmetic operation and logic operation.
The microprogram conversion module contained in the control unit is used for converting other programs into microprograms executable by the control unit.
An intelligent cosmetic mirror system based on multi-head attention target detection comprises the following steps:
s1, extracting the face features by using a residual error network in a target detection unit;
s2, inputting the face features extracted by the residual error network into a coding neural network based on a multi-head attention mechanism to obtain prediction frames of different regions of the face;
and S3, inputting the face features extracted by the residual error network into a generated confrontation network to obtain a makeup effect picture.
Preferably, in step S1, the face feature is extracted by using a residual error network in the target detection unit, where the residual error network includes 3 first residual error modules, 4 second residual error modules, 6 third residual error modules, and 3 fourth residual error modules, and the residual error modules are connected with each other; wherein the first residual module comprises a 1 x 64 convolutional layer, a 3 x 64 convolutional layer, a 1 x 256 convolutional layer; the second residual module contains one convolution layer of 1 x 128, one convolution layer of 3 x 128, one convolution layer of 1 x 512; the third residual module contains one convolution layer of 1 × 256, one convolution layer of 3 × 256, one convolution layer of 1 × 1024; the fourth residual module contains one convolution layer of 1 × 512, one convolution layer of 3 × 512, and one convolution layer of 1 × 2048; the final residual network outputs a 7 × 2048 feature map. The extraction process of the residual error network human face features is as follows:
the method for extracting the face features of the face image by using the residual error network comprises the following steps:
(1) Extracting the feature information of the face image through a plurality of convolution layers of a residual error network, wherein the feature extraction formula of the ith convolution layer is as follows:
wherein Wl∈Rc×w×hFor a learnable three-dimensional convolution kernel in a residual network,representing a convolution operation, Xl-1Is the output of the l-1 th convolutional layer as input to the l-1 th convolutional layer, when l =1, X0Representing the input image data of the face acquired by the image acquisition unit, bl∈Rc×w×hFor a randomly initialized convolution bias, f (-) is the Relu activation function, YlIs the face feature output by the first convolutional layer. Wherein the Relu activation function calculation formula is as follows:
(2) And residual connection is carried out on the face features output by different residual modules to enhance the feature extraction capability of the residual network, and the residual connection calculation formula of the ith residual module is as follows:wherein XlIs the input of the l residual module, YlIs the output of the l-th residual block,the representation matrix is added element-by-element,is the output of the final l residual module after residual concatenation.
Preferably, the human face features extracted from the residual error network in step S2 are input into a coding neural network based on a multi-head attention mechanism, and prediction frames of different regions of the human face are obtained, where the coding neural network based on the multi-head attention mechanism includes a sequence embedding layer, a multi-head attention layer, a network layer for regularizing the output of the multi-head attention layer, a feedforward neural network layer, and a network layer for regularizing the output of the feedforward neural network; the input of the coding neural network based on the multi-head attention mechanism is a 7 × 2048 feature map finally output by a residual error network, the feature map is converted into 49 1 × 2048 sequence type data through a sequence embedding layer and is input into a multi-head attention layer, the multi-head attention layer captures human face region features in sequence data, the human face region features are input into the feedforward neural network through a regularization layer, and the final sequence data related to the human face region features are obtained after the regularization. The makeup information of different parts of the face image is transmitted to a cloud server through a control unit wireless communication module by utilizing a face region target prediction frame generated by a target detection unit, the cloud server compares the makeup information with relevant makeup information in a cloud database to give a score of the makeup of the current user, and the control unit feeds the score back to the user through a makeup mirror after receiving the score. And for the too low makeup score, the server recommends a corresponding cosmetic to the user according to the current face area. Wherein:
(1) Inputting the final human face feature map output by the residual error network into a multi-head attention coding neural network, wherein a multi-head attention calculation formula of the multi-head attention coding neural network is as follows:
wherein Ki、Vi、qiA key matrix, a value matrix and a query vector matrix W obtained by projecting the input human face features to the ith feature spacek,Wv,Wq,vTA learnable projective transformation matrix of keys, values, query vectors, activation vectors, att ((K)i,Vi),qi) Denotes the attention score of the ith head, att ((K, V), Q) denotes the final multiple head attention score, h is the number of heads of attention,represents the addition of the matrix element by element, tanh (·), soft max (·) is the activation function.
(2) Defining the loss function of the target detection of the face area as follows:
where Y is the value of the real face region box,is a predictor of a multi-headed attention-coding neural network, ciA representation of the ith personal face area,is a category prediction value of the ith personal face region,representing the unit vector when the ith personal face region is non-empty, biIs the true value of the ith personal face area box,is the value of the i-th personal face area prediction box, Lbox(. Cndot.) represents a commonly used prediction box loss function, such as MAE, and the multi-head attention mechanism-based coding neural network is trained by using a defined face region target detection loss function and a gradient descent algorithm until the network converges.
Preferably, the step S3 of inputting the face features extracted by the residual error network into the confrontation generating network, and the step of obtaining the makeup effect diagram is as follows:
(1) Inputting the face feature map finally output by the residual error network into a generator for generating a countermeasure network, wherein the generator generates a final effect prediction image of the current makeup of the user through a plurality of different convolution layers and pooling layers;
(2) Inputting a predicted image of the final effect of the current makeup of the user generated by a generator into a discriminator, extracting the characteristics of the image by the discriminator by using a plurality of convolution layers and a plurality of pooling layers, inputting the characteristics into a control unit, inquiring a makeup effect image which best accords with the characteristics of the current makeup by combining a cloud database by the control unit, feeding the image back to the discriminator, calculating the similarity between the predicted image of the final effect of the current makeup of the user generated by the generator and the makeup effect image fed back by the control unit by using pixel-by-pixel inner products by the discriminator, obtaining the confidence coefficient of the predicted image of the final effect of the current makeup of the user generated by the generator through cross entropy loss, inputting the predicted image of the final effect of the current makeup of the user generated by the generator into the control unit if the confidence coefficient is greater than a preset threshold, drawing the control unit on a cosmetic mirror, inputting the predicted image of the final effect of the current makeup of the user generated by the generator into the generator if the predicted image of the final effect of the current makeup of the user is less than the threshold, and regenerating the predicted image of the final effect of the current makeup of the user by the generator until the predicted image of the current makeup of the user is greater than the preset threshold.
The invention has the beneficial effects that: compared with other target detection methods, the method disclosed by the invention has the advantages that the target frame is predicted by directly utilizing the characteristic information of the image, the design process of artificially carrying out an anchor frame in the traditional target detection method is omitted, and the end-to-end target detection is really realized. The target detection technology can help the user to know the current makeup information more quickly, so that a great deal of time of the user is saved, and the use experience of the user is improved.
Drawings
Fig. 1 is a schematic diagram of a target detection unit human face feature extraction residual error network used in the present invention.
FIG. 2 is a schematic diagram of an object detection network based on a multi-head attention mechanism of an object detection unit used in the present invention.
Fig. 3 is a schematic diagram of a generator in a target detection unit generation countermeasure network used in the present invention.
Fig. 4 is a schematic diagram of an arbiter in a target detection unit generation countermeasure network used in the present invention.
Description of the reference numerals: 1-face image data; 2-first residual module; 3-a second residual module; 4-a third residual module; 5-a fourth residual module; 6-sequence embedding layer; 7-multi-head attention target detection; 8-target prediction box; 9-a fifth residual module; 10-a sixth residual module; 11-seventh residual module; 12-an eighth residual module; 13-first average pooling layer; 14-a ninth residual module; 15-a tenth residual module; 16-an eleventh residual module; 17-a twelfth residual module; 18-second average pooling layer.
Detailed Description
The invention will be further elucidated and described with reference to the drawings and to the detailed description of an embodiment of the invention. It is to be understood that the described embodiments are merely exemplary of some, and not all, embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic diagram of a target detection unit human face feature extraction residual error network used in the present invention.
As shown in fig. 1, the method for extracting features of a face image by using a residual error network comprises the following steps:
(1) Inputting a plurality of images obtained by an image acquisition unit into a residual error network as a batch;
(2) Firstly, extracting features of the image through a first residual error module 2, wherein residual error connection is performed among all residual error sub-networks, and finally, a 56 × 256 face feature map is output by the first residual error module 2 through a plurality of convolution kernels of 1 × 1 and 3 × 3;
(3) Inputting the face feature maps 56 × 256 into the second residual module 3, wherein residual sub-networks are connected with each other through residual errors, and finally the second residual module 3 outputs the face feature maps 28 × 512 through multiple convolution kernels 1 × 1,3 × 3;
(4) Inputting the 28 × 512 face feature maps into a third residual module 4, wherein residual sub-networks are connected with each other through residual errors, and finally the third residual module 4 outputs 14 × 1024 face feature maps through a plurality of convolution kernels of 1 × 1 and 3 × 3;
(5) And inputting the 14 × 1024 face feature maps into a fourth residual module 5, wherein residual sub-networks are connected with each other through residual errors, and finally the fourth residual module 5 outputs 7 × 2048 face feature maps through a plurality of convolution kernels of 1 × 1 and 3 × 3.
FIG. 2 is a schematic diagram of an object detection network based on a multi-head attention mechanism for an object detection unit used in the present invention.
As shown in fig. 2, the method for identifying different regions of a human face by using a multi-head attention target detection network comprises the following steps:
(1) Carrying out Hadamard product on the coded information of different regions of the human face in the human face characteristic image and the human face image, inputting the Hadamard product result into the sequence embedding layer 6, inputting the result of the sequence embedding layer 6 into the multi-head attention coding neural network 7, and carrying out different subspace projection on the result of the sequence embedding layer 6 by using keys, values and a projection matrix for inquiry respectively;
(2) Calculating attention scores of different heads by using a zooming dot product;
(3) Normalizing the attention score by the soft max (·) activation function;
(4) Respectively normalizing and regularizing the attention scores of different heads, and inputting a final sequence result 8 into a feedforward neural network;
(5) Predicting frames of different parts of the face through different full-connection layers, inputting final information of the predicting frames into a control unit, marking the different parts of the face on a cosmetic mirror according to the different predicting frames by the control unit, capturing makeup information of the different parts, and transmitting the makeup information to a cloud server for makeup effect evaluation.
Fig. 3 is a schematic diagram of a generator in a target detection unit generation countermeasure network used in the present invention.
Fig. 4 is a schematic diagram of an arbiter in a target detection unit generation countermeasure network used in the present invention.
As shown in fig. 3 and 4, the method for predicting the final makeup effect of the face by using the generator and the arbiter for generating the confrontation network comprises the following steps:
(1) Inputting the face feature map finally output by the residual error network into a generation countermeasure network generator, and generating a prediction image of the final effect of the current makeup of the user through a plurality of different convolution layers 9, 10, 11, 12 of 3 x 3 and a pool layer 13 of 2 x 2;
(2) Inputting the predicted image of the final effect of the current makeup of the user generated by the generator into a discriminator, extracting the characteristics of the image by utilizing a plurality of 3 x 3 convolution layers 14, 15, 16, 17 and 2 x 2 of the pool layers 18, inputting the characteristics into a control unit, inquiring a makeup effect image which best accords with the characteristics of the current makeup by combining a cloud database by the control unit, feeding back the image to the discriminator, calculating the similarity between the predicted image of the final effect of the current makeup of the user generated by the generator by utilizing the inner product of pixel points by the discriminator and the makeup effect image fed back by the control unit, obtaining the confidence coefficient of the predicted image of the final effect of the current makeup of the user generated by the generator through cross entropy loss, inputting the predicted image of the final effect of the current makeup of the user generated by the generator into the control unit if the confidence coefficient is greater than a preset threshold value 0.8, drawing the control unit on a cosmetic mirror, inputting the predicted image of the generated predicted generator of the final effect of the current makeup of the user into the generator if the confidence coefficient is smaller than the threshold value, and enabling the generator to regenerate the predicted image of the final effect of the current makeup of the user to be greater than the preset threshold value.
Examples
In order to verify the effectiveness of multi-head attention target detection, a comparison was made on the famous target detection data set COCO with the current best target detection method, the Faster R-CNN network. Wherein the Faster RCNN-FPN is a fast R-CNN network with regional suggestions, and the Faster RCNN-R101-FPN is a master R-CNN network with ResNet101 as a backbone network. The results are shown in table 1, where AP represents the average accuracy, AP-50 represents the AP measurements at an IoU threshold of 0.5, AP-75 represents the measurements at an IoU threshold of 0.75, AP-S represents the AP measurements for the target box at pixel areas less than 32 x 32, AP-M represents the measurements for the target box at pixel areas between 32 x 32 and 96 x 96, and AP-L represents the AP measurements for the target box at pixel areas greater than 96 x 96.
Table 1 test results of validity experiments of the present invention for multi-head attention target detection.
Claims (8)
1. An intelligence vanity mirror system based on bull attention target detection which characterized in that includes: the system comprises an image acquisition unit, a target detection unit and a control unit; the image acquisition unit acquires a face image of a user under the control of the control unit through an external camera; the target detection unit utilizes a residual error network to extract human face features for the collected human face images under the control of the control unit, and the extracted human face features are respectively input into a coding neural network based on a multi-head attention mechanism and a generator for generating a confrontation network; carrying out Hadamard product on the coding information of different regions of the human face in the human face characteristic image and the human face image, inputting the Hadamard product result into a sequence embedding layer, and inputting the result into a multi-head attention coding neural network; predicting frames of different parts of the face through different full-connected layers, inputting final information of the prediction frames into a control unit, marking the different parts of the face on a cosmetic mirror according to the different prediction frames by the control unit, capturing makeup information of the different parts, transmitting the makeup information to a cloud server for makeup effect evaluation, outputting a characteristic diagram finally output by a residual error network as input of a generator, and outputting an effect diagram after face makeup prediction; the input of the discriminator is the makeup effect image output by the generator, when the confidence of the makeup effect image generated by the discriminator calculation generator is larger than the threshold value, the makeup effect image is output to the control unit, the control unit can display the makeup effect image on the cosmetic mirror, otherwise, the discriminator can input the makeup effect image into the generator again, and the generator is required to generate the makeup effect image again until the confidence of the makeup effect image generated by the discriminator calculation generator is larger than the threshold value; the result of the target detection unit is displayed on the cosmetic mirror and presented to a user under the control of the control unit; the control unit comprises a wireless communication module, a voice input module, a voice output module, a clock module, a storage module, an arithmetic logic operation module and a microprogram conversion module.
2. The intelligent cosmetic mirror system based on multi-head attention target detection as claimed in claim 1, wherein: the residual error network of the target detection unit comprises 3 first residual error modules, 4 second residual error modules, 6 third residual error modules and 3 fourth residual error modules, and residual error connection is carried out among the residual error modules; wherein the first residual module comprises one convolution layer of 1 x 64, one convolution layer of 3 x 64, one convolution layer of 1 x 256; the second residual module contains one convolution layer of 1 × 128, one convolution layer of 3 × 128, one convolution layer of 1 × 512; the third residual module contains one convolution layer of 1 × 256, one convolution layer of 3 × 256, one convolution layer of 1 × 1024; the fourth residual module contains one convolution layer of 1 × 512, one convolution layer of 3 × 512, and one convolution layer of 1 × 2048; the final residual network outputs a 7 × 2048 feature map.
3. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the coding neural network based on the multi-head attention mechanism of the target detection unit comprises a sequence embedding layer, a multi-head attention layer, a network layer for regularizing the output of the multi-head attention layer, a feedforward neural network layer and a network layer for regularizing the output of the feedforward neural network; the input of the coding neural network based on the multi-head attention mechanism is a 7 × 2048 feature graph finally output by the residual error network, the feature graph is converted into 49 1 × 2048 sequence type data through a sequence embedding layer and is input into the multi-head attention layer, the multi-head attention layer captures the human face region features in the sequence data, the human face region features are input into the feedforward neural network through a regularization layer, and the final sequence data related to the human face region features are obtained after regularization.
4. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the generation countermeasure network of the target detection unit includes a generator for outputting data of a specified type from random input data, a discriminator for judging output data of the generator based on real data; the input of the generator is 7 × 2048 feature maps finally output by the residual error network, and an effect map after face makeup prediction is output; the input of the discriminator is the makeup effect picture output by the generator, when the confidence level of the makeup effect picture generated by the generator calculated by the discriminator is larger than the threshold value, the makeup effect picture is output to the control unit, the control unit can display the makeup effect picture on the cosmetic mirror, otherwise, the discriminator can input the makeup effect picture into the generator again and requires the generator to regenerate the makeup effect picture until the confidence level of the generated makeup effect picture is larger than the threshold value.
5. The intelligent cosmetic mirror system based on multi-head attention target detection as claimed in claim 1, wherein: the wireless communication module that the control unit contained is used for setting up the connection between control unit and high in the clouds server, mobile terminal, mobile network, realizes the data interaction and the control interaction of control unit and corresponding part.
6. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the voice input module contained in the control unit receives the voice control command of the user and completes the execution and response of the corresponding command.
7. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the voice output module contained in the control unit converts the internal control signal into voice information and outputs the voice information to the loudspeaker to prompt the user of relevant information.
8. The intelligent cosmetic mirror system based on multi-head attention target detection according to claim 1, characterized in that: the clock module contained in the control unit generates periodic pulse signals, so that the components in the control unit can execute commands in order; the control unit comprises a storage module for storing data generated in the running of the microprogram, an execution result of the control command, a result generated by the target detection unit and input data of each network for target detection; the arithmetic logic operation module of the control unit performs corresponding arithmetic operation and logic operation; the microprogram conversion module contained in the control unit is used for converting other programs into microprograms executable by the control unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110576729.3A CN113239844B (en) | 2021-05-26 | 2021-05-26 | Intelligent cosmetic mirror system based on multi-head attention target detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110576729.3A CN113239844B (en) | 2021-05-26 | 2021-05-26 | Intelligent cosmetic mirror system based on multi-head attention target detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113239844A CN113239844A (en) | 2021-08-10 |
CN113239844B true CN113239844B (en) | 2022-11-01 |
Family
ID=77138938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110576729.3A Active CN113239844B (en) | 2021-05-26 | 2021-05-26 | Intelligent cosmetic mirror system based on multi-head attention target detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113239844B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111861945A (en) * | 2020-09-21 | 2020-10-30 | 浙江大学 | Text-guided image restoration method and system |
CN112084841A (en) * | 2020-07-27 | 2020-12-15 | 齐鲁工业大学 | Cross-modal image multi-style subtitle generation method and system |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10381105B1 (en) * | 2017-01-24 | 2019-08-13 | Bao | Personalized beauty system |
US11544530B2 (en) * | 2018-10-29 | 2023-01-03 | Nec Corporation | Self-attentive attributed network embedding |
CN111583097A (en) * | 2019-02-18 | 2020-08-25 | 北京三星通信技术研究有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
WO2021022521A1 (en) * | 2019-08-07 | 2021-02-11 | 华为技术有限公司 | Method for processing data, and method and device for training neural network model |
CN110807332B (en) * | 2019-10-30 | 2024-02-27 | 腾讯科技(深圳)有限公司 | Training method, semantic processing method, device and storage medium for semantic understanding model |
CN111639596B (en) * | 2020-05-29 | 2023-04-28 | 上海锘科智能科技有限公司 | Glasses-shielding-resistant face recognition method based on attention mechanism and residual error network |
CN112017301A (en) * | 2020-07-24 | 2020-12-01 | 武汉纺织大学 | Style migration model and method for specific relevant area of clothing image |
CN112232156B (en) * | 2020-09-30 | 2022-08-16 | 河海大学 | Remote sensing scene classification method based on multi-head attention generation countermeasure network |
-
2021
- 2021-05-26 CN CN202110576729.3A patent/CN113239844B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112084841A (en) * | 2020-07-27 | 2020-12-15 | 齐鲁工业大学 | Cross-modal image multi-style subtitle generation method and system |
CN111861945A (en) * | 2020-09-21 | 2020-10-30 | 浙江大学 | Text-guided image restoration method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113239844A (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11302009B2 (en) | Method of image processing using a neural network | |
WO2021043168A1 (en) | Person re-identification network training method and person re-identification method and apparatus | |
KR20210036244A (en) | System and method for boundary aware semantic segmentation | |
WO2021022521A1 (en) | Method for processing data, and method and device for training neural network model | |
CN112434655B (en) | Gait recognition method based on adaptive confidence map convolution network | |
CN109858392B (en) | Automatic face image identification method before and after makeup | |
EP4099220A1 (en) | Processing apparatus, method and storage medium | |
CN111797683A (en) | Video expression recognition method based on depth residual error attention network | |
CN110991380A (en) | Human body attribute identification method and device, electronic equipment and storage medium | |
US9299011B2 (en) | Signal processing apparatus, signal processing method, output apparatus, output method, and program for learning and restoring signals with sparse coefficients | |
CN110210344B (en) | Video action recognition method and device, electronic equipment and storage medium | |
CN112581370A (en) | Training and reconstruction method of super-resolution reconstruction model of face image | |
CN110599395A (en) | Target image generation method, device, server and storage medium | |
KR101366776B1 (en) | Video object detection apparatus and method thereof | |
KR102357000B1 (en) | Action Recognition Method and Apparatus in Untrimmed Videos Based on Artificial Neural Network | |
Fu et al. | A compromise principle in deep monocular depth estimation | |
CN111523377A (en) | Multi-task human body posture estimation and behavior recognition method | |
CN116246338B (en) | Behavior recognition method based on graph convolution and transducer composite neural network | |
US20140086479A1 (en) | Signal processing apparatus, signal processing method, output apparatus, output method, and program | |
CN114694089A (en) | Novel multi-mode fusion pedestrian re-recognition algorithm | |
CN116453025A (en) | Volleyball match group behavior identification method integrating space-time information in frame-missing environment | |
Du et al. | Adaptive visual interaction based multi-target future state prediction for autonomous driving vehicles | |
CN114240999A (en) | Motion prediction method based on enhanced graph attention and time convolution network | |
CN117854160A (en) | Human face living body detection method and system based on artificial multi-mode and fine-granularity patches | |
CN113239844B (en) | Intelligent cosmetic mirror system based on multi-head attention target detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |