WO2020252903A1 - Au检测方法、装置、电子设备及存储介质 - Google Patents

Au检测方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2020252903A1
WO2020252903A1 PCT/CN2019/102615 CN2019102615W WO2020252903A1 WO 2020252903 A1 WO2020252903 A1 WO 2020252903A1 CN 2019102615 W CN2019102615 W CN 2019102615W WO 2020252903 A1 WO2020252903 A1 WO 2020252903A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
parameter
detection
face
face picture
Prior art date
Application number
PCT/CN2019/102615
Other languages
English (en)
French (fr)
Inventor
盛建达
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020252903A1 publication Critical patent/WO2020252903A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the technical field of intelligent decision-making, in particular to an AU detection method, device, electronic equipment and storage medium.
  • AU detection usually uses neural network algorithms, but in the traditional training method, the training sample is single and the training parameters are limited. Therefore, the accuracy of the detection results and the detection speed are limited to certain extent, which is a great influence on the development of artificial intelligence Have an adverse effect.
  • An AU detection method includes:
  • the method before inputting the image to be detected into an AU detection model trained in combination with a back-propagation algorithm and a neural network algorithm, and obtaining an expression detection result, the method further includes:
  • a face picture with an AU mark and a face picture with a classification mark are obtained as sample pictures;
  • the acquiring the face picture with AU mark and the face picture with classification mark as sample pictures includes one or a combination of the following methods:
  • the acquiring a face picture with an AU mark and a face picture with a classification mark as a sample picture further includes:
  • Data enhancement is performed on the face picture with the AU mark and the face picture with the classification mark to obtain the sample picture.
  • the expression recognition model is trained to obtain the second parameter.
  • the use of a backpropagation algorithm in combination with the first parameter and the second parameter to train the sample picture to obtain an AU detection model includes:
  • the back propagation algorithm is used to adjust all parameters in the AU detection model until the accuracy value reaches the accuracy threshold, and training is stopped.
  • the method further includes:
  • An AU detection device the device includes:
  • the acquiring unit is used to acquire the target face picture when the AU detection instruction is received;
  • the de-averaging unit is used to de-average the adjusted target face image
  • the normalization unit is used to normalize the target face picture after de-averaging to obtain the picture to be detected;
  • the input unit is used to input the image to be detected into an AU detection model trained in combination with a backpropagation algorithm and a neural network algorithm to obtain an expression detection result, wherein the AU detection model consists of a face image with an AU mark And the face picture with the classification mark is obtained through training, and is used to output the expression detection result according to the target face picture.
  • the acquisition unit is further configured to input the image to be detected into the AU detection model trained by combining the back propagation algorithm and the neural network algorithm to obtain the expression detection result, when the training is received
  • the acquisition unit is further configured to input the image to be detected into the AU detection model trained by combining the back propagation algorithm and the neural network algorithm to obtain the expression detection result, when the training is received
  • the acquisition unit is further configured to input the image to be detected into the AU detection model trained by combining the back propagation algorithm and the neural network algorithm to obtain the expression detection result, when the training is received
  • the acquisition unit is further configured to input the image to be detected into the AU detection model trained by combining the back propagation algorithm and the neural network algorithm to obtain the expression detection result, when the training is received
  • the face picture with AU mark and the face picture with classification mark as sample pictures
  • the device also includes:
  • An extraction unit for extracting the first parameter of the pre-trained face classification model and the second parameter of the pre-trained facial expression recognition model
  • the training unit is configured to use a back propagation algorithm, combine the first parameter and the second parameter, and use a neural network algorithm to train the sample picture to obtain an AU detection model.
  • the acquiring unit acquires a face picture with an AU mark and a face picture with a classification mark as sample pictures, including one or a combination of the following methods:
  • Data enhancement is performed on the face picture with the AU mark and the face picture with the classification mark to obtain the sample picture.
  • the training unit is further configured to train the face classification model before extracting the first parameter of the pre-trained face classification model and the second parameter of the pre-trained expression recognition model , To obtain the first parameter;
  • the training unit is also used to train the expression recognition model by using a back propagation algorithm in combination with the first parameter to obtain the second parameter.
  • the training unit uses a backpropagation algorithm to train the sample picture in combination with the first parameter and the second parameter to obtain an AU detection model including:
  • the back propagation algorithm is used to adjust all parameters in the AU detection model until the accuracy value reaches the accuracy threshold, and training is stopped.
  • the device further includes:
  • An electronic device which includes:
  • the memory stores at least one computer readable instruction
  • a non-volatile readable storage medium in which at least one computer readable instruction is stored, and the at least one computer readable instruction is executed by a processor in an electronic device to implement The AU detection method is described.
  • Fig. 1 is a flowchart of a preferred embodiment of the AU detection method of the present application.
  • Fig. 2 is a functional block diagram of a preferred embodiment of the AU detection device of the present application.
  • the AU detection method is applied to one or more electronic devices.
  • the electronic device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes but is not limited to Microprocessor, Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the electronic device may be any electronic product that can interact with a user with a human machine, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • a human machine such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • PDA personal digital assistant
  • IPTV Internet Protocol Television
  • smart wearable devices etc.
  • the AU detection instruction can be triggered by the user, or can be triggered automatically when certain conditions are met, which is not limited by this application.
  • the electronic device in order to ensure training and calculation in the same dimension, the electronic device first preprocesses the target face picture, that is, adjusts the size of the target face picture.
  • the electronic device de-averages the 224*224 size face picture, and centralizes all dimensions of the matrix corresponding to the input picture to 0, that is, the face picture minus the mean.
  • the electronic device normalizes the 224*224 size face picture to obtain the picture to be detected.
  • normalization refers to normalizing the amplitude to the same range, that is, dividing the standard deviation of the face image. After normalization, the interference caused by the difference in the value range of the data of each dimension is reduced. For example, we have two dimensions of features A and B. The range of A is 0 to 10, and the range of B is 0 to 10000. If it is problematic to use these two features directly, after normalization, the data of A and B will both become the range of 0 to 1, which is convenient for training and calculation in the same dimension.
  • the AU detection model is obtained by training a face picture with an AU mark and a face picture with a classification mark, and is used for outputting the expression detection result according to the target face picture.
  • the method before inputting the image to be detected into an AU detection model trained in combination with a backpropagation algorithm and a neural network algorithm, and obtaining an expression detection result, the method further includes:
  • the electronic device trains the AU detection model.
  • training the AU detection model by the electronic device includes:
  • the electronic device When the training instruction is received, the electronic device obtains the face picture with the AU mark and the face picture with the classification mark as sample pictures. Further, the electronic device extracts the first face classification model of the pre-trained face classification model. A parameter and a second parameter of a pre-trained expression recognition model. The electronic device uses a backpropagation algorithm, combines the first parameter and the second parameter, and uses a neural network algorithm to train the sample picture to obtain AU Check the model.
  • the electronic device receiving the training instruction includes, but is not limited to, one or a combination of the following:
  • the electronic device receives a signal that the user triggers the configuration button to confirm that the training instruction is received.
  • the configuration button is pre-configured and used to trigger the training instruction.
  • the configuration button may be a virtual button or a physical button.
  • the electronic device receives the configuration voice signal to determine that the training instruction is received.
  • the configuration voice signal is pre-configured and used to trigger the training instruction.
  • the electronic device receives the voice input by the user and performs voice recognition on the voice to determine whether the voice is consistent with the configuration voice, and when the voice is consistent with the configuration voice, all The electronic device determines to receive the training instruction.
  • the FACS Food Action Coding System
  • AU is the subtle movement of facial muscles, that is, the basic muscle action unit of the human face, and AU mainly includes single AU and combined AU.
  • the AU detection refers to a process of judging which AU the face picture is by comparing the similarity between the face picture and the AU.
  • 19 single AUs in FACS can be selected, such as 6 upper half face AUs and 13 lower half face AUs, and this application is not limited.
  • the AU detection refers to using the above 19 AUs as the standard for detection and comparison to predict the probability that the input face picture belongs to each of the 19 AUs.
  • the AU marking of a face picture includes, but is not limited to, any of the following methods:
  • the electronic device tags the AU in the face picture.
  • the electronic device marks the AU in the face picture with a special mark (such as a red circle, a red box, etc.).
  • the electronic device classifying and marking the face picture includes:
  • the face pictures are classified, and the face pictures belonging to different users are marked separately.
  • the marking method may be labeling, etc., which is not limited in this application.
  • the sample picture is used to train the AU detection model using a neural network algorithm.
  • the electronic device acquiring the face picture with AU mark and the face picture with classification mark as sample pictures includes, but is not limited to one or more of the following methods combination:
  • the electronic device uses web crawler technology to obtain the sample picture.
  • the electronic device adopts the web crawler technology to obtain a large number of face pictures as the sample pictures, which effectively guarantees the training accuracy of the model.
  • the electronic device obtains the sample picture through a face picture obtaining tool.
  • the electronic device reads pictures through opencv (open source library).
  • the electronic device obtains the uploaded face picture as the sample picture.
  • the electronic device may also receive a face picture uploaded by the user as the sample picture to ensure the accuracy of the model.
  • the format of the sample picture includes, but is not limited to: jpg, png, gif, etc., which are not limited here.
  • acquiring, by the electronic device, the face picture with the AU mark and the face picture with the classification mark as the sample picture further includes:
  • Data enhancement is performed on the face picture with the AU mark and the face picture with the classification mark to obtain the sample picture.
  • the electronic device first performs data enhancement on the face image to ensure sufficient training samples, so as to achieve a better training effect.
  • this application does not limit the data enhancement technology adopted.
  • the face picture with the AU mark and the face picture with the classification mark are used as sample pictures at the same time, it can ensure that there are sufficient training samples during training, and the training effect is better. And because more feature details are provided, the over-fitting phenomenon can be effectively avoided.
  • the electronic device first trains the face classification model and the expression recognition model.
  • the method further includes:
  • the electronic device trains the face classification model to obtain the first parameter, and uses a backpropagation algorithm to combine the first parameter to train the expression recognition model to obtain the second parameter.
  • the electronic device uses a migration method to gradually train to obtain the face classification model.
  • the parameters of the last layer of the fully connected layer represent the number of face types, and the softmax layer is used for multi-classification.
  • the network composition of the face classification model and the expression recognition model are similar, only the last layer has a different structure and the number of other parameters are the same. Therefore, the network parameters can be loaded with each other.
  • the electronic device transfers and learns the expression recognition model with the first parameter of the face classification model, that is, directly loads the The first parameter is trained, the first parameter is used as the initial parameter, the parameters of each layer of the neural network from the beginning to conv3_x are fixed, and the back propagation algorithm is used to train the parameters of each layer of conv4_x and later.
  • the electronic device uses the training results of the face classification model and migrates, while locking some network layer parameters. Therefore, the face structure features learned in the face classification can be fully utilized in the AU prediction.
  • the prior knowledge of the face classification learning is used to improve the AU training accuracy.
  • the electronic device utilizes the backpropagation algorithm and makes full use of prior knowledge to be able to continuously update network parameters to achieve higher training accuracy.
  • the electronic device uses a backpropagation algorithm to train the sample picture in combination with the first parameter and the second parameter to obtain an AU detection model including:
  • the electronic device calculates the accuracy value of the output result of the AU detection model, and obtains the accuracy threshold value of the output result of the AU detection model. Further, the electronic device uses a backpropagation algorithm to adjust all the values in the AU detection model. Parameter, until the accuracy value reaches the accuracy threshold, stop training.
  • the accuracy threshold can be customized to train a model that meets the accuracy requirements according to actual needs.
  • the electronic device can realize the training of the AU detection model through continuous adjustment of all parameters to obtain a model that meets the accuracy requirements, and since all parameters participate in the adjustment, the training effect is better.
  • the electronic device retains the AU detection result in the expression detection result, and further, the electronic device deletes the face classification result in the expression detection result.
  • the face pictures with classification marks are used as sample pictures, more feature details are provided, which can effectively prevent the situation of learning wrong information due to overfitting.
  • the face picture features with AU marks are gradually strengthening, and the face picture features with classification marks are gradually weakening, which leads to the gradual improvement of the accuracy of the AU detection results after the training is completed.
  • the accuracy of the face classification result is gradually decreasing. Therefore, the electronic device retains the AU detection result in the expression detection result and deletes the face classification result in the expression detection result to achieve a reasonable result. use.
  • the acquiring unit 110 acquires the target face picture.
  • the target face picture refers to a picture that needs to be detected.
  • the adjusting unit 117 adjusts the size of the target face picture.
  • the adjustment unit 117 in order to ensure that training and calculation are performed in the same dimension, the adjustment unit 117 first preprocesses the target face picture, that is, adjusts the size of the target face picture.
  • the adjustment unit 117 scales the face image to a size with a minimum side length of 256 in equal proportions, such as scaling [1280, 720] to a size [455, 256], and further, the adjustment unit 117 Perform center cropping on the face image, and take the 224*224 size area image in the middle of the face image, such as [455,256] size image taking the [115:339, 16:240] area to get 224*224 size Face picture.
  • the de-averaging unit 118 performs de-averaging on the adjusted target face picture.
  • the normalization unit 113 normalizes a face picture with a size of 224*224 to obtain the picture to be detected.
  • normalization refers to normalizing the amplitude to the same range, that is, dividing the standard deviation of the face image. After normalization, the interference caused by the difference in the value range of the data of each dimension is reduced. For example, we have two dimensions of features A and B. The range of A is 0 to 10, and the range of B is 0 to 10000. If it is problematic to use these two features directly, after normalization, the data of A and B will both become the range of 0 to 1, which is convenient for training and calculation in the same dimension.
  • the input unit 114 inputs the image to be detected into an AU detection model trained in combination with a back propagation algorithm and a neural network algorithm to obtain an expression detection result.
  • the AU detection model is obtained by training a face picture with an AU mark and a face picture with a classification mark, and is used for outputting the expression detection result according to the target face picture.
  • the method before inputting the image to be detected into an AU detection model trained in combination with a backpropagation algorithm and a neural network algorithm, and obtaining an expression detection result, the method further includes:
  • training the AU detection model includes:
  • the acquiring unit 110 acquires a face picture with an AU mark and a face picture with a classification mark as sample pictures. Further, the extraction unit 111 extracts the first face classification model of the pre-trained face classification model. A parameter and a second parameter of a pre-trained expression recognition model.
  • the training unit 112 uses a backpropagation algorithm, combines the first parameter and the second parameter, and uses a neural network algorithm to train the sample picture to obtain AU detection model.
  • the training instruction received by the acquiring unit 110 includes, but is not limited to, one or a combination of the following:
  • the acquiring unit 110 receives a signal that the user triggers the configuration button to confirm that the training instruction is received.
  • the configuration button is pre-configured and used to trigger the training instruction.
  • the configuration button may be a virtual button or a physical button.
  • the acquiring unit 110 receives the configuration voice signal to determine that the training instruction is received.
  • the configuration voice signal is pre-configured and used to trigger the training instruction.
  • the acquiring unit 110 receives the voice input by the user and performs voice recognition on the voice to determine whether the voice is consistent with the configuration voice, and when the voice is consistent with the configuration voice, The acquiring unit 110 determines that the training instruction is received.
  • the FACS Food Action Coding System
  • AU is the subtle movement of facial muscles, that is, the basic muscle action unit of the human face, and AU mainly includes single AU and combined AU.
  • AU may include raised inner eyebrows, raised mouth corners, and wrinkled nose.
  • AU detection refers to comparing the similarity between the face picture and the AU to determine which AU the face picture is.
  • 19 single AUs in FACS can be selected, such as 6 upper half face AUs and 13 lower half face AUs, and this application is not limited.
  • AU detection refers to using the above 19 AUs as the standard for detection and comparison to predict the probability that the input face picture belongs to each of the 19 AUs.
  • the AU marking of a face picture includes, but is not limited to, any of the following methods:
  • classifying and marking the face picture includes:
  • the face pictures are classified, and the face pictures belonging to different users are marked separately.
  • the marking method may be labeling, etc., which is not limited in this application.
  • the sample picture is used to train the AU detection model using a neural network algorithm.
  • the acquiring unit 110 acquiring the face picture with AU mark and the face picture with classification mark as sample pictures includes, but is not limited to, one or more of the following methods The combination:
  • the obtaining unit 110 uses web crawler technology to obtain the sample picture.
  • the acquisition unit 110 adopts the web crawler technology to acquire a large number of face pictures as the sample pictures, which effectively guarantees the training accuracy of the model.
  • the obtaining unit 110 obtains the sample picture through a face picture obtaining tool.
  • the acquiring unit 110 reads pictures and the like through opencv (open source library).
  • the obtaining unit 110 obtains the uploaded face picture as the sample picture.
  • the acquisition unit 110 may also receive a face picture uploaded by the user as the sample picture to ensure the accuracy of the model.
  • the format of the sample picture includes, but is not limited to: jpg, png, gif, etc., which are not limited here.
  • the acquiring unit 110 acquires the face picture with AU mark and the face picture with classification mark as sample pictures further includes:
  • the acquiring unit 110 performs data enhancement on the face picture with the AU mark and the face picture with the classification mark to obtain the sample picture.
  • the The acquiring unit 110 first performs data enhancement on the face image to ensure sufficient training samples, so as to achieve a better training effect.
  • this application does not limit the data enhancement technology adopted.
  • the training unit 112 first trains the face classification model and the expression recognition model.
  • the method further includes:
  • the training unit 112 trains the face classification model to obtain the first parameter, and uses a backpropagation algorithm in combination with the first parameter to train the expression recognition model to obtain the second parameter.
  • the training unit 112 adopts a migration method to gradually train to obtain the face classification model.
  • the parameters of the last fully connected layer represent the number of face types, and the softmax layer is used for multi-classification.
  • the training unit 112 first trains 100 types of classification models, and when the accuracy of the classification results reaches 70%, then transfers the 100 types of classification results to 1200 types of classification models for training. When the accuracy of the classification results is After reaching 90%, the classification results of 1,200 categories are transferred to the classification models of 16,000 categories for training, and so on, and finally, as high as possible, 16,000 categories of face classification models are obtained.
  • the network composition of the face classification model and the expression recognition model are similar, only the last layer has a different structure and the number of other parameters are the same. Therefore, the network parameters can be loaded with each other.
  • the training unit 112 utilizes the backpropagation algorithm and makes full use of prior knowledge to realize continuous training and updating of network parameters to achieve higher training accuracy.
  • the training unit 112 uses a backpropagation algorithm to train the sample picture in combination with the first parameter and the second parameter to obtain an AU detection model including:
  • the training unit 112 calculates the accuracy value of the output result of the AU detection model, and obtains the accuracy threshold of the output result of the AU detection model. Further, the training unit 112 uses a back propagation algorithm to adjust the accuracy of the AU detection model. All parameters of, until the accuracy value reaches the accuracy threshold, stop training.
  • the accuracy threshold can be customized to train a model that meets the accuracy requirements according to actual needs.
  • the AU detection model obtained after training by the training unit 112 first performs data splicing on the sample pictures, and then passes through a convolutional layer, a pooling layer, and then two sets of convolution packages with different parameters, The data is split, and the two sets of convolution packages with different parameters are further passed through the pooling layer, the fully connected layer, and finally the sigmoid layer for two classifications to obtain the prediction result of AU.
  • the second convolutional layer of the first convolution package of each group downsampling with stride of 2 is performed to reduce the dimensionality of the data, remove secondary factors, and reduce the amount of redundant data.
  • the training unit 112 can train the AU detection model through continuous adjustment of all parameters to obtain a model that meets the accuracy requirements, and since all parameters participate in the adjustment, the training effect is better. .
  • the method further includes:
  • the retaining unit 115 retains the AU detection result in the expression detection result, and further, the deleting unit 116 deletes the face classification result in the expression detection result.
  • the face pictures with classification marks are used as sample pictures, more feature details are provided, which can effectively prevent the situation of learning wrong information due to overfitting.
  • the face picture features with AU marks are gradually strengthening, and the face picture features with classification marks are gradually weakening, which leads to the gradual improvement of the accuracy of the AU detection results after the training is completed.
  • the accuracy of the face classification result is gradually decreasing. Therefore, the retaining unit 115 retains the AU detection result in the expression detection result, and further, the deletion unit 116 deletes the face classification in the expression detection result. Results in order to achieve a reasonable use of the results.
  • the present application can obtain a target face picture, adjust the size of the target face picture, and perform de-averaging on the adjusted target face picture to further correct
  • the target face picture after de-averaging is normalized to obtain the picture to be detected, so as to realize the preprocessing of the target face picture, which is convenient for training and calculation in the same dimension, and further input the picture to be detected into
  • the AU detection model trained by combining the backpropagation algorithm and the neural network algorithm the expression detection result is obtained, so that the neural network algorithm can realize intelligent decision-making, which is accurate and efficient.
  • FIG. 3 it is a schematic structural diagram of an electronic device implementing a preferred embodiment of the AU detection method of the present application.
  • the electronic device 1 is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC) ), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the electronic device 1 can also be, but is not limited to, any electronic product that can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device, for example, a personal computer, a tablet computer, or a smart phone. , Personal Digital Assistant (PDA), game consoles, interactive network TV (Internet Protocol Television, IPTV), smart wearable devices, etc.
  • PDA Personal Digital Assistant
  • IPTV Internet Protocol Television
  • smart wearable devices etc.
  • the electronic device 1 may also be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the network where the electronic device 1 is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), etc.
  • the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions stored in the memory 12 and running on the processor 13 , Such as the AU detection program.
  • the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation on the electronic device 1. It may include more or less components than those shown in the figure, or a combination of certain components, or different components. Components, for example, the electronic device 1 may also include input and output devices, network access devices, buses, and so on.
  • the processor 13 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (ASICs), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the processor 13 is the computing core and control center of the electronic device 1 and connects the entire electronic device with various interfaces and lines. Each part of 1, and executes the operating system of the electronic device 1, and various installed applications, program codes, etc.
  • the processor 13 executes the operating system of the electronic device 1 and various installed applications.
  • the processor 13 executes the application program to implement the steps in the foregoing embodiments of the AU detection method, such as steps S10, S11, S12, S13, and S14 shown in FIG. 1.
  • the computer-readable instructions may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to Complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions in the electronic device 1.
  • the computer-readable instructions can be divided into an acquisition unit 110, an extraction unit 111, a training unit 112, a normalization unit 113, an input unit 114, a retention unit 115, a deletion unit 116, an adjustment unit 117, and an average removal unit 118. .
  • the memory 12 may be used to store the computer-readable instructions and/or modules.
  • the processor 13 executes or executes the computer-readable instructions and/or modules stored in the memory 12 and calls the computer-readable instructions and/or modules stored in the memory 12
  • the data inside realizes various functions of the electronic device 1.
  • the memory 12 may mainly include a storage program area and a storage data area.
  • the storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may Store data (such as audio data, etc.) created based on the use of electronic devices.
  • the memory 12 may include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), At least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, a flash memory card (Flash Card), At least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • the memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a non-volatile memory in a physical form, such as a memory stick, a TF card (Trans-flash Card), and so on.
  • TF card Trans-flash Card
  • the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a non-volatile readable storage medium. When the computer program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the execution of multiple instructions by the processor 13 includes:
  • a face picture with an AU mark and a face picture with a classification mark are obtained as sample pictures;
  • the execution of multiple instructions by the processor 13 includes:
  • the processor 13 further executing multiple instructions includes:
  • the expression recognition model is trained to obtain the second parameter.
  • the back propagation algorithm is used to adjust all parameters in the AU detection model until the accuracy value reaches the accuracy threshold, and training is stopped.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of hardware plus software functional modules.

Abstract

一种AU检测方法、装置、电子设备及存储介质。所述AU检测方法能够当接收到AU检测指令时,获取目标人脸图片(S10),调整所述目标人脸图片的尺寸(S11),并对调整后的目标人脸图片进行去均值(S12),进一步对去均值后的目标人脸图片进行归一化,得到待检测图片(S13),以实现对所述目标人脸图片的预处理,便于在同一维度上进行训练及计算,进一步将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果(S14),从而结合神经网络算法实现智能决策,准确且高效。

Description

AU检测方法、装置、电子设备及存储介质
本申请要求于2019年06月18日提交中国专利局,申请号为201910528234.6发明名称为“AU检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及智能决策技术领域,尤其涉及一种AU检测方法、装置、电子设备及存储介质。
背景技术
随着人工智能技术的飞速发展,为了应对心理学研究、医疗、公共安全等众多领域的科研需求,AU(action units,人脸动作单元)检测也随之越来越重要。
现有技术方案中,AU检测通常采用神经网络算法,但在传统的训练方式中,训练样本单一,且训练参数有限,因此检测结果的准确率及检测速度均受到一定限制,对人工智能的发展产生不利影响。
发明内容
鉴于以上内容,有必要提供一种AU检测方法、装置、电子设备及存储介质,本申请能够实现网络参数的不断更新,并充分利用先验知识,以达到更高的训练精度。
一种AU检测方法,所述方法包括:
当接收到AU检测指令时,获取目标人脸图片;
调整所述目标人脸图片的尺寸;
对调整后的目标人脸图片进行去均值;
对去均值后的目标人脸图片进行归一化,得到待检测图片;
将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
根据本申请优选实施例,在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,所述方法还包括:
当接收到训练指令时,获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片;
提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型 的第二参数;
利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
根据本申请优选实施例,所述获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片包括以下一种或者多种方式的组合:
采用网络爬虫技术获取所述样本图片;及/或
通过人脸图片获取工具获取所述样本图片;及/或
获取上传的人脸图片作为所述样本图片。
根据本申请优选实施例,所述获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片还包括:
对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
根据本申请优选实施例,在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,所述方法还包括:
训练出所述人脸分类模型,得到所述第一参数;
利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
根据本申请优选实施例,所述利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型包括:
计算所述AU检测模型输出结果的精度值;
获取所述AU检测模型输出结果的精度阈值;
利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
根据本申请优选实施例,在得到所述表情检测结果后,所述方法还包括:
保留所述表情检测结果中的AU检测结果;
删除所述表情检测结果中的人脸分类结果。
一种AU检测装置,所述装置包括:
获取单元,用于当接收到AU检测指令时,获取目标人脸图片;
调整单元,用于调整所述目标人脸图片的尺寸;
去均值单元,用于对调整后的目标人脸图片进行去均值;
归一化单元,用于对去均值后的目标人脸图片进行归一化,得到待检测图片;
输入单元,用于将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
根据本申请优选实施例,所述获取单元,还用于在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,当接收到训练指令时,获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片;
所述装置还包括:
提取单元,用于提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数;
训练单元,用于利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
根据本申请优选实施例,所述获取单元获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片包括以下一种或者多种方式的组合:
采用网络爬虫技术获取所述样本图片;及/或
通过人脸图片获取工具获取所述样本图片;及/或
获取上传的人脸图片作为所述样本图片。
根据本申请优选实施例,所述获取单元获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片还包括:
对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
根据本申请优选实施例,所述训练单元,还用于在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,训练出所述人脸分类模型,得到所述第一参数;
所述训练单元,还用于利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
根据本申请优选实施例,所述训练单元利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型包括:
计算所述AU检测模型输出结果的精度值;
获取所述AU检测模型输出结果的精度阈值;
利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
根据本申请优选实施例,所述装置还包括:
保留单元,用于在得到所述表情检测结果后,保留所述表情检测结果中的AU检测结果;
删除单元,用于删除所述表情检测结果中的人脸分类结果。
一种电子设备,所述电子设备包括:
存储器,存储至少一个计算机可读指令;及
处理器,执行所述存储器中存储的至少一个计算机可读指令以实现所述AU检测方法。
一种非易失性可读存储介质,所述非易失性可读存储介质中存储有至少一个计算机可读指令,所述至少一个计算机可读指令被电子设备中的处理器执行以实现所述AU检测方法。
由以上技术方案可以看出,本申请能够当接收到AU检测指令时,获取目标人脸图片,调整所述目标人脸图片的尺寸,并对调整后的目标人脸图片进行去均值,进一步对去均值后的目标人脸图片进行归一化,得到待检测图片,以实现对所述目标人脸图片的预处理,便于在同一维度上进行训练及计算,进一 步将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,从而结合神经网络算法实现智能决策,准确且高效。
附图说明
图1是本申请AU检测方法的较佳实施例的流程图。
图2是本申请AU检测装置的较佳实施例的功能模块图。
图3是本申请实现AU检测方法的较佳实施例的电子设备的结构示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面结合附图和具体实施例对本申请进行详细描述。
如图1所示,是本申请AU检测方法的较佳实施例的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。
所述AU检测方法应用于一个或者多个电子设备中,所述电子设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述电子设备可以是任何一种可与用户进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。
所述电子设备还可以包括网络设备和/或用户设备。其中,所述网络设备包括,但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云。
所述电子设备所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
S10,当接收到AU检测指令时,获取目标人脸图片。
在本申请的至少一个实施例中,所述AU检测指令可以由用户触发,也可以在满足一定条件时自动触发,本申请不限制。
在本申请的至少一个实施例中,所述目标人脸图片是指需要检测的图片。
S11,调整所述目标人脸图片的尺寸。
在本申请的至少一个实施例中,为了保证在同一维度上进行训练及计算,所述电子设备首先对所述述目标人脸图片进行预处理,即调整所述目标人脸图片的尺寸。
具体地,所述电子设备把所述人脸图片等比例缩放到最小边长为256的尺寸,如[1280,720]缩放到[455,256]尺寸,进一步地,所述电子设备对所述人脸图片进行中心裁剪,取所述人脸图片最中间224*224尺寸的区域图片,如[455,256]尺寸图片取[115:339,16:240]区域,得到224*224尺寸的人 脸图片。
S12,对调整后的目标人脸图片进行去均值。
在本申请的至少一个实施例中,所述电子设备对所述224*224尺寸的人脸图片进行去均值,把输入图片对应的矩阵各个维度都中心化为0,即人脸图片减均值。
S13,对去均值后的目标人脸图片进行归一化,得到待检测图片。
在本申请的至少一个实施例中,所述电子设备对所述224*224尺寸的人脸图片进行归一化,得到所述待检测图片。
具体地,归一化是指将幅度归一化到同样的范围,即人脸图片除标准差。经过归一化后,减少了各维度数据取值范围有差异而带来的干扰,例如,我们有两个维度的特征A和B,A范围是0到10,而B范围是0到10000,如果直接使用这两个特征是有问题的,进行归一化处理后,则A和B的数据都变为0到1的范围,便于在同一维度上进行训练及计算。
S14,将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果。
其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
通过上述实施方式,所述电子设备能够利用所述AU检测模型得到表情检测结果,且结果更加准确。
在本申请的至少一个实施例中,在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,所述方法还包括:
所述电子设备训练所述AU检测模型。
具体地,所述电子设备训练所述AU检测模型包括:
当接收到训练指令时,所述电子设备获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片,进一步地,所述电子设备提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数,所述电子设备利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
在本申请的至少一个实施例中,所述电子设备接收所述训练指令包括,但不限于以下一种或者多种的组合:
(1)所述电子设备接收到用户触发配置按键的信号,以确定接收到所述训练指令。
具体地,所述配置按键是预先配置的,用于触发所述训练指令。所述配置按键可以是虚拟按键,也可以是实体按键。
(2)所述电子设备接收到配置语音信号,以确定接收到所述训练指令。
具体地,所述配置语音信号是预先配置的,用于触发所述训练指令。
进一步地,所述电子设备接收用户输入的语音,并对所述语音进行语音识别,以确定所述语音是否与所述配置语音一致,并且,当所述语音与所述配置语音一致时,所述电子设备确定接收到所述训练指令。
在本申请的至少一个实施例中,FACS(Facial Action Coding System,面部行为编码系统)中详细分析了面部全部肌肉组织的活动,及其活动所引起的面部各个独立部位的变化,以及可观察到的、由这些肌肉活动所引起的表情。在此基础上,将脸部运动分解成一些基本的AU。其中,AU是脸部肌肉的细微运动,也就是指人脸部的基本肌肉动作单元,而且AU主要有单个AU和组合AU。
例如:AU可以包括内眉上扬、嘴角上扬、鼻子蹙皱等。
进一步地,所述AU检测是指通过比较人脸图片和AU的相似度,以判断人脸图片为哪一个AU的过程。
在本提案中,可以选取FACS中的19个单个AU,如:6个上半脸AU和13个下半脸AU,本申请不限制。
那么,所述AU检测则是指将上述19个AU作为检测对比的标准,以预测输入的人脸图片属于19个AU中每个AU的概率。
在本申请的至少一个实施例中,将人脸图片进行AU标记包括,但不限于以下任意一种方式:
(1)所述电子设备为人脸图片中的AU打标签。
(2)所述电子设备对人脸图片中的AU做特殊标记(如:红色圆圈、红色方框等)。
在本申请的至少一个实施例中,所述电子设备将人脸图片进行分类标记包括:
根据用户的不同,对人脸图片进行分类,并对属于不同用户的人脸图片分别进行标记。
具体地,标记方式可以是打标签等,本申请不限制。
在本申请的至少一个实施例中,所述样本图片用于采用神经网络算法训练所述AU检测模型。
在本申请的至少一个实施例中,所述电子设备获取所述带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片包括,但不限于以下一种或者多种方式的组合:
(1)所述电子设备采用网络爬虫技术获取所述样本图片。
由于训练样本的数量越多,训练精度越高,因此,所述电子设备采用网络爬虫技术,可以获取到大量人脸图片作为所述样本图片,有效保证了模型的训练精度。
(2)所述电子设备通过人脸图片获取工具获取所述样本图片。
例如:所述电子设备通过opencv(开源库)读取图片等。
(3)所述电子设备获取上传的人脸图片作为所述样本图片。
当检测范围是根据用户的需求设定时,所述电子设备还可以接收用户上传的人脸图片作为所述样本图片,以确保模型的精确性。
在本申请的至少一个实施例中,所述样本图片的格式包括,但不限于:jpg、png和gif等,此处不作限制。
在本申请的至少一个实施例中,所述电子设备获取所述带有AU标记的 人脸图片及带有分类标记的人脸图片作为样本图片还包括:
对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
可以理解的是,一方面局限于人脸图片数量的有限性,另一方面限制于AU标记的困难性,因此,为了保证具有足够的训练样本进行训练,以得到更加精准的所述AU检测模型,所述电子设备首先对人脸图片进行数据增强,以确保训练样本充足,从而达到更好的训练效果。
具体地,只要能够起到数据增强的作用,本申请对采取的数据增强技术不限制。
在本实施例中,由于同时采用所述带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片,因此能够保证训练时具有充足的训练样本,训练效果更好。且由于提供了更多的特征细节,进而有效避免过拟合现象的发生。
在本申请的至少一个实施例中,所述电子设备首先训练所述人脸分类模型及所述表情识别模型。
具体地,所述电子设备在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,所述方法还包括:
所述电子设备训练出所述人脸分类模型,得到所述第一参数,利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
例如:所述电子设备采用迁移方式逐步训练得到所述人脸分类模型,具体地,最后一层全连接层参数表征人脸种类数,并采用softmax层进行多分类。先训练100个种类的分类模型,当分类结果的准确率达到70%后,再把100个种类的分类结果迁移到1200个种类的分类模型进行训练,当分类结果的准确率达到90%后,再把1200个种类的分类结果迁移到16000个种类的分类模型进行训练,以此类推,最后尽可能高的训练得到16000个种类的人脸分类模型。
进一步地,由于所述人脸分类模型及所述表情识别模型的网络组成类似,仅仅最后一层结构不一样,其他参数数量均一致,因此,网络参数是可以互相加载的。
当所述人脸分类模型的准确率达到预设值(如:95%)时,所述电子设备以所述人脸分类模型的第一参数迁移学习所述表情识别模型,即直接加载所述第一参数进行训练,以所述第一参数作为初始参数,固定神经网络从开始到conv3_x的各层参数,并利用反向传播算法对conv4_x及后面的各层参数进行训练。
通过上述实施方式,所述电子设备使用人脸分类模型的训练结果并迁移,同时锁定部分网络层参数,因此,在AU预测中即可充分利用人脸分类中学习到的人脸结构特征,充分利用所述人脸分类学习的先验知识,提高AU训练精度。
在本申请的至少一个实施例中,所述电子设备利用所述反向传播算法, 并充分利用先验知识,能够实现网络参数的不断更新,以达到更高的训练精度。
具体地,所述电子设备利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型包括:
所述电子设备计算所述AU检测模型输出结果的精度值,并获取所述AU检测模型输出结果的精度阈值,进一步地,所述电子设备利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
具体地,所述精度阈值可以进行自定义配置,以便根据实际需求训练出满足精度要求的模型。
进一步地,训练后得到的所述AU检测模型首先将所述样本图片进行数据拼接,然后经过一个卷积层、一个池化层,再经过两组参数不同的convolution package后,将数据进行拆分,进一步经由两组参数不同的convolution package后,再经过池化层、全连接层,最后由sigmoid层进行二分类,得到AU的预测结果。其中,在每组第一个convolution package的第二个卷积层处进行stride为2的下采样,以便对数据降维,去掉次要因素,并减少冗余数据量。
通过上述实施方式,所述电子设备能够通过所有参数的不断调整,实现对所述AU检测模型的训练,以获取到满足精度需求的模型,而且由于所有参数都参与调整,因此训练效果更佳。
在本申请的至少一个实施例中,所述电子设备在得到所述表情检测结果后,所述方法还包括:
所述电子设备保留所述表情检测结果中的AU检测结果,进一步地,所述电子设备删除所述表情检测结果中的人脸分类结果。
通过上述实施方式,由于将所述带有分类标记的人脸图片作为样本图片,因此提供了更多的特征细节,可以有效防止由于过拟合导致学习到错误信息的情况,但是,在训练过程中,所述带有AU标记的人脸图片特征在逐步加强,而所述带有分类标记的人脸图片特征在逐渐减弱,导致训练结束后,所述AU检测结果的精度在逐渐提高,而所述人脸分类结果的精度在逐渐下降,因此,所述电子设备保留所述表情检测结果中的AU检测结果,并删除所述表情检测结果中的人脸分类结果,以达到对结果的合理利用。
由以上技术方案可以看出,本申请能够当接收到AU检测指令时,获取目标人脸图片,调整所述目标人脸图片的尺寸,并对调整后的目标人脸图片进行去均值,进一步对去均值后的目标人脸图片进行归一化,得到待检测图片,以实现对所述目标人脸图片的预处理,便于在同一维度上进行训练及计算,进一步将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,从而结合神经网络算法实现智能决策,准确且高效。
如图2所示,是本申请AU检测装置的较佳实施例的功能模块图。所述 AU检测装置11包括获取单元110、提取单元111、训练单元112、归一化单元113、输入单元114、保留单元115、删除单元116、调整单元117及去均值单元118。本申请所称的模块/单元是指一种能够被处理器13所执行,并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器12中。在本实施例中,关于各模块/单元的功能将在后续的实施例中详述。
当接收到AU检测指令时,获取单元110获取目标人脸图片。
在本申请的至少一个实施例中,所述AU检测指令可以由用户触发,也可以在满足一定条件时自动触发,本申请不限制。
在本申请的至少一个实施例中,所述目标人脸图片是指需要检测的图片。
调整单元117调整所述目标人脸图片的尺寸。
在本申请的至少一个实施例中,为了保证在同一维度上进行训练及计算,所述调整单元117首先对所述述目标人脸图片进行预处理,即调整所述目标人脸图片的尺寸。
具体地,所述调整单元117把所述人脸图片等比例缩放到最小边长为256的尺寸,如[1280,720]缩放到[455,256]尺寸,进一步地,所述调整单元117对所述人脸图片进行中心裁剪,取所述人脸图片最中间224*224尺寸的区域图片,如[455,256]尺寸图片取[115:339,16:240]区域,得到224*224尺寸的人脸图片。
去均值单元118对调整后的目标人脸图片进行去均值。
在本申请的至少一个实施例中,所述去均值单元118对所述224*224尺寸的人脸图片进行去均值,把输入图片对应的矩阵各个维度都中心化为0,即人脸图片减均值。
归一化单元113对去均值后的目标人脸图片进行归一化,得到待检测图片。
在本申请的至少一个实施例中,所述归一化单元113对224*224尺寸的人脸图片进行归一化,得到所述待检测图片。
具体地,归一化是指将幅度归一化到同样的范围,即人脸图片除标准差。经过归一化后,减少了各维度数据取值范围有差异而带来的干扰,例如,我们有两个维度的特征A和B,A范围是0到10,而B范围是0到10000,如果直接使用这两个特征是有问题的,进行归一化处理后,则A和B的数据都变为0到1的范围,便于在同一维度上进行训练及计算。
输入单元114将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果。
其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
通过上述实施方式,能够利用所述AU检测模型得到表情检测结果,且所述检测结果更加准确。
在本申请的至少一个实施例中,在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,所述 方法还包括:
训练所述AU检测模型。
具体地,训练所述AU检测模型包括:
当接收到训练指令时,所述获取单元110获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片,进一步地,提取单元111提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数,所述训练单元112利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
在本申请的至少一个实施例中,所述获取单元110接收所述训练指令包括,但不限于以下一种或者多种的组合:
(1)所述获取单元110接收到用户触发配置按键的信号,以确定接收到所述训练指令。
具体地,所述配置按键是预先配置的,用于触发所述训练指令。所述配置按键可以是虚拟按键,也可以是实体按键。
(2)所述获取单元110接收到配置语音信号,以确定接收到所述训练指令。
具体地,所述配置语音信号是预先配置的,用于触发所述训练指令。
进一步地,所述获取单元110接收用户输入的语音,并对所述语音进行语音识别,以确定所述语音是否与所述配置语音一致,并且,当所述语音与所述配置语音一致时,所述获取单元110确定接收到所述训练指令。
在本申请的至少一个实施例中,FACS(Facial Action Coding System,面部行为编码系统)中详细分析了面部全部肌肉组织的活动,及其活动所引起的面部各个独立部位的变化,以及可观察到的、由这些肌肉活动所引起的表情。在此基础上,将脸部运动分解成一些基本的AU。其中,AU是脸部肌肉的细微运动,也就是指人脸部的基本肌肉动作单元,而且AU主要有单个AU和组合AU。
例如:AU可以包括内眉上扬、嘴角上扬、鼻子蹙皱等。
进一步地,AU检测是指通过比较人脸图片和AU的相似度,以判断人脸图片为哪一个AU。
在本提案中,可以选取FACS中的19个单个AU,如:6个上半脸AU和13个下半脸AU,本申请不限制。
那么,AU检测则是指将上述19个AU作为检测对比的标准,以预测输入的人脸图片属于19个AU中每个AU的概率。
在本申请的至少一个实施例中,将人脸图片进行AU标记包括,但不限于以下任意一种方式:
(1)为人脸图片中的AU打标签。
(2)对所述人脸图片中的AU做特殊标记(如:红色圆圈、红色方框等)。
在本申请的至少一个实施例中,将人脸图片进行分类标记包括:
根据用户的不同,对人脸图片进行分类,并对属于不同用户的人脸图片分别进行标记。
具体地,标记方式可以是打标签等,本申请不限制。
在本申请的至少一个实施例中,所述样本图片用于采用神经网络算法训练所述AU检测模型。
在本申请的至少一个实施例中,所述获取单元110获取所述带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片包括,但不限于以下一种或者多种方式的组合:
(1)所述获取单元110采用网络爬虫技术获取所述样本图片。
由于训练样本的数量越多,训练精度越高,因此,所述获取单元110采用网络爬虫技术,可以获取到大量人脸图片作为所述样本图片,有效保证了模型的训练精度。
(2)所述获取单元110通过人脸图片获取工具获取所述样本图片。
例如:所述获取单元110通过opencv(开源库)读取图片等。
(3)所述获取单元110获取上传的人脸图片作为所述样本图片。
当检测范围是根据用户的需求设定时,所述获取单元110还可以接收用户上传的人脸图片作为所述样本图片,以确保模型的精确性。
在本申请的至少一个实施例中,所述样本图片的格式包括,但不限于:jpg、png和gif等,此处不作限制。
在本申请的至少一个实施例中,所述获取单元110获取所述带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片还包括:
所述获取单元110对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
可以理解的是,一方面局限于人脸图片数量的有限性,另一方面限制于AU标记的困难性,因此,为了保证具有足够的训练样本进行训练,以得到所述AU检测模型,所述获取单元110首先对所述人脸图片进行数据增强,以确保训练样本充足,从而达到更好的训练效果。
具体地,只要能够起到数据增强的作用,本申请对采取的数据增强技术不限制。
在本实施例中,由于同时采用所述带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片,因此能够保证训练时具有充足的训练样本,训练效果更好。且由于提供了更多的特征细节,进而有效避免过拟合现象的发生。
在本申请的至少一个实施例中,所述训练单元112首先训练所述人脸分类模型及所述表情识别模型。
具体地,所述提取单元111在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,所述方法还包括:
所述训练单元112训练出所述人脸分类模型,得到所述第一参数,利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
例如:所述训练单元112采用迁移方式逐步训练得到所述人脸分类模型,具体地,最后一层全连接层参数表征人脸种类数,并采用softmax层进行多 分类。所述训练单元112先训练100个种类的分类模型,当分类结果的准确率达到70%后,再把100个种类的分类结果迁移到1200个种类的分类模型进行训练,当分类结果的准确率达到90%后,再把1200个种类的分类结果迁移到16000个种类的分类模型进行训练,以此类推,最后尽可能高的训练得到16000个种类的人脸分类模型。
进一步地,由于所述人脸分类模型及所述表情识别模型的网络组成类似,仅仅最后一层结构不一样,其他参数数量均一致,因此,网络参数是可以互相加载的。
当所述人脸分类模型的准确率达到预设值(如:95%)时,所述训练单元112以所述人脸分类模型的第一参数迁移学习所述表情识别模型,即直接加载所述第一参数进行训练,以所述第一参数作为初始参数,固定神经网络从开始到conv3_x的各层参数,并利用反向传播算法对conv4_x及后面的各层参数进行训练。
通过上述实施方式,所述训练单元112使用人脸分类模型的训练结果并迁移,同时锁定部分网络层参数,因此,在AU预测中即可充分利用人脸分类中学习到的人脸结构特征,充分利用人脸分类学习的先验知识,提高AU训练精度。
在本申请的至少一个实施例中,所述训练单元112利用所述反向传播算法,并充分利用先验知识,能够实现网络参数的不断训练更新,以达到更高的训练精度。
具体地,所述训练单元112利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型包括:
所述训练单元112计算所述AU检测模型输出结果的精度值,并获取所述AU检测模型输出结果的精度阈值,进一步地,所述训练单元112利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
具体地,所述精度阈值可以进行自定义配置,以便根据实际需求训练出满足精度要求的模型。
进一步地,所述训练单元112训练后得到的所述AU检测模型首先将所述样本图片进行数据拼接,然后经过一个卷积层、一个池化层,再经过两组参数不同的convolution package后,将数据进行拆分,进一步经由两组参数不同的convolution package后,再经过池化层、全连接层,最后由sigmoid层进行二分类,得到AU的预测结果。其中,在每组第一个convolution package的第二个卷积层处进行stride为2的下采样,以便对数据降维,去掉次要因素,并减少冗余数据量。
通过上述实施方式,所述训练单元112能够通过所有参数的不断调整,实现对所述AU检测模型的训练,以获取到满足精度需求的模型,而且由于所有参数都参与调整,因此训练效果更佳。
在本申请的至少一个实施例中,在得到所述表情检测结果后,所述方法还包括:
保留单元115保留所述表情检测结果中的AU检测结果,进一步地,删除单元116删除所述表情检测结果中的人脸分类结果。
通过上述实施方式,由于将所述带有分类标记的人脸图片作为样本图片,因此提供了更多的特征细节,可以有效防止由于过拟合导致学习到错误信息的情况,但是,在训练过程中,所述带有AU标记的人脸图片特征在逐步加强,而所述带有分类标记的人脸图片特征在逐渐减弱,导致训练结束后,所述AU检测结果的精度在逐渐提高,而所述人脸分类结果的精度在逐渐下降,因此,所述保留单元115保留所述表情检测结果中的AU检测结果,进一步地,所述删除单元116删除所述表情检测结果中的人脸分类结果,以达到对结果的合理利用。
由以上技术方案可以看出,本申请能够当接收到AU检测指令时,获取目标人脸图片,调整所述目标人脸图片的尺寸,并对调整后的目标人脸图片进行去均值,进一步对去均值后的目标人脸图片进行归一化,得到待检测图片,以实现对所述目标人脸图片的预处理,便于在同一维度上进行训练及计算,进一步将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,从而结合神经网络算法实现智能决策,准确且高效。
如图3所示,是本申请实现AU检测方法的较佳实施例的电子设备的结构示意图。
所述电子设备1是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述电子设备1还可以是但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备等。
所述电子设备1还可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。
所述电子设备1所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
在本申请的一个实施例中,所述电子设备1包括,但不限于,存储器12、处理器13,以及存储在所述存储器12中并可在所述处理器13上运行的计算机可读指令,例如AU检测程序。
本领域技术人员可以理解,所述示意图仅仅是电子设备1的示例,并不构成对电子设备1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备1还可以包括输入输出设备、 网络接入设备、总线等。
所述处理器13可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,所述处理器13是所述电子设备1的运算核心和控制中心,利用各种接口和线路连接整个电子设备1的各个部分,及执行所述电子设备1的操作系统以及安装的各类应用程序、程序代码等。
所述处理器13执行所述电子设备1的操作系统以及安装的各类应用程序。所述处理器13执行所述应用程序以实现上述各个AU检测方法实施例中的步骤,例如图1所示的步骤S10、S11、S12、S13、S14。
或者,所述处理器13执行所述计算机可读指令时实现上述各装置实施例中各模块/单元的功能,例如:当接收到AU检测指令时,获取目标人脸图片;调整所述目标人脸图片的尺寸;对调整后的目标人脸图片进行去均值;对去均值后的目标人脸图片进行归一化,得到待检测图片;将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
示例性的,所述计算机可读指令可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器12中,并由所述处理器13执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,该指令段用于描述所述计算机可读指令在所述电子设备1中的执行过程。例如,所述计算机可读指令可以被分割成获取单元110、提取单元111、训练单元112、归一化单元113、输入单元114、保留单元115、删除单元116、调整单元117及去均值单元118。
所述存储器12可用于存储所述计算机可读指令和/或模块,所述处理器13通过运行或执行存储在所述存储器12内的计算机可读指令和/或模块,以及调用存储在存储器12内的数据,实现所述电子设备1的各种功能。所述存储器12可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据(比如音频数据等)等。此外,存储器12可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。
所述存储器12可以是电子设备1的外部存储器和/或内部存储器。进一步地,所述存储器12可以是具有实物形式的非易失性存储器,如内存条、TF卡(Trans-flash Card)等。
所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。
其中,所述计算机程序包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述非易失性可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)等。
结合图1,所述电子设备1中的所述存储器12存储多个指令以实现一种AU检测方法,所述处理器13可执行所述多个指令从而实现:当接收到AU检测指令时,获取目标人脸图片;调整所述目标人脸图片的尺寸;对调整后的目标人脸图片进行去均值;对去均值后的目标人脸图片进行归一化,得到待检测图片;将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
根据本申请优选实施例,所述处理器13执行多个指令包括:
当接收到训练指令时,获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片;
提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数;
利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
根据本申请优选实施例,所述处理器13执行多个指令包括:
采用网络爬虫技术获取所述样本图片;及/或
通过人脸图片获取工具获取所述样本图片;及/或
获取上传的人脸图片作为所述样本图片。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
训练出所述人脸分类模型,得到所述第一参数;
利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
计算所述AU检测模型输出结果的精度值;
获取所述AU检测模型输出结果的精度阈值;
利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
根据本申请优选实施例,所述处理器13还执行多个指令包括:
保留所述表情检测结果中的AU检测结果;
删除所述表情检测结果中的人脸分类结果。
具体地,所述处理器13对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种AU检测方法,其特征在于,所述方法包括:
    当接收到AU检测指令时,获取目标人脸图片;
    调整所述目标人脸图片的尺寸;
    对调整后的目标人脸图片进行去均值;
    对去均值后的目标人脸图片进行归一化,得到待检测图片;
    将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
  2. 如权利要求1所述的AU检测方法,其特征在于,在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,所述方法还包括:
    当接收到训练指令时,获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片;
    提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数;
    利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
  3. 如权利要求2所述的AU检测方法,其特征在于,所述获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片包括以下一种或者多种方式的组合:
    采用网络爬虫技术获取所述样本图片;及/或
    通过人脸图片获取工具获取所述样本图片;及/或
    获取上传的人脸图片作为所述样本图片。
  4. 如权利要求2所述的AU检测方法,其特征在于,所述获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片还包括:
    对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
  5. 如权利要求2所述的AU检测方法,其特征在于,在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,所述方法还包括:
    训练出所述人脸分类模型,得到所述第一参数;
    利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
  6. 如权利要求2所述的AU检测方法,其特征在于,所述利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型包括:
    计算所述AU检测模型输出结果的精度值;
    获取所述AU检测模型输出结果的精度阈值;
    利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
  7. 如权利要求1所述的AU检测方法,其特征在于,在得到所述表情检测结果后,所述方法还包括:
    保留所述表情检测结果中的AU检测结果;
    删除所述表情检测结果中的人脸分类结果。
  8. 一种AU检测装置,其特征在于,所述装置包括:
    获取单元,用于当接收到AU检测指令时,获取目标人脸图片;
    调整单元,用于调整所述目标人脸图片的尺寸;
    去均值单元,用于对调整后的目标人脸图片进行去均值;
    归一化单元,用于对去均值后的目标人脸图片进行归一化,得到待检测图片;
    输入单元,用于将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
  9. 一种电子设备,其特征在于,所述电子设备包括:
    存储器,存储至少一个计算机可读指令;及
    处理器,执行所述至少一个计算机可读指令以实现以下步骤:
    当接收到AU检测指令时,获取目标人脸图片;
    调整所述目标人脸图片的尺寸;
    对调整后的目标人脸图片进行去均值;
    对去均值后的目标人脸图片进行归一化,得到待检测图片;
    将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
  10. 如权利要求9所述的电子设备,其特征在于,在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,所述处理器执行所述至少一个计算机可读指令还用以实现以下步骤:
    当接收到训练指令时,获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片;
    提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数;
    利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
  11. 如权利要求10所述的电子设备,其特征在于,所述处理器执行所述 至少一个计算机可读指令以实现所述获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片时还包括:
    对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
  12. 如权利要求10所述的电子设备,其特征在于,在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,所述处理器执行所述至少一个计算机可读指令还用以实现以下步骤:
    训练出所述人脸分类模型,得到所述第一参数;
    利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
  13. 如权利要求10所述的电子设备,其特征在于,所述处理器执行所述至少一个计算机可读指令以实现所述利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型时,包括以下步骤:
    计算所述AU检测模型输出结果的精度值;
    获取所述AU检测模型输出结果的精度阈值;
    利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
  14. 如权利要求9所述的电子设备,其特征在于,在得到所述表情检测结果后,所述处理器执行所述至少一个计算机可读指令还用以实现以下步骤:
    保留所述表情检测结果中的AU检测结果;
    删除所述表情检测结果中的人脸分类结果。
  15. 一种非易失性可读存储介质,其特征在于,所述非易失性可读存储介质中存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行以实现以下步骤:
    当接收到AU检测指令时,获取目标人脸图片;
    调整所述目标人脸图片的尺寸;
    对调整后的目标人脸图片进行去均值;
    对去均值后的目标人脸图片进行归一化,得到待检测图片;
    将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果,其中,所述AU检测模型由带有AU标记的人脸图片及带有分类标记的人脸图片训练得到,用于根据所述目标人脸图片输出所述表情检测结果。
  16. 如权利要求15所述的存储介质,其特征在于,在将所述待检测图片输入到结合反向传播算法及神经网络算法训练的AU检测模型中,得到表情检测结果前,所述至少一个计算机可读指令被处理器执行还用以实现以下步骤:
    当接收到训练指令时,获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片;
    提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数;
    利用反向传播算法,结合所述第一参数及所述第二参数,采用神经网络算法训练所述样本图片,得到AU检测模型。
  17. 如权利要求16所述的存储介质,其特征在于,所述至少一个计算机可读指令被处理器执行以实现所述获取带有AU标记的人脸图片及带有分类标记的人脸图片作为样本图片时还包括:
    对所述带有AU标记的人脸图片及带有分类标记的人脸图片进行数据增强,得到所述样本图片。
  18. 如权利要求16所述的存储介质,其特征在于,在提取预先训练的人脸分类模型的第一参数,及预先训练的表情识别模型的第二参数前,所述至少一个计算机可读指令被处理器执行还用以实现以下步骤:
    训练出所述人脸分类模型,得到所述第一参数;
    利用反向传播算法,结合所述第一参数,训练出所述表情识别模型,得到所述第二参数。
  19. 如权利要求16所述的存储介质,其特征在于,所述至少一个计算机可读指令被处理器执行以实现所述利用反向传播算法,结合所述第一参数及所述第二参数,训练所述样本图片,得到AU检测模型时,包括以下步骤:
    计算所述AU检测模型输出结果的精度值;
    获取所述AU检测模型输出结果的精度阈值;
    利用反向传播算法调节所述AU检测模型中的所有参数,直至所述精度值达到所述精度阈值,停止训练。
  20. 如权利要求15所述的存储介质,其特征在于,在得到所述表情检测结果后,所述至少一个计算机可读指令被处理器执行还用以实现以下步骤:
    保留所述表情检测结果中的AU检测结果;
    删除所述表情检测结果中的人脸分类结果。
PCT/CN2019/102615 2019-06-18 2019-08-26 Au检测方法、装置、电子设备及存储介质 WO2020252903A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910528234.6 2019-06-18
CN201910528234.6A CN110427802A (zh) 2019-06-18 2019-06-18 Au检测方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020252903A1 true WO2020252903A1 (zh) 2020-12-24

Family

ID=68408671

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/102615 WO2020252903A1 (zh) 2019-06-18 2019-08-26 Au检测方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN110427802A (zh)
WO (1) WO2020252903A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818772A (zh) * 2021-01-19 2021-05-18 网易(杭州)网络有限公司 一种面部参数的识别方法、装置、电子设备及存储介质
CN113792572A (zh) * 2021-06-17 2021-12-14 重庆邮电大学 一种基于局部表征的面部表情识别方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085063B (zh) * 2020-08-10 2023-10-13 深圳市优必选科技股份有限公司 一种目标识别方法、装置、终端设备及存储介质
CN112541445B (zh) * 2020-12-16 2023-07-18 中国联合网络通信集团有限公司 人脸表情的迁移方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409198A (zh) * 2018-08-31 2019-03-01 平安科技(深圳)有限公司 Au检测模型训练方法、au检测方法、装置、设备及介质
US20190087686A1 (en) * 2017-09-21 2019-03-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting human face
CN109583431A (zh) * 2019-01-02 2019-04-05 上海极链网络科技有限公司 一种人脸情绪识别模型、方法及其电子装置
CN109740657A (zh) * 2018-12-27 2019-05-10 郑州云海信息技术有限公司 一种用于图像数据分类的神经网络模型的训练方法与设备
CN109784153A (zh) * 2018-12-10 2019-05-21 平安科技(深圳)有限公司 情绪识别方法、装置、计算机设备及存储介质
CN109840513A (zh) * 2019-02-28 2019-06-04 北京科技大学 一种人脸微表情识别方法及识别装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862292B (zh) * 2017-11-15 2019-04-12 平安科技(深圳)有限公司 人物情绪分析方法、装置及存储介质
CN108875901B (zh) * 2017-11-20 2021-03-23 北京旷视科技有限公司 神经网络训练方法以及通用物体检测方法、装置和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190087686A1 (en) * 2017-09-21 2019-03-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting human face
CN109409198A (zh) * 2018-08-31 2019-03-01 平安科技(深圳)有限公司 Au检测模型训练方法、au检测方法、装置、设备及介质
CN109784153A (zh) * 2018-12-10 2019-05-21 平安科技(深圳)有限公司 情绪识别方法、装置、计算机设备及存储介质
CN109740657A (zh) * 2018-12-27 2019-05-10 郑州云海信息技术有限公司 一种用于图像数据分类的神经网络模型的训练方法与设备
CN109583431A (zh) * 2019-01-02 2019-04-05 上海极链网络科技有限公司 一种人脸情绪识别模型、方法及其电子装置
CN109840513A (zh) * 2019-02-28 2019-06-04 北京科技大学 一种人脸微表情识别方法及识别装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818772A (zh) * 2021-01-19 2021-05-18 网易(杭州)网络有限公司 一种面部参数的识别方法、装置、电子设备及存储介质
CN113792572A (zh) * 2021-06-17 2021-12-14 重庆邮电大学 一种基于局部表征的面部表情识别方法

Also Published As

Publication number Publication date
CN110427802A (zh) 2019-11-08

Similar Documents

Publication Publication Date Title
WO2020252903A1 (zh) Au检测方法、装置、电子设备及存储介质
He et al. Automatic depression recognition using CNN with attention mechanism from videos
WO2020248376A1 (zh) 情绪检测方法、装置、电子设备及存储介质
US20210271862A1 (en) Expression recognition method and related apparatus
WO2019120115A1 (zh) 人脸识别的方法、装置及计算机装置
WO2018188453A1 (zh) 人脸区域的确定方法、存储介质、计算机设备
CN112734775B (zh) 图像标注、图像语义分割、模型训练方法及装置
Tian et al. Ear recognition based on deep convolutional network
WO2021063056A1 (zh) 人脸属性识别方法、装置、电子设备和存储介质
CN111814620A (zh) 人脸图像质量评价模型建立方法、优选方法、介质及装置
CN111126347B (zh) 人眼状态识别方法、装置、终端及可读存储介质
Sadeghi et al. HistNet: Histogram-based convolutional neural network with Chi-squared deep metric learning for facial expression recognition
JP2021012595A (ja) 情報処理装置、情報処理装置の制御方法、および、プログラム
CN111666976B (zh) 基于属性信息的特征融合方法、装置和存储介质
Shanthi et al. Algorithms for face recognition drones
Rodrigues et al. Classification of facial expressions under partial occlusion for VR games
CN110867225A (zh) 字符级临床概念提取命名实体识别方法及系统
CN113255557A (zh) 一种基于深度学习的视频人群情绪分析方法及系统
Kakkar Facial expression recognition with LDPP & LTP using deep belief network
CN116386104A (zh) 对比学习结合掩码图像建模的自监督人脸表情识别方法
CN115457620A (zh) 用户表情识别方法、装置、计算机设备及存储介质
Kailash et al. Deep learning based detection of mobility aids using yolov5
CN104598866A (zh) 一种基于人脸的社交情商促进方法及系统
CN108764106B (zh) 基于级联结构的多尺度彩色图像人脸比对方法
CN113327212A (zh) 人脸驱动、模型的训练方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19933629

Country of ref document: EP

Kind code of ref document: A1