WO2020248841A1 - Au detection method and apparatus for image, and electronic device and storage medium - Google Patents

Au detection method and apparatus for image, and electronic device and storage medium Download PDF

Info

Publication number
WO2020248841A1
WO2020248841A1 PCT/CN2020/093313 CN2020093313W WO2020248841A1 WO 2020248841 A1 WO2020248841 A1 WO 2020248841A1 CN 2020093313 W CN2020093313 W CN 2020093313W WO 2020248841 A1 WO2020248841 A1 WO 2020248841A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
network
face image
sample picture
Prior art date
Application number
PCT/CN2020/093313
Other languages
French (fr)
Chinese (zh)
Inventor
盛建达
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020248841A1 publication Critical patent/WO2020248841A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Definitions

  • This application relates to the field of artificial intelligence image processing, and in particular to an image AU detection method, device, electronic equipment and storage medium.
  • Existing AU (Action Units, used to detect subtle movements of facial muscles) detection refers to comparing the similarity between the expression in the face image and the AU to determine which category the AU of the face image belongs to.
  • FACS Fluorescence Action Coding System
  • AU refers to the basic muscle action units of the human face, such as: raised inner eyebrows, raised mouth corners, and wrinkled nose.
  • the AU detection methods for video streams in the industry usually include the following: 1 AU detection based on a single frame of image; 2 AU detection using LSTM (Long Short-Term Memory) algorithm.
  • the AU detection method based on a single frame image detects AU on an average face.
  • the inventor realized that the method ignores the correlation between frames and the AU detection accuracy is not high.
  • the method of using the LSTM algorithm for AU detection makes good use of spatial correlation, the extraction of AU feature values is relatively rough, which also makes the AU detection accuracy not high.
  • the first aspect of the present application provides an image AU detection method, the method includes:
  • the face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  • the second aspect of the present application provides an image AU detection device, the device includes:
  • the acquisition module is used to acquire a face image
  • the preprocessing module is used to detect and process the acquired face images to obtain a unified face area
  • the feature extraction module is used to input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector;
  • the recognition module is used to input the face feature vector output by the ResNet network into the LSTM network for training, and obtain the AU recognition result of the face image.
  • a third aspect of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the AU detection method of the image when executing computer-readable instructions stored in a memory.
  • the fourth aspect of the present application provides one or more readable storage media storing computer readable instructions.
  • the computer readable instructions are executed by one or more processors, when the one or more processors execute Realize the AU detection method of the image.
  • This application enables the training model to make full use of the dynamic information of facial AU changes to automatically learn the mapping relationship between the AU features of the recognized object, thereby improving the prediction accuracy and robustness of the training model, thereby improving the face image AU recognition performance.
  • Fig. 1 is an application environment diagram of an image AU detection method in an embodiment of this application.
  • Fig. 2 is a flowchart of an image AU detection method in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the basic operation structure of the ResNet network in this application.
  • Fig. 4 is a structural diagram of a ResNet network in an embodiment of this application.
  • FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
  • FIG. 6 is a structural diagram of an image AU detection device in an embodiment of this application.
  • Fig. 7 is a schematic diagram of the electronic equipment of this application.
  • the AU detection method of the image of this application is applied to one or more electronic devices.
  • the electronic device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC) , Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC application specific integrated circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the electronic device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, and a cloud server.
  • the device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • FIG. 1 is a schematic diagram of an application environment of an image AU detection method in an embodiment of the present application.
  • the terminal device 1 includes an image acquisition unit 11.
  • the image collection unit 11 is used to collect face images.
  • the terminal device 1 may obtain a face image through the image acquisition unit 11 and perform AU detection on the face image.
  • 19 AUs in FACS are selected, including 6 upper half face AUs and 13 lower half face AUs.
  • the above 19 AUs are used as the standard for detecting and comparing face images to predict which AU category the face image belongs to.
  • the terminal device 1 is also connected to an external device 2 in communication.
  • the terminal device 1 is in communication connection with the external device 2 via a network.
  • the network used to support the communication between the terminal device 1 and the external device 2 may be a wired network or a wireless network, such as radio, wireless fidelity (WIFI), cellular, satellite, broadcast Wait.
  • the terminal device 1 may be a computer device, a single server, a server cluster, or a cloud server.
  • the external device 2 can be, but is not limited to, a computer device, a mobile phone, a notebook computer, a tablet computer, and other devices.
  • Fig. 2 is a flowchart of an image AU detection method in an embodiment of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • the AU detection method of the image specifically includes the following steps:
  • Step S201 Obtain a face image.
  • the image acquisition unit 11 may be a 2D camera, and the terminal device 1 acquires the user's 2D face image as the user's face image through the 2D camera.
  • the image acquisition unit 11 may also be a 3D camera, and the terminal device 1 acquires the user's 3D face image as the user's face image through the 3D camera.
  • the terminal device 1 receives a face picture sent by an external device 2 communicatively connected with the terminal device.
  • the face image is stored in a storage device of the terminal device 1, and the terminal device 1 obtains the face image from the storage device.
  • the face image includes consecutive frames of face pictures.
  • the face picture may be a face video or the like.
  • Step S202 Perform detection processing on the acquired face image to acquire a unified face area.
  • the terminal device 1 may use the Adaboost face detection algorithm based on Haar-like features to perform face detection on each frame of face images in the acquired face images to determine the face area.
  • the Adaboost face detection algorithm may be used to scan each frame of the face image with a window of a preset size and a preset compensation until the face area in each frame of the image is determined.
  • the face area may be a fixed rectangular area including the forehead, chin, left cheek, and right cheek in the face image.
  • the terminal device 1 is also used to calibrate the face area. Specifically, the terminal device 1 detects the key feature points in the face area, and performs alignment and calibration on the corresponding face images based on the positions of the detected key feature points.
  • the key feature points in the face area may be eyes, nose, mouth, left cheek outer contour, right cheek outer contour, and so on.
  • the face image can be aligned and calibrated by the landmark method, so that the positions of the key feature points of the face in the face image are basically the same.
  • the terminal device 1 may also edit the face image after alignment and calibration according to a preset template to obtain uniformity.
  • the size of the face image includes one or two of cutting processing and zoom processing.
  • the terminal device 1 cuts out the corresponding face image according to a uniform template based on the key feature points in the detected face area and scales the face image to a uniform size, so, The editing of the face image is realized.
  • opencv's resize can be used to scale the face image to a uniform size based on a bilinear difference or area difference algorithm.
  • Step S203 Input the face image subjected to the detection process as the original image to the optimized ResNet (Residual Neural Network, deep residual network) network for feature value extraction to output a face feature vector.
  • ResNet Residual Neural Network, deep residual network
  • FIG. 3 shows a schematic diagram of the basic operation structure of the ResNet network in this application.
  • the first basic operation structure of the ResNet network is shown in FIG. 3(a), and the input in the first basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the first basic operation structure shown in Figure 3(a) is used.
  • the second basic operation structure of the ResNet network is shown in FIG. 3(b).
  • the input in the second basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the second basic operation structure shown in Figure 3(b) is used.
  • FIG. 4 shows a structural diagram of a ResNet network in an embodiment of this application.
  • the overall structure of the ResNet network includes: a convolutional layer, a pooling layer, 4 sets of convolution packages with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the original image is sequentially processed through the convolutional layer, pooling layer, 4 sets of convolution packages with different parameters, pooling layer, fully connected layer, and sigmoid layer of the ResNet network to obtain the face Feature vector.
  • the four convolutional packets include conv2_x, conv3_x, conv4_x, and conv5_x, respectively.
  • the second convolutional layer of the first package in each group of convolutional packages is down-sampled with a stride of 2.
  • the step of "inputting the face image after the detection process as the original image into the optimized ResNet network for feature value extraction to output the face feature vector” includes:
  • the augmentation of the original image to obtain sample data specifically includes: obtaining the original image; randomly cropping a picture with preset resolution pixels from the original image to obtain an initial sample picture; and obtaining an evenly distributed image with [0,1] Random number.
  • the random number of the initial sample picture is less than the random threshold, it will flip to generate a new [0,1] uniformly distributed random number.
  • the random number is less than 0.5, the initial sample picture will be grayed out to obtain the first Sample picture; add point light source processing to the first sample picture to obtain a second sample picture; use the obtained initial sample picture, first sample picture, and second sample picture as sample data.
  • the purpose of augmenting the original image is to increase the number of training samples.
  • the methods of augmentation include, but are not limited to, flipping, randomly cropping an original image with a size of 256*256 by 248*248, graying the original image, modifying the lighting of the original image, and adding point light source lighting to the original image.
  • the random number When the random number is less than 0.5, it will flip to generate a new uniform [0,1]
  • the distributed random number when the random number is less than 0.5, the initial sample image is grayed out to obtain the first sample image; or the first sample image is processed by adding point light sources to obtain the second sample image, etc., when the When the number of initial sample pictures, first sample pictures, and second sample pictures reaches the actual application scenario requirements, the augmentation ends.
  • the ResNe network is optimized through the training sample data of the first basic operation structure or the second basic operation structure of the ResNet network.
  • the first basic arithmetic structure of the ResNet network is the superposition of the output of the input through the three convolutional layers and the original input.
  • the first basic operation structure is used when the input and output matrices have the same size.
  • training the sample data to optimize the ResNet network, and the optimized ResNet network includes: training a classification network of face images to obtain a face classification network; and transferring the trained face classification network to train an AU neural network , Get the sample data after training.
  • the migration method can be used to train the face classification network step by step, that is, the last fully-linked layer parameter is set to the number of face classifications, and the last 9 AU sigmoid layers in the AU neural network are replaced with a softmax layer .
  • the parameters of each layer of the ResNet network from the convolutional layer to the conv3_x are fixed, and 16000 human face training parameters are transferred as the initial parameters to train the parameters of the conv4_x and subsequent layers. In this way, the prior knowledge of the existing face classification learning is fully utilized to improve the AU detection accuracy.
  • the migration training refers to directly loading the face classification network parameters into the AU neural network, because only the last layer of the two neural networks is different in structure, and the number of other parameters are the same. So the parameters can be loaded.
  • the AU neural network has a low output dimension and only 19 results, while the face classification network has a high dimension. Use the face classification training results and transfer them, and at the same time part of the layers are locked, so that the face structure features learned in the face classification are fully utilized in the AU detection.
  • the face image after the detection process is input to the optimized ResNet network to obtain the features of the fully connected layer as the face feature vector output by the ResNet network.
  • Step S204 Input the face feature vector output by the ResNet network into an LSTM (Long Short-Term Memory) network for training, and obtain the AU recognition result of the face image.
  • LSTM Long Short-Term Memory
  • the LSTM network is a special recurrent neural network.
  • the LSTM network regards the input sequence as a time sequence, and the long and short-term memory network can learn the short-term and long-term dependencies of the data in the input sequence in time.
  • the aforementioned AU recognition result may indicate the AU category of the face image.
  • FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
  • X0, X1,..., Xn are each frame image of a face image with a length of n frames, and each frame image in the face image is extracted through the ResNet network to extract the face feature vector Y0, Y1,... ., Yn, the face feature vectors Y0, Y1,..., Yn are sequentially input to the LSTM network in chronological order, and the AU recognition results h0, h1,... , Hn.
  • the LSTM network includes: input gates (that is, input gates), forget gates (that is, forget gates), output gates (that is, output gates), state units (that is, cells), and LSTM output gates.
  • the processing procedures of the input gate, forget gate, output gate, state unit and LSTM output gate can be calculated and implemented by the following formulas:
  • i t ⁇ (W ix ⁇ x t +W im ⁇ m t-1 +W ic ⁇ c t-1 +b i );
  • o t ⁇ (W ox ⁇ x t +W om ⁇ m t-1 +W oc ⁇ c t-1 +b o );
  • x t is expressed as the face feature vector input at time t;
  • W ie Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc
  • W are preset weight matrices, indicating that the elements of each gate are obtained from the data of the corresponding dimension, that is to say, nodes of different dimensions do not interfere with each other;
  • b ie, b i , b f, b c, b o
  • represents a predetermined offset vector, i t, f t, o t, c t, m t respectively represent the gate input at time t, forgetting gate, the output of the gate, and the status unit
  • is the dot product
  • ⁇ () is the sigmoid function
  • h() is the output activation function
  • each frame image in the face image is extracted through the ResNet network to extract the face feature vector into the LSTM network, and the LSTM network is processed based on the back propagation algorithm. Training is performed so that the deviation between the value of the input image processed by the LSTM network and the mapping value of the expression category to which the image belongs is within a preset allowable range.
  • the training process of the LSTM network can also be implemented with reference to other existing technical solutions, which is not limited here.
  • the image AU detection method in this application builds a training model based on the ResNet network and the LSTM network, and uses a collection of continuous frame images (such as video) of the face image as the training input of the training model, so that the training model can be fully utilized
  • the dynamic information of face AU changes automatically learns the mapping relationship between the AU features of the recognition object, thereby improving the prediction accuracy and robustness of the training model, and further improving the performance of AU recognition of the face image.
  • FIG. 6 is a structural diagram of an image AU detection device 40 in an embodiment of this application.
  • the image AU detection device 40 runs in the terminal device 1.
  • the image AU detection device 40 may include multiple functional modules composed of program code segments.
  • the program code of each program segment in the image AU detection device 40 can be stored in the memory and executed by at least one processor to perform the function of face recognition.
  • the image AU detection device 40 can be divided into multiple functional modules according to the functions it performs.
  • the image AU detection device 40 may include an acquisition module 401, a preprocessing module 402, a feature extraction module 403, and an identification module 404.
  • the module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In some embodiments, the functions of each module will be detailed in subsequent embodiments.
  • the acquiring module 401 is used to acquire a face image.
  • the image acquisition unit 11 may be a 2D camera, and the acquisition module 401 acquires the user's 2D face image as the user's face image through the 2D camera.
  • the image acquisition unit 11 may also be a 3D camera, and the acquisition module 401 acquires the user's 3D face image as the user's face image through the 3D camera.
  • the acquisition module 401 receives a face picture sent by the external device 2 communicatively connected with the terminal device.
  • the face image is stored in a storage device of the terminal device 1, and the acquisition module 401 acquires the face image from the storage device.
  • the face image includes consecutive frames of face pictures.
  • the face picture may be a face video or the like.
  • the preprocessing module 402 is used to perform detection processing on the acquired face image to acquire a unified face area.
  • the preprocessing module 402 may use the Adaboost face detection algorithm based on Haar-like features to perform face detection on each frame of face images in the acquired face images to determine the face area.
  • the Adaboost face detection algorithm may be used to scan each frame of the face image with a window of a preset size and a preset compensation until the face area in each frame of the image is determined.
  • the face area may be a fixed rectangular area including the forehead, chin, left cheek, and right cheek in the face image.
  • the preprocessing module 402 is also used to perform calibration processing on the face area. Specifically, the preprocessing module 402 detects key feature points in the face region, and performs alignment and calibration on the corresponding face image based on the positions of the detected key feature points.
  • the key feature points in the face area may be eyes, nose, mouth, left cheek outer contour, right cheek outer contour, and so on.
  • the face image can be aligned and calibrated by the landmark method, so that the positions of the key feature points of the face in the face image are basically the same.
  • the preprocessing module 402 may also edit the face image after alignment and calibration according to a preset template to obtain Face images of uniform size.
  • the editing processing includes one or two of cutting processing and zoom processing.
  • the preprocessing module 402 cuts out the corresponding face image according to a uniform template based on the key feature points in the detected face area and scales the face image to a uniform size.
  • opencv's resize can be used to scale the face image to a uniform size based on a bilinear difference or area difference algorithm.
  • the feature extraction module 403 is configured to input the detected face image as an original image into an optimized ResNet (Residual Neural Network, deep residual network) network for feature value extraction to output a face feature vector.
  • ResNet Residual Neural Network, deep residual network
  • the first basic operation structure of the ResNet network is shown in FIG. 3(a), and the input in the first basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the first basic operation structure shown in Figure 3(a) is used.
  • the second basic operation structure of the ResNet network is shown in FIG. 3(b). The input in the second basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the second basic operation structure shown in Figure 3(b) is used.
  • the overall structure of the ResNet network includes: a convolutional layer, a pooling layer, 4 sets of convolution packages with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the original image is sequentially processed through the convolutional layer, pooling layer, 4 sets of convolution packages with different parameters, pooling layer, fully connected layer, and sigmoid layer of the ResNet network to obtain the face Feature vector.
  • the four convolutional packets include conv2_x, conv3_x, conv4_x, and conv5_x, respectively.
  • the second convolutional layer of the first package in each group of convolutional packages is down-sampled with a stride of 2.
  • the “inputting the face image after the detection process as the original image into the optimized ResNet network for feature value extraction to output the face feature vector” includes:
  • the augmentation of the original image to obtain sample data specifically includes: obtaining the original image; randomly cropping a picture with preset resolution pixels from the original image to obtain an initial sample picture; and obtaining an evenly distributed image with [0,1] Random number.
  • the random number of the initial sample picture is less than the random threshold, it will flip to generate a new [0,1] uniformly distributed random number.
  • the random number is less than 0.5, the initial sample picture will be grayed out to obtain the first Sample picture; add point light source processing to the first sample picture to obtain a second sample picture; use the obtained initial sample picture, first sample picture, and second sample picture as sample data.
  • the purpose of augmenting the original image is to increase the number of training samples.
  • the methods of augmentation include, but are not limited to, flipping, randomly cropping an original image with a size of 256*256 by 248*248, graying the original image, modifying the lighting of the original image, and adding point light source lighting to the original image.
  • the random number When the random number is less than 0.5, it will flip to generate a new uniform [0,1]
  • the distributed random number when the random number is less than 0.5, the initial sample image is grayed out to obtain the first sample image; or the first sample image is processed by adding point light sources to obtain the second sample image, etc., when the When the number of initial sample pictures, first sample pictures, and second sample pictures reaches the actual application scenario requirements, the augmentation ends.
  • the ResNe network is optimized through the training sample data of the first basic operation structure or the second basic operation structure of the ResNet network.
  • the first basic arithmetic structure of the ResNet network is the superposition of the output of the input through three convolutional layers and the original input.
  • the first basic operation structure is used when the input and output matrices have the same size.
  • training the sample data to optimize the ResNet network, and the optimized ResNet network includes: training a classification network of face images to obtain a face classification network; and transferring the trained face classification network to train an AU neural network , Get the sample data after training.
  • the migration method can be used to train the face classification network step by step, that is, the last fully-linked layer parameter is set to the number of face classifications, and the last 9 AU sigmoid layers in the AU neural network are replaced with a softmax layer .
  • the parameters of each layer of the ResNet network from the convolutional layer to the conv3_x are fixed, and 16000 human face training parameters are transferred as the initial parameters to train the parameters of the conv4_x and subsequent layers. In this way, the prior knowledge of the existing face classification learning is fully utilized to improve the AU detection accuracy.
  • the migration training refers to directly loading the face classification network parameters into the AU neural network, because only the last layer of the two neural networks is different in structure, and the number of other parameters are the same. So the parameters can be loaded.
  • the AU neural network has a low output dimension and only 19 results, while the face classification network has a high dimension. Use the face classification training results and transfer them, and at the same time part of the layers are locked, so that the face structure features learned in the face classification are fully utilized in the AU detection.
  • the face image after the detection process is input to the optimized ResNet network to obtain the features of the fully connected layer as the face feature vectors output by the ResNet network.
  • the recognition module 404 is configured to input the face feature vector output by the ResNet network into an LSTM (Long Short-Term Memory) network for training, and obtain the AU recognition result of the face image.
  • LSTM Long Short-Term Memory
  • the LSTM network is a special recurrent neural network.
  • the LSTM network regards the input sequence as a time sequence, and the long and short-term memory network can learn the short-term and long-term dependence of data in the input sequence in time.
  • the aforementioned AU recognition result may indicate the AU category of the face image.
  • FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
  • X0, X1,..., Xn are each frame image of a face image with a length of n frames, and each frame image in the face image is extracted through the ResNet network to extract the face feature vector Y0, Y1,... ., Yn, the face feature vectors Y0, Y1,..., Yn are sequentially input to the LSTM network in chronological order, and the AU recognition results h0, h1,... , Hn.
  • the LSTM network includes: input gates (that is, input gates), forget gates (that is, forget gates), output gates (that is, output gates), state units (that is, cells), and LSTM output gates.
  • the processing procedures of the input gate, forget gate, output gate, state unit and LSTM output gate can be calculated and implemented by the following formulas:
  • i t ⁇ (W ix ⁇ x t +W im ⁇ m t-1 +W ic ⁇ c t-1 +b i );
  • o t ⁇ (W ox ⁇ x t +W om ⁇ m t-1 +W oc ⁇ c t-1 +b o );
  • x t is expressed as the face feature vector input at time t;
  • W ie Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc
  • W are preset weight matrices, indicating that the elements of each gate are obtained from the data of the corresponding dimension, that is to say, nodes of different dimensions do not interfere with each other;
  • b ie, b i , b f, b c, b o
  • represents a predetermined offset vector, i t, f t, o t, c t, m t respectively represent the gate input at time t, forgetting gate, the output of the gate, and the status unit
  • is the dot product
  • ⁇ () is the sigmoid function
  • h() is the output activation function
  • each frame image in the face image is extracted through the ResNet network to extract the face feature vector into the LSTM network, and the LSTM network is processed based on the back propagation algorithm. Training is performed so that the deviation between the value of the input image processed by the LSTM network and the mapping value of the expression category to which the image belongs is within a preset allowable range.
  • the training process of the LSTM network can also be implemented with reference to other existing technical solutions, which is not limited here.
  • the image AU detection method in this application builds a training model based on the ResNet network and the LSTM network, and uses a collection of continuous frame images (such as video) of the face image as the training input of the training model, so that the training model can be fully utilized
  • the dynamic information of face AU changes automatically learns the mapping relationship between the AU features of the recognition object, thereby improving the prediction accuracy and robustness of the training model, and further improving the performance of AU recognition of the face image.
  • FIG. 7 is a schematic diagram of the electronic device 6 in an embodiment of the application.
  • the electronic device 6 includes a memory 61, a processor 62, and computer readable instructions 63 stored in the memory 61 and executable on the processor 62.
  • the processor 62 implements the steps in the embodiment of the AU detection method for the image when the processor 62 executes the computer readable instruction 63, such as steps S201 to S204 shown in FIG. 2.
  • the processor 62 executes the computer-readable instruction 63, the functions of the modules/units in the embodiment of the image AU detection device are realized, such as the modules 401 to 404 in FIG. 6.
  • the computer-readable instructions 63 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 62, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 63 in the electronic device 6.
  • the computer-readable instruction 63 may be divided into the acquisition module 401, the preprocessing module 402, the feature extraction module 403, and the recognition module 404 in FIG. 6.
  • the specific functions of each module refer to Embodiment 2.
  • the electronic device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud terminal device 1.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud terminal device 1.
  • the schematic diagram is only an example of the electronic device 6 and does not constitute a limitation on the electronic device 6. It may include more or less components than those shown in the figure, or combine certain components, or different components. Components, for example, the electronic device 6 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 62 may be a central processing module (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor 62 can also be any conventional processor, etc.
  • the processor 62 is the control center of the electronic device 6 and connects the entire electronic device 6 through various interfaces and lines. Parts.
  • the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
  • the face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
  • the face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  • the ResNet network structure includes a convolutional layer, a pooling layer, four groups of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the memory 61 may be used to store the computer-readable instructions 63 and/or modules/units, and the processor 62 can run or execute the computer-readable instructions and/or modules/units stored in the memory 61, and The data stored in the memory 61 is called to realize various functions of the electronic device 6.
  • the memory 61 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.); the storage data area may The data (such as audio data, phone book, etc.) created according to the use of the electronic device 6 is stored.
  • the memory 61 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated module/unit of the electronic device 6 is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile, and the computer-readable instructions may implement the steps of the foregoing method embodiments when executed by a processor.
  • the computer-readable instruction includes computer-readable instruction code
  • the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media software distribution media.
  • the content contained in the computer-readable medium can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction.
  • the computer-readable medium Does not include electrical carrier signals and telecommunication signals.
  • the agency’s application also provides one or more readable storage media storing computer readable instructions, wherein when the computer readable instructions are executed by one or more processors, the one or Multiple processors perform the following steps:
  • the face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
  • the face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  • the ResNet network structure includes a convolutional layer, a pooling layer, four groups of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the functional modules in the various embodiments of the present application may be integrated in the same processing module, or each module may exist alone physically, or two or more modules may be integrated in the same module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

An AU detection method and apparatus for an image, and an electronic device and a storage medium, wherein same relate to fields of image processing and predictive analysis such as artificial intelligence. The method comprises: acquiring a facial image (S201); carrying out detection processing on the acquired facial image to acquire a unified facial area (S202); by taking the facial image on which the detection processing has been carried out as an original image, inputting same into an optimized ResNet network to carry out feature value extraction, so as to output a facial feature vector (S203); and inputting the facial feature vector output by the ResNet network into an LSTM network for training to obtain an AU recognition result of the facial image (S204). According to the method, a training model can make full use of dynamic information of a facial AU change to automatically learn of a mapping relationship between AU features of a recognized object, such that the prediction precision and robustness of the training model are improved, and the performance of AU recognition of the facial image is thus improved.

Description

图像的AU检测方法、装置、电子设备及存储介质Image AU detection method, device, electronic equipment and storage medium
本申请要求于2019年6月13日提交中国专利局、申请号为201910511707.1,发明名称为“图像的AU检测方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 13, 2019, the application number is 201910511707.1, and the invention title is "Image AU detection method, device, electronic equipment and storage medium". The entire content is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及人工智能的图像处理领域,具体涉及一种图像的AU检测方法、装置、电子设备及存储介质。This application relates to the field of artificial intelligence image processing, and in particular to an image AU detection method, device, electronic equipment and storage medium.
背景技术Background technique
现有AU(Action Units,动作单元,用于检测脸部肌肉的细微动作)检测是指比较人脸图像中的表情和AU的相似度,以判断人脸图像的AU为哪一个类别。FACS(Facial Action Coding System,面部行为编码系统)中详细分析了面部全部肌肉组织的活动及其活动所引起的面部各个独立部位的变化,以及可观察到的、由这些肌肉活动所引起的表情,在此基础上,将脸部运动分解成一些基本的AU。AU是指人脸部的基本肌肉动作单元,例如:内眉上扬、嘴角上扬、鼻子蹙皱等。Existing AU (Action Units, used to detect subtle movements of facial muscles) detection refers to comparing the similarity between the expression in the face image and the AU to determine which category the AU of the face image belongs to. FACS (Facial Action Coding System) analyzes in detail the activities of all facial muscle tissues and the changes in individual parts of the face caused by their activities, as well as the observable expressions caused by these muscle activities. On this basis, the facial movement is decomposed into some basic AU. AU refers to the basic muscle action units of the human face, such as: raised inner eyebrows, raised mouth corners, and wrinkled nose.
技术问题technical problem
目前业内针对视频流的AU检测方法通常有以下几种:1基于单帧图像的AU检测;2使用LSTM(Long short-Term Memory,长短期记忆网络)算法进行AU检测。然而,基于单帧图像的AU检测方法,是在一张平均人脸上对AU进行检测,发明人意识到,所述方法忽略了帧与帧之间的相互关系,AU检测精度不高。使用LSTM算法进行AU检测的方法虽然对空间的相关性利用比较好,但是对于AU特征值的提取比较粗糙,同样使得AU检测精度不高。At present, the AU detection methods for video streams in the industry usually include the following: 1 AU detection based on a single frame of image; 2 AU detection using LSTM (Long Short-Term Memory) algorithm. However, the AU detection method based on a single frame image detects AU on an average face. The inventor realized that the method ignores the correlation between frames and the AU detection accuracy is not high. Although the method of using the LSTM algorithm for AU detection makes good use of spatial correlation, the extraction of AU feature values is relatively rough, which also makes the AU detection accuracy not high.
技术解决方案Technical solutions
鉴于以上内容,有必要提出一种图像的AU检测方法、装置、电子设备及计算机可读存储介质,以解决基于单帧图像的AU检测方法或基于LSTM算法进行AU检测时AU检测精度不高的问题。In view of the above, it is necessary to propose an image AU detection method, device, electronic equipment and computer readable storage medium to solve the problem of low AU detection accuracy when AU detection method based on single frame image or AU detection based on LSTM algorithm problem.
本申请的第一方面提供一种图像的AU检测方法,所述方法包括:The first aspect of the present application provides an image AU detection method, the method includes:
获取人脸图像;Obtain face images;
对获取的人脸图像进行检测处理以获取统一的人脸区域;Perform detection processing on the acquired face image to obtain a uniform face area;
将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及Input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
本申请的第二方面提供一种图像的AU检测装置,所述装置包括:The second aspect of the present application provides an image AU detection device, the device includes:
获取模块,用于获取人脸图像;The acquisition module is used to acquire a face image;
预处理模块,用于对获取的人脸图像进行检测处理以获取统一的人脸区域;The preprocessing module is used to detect and process the acquired face images to obtain a unified face area;
特征提取模块,用于将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及The feature extraction module is used to input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
识别模块,用于将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进 行训练,得到所述人脸图像的AU识别结果。The recognition module is used to input the face feature vector output by the ResNet network into the LSTM network for training, and obtain the AU recognition result of the face image.
本申请的第三方面提供一种电子设备,所述电子设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令时实现所述图像的AU检测方法。A third aspect of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the AU detection method of the image when executing computer-readable instructions stored in a memory.
本申请的第四方面提供一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现所述图像的AU检测方法。The fourth aspect of the present application provides one or more readable storage media storing computer readable instructions. When the computer readable instructions are executed by one or more processors, when the one or more processors execute Realize the AU detection method of the image.
有益效果Beneficial effect
本申请够使所述训练模型充分利用脸部AU变化的动态信息自动学习识别对象的AU特征之间的映射关系,从而提高所述训练模型的预测精度和鲁棒性,进而提升人脸图像的AU识别的性能。This application enables the training model to make full use of the dynamic information of facial AU changes to automatically learn the mapping relationship between the AU features of the recognized object, thereby improving the prediction accuracy and robustness of the training model, thereby improving the face image AU recognition performance.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the present application are presented in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
图1为本申请一实施方式中图像的AU检测方法的应用环境图。Fig. 1 is an application environment diagram of an image AU detection method in an embodiment of this application.
图2是本申请一实施方式中图像的AU检测方法的流程图。Fig. 2 is a flowchart of an image AU detection method in an embodiment of the present application.
图3为本申请中ResNet网络的基本运算结构示意图。Figure 3 is a schematic diagram of the basic operation structure of the ResNet network in this application.
图4为本申请一实施方式中ResNet网络的结构图。Fig. 4 is a structural diagram of a ResNet network in an embodiment of this application.
图5为本申请一实施方式中LSTM网络的时序处理流向示意图。FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
[根据细则91更正 03.08.2020] 
[Corrected according to Rule 91 03.08.2020]
[根据细则91更正 03.08.2020] 
图6为本申请一实施方式中图像的AU检测装置的结构图。
[Corrected according to Rule 91 03.08.2020]
FIG. 6 is a structural diagram of an image AU detection device in an embodiment of this application.
[根据细则91更正 03.08.2020] 
图7为本申请电子设备的示意图。
[Corrected according to Rule 91 03.08.2020]
Fig. 7 is a schematic diagram of the electronic equipment of this application.
本发明的实施方式Embodiments of the invention
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施例对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In the following description, many specific details are set forth in order to fully understand the present application. The described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the description of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.
优选地,本申请图像的AU检测方法应用在一个或者多个电子设备中。所述电子设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。Preferably, the AU detection method of the image of this application is applied to one or more electronic devices. The electronic device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC) , Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述电子设备可以是桌上型计算机、笔记本电脑、平板电脑及云端服务器等计算设备。所述设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The electronic device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, and a cloud server. The device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
实施例1Example 1
图1是本申请一实施方式中图像的AU检测方法的应用环境示意图。FIG. 1 is a schematic diagram of an application environment of an image AU detection method in an embodiment of the present application.
参阅图1所示,所述图像的AU检测方法应用在终端装置1中。所述终端装置1包括一图像采集单元11。所述图像采集单元11用于采集人脸图像。本实施方式中,所述终 端装置1可以通过图像采集单元11获取人脸图像并对所述人脸图像进行AU检测。本申请中选取了FACS中19个AU,其中,包括6个上半脸AU和13个下半脸AU。本申请中,将上述19个AU作为对人脸图像进行检测对比的标准,以预测人脸图像属于哪一个AU类别。Referring to FIG. 1, the AU detection method of the image is applied in the terminal device 1. The terminal device 1 includes an image acquisition unit 11. The image collection unit 11 is used to collect face images. In this embodiment, the terminal device 1 may obtain a face image through the image acquisition unit 11 and perform AU detection on the face image. In this application, 19 AUs in FACS are selected, including 6 upper half face AUs and 13 lower half face AUs. In this application, the above 19 AUs are used as the standard for detecting and comparing face images to predict which AU category the face image belongs to.
所述终端装置1还与一外部设备2通信连接。在一实施方式中,所述终端装置1通过网络与所述外部设备2通信连接。在一具体实施方式中,用于支持终端装置1与外部设备2进行通信的网络可以是有线网络,也可以是无线网络,例如无线电、无线保真(Wireless Fidelity,WIFI)、蜂窝、卫星、广播等。本实施方式中,所述终端装置1可以为计算机装置、单一的服务器、服务器集群或云端服务器。所述外部设备2可以是,但不限于计算机装置、手机、笔记本电脑、平板电脑等装置。The terminal device 1 is also connected to an external device 2 in communication. In one embodiment, the terminal device 1 is in communication connection with the external device 2 via a network. In a specific embodiment, the network used to support the communication between the terminal device 1 and the external device 2 may be a wired network or a wireless network, such as radio, wireless fidelity (WIFI), cellular, satellite, broadcast Wait. In this embodiment, the terminal device 1 may be a computer device, a single server, a server cluster, or a cloud server. The external device 2 can be, but is not limited to, a computer device, a mobile phone, a notebook computer, a tablet computer, and other devices.
图2是本申请一实施方式中图像的AU检测方法的流程图。根据不同的需求,所述流程图中步骤的顺序可以改变,某些步骤可以省略。Fig. 2 is a flowchart of an image AU detection method in an embodiment of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
参阅图2所示,所述图像的AU检测方法具体包括以下步骤:Referring to FIG. 2, the AU detection method of the image specifically includes the following steps:
步骤S201、获取人脸图像。Step S201: Obtain a face image.
本实施方式中,所述图像采集单元11可以为2D摄影机,所述终端装置1通过所述2D摄影机获取用户的2D人脸图像作为用户的人脸图像。在另一实施方式中,所述图像采集单元11也可以为3D摄像机,所述终端装置1通过所述3D摄像机获取用户的3D人脸图像作为用户的人脸图像。在另一实施方式中,所述终端装置1接收与所述终端装置通信连接的外部设备2发送的人脸图片。在其他实施方式中,所述人脸图像存储在终端装置1的存储设备中,所述终端装置1从所述存储设备中获取人脸图像。其中,所述人脸图像包含连续帧的人脸图片。例如,在一实施方式中,所述人脸图片可以为人脸视频等。In this embodiment, the image acquisition unit 11 may be a 2D camera, and the terminal device 1 acquires the user's 2D face image as the user's face image through the 2D camera. In another embodiment, the image acquisition unit 11 may also be a 3D camera, and the terminal device 1 acquires the user's 3D face image as the user's face image through the 3D camera. In another embodiment, the terminal device 1 receives a face picture sent by an external device 2 communicatively connected with the terminal device. In other embodiments, the face image is stored in a storage device of the terminal device 1, and the terminal device 1 obtains the face image from the storage device. Wherein, the face image includes consecutive frames of face pictures. For example, in one embodiment, the face picture may be a face video or the like.
步骤S202、对获取的人脸图像进行检测处理以获取统一的人脸区域。Step S202: Perform detection processing on the acquired face image to acquire a unified face area.
本实施方式中,所述终端装置1可以采用基于Haar-like特征的Adaboost人脸检测算法对获取的人脸图像中的每帧人脸图像进行人脸检测,确定人脸区域。在具体实施过程中,可以基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域。本实施方式中,所述人脸区域可以是人脸图像中包括额头、下巴、左脸颊、右脸颊的固定矩形区域。In this embodiment, the terminal device 1 may use the Adaboost face detection algorithm based on Haar-like features to perform face detection on each frame of face images in the acquired face images to determine the face area. In a specific implementation process, the Adaboost face detection algorithm may be used to scan each frame of the face image with a window of a preset size and a preset compensation until the face area in each frame of the image is determined. In this embodiment, the face area may be a fixed rectangular area including the forehead, chin, left cheek, and right cheek in the face image.
在一实施方式中,所述终端装置1还用于对所述人脸区域进行校准处理。具体的,所述终端装置1检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对相应的人脸图像进行对齐校准。所述人脸区域中的关键特征点可以为眼睛、鼻子、嘴巴、左脸颊外轮廓、右脸颊外轮廓等。根据在人脸区域中检测到的关键特征点,可通过landmark方法对所述人脸图像进行对齐校准,使得人脸在人脸图像中各关键特征点的位置基本一致。本实施方式中,为了避免人脸图像大小不统一影响后续人脸图像的识别结果,所述终端装置1还可以将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。本实施方式中,所述编辑处理包括剪切处理、缩放处理中的一种或两种。例如,在编辑处理过程中,所述终端装置1基于检测到的人脸区域中的关键特征点,将相应的人脸图像按统一模板剪切出来并将人脸图像缩放到统一大小,如此,实现对所述人脸图像的编辑。本实施方式中,可以使用opencv的resize基于双线性差值或面积差值算法将人脸图像缩放到统一大小尺寸。In an embodiment, the terminal device 1 is also used to calibrate the face area. Specifically, the terminal device 1 detects the key feature points in the face area, and performs alignment and calibration on the corresponding face images based on the positions of the detected key feature points. The key feature points in the face area may be eyes, nose, mouth, left cheek outer contour, right cheek outer contour, and so on. According to the key feature points detected in the face region, the face image can be aligned and calibrated by the landmark method, so that the positions of the key feature points of the face in the face image are basically the same. In this embodiment, in order to avoid that the size of the face image is not uniform and affect the recognition result of the subsequent face image, the terminal device 1 may also edit the face image after alignment and calibration according to a preset template to obtain uniformity. The size of the face image. In this embodiment, the editing processing includes one or two of cutting processing and zoom processing. For example, in the editing process, the terminal device 1 cuts out the corresponding face image according to a uniform template based on the key feature points in the detected face area and scales the face image to a uniform size, so, The editing of the face image is realized. In this embodiment, opencv's resize can be used to scale the face image to a uniform size based on a bilinear difference or area difference algorithm.
步骤S203、将经过检测处理的人脸图像作为原始图像输入到优化后的ResNet(Residual Neural Network,深度残差网络)网络中进行特征值提取以输出人脸特征向量。Step S203: Input the face image subjected to the detection process as the original image to the optimized ResNet (Residual Neural Network, deep residual network) network for feature value extraction to output a face feature vector.
请参考图3,所示为本申请中ResNet网络的基本运算结构示意图。本实施方式中,ResNet网络的第一基本运算结构如图3(a)所示,所述第一基本运算结构中的输入经过三个卷积层的输出和原始输入进行叠加。当输入和输出矩阵大小相同时使用如图3(a) 所示的第一基本运算结构。ResNet网络的第二基本运算结构如图3(b)所示,所述第二基本运算结构中的输入经过三个卷积层的输出和原始输入进行叠加。当输入和输出矩阵大小不相同时使用图3(b)所示的第二基本运算结构。Please refer to FIG. 3, which shows a schematic diagram of the basic operation structure of the ResNet network in this application. In this embodiment, the first basic operation structure of the ResNet network is shown in FIG. 3(a), and the input in the first basic operation structure is superimposed with the original input through the output of three convolutional layers. When the input and output matrix sizes are the same, the first basic operation structure shown in Figure 3(a) is used. The second basic operation structure of the ResNet network is shown in FIG. 3(b). The input in the second basic operation structure is superimposed with the original input through the output of three convolutional layers. When the input and output matrix sizes are not the same, the second basic operation structure shown in Figure 3(b) is used.
请参考图4,所示为本申请一实施方式中ResNet网络的结构图。如图4所示,ResNet网络的整体结构包括:一卷积层、一池化层、4组参数不同的卷积包(convolution package)、一池化层、一全连接层及一sigmoid层。本实施方式中,原始图像依次经过ResNet网络的卷积层、池化层、4组参数不同的卷积包(convolution package)、池化层、全连接层及sigmoid层处理后得到所述人脸特征向量。本实施方式中,所述4个卷积包分别包括conv2_x、conv3_x、conv4_x、conv5_x。本实施方式中,每组卷积包中的第一个package的第二个卷积层处进行stride为2的下采样。Please refer to FIG. 4, which shows a structural diagram of a ResNet network in an embodiment of this application. As shown in Figure 4, the overall structure of the ResNet network includes: a convolutional layer, a pooling layer, 4 sets of convolution packages with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer. In this embodiment, the original image is sequentially processed through the convolutional layer, pooling layer, 4 sets of convolution packages with different parameters, pooling layer, fully connected layer, and sigmoid layer of the ResNet network to obtain the face Feature vector. In this embodiment, the four convolutional packets include conv2_x, conv3_x, conv4_x, and conv5_x, respectively. In this embodiment, the second convolutional layer of the first package in each group of convolutional packages is down-sampled with a stride of 2.
本实施方式中,所述步骤“将经过检测处理后的人脸图像作为原始图像输入到优化后的ResNet网络中进行特征值提取以输出人脸特征向量”包括:In this embodiment, the step of "inputting the face image after the detection process as the original image into the optimized ResNet network for feature value extraction to output the face feature vector" includes:
(S2031)将原始图像进行增广得到样本数据。(S2031) Augment the original image to obtain sample data.
本实施方式中,所述将原始图像进行增广得到样本数据具体包括:获取原始图像;对原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数,当随机数小于0.5则对初始样本图片进行灰度化得到第一样本图片;对所述第一样本图片作增加点光源处理,得到第二样本图片;将得到的初始样本图片、第一样本图片和第二样本图片作为样本数据。In this embodiment, the augmentation of the original image to obtain sample data specifically includes: obtaining the original image; randomly cropping a picture with preset resolution pixels from the original image to obtain an initial sample picture; and obtaining an evenly distributed image with [0,1] Random number. When the random number of the initial sample picture is less than the random threshold, it will flip to generate a new [0,1] uniformly distributed random number. When the random number is less than 0.5, the initial sample picture will be grayed out to obtain the first Sample picture; add point light source processing to the first sample picture to obtain a second sample picture; use the obtained initial sample picture, first sample picture, and second sample picture as sample data.
本实施方式中,将原始图像进行增广是为了增加训练的样本数量。所述增广的方式包括,但不限于翻转、对大小为256*256的原始图像随机裁剪248*248、对原始图像灰度化、对原始图像修改光照、对原始图像增加点光源光照等。采用多种增广方式相耦合的方式进行增广。例如,对256*256的原始图像随机裁剪248*248像素的图片得到初始样本图片;产生[0,1]均分布的随机数,当随机数小于0.5则翻转产生新的[0,1]均分布的随机数,当随机数小于0.5则对初始样本图片进行灰度化得到第一样本图片;或者对所述第一样本图片作增加点光源处理得到第二样本图片等,当得到的初始样本图片、第一样本图片和第二样本图片的数量达到实际应用场景需求时,增广结束。In this embodiment, the purpose of augmenting the original image is to increase the number of training samples. The methods of augmentation include, but are not limited to, flipping, randomly cropping an original image with a size of 256*256 by 248*248, graying the original image, modifying the lighting of the original image, and adding point light source lighting to the original image. Use multiple augmentation methods to couple the augmentation. For example, randomly crop a 248*248 pixel image from a 256*256 original image to obtain an initial sample image; generate a random number with uniform distribution of [0,1]. When the random number is less than 0.5, it will flip to generate a new uniform [0,1] The distributed random number, when the random number is less than 0.5, the initial sample image is grayed out to obtain the first sample image; or the first sample image is processed by adding point light sources to obtain the second sample image, etc., when the When the number of initial sample pictures, first sample pictures, and second sample pictures reaches the actual application scenario requirements, the augmentation ends.
(S2032)训练所述样本数据对ResNet网络进行优化,得到优化的ResNet网络。(S2032) Train the sample data to optimize the ResNet network to obtain an optimized ResNet network.
本实施方式中,通过ResNet网络的第一基本运算结构或第二基本运算结构训练样本数据对ResNe网络进行优化。其中,ResNet网络的第一基本运算结构为输入经过三个卷积层的输出和原始输入进行叠加。当输入和输出矩阵大小相同时使用所述第一基本运算结构。In this embodiment, the ResNe network is optimized through the training sample data of the first basic operation structure or the second basic operation structure of the ResNet network. Among them, the first basic arithmetic structure of the ResNet network is the superposition of the output of the input through the three convolutional layers and the original input. The first basic operation structure is used when the input and output matrices have the same size.
本实施方式中,训练所述样本数据对ResNet网络进行优化,得到优化的ResNet网络包括:训练人脸图像的分类网络,得到人脸分类网络;及将训练的人脸分类网络迁移训练AU神经网络,得到训练后的样本数据。In this embodiment, training the sample data to optimize the ResNet network, and the optimized ResNet network includes: training a classification network of face images to obtain a face classification network; and transferring the trained face classification network to train an AU neural network , Get the sample data after training.
本实施方式中,训练人脸分类网络时可采用迁移方式逐步训练,也即将最后一个全链接层参数设置为人脸分类数,及将AU神经网络中最后的9个AU的sigmoid层换成softmax层。先训练100个分类的人脸,当准确率达到70%后把100个分类的人脸结果迁移到1200个人脸分类进行训练。当1200个人脸分类准确率达到90%后迁移到16000个人脸分类上训练,最后尽可能高的训练16000人脸分类结果。本实施方式中,固定ResNet网络从卷积层到conv3_x的各层参数,将16000个人类人脸训练参数迁移作为初始参数对conv4_x及后面的各层参数进行训练。这样充分利用现有的人脸分类学习的先验知识提高AU检测精度。In this embodiment, the migration method can be used to train the face classification network step by step, that is, the last fully-linked layer parameter is set to the number of face classifications, and the last 9 AU sigmoid layers in the AU neural network are replaced with a softmax layer . First train 100 classified faces, and when the accuracy reaches 70%, transfer the result of 100 classified faces to 1200 face classification for training. When the accuracy of 1200 face classification reaches 90%, it will be transferred to 16000 face classification for training, and finally 16000 face classification results will be trained as high as possible. In this embodiment, the parameters of each layer of the ResNet network from the convolutional layer to the conv3_x are fixed, and 16000 human face training parameters are transferred as the initial parameters to train the parameters of the conv4_x and subsequent layers. In this way, the prior knowledge of the existing face classification learning is fully utilized to improve the AU detection accuracy.
本实施方式中,迁移训练是指把人脸分类网络参数直接加载到AU神经网络,由于两个神经网络仅仅最后一层结构不一样其他参数数量均一致。所以参数是可以加载的。AU神经网络的输出维度较低只有19个结果,而人脸分类网络的维度较高。使用 人脸分类训练结果并迁移,同时部分层锁定,这样在AU检测中充分利用了人脸分类中学习到的人脸结构特征。In this embodiment, the migration training refers to directly loading the face classification network parameters into the AU neural network, because only the last layer of the two neural networks is different in structure, and the number of other parameters are the same. So the parameters can be loaded. The AU neural network has a low output dimension and only 19 results, while the face classification network has a high dimension. Use the face classification training results and transfer them, and at the same time part of the layers are locked, so that the face structure features learned in the face classification are fully utilized in the AU detection.
(S2033)在优化后的ResNet网络中输入待检测的人脸图像,得到人脸特征向量。(S2033) Input the face image to be detected in the optimized ResNet network to obtain the face feature vector.
本实施方式中,在对ResNet网络进行优化后,将经过检测处理后的人脸图像输入到优化后的所述ResNet网络获取全连接层的特征作为ResNet网络输出的人脸特征向量。In this embodiment, after the ResNet network is optimized, the face image after the detection process is input to the optimized ResNet network to obtain the features of the fully connected layer as the face feature vector output by the ResNet network.
步骤S204、将经过ResNet网络输出的人脸特征向量输入LSTM(Long short-Term Memory,长短期记忆网络)网络中进行训练,得到所述人脸图像的AU识别结果。Step S204: Input the face feature vector output by the ResNet network into an LSTM (Long Short-Term Memory) network for training, and obtain the AU recognition result of the face image.
本实施方式中,LSTM网络是一种特殊的循环神经网络。所述LSTM网络将输入序列视为一个时间序列,长短期记忆网络能够学习输入序列中的数据在时间上的短期以及长期依赖关系。本实施方式中,上述AU识别结果可指示所述人脸图像的AU类别。In this embodiment, the LSTM network is a special recurrent neural network. The LSTM network regards the input sequence as a time sequence, and the long and short-term memory network can learn the short-term and long-term dependencies of the data in the input sequence in time. In this embodiment, the aforementioned AU recognition result may indicate the AU category of the face image.
请参考图5,所示为本申请一实施方式中LSTM网络的时序处理流向示意图。其中,X0,X1,...,Xn是长度为n帧的人脸图像的每个帧图像,将人脸图像中的各帧图像经ResNet网络提取出人脸特征向量Y0,Y1,...,Yn,在将所述人脸特征向量Y0,Y1,...,Yn按照时间顺序依次输入LSTM网络,将经LSTM网络处理得到的不同时刻输出的AU识别结果h0,h1,...,hn。Please refer to FIG. 5, which is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application. Among them, X0, X1,..., Xn are each frame image of a face image with a length of n frames, and each frame image in the face image is extracted through the ResNet network to extract the face feature vector Y0, Y1,... ., Yn, the face feature vectors Y0, Y1,..., Yn are sequentially input to the LSTM network in chronological order, and the AU recognition results h0, h1,... , Hn.
本实施方式中,所述LSTM网络包括:输入门(即input gate)、遗忘门(即forget gate)、输出门(即output gate)、状态单元(即cell)和LSTM输出门。In this embodiment, the LSTM network includes: input gates (that is, input gates), forget gates (that is, forget gates), output gates (that is, output gates), state units (that is, cells), and LSTM output gates.
对于输入的人脸图像序列包含两帧以上人脸图像的情况,所述输入门、遗忘门、输出门、状态单元和LSTM输出门的处理过程可以分别通过以下公式计算实现:For the case where the input face image sequence contains more than two frames of face images, the processing procedures of the input gate, forget gate, output gate, state unit and LSTM output gate can be calculated and implemented by the following formulas:
i t=σ(W ix·x t+W im·m t-1+W ic·c t-1+b i); i t =σ(W ix ·x t +W im ·m t-1 +W ic ·c t-1 +b i );
f t=σ(W fx·xt+W fm·mt-1+W fc·c t-1+b f); f t =σ(W fx ·xt+W fm ·mt-1+W fc ·c t-1 +b f );
c t=ft⊙c t-1+i t⊙σ(W cx·x t+W cm·m t-1+b c); c t =ft⊙c t-1 +i t ⊙σ(W cx ·x t +W cm ·m t-1 +b c );
o t=σ(W ox·x t+W om·m t-1+W oc·c t-1+b o); o t =σ(W ox ·x t +W om ·m t-1 +W oc ·c t-1 +b o );
m t=o t⊙h(c t)。 m t =o t ⊙h(c t ).
其中,在上述公式中,x t表示为t时刻输入的人脸特征向量;W(即W ix、W im、W ic、W fx、W fm、W fc、W cx、W cm、W ox、W om和W oc)为预设的权重矩阵,表示每个门的元素都是由对应维数的数据得到,也就是说不同维数的节点之间互不干扰;b(即b i、b f、b c、b o)表示预设的偏置向量,i t、f t、o t、c t、m t分别表示t时刻的所述输入门、遗忘门、输出门、状态单元和所述LSTM输出门的状态,⊙为点积,σ()为sigmoid函数,h()为上述状态单元的输出激活函数,所述输出激活函数具体可以为tanh函数。 Among them, in the above formula, x t is expressed as the face feature vector input at time t; W (ie Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc ) are preset weight matrices, indicating that the elements of each gate are obtained from the data of the corresponding dimension, that is to say, nodes of different dimensions do not interfere with each other; b (ie, b i , b f, b c, b o) represents a predetermined offset vector, i t, f t, o t, c t, m t respectively represent the gate input at time t, forgetting gate, the output of the gate, and the status unit In the state of the LSTM output gate, ⊙ is the dot product, σ() is the sigmoid function, and h() is the output activation function of the above-mentioned state unit. The output activation function may specifically be a tanh function.
本实施方式中对LSTM网络进行训练的具体过程为:将人脸图像中的各帧图像经ResNet网络提取出人脸特征向量输入所述LSTM网络中,并基于反向传播算法对所述LSTM网络进行训练,以使得输入的图像经所述LSTM网络处理后输出的值与所述图像所属表情类别的映射值的偏差在预设的允许范围内。当然,对所述LSTM网络的训练过程也可以参照其它已有的技术方案实现,此处不作限定。The specific process of training the LSTM network in this embodiment is as follows: each frame image in the face image is extracted through the ResNet network to extract the face feature vector into the LSTM network, and the LSTM network is processed based on the back propagation algorithm. Training is performed so that the deviation between the value of the input image processed by the LSTM network and the mapping value of the expression category to which the image belongs is within a preset allowable range. Of course, the training process of the LSTM network can also be implemented with reference to other existing technical solutions, which is not limited here.
本申请中的图像的AU检测方法基于ResNet网络和LSTM网络构建训练模型,并将人脸图像的连续帧图像集合(例如视频)作为所述训练模型的训练输入,能够使所述训练模型充分利用脸部AU变化的动态信息自动学习识别对象的AU特征之间的映射关系,从而提高所述训练模型的预测精度和鲁棒性,进而提升人脸图像的AU识别的性能。The image AU detection method in this application builds a training model based on the ResNet network and the LSTM network, and uses a collection of continuous frame images (such as video) of the face image as the training input of the training model, so that the training model can be fully utilized The dynamic information of face AU changes automatically learns the mapping relationship between the AU features of the recognition object, thereby improving the prediction accuracy and robustness of the training model, and further improving the performance of AU recognition of the face image.
实施例2Example 2
图6为本申请一实施方式中图像的AU检测装置40的结构图。FIG. 6 is a structural diagram of an image AU detection device 40 in an embodiment of this application.
在一些实施例中,所述图像的AU检测装置40运行于终端装置1中。所述图像的 AU检测装置40可以包括多个由程序代码段所组成的功能模块。所述图像的AU检测装置40中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行人脸识别的功能。In some embodiments, the image AU detection device 40 runs in the terminal device 1. The image AU detection device 40 may include multiple functional modules composed of program code segments. The program code of each program segment in the image AU detection device 40 can be stored in the memory and executed by at least one processor to perform the function of face recognition.
本实施例中,所述图像的AU检测装置40根据其所执行的功能,可以被划分为多个功能模块。参阅图7所示,所述图像的AU检测装置40可以包括获取模块401、预处理模块402、特征提取模块403、识别模块404。本申请所称的模块是指一种能够被至少一个处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在存储器中。所述在一些实施例中,关于各模块的功能将在后续的实施例中详述。In this embodiment, the image AU detection device 40 can be divided into multiple functional modules according to the functions it performs. Referring to FIG. 7, the image AU detection device 40 may include an acquisition module 401, a preprocessing module 402, a feature extraction module 403, and an identification module 404. The module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In some embodiments, the functions of each module will be detailed in subsequent embodiments.
所述获取模块401用于获取人脸图像。The acquiring module 401 is used to acquire a face image.
本实施方式中,所述图像采集单元11可以为2D摄影机,所述获取模块401通过所述2D摄影机获取用户的2D人脸图像作为用户的人脸图像。在另一实施方式中,所述图像采集单元11也可以为3D摄像机,所述获取模块401通过所述3D摄像机获取用户的3D人脸图像作为用户的人脸图像。在另一实施方式中,所述获取模块401接收与所述终端装置通信连接的外部设备2发送的人脸图片。在其他实施方式中,所述人脸图像存储在终端装置1的存储设备中,所述获取模块401从所述存储设备中获取人脸图像。其中,所述人脸图像包含连续帧的人脸图片。例如,在一实施方式中,所述人脸图片可以为人脸视频等。In this embodiment, the image acquisition unit 11 may be a 2D camera, and the acquisition module 401 acquires the user's 2D face image as the user's face image through the 2D camera. In another embodiment, the image acquisition unit 11 may also be a 3D camera, and the acquisition module 401 acquires the user's 3D face image as the user's face image through the 3D camera. In another embodiment, the acquisition module 401 receives a face picture sent by the external device 2 communicatively connected with the terminal device. In other embodiments, the face image is stored in a storage device of the terminal device 1, and the acquisition module 401 acquires the face image from the storage device. Wherein, the face image includes consecutive frames of face pictures. For example, in one embodiment, the face picture may be a face video or the like.
所述预处理模块402用于对获取的人脸图像进行检测处理以获取统一的人脸区域。The preprocessing module 402 is used to perform detection processing on the acquired face image to acquire a unified face area.
本实施方式中,所述预处理模块402可以采用基于Haar-like特征的Adaboost人脸检测算法对获取的人脸图像中的每帧人脸图像进行人脸检测,确定人脸区域。在具体实施过程中,可以基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域。本实施方式中,所述人脸区域可以是人脸图像中包括额头、下巴、左脸颊、右脸颊的固定矩形区域。In this embodiment, the preprocessing module 402 may use the Adaboost face detection algorithm based on Haar-like features to perform face detection on each frame of face images in the acquired face images to determine the face area. In a specific implementation process, the Adaboost face detection algorithm may be used to scan each frame of the face image with a window of a preset size and a preset compensation until the face area in each frame of the image is determined. In this embodiment, the face area may be a fixed rectangular area including the forehead, chin, left cheek, and right cheek in the face image.
在一实施方式中,所述预处理模块402还用于对所述人脸区域进行校准处理。具体的所述预处理模块402检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对相应的人脸图像进行对齐校准。所述人脸区域中的关键特征点可以为眼睛、鼻子、嘴巴、左脸颊外轮廓、右脸颊外轮廓等。根据在人脸区域中检测到的关键特征点,可通过landmark方法对所述人脸图像进行对齐校准,使得人脸在人脸图像中各关键特征点的位置基本一致。本实施方式中,为了避免人脸图像大小不统一影响后续人脸图像的识别结果,所述预处理模块402还可以将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。本实施方式中,所述编辑处理包括剪切处理、缩放处理中的一种或两种。例如,在编辑处理过程中,所述预处理模块402基于检测到的人脸区域中的关键特征点,将相应的人脸图像按统一模板剪切出来并将人脸图像缩放到统一大小,如此,实现对所述人脸图像的编辑。本实施方式中,可以使用opencv的resize基于双线性差值或面积差值算法将人脸图像缩放到统一大小尺寸。In an embodiment, the preprocessing module 402 is also used to perform calibration processing on the face area. Specifically, the preprocessing module 402 detects key feature points in the face region, and performs alignment and calibration on the corresponding face image based on the positions of the detected key feature points. The key feature points in the face area may be eyes, nose, mouth, left cheek outer contour, right cheek outer contour, and so on. According to the key feature points detected in the face region, the face image can be aligned and calibrated by the landmark method, so that the positions of the key feature points of the face in the face image are basically the same. In this embodiment, in order to avoid that the size of the face image is not uniform and affect the recognition result of the subsequent face image, the preprocessing module 402 may also edit the face image after alignment and calibration according to a preset template to obtain Face images of uniform size. In this embodiment, the editing processing includes one or two of cutting processing and zoom processing. For example, in the editing process, the preprocessing module 402 cuts out the corresponding face image according to a uniform template based on the key feature points in the detected face area and scales the face image to a uniform size. , To realize the editing of the face image. In this embodiment, opencv's resize can be used to scale the face image to a uniform size based on a bilinear difference or area difference algorithm.
所述特征提取模块403用于将经过检测处理的人脸图像作为原始图像输入到优化后的ResNet(Residual Neural Network,深度残差网络)网络中进行特征值提取以输出人脸特征向量。The feature extraction module 403 is configured to input the detected face image as an original image into an optimized ResNet (Residual Neural Network, deep residual network) network for feature value extraction to output a face feature vector.
本实施方式中,ResNet网络的第一基本运算结构如图3(a)所示,所述第一基本运算结构中的输入经过三个卷积层的输出和原始输入进行叠加。当输入和输出矩阵大小相同时使用如图3(a)所示的第一基本运算结构。ResNet网络的第二基本运算结构如图3(b)所示,所述第二基本运算结构中的输入经过三个卷积层的输出和原始输入进行叠加。当输入和输出矩阵大小不相同时使用图3(b)所示的第二基本运算结构。In this embodiment, the first basic operation structure of the ResNet network is shown in FIG. 3(a), and the input in the first basic operation structure is superimposed with the original input through the output of three convolutional layers. When the input and output matrix sizes are the same, the first basic operation structure shown in Figure 3(a) is used. The second basic operation structure of the ResNet network is shown in FIG. 3(b). The input in the second basic operation structure is superimposed with the original input through the output of three convolutional layers. When the input and output matrix sizes are not the same, the second basic operation structure shown in Figure 3(b) is used.
如图4所示,ResNet网络的整体结构包括:一卷积层、一池化层、4组参数不同的卷积包(convolution package)、一池化层、一全连接层及一sigmoid层。本实施方 式中,原始图像依次经过ResNet网络的卷积层、池化层、4组参数不同的卷积包(convolution package)、池化层、全连接层及sigmoid层处理后得到所述人脸特征向量。本实施方式中,所述4个卷积包分别包括conv2_x、conv3_x、conv4_x、conv5_x。本实施方式中,每组卷积包中的第一个package的第二个卷积层处进行stride为2的下采样。As shown in Figure 4, the overall structure of the ResNet network includes: a convolutional layer, a pooling layer, 4 sets of convolution packages with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer. In this embodiment, the original image is sequentially processed through the convolutional layer, pooling layer, 4 sets of convolution packages with different parameters, pooling layer, fully connected layer, and sigmoid layer of the ResNet network to obtain the face Feature vector. In this embodiment, the four convolutional packets include conv2_x, conv3_x, conv4_x, and conv5_x, respectively. In this embodiment, the second convolutional layer of the first package in each group of convolutional packages is down-sampled with a stride of 2.
本实施方式中,所述“将经过检测处理后的人脸图像作为原始图像输入到优化后的ResNet网络中进行特征值提取以输出人脸特征向量”包括:In this embodiment, the “inputting the face image after the detection process as the original image into the optimized ResNet network for feature value extraction to output the face feature vector” includes:
(S2031)将原始图像进行增广得到样本数据。(S2031) Augment the original image to obtain sample data.
本实施方式中,所述将原始图像进行增广得到样本数据具体包括:获取原始图像;对原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数,当随机数小于0.5则对初始样本图片进行灰度化得到第一样本图片;对所述第一样本图片作增加点光源处理,得到第二样本图片;将得到的初始样本图片、第一样本图片和第二样本图片作为样本数据。In this embodiment, the augmentation of the original image to obtain sample data specifically includes: obtaining the original image; randomly cropping a picture with preset resolution pixels from the original image to obtain an initial sample picture; and obtaining an evenly distributed image with [0,1] Random number. When the random number of the initial sample picture is less than the random threshold, it will flip to generate a new [0,1] uniformly distributed random number. When the random number is less than 0.5, the initial sample picture will be grayed out to obtain the first Sample picture; add point light source processing to the first sample picture to obtain a second sample picture; use the obtained initial sample picture, first sample picture, and second sample picture as sample data.
本实施方式中,将原始图像进行增广是为了增加训练的样本数量。所述增广的方式包括,但不限于翻转、对大小为256*256的原始图像随机裁剪248*248、对原始图像灰度化、对原始图像修改光照、对原始图像增加点光源光照等。采用多种增广方式相耦合的方式进行增广。例如,对256*256的原始图像随机裁剪248*248像素的图片得到初始样本图片;产生[0,1]均分布的随机数,当随机数小于0.5则翻转产生新的[0,1]均分布的随机数,当随机数小于0.5则对初始样本图片进行灰度化得到第一样本图片;或者对所述第一样本图片作增加点光源处理得到第二样本图片等,当得到的初始样本图片、第一样本图片和第二样本图片的数量达到实际应用场景需求时,增广结束。In this embodiment, the purpose of augmenting the original image is to increase the number of training samples. The methods of augmentation include, but are not limited to, flipping, randomly cropping an original image with a size of 256*256 by 248*248, graying the original image, modifying the lighting of the original image, and adding point light source lighting to the original image. Use multiple augmentation methods to couple the augmentation. For example, randomly crop a 248*248 pixel image from a 256*256 original image to obtain an initial sample image; generate a random number with uniform distribution of [0,1]. When the random number is less than 0.5, it will flip to generate a new uniform [0,1] The distributed random number, when the random number is less than 0.5, the initial sample image is grayed out to obtain the first sample image; or the first sample image is processed by adding point light sources to obtain the second sample image, etc., when the When the number of initial sample pictures, first sample pictures, and second sample pictures reaches the actual application scenario requirements, the augmentation ends.
(S2032)训练所述样本数据对ResNet网络进行优化,得到优化的ResNet网络。(S2032) Train the sample data to optimize the ResNet network to obtain an optimized ResNet network.
本实施方式中,通过ResNet网络的第一基本运算结构或第二基本运算结构训练样本数据对ResNe网络进行优化。其中,ResNet网络的第一基本运算结构为输入经过三个卷积层的输出和原始输入进行叠加。当输入和输出矩阵大小相同时使用所述第一基本运算结构。In this embodiment, the ResNe network is optimized through the training sample data of the first basic operation structure or the second basic operation structure of the ResNet network. Among them, the first basic arithmetic structure of the ResNet network is the superposition of the output of the input through three convolutional layers and the original input. The first basic operation structure is used when the input and output matrices have the same size.
本实施方式中,训练所述样本数据对ResNet网络进行优化,得到优化的ResNet网络包括:训练人脸图像的分类网络,得到人脸分类网络;及将训练的人脸分类网络迁移训练AU神经网络,得到训练后的样本数据。In this embodiment, training the sample data to optimize the ResNet network, and the optimized ResNet network includes: training a classification network of face images to obtain a face classification network; and transferring the trained face classification network to train an AU neural network , Get the sample data after training.
本实施方式中,训练人脸分类网络时可采用迁移方式逐步训练,也即将最后一个全链接层参数设置为人脸分类数,及将AU神经网络中最后的9个AU的sigmoid层换成softmax层。先训练100个分类的人脸,当准确率达到70%后把100个分类的人脸结果迁移到1200个人脸分类进行训练。当1200个人脸分类准确率达到90%后迁移到16000个人脸分类上训练,最后尽可能高的训练16000人脸分类结果。本实施方式中,固定ResNet网络从卷积层到conv3_x的各层参数,将16000个人类人脸训练参数迁移作为初始参数对conv4_x及后面的各层参数进行训练。这样充分利用现有的人脸分类学习的先验知识提高AU检测精度。In this embodiment, the migration method can be used to train the face classification network step by step, that is, the last fully-linked layer parameter is set to the number of face classifications, and the last 9 AU sigmoid layers in the AU neural network are replaced with a softmax layer . First train 100 classified faces, and when the accuracy reaches 70%, transfer the result of 100 classified faces to 1200 face classification for training. When the accuracy of 1200 face classification reaches 90%, it will be transferred to 16000 face classification for training, and finally 16000 face classification results will be trained as high as possible. In this embodiment, the parameters of each layer of the ResNet network from the convolutional layer to the conv3_x are fixed, and 16000 human face training parameters are transferred as the initial parameters to train the parameters of the conv4_x and subsequent layers. In this way, the prior knowledge of the existing face classification learning is fully utilized to improve the AU detection accuracy.
本实施方式中,迁移训练是指把人脸分类网络参数直接加载到AU神经网络,由于两个神经网络仅仅最后一层结构不一样其他参数数量均一致。所以参数是可以加载的。AU神经网络的输出维度较低只有19个结果,而人脸分类网络的维度较高。使用人脸分类训练结果并迁移,同时部分层锁定,这样在AU检测中充分利用了人脸分类中学习到的人脸结构特征。In this embodiment, the migration training refers to directly loading the face classification network parameters into the AU neural network, because only the last layer of the two neural networks is different in structure, and the number of other parameters are the same. So the parameters can be loaded. The AU neural network has a low output dimension and only 19 results, while the face classification network has a high dimension. Use the face classification training results and transfer them, and at the same time part of the layers are locked, so that the face structure features learned in the face classification are fully utilized in the AU detection.
(S2033)在优化后的ResNet网络中输入待检测的人脸图像,得到人脸特征向量。(S2033) Input the face image to be detected in the optimized ResNet network to obtain the face feature vector.
本实施方式中,在对ResNet网络进行优化后,将经过检测处理后的人脸图像输入到优化后的所述ResNet网络获取全连接层的特征作为ResNet网络输出的人脸特征向 量。In this embodiment, after the ResNet network is optimized, the face image after the detection process is input to the optimized ResNet network to obtain the features of the fully connected layer as the face feature vectors output by the ResNet network.
所述识别模块404用于将经过ResNet网络输出的人脸特征向量输入LSTM(Long short-Term Memory,长短期记忆网络)网络中进行训练,得到所述人脸图像的AU识别结果。The recognition module 404 is configured to input the face feature vector output by the ResNet network into an LSTM (Long Short-Term Memory) network for training, and obtain the AU recognition result of the face image.
本实施方式中,LSTM网络是一种特殊的循环神经网络。所述LSTM网络将输入序列视为一个时间序列,长短期记忆网络能够学习输入序列中的数据在时间上的短期以及长期依赖关系。本实施方式中,上述AU识别结果可指示所述人脸图像的AU类别。In this embodiment, the LSTM network is a special recurrent neural network. The LSTM network regards the input sequence as a time sequence, and the long and short-term memory network can learn the short-term and long-term dependence of data in the input sequence in time. In this embodiment, the aforementioned AU recognition result may indicate the AU category of the face image.
图5为本申请一实施方式中LSTM网络的时序处理流向示意图。其中,X0,X1,...,Xn是长度为n帧的人脸图像的每个帧图像,将人脸图像中的各帧图像经ResNet网络提取出人脸特征向量Y0,Y1,...,Yn,在将所述人脸特征向量Y0,Y1,...,Yn按照时间顺序依次输入LSTM网络,将经LSTM网络处理得到的不同时刻输出的AU识别结果h0,h1,...,hn。FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application. Among them, X0, X1,..., Xn are each frame image of a face image with a length of n frames, and each frame image in the face image is extracted through the ResNet network to extract the face feature vector Y0, Y1,... ., Yn, the face feature vectors Y0, Y1,..., Yn are sequentially input to the LSTM network in chronological order, and the AU recognition results h0, h1,... , Hn.
本实施方式中,所述LSTM网络包括:输入门(即input gate)、遗忘门(即forget gate)、输出门(即output gate)、状态单元(即cell)和LSTM输出门。In this embodiment, the LSTM network includes: input gates (that is, input gates), forget gates (that is, forget gates), output gates (that is, output gates), state units (that is, cells), and LSTM output gates.
对于输入的人脸图像序列包含两帧以上人脸图像的情况,所述输入门、遗忘门、输出门、状态单元和LSTM输出门的处理过程可以分别通过以下公式计算实现:For the case where the input face image sequence contains more than two frames of face images, the processing procedures of the input gate, forget gate, output gate, state unit and LSTM output gate can be calculated and implemented by the following formulas:
i t=σ(W ix·x t+W im·m t-1+W ic·c t-1+b i); i t =σ(W ix ·x t +W im ·m t-1 +W ic ·c t-1 +b i );
f t=σ(W fx·xt+W fm·mt-1+W fc·c t-1+b f); f t =σ(W fx ·xt+W fm ·mt-1+W fc ·c t-1 +b f );
c t=ft⊙c t-1+i t⊙σ(W cx·x t+W cm·m t-1+b c); c t =ft⊙c t-1 +i t ⊙σ(W cx ·x t +W cm ·m t-1 +b c );
o t=σ(W ox·x t+W om·m t-1+W oc·c t-1+b o); o t =σ(W ox ·x t +W om ·m t-1 +W oc ·c t-1 +b o );
m t=o t⊙h(c t)。 m t =o t ⊙h(c t ).
其中,在上述公式中,x t表示为t时刻输入的人脸特征向量;W(即W ix、W im、W ic、W fx、W fm、W fc、W cx、W cm、W ox、W om和W oc)为预设的权重矩阵,表示每个门的元素都是由对应维数的数据得到,也就是说不同维数的节点之间互不干扰;b(即b i、b f、b c、b o)表示预设的偏置向量,i t、f t、o t、c t、m t分别表示t时刻的所述输入门、遗忘门、输出门、状态单元和所述LSTM输出门的状态,⊙为点积,σ()为sigmoid函数,h()为上述状态单元的输出激活函数,所述输出激活函数具体可以为tanh函数。 Among them, in the above formula, x t is expressed as the face feature vector input at time t; W (ie Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc ) are preset weight matrices, indicating that the elements of each gate are obtained from the data of the corresponding dimension, that is to say, nodes of different dimensions do not interfere with each other; b (ie, b i , b f, b c, b o) represents a predetermined offset vector, i t, f t, o t, c t, m t respectively represent the gate input at time t, forgetting gate, the output of the gate, and the status unit In the state of the LSTM output gate, ⊙ is the dot product, σ() is the sigmoid function, and h() is the output activation function of the above-mentioned state unit. The output activation function may specifically be a tanh function.
本实施方式中对LSTM网络进行训练的具体过程为:将人脸图像中的各帧图像经ResNet网络提取出人脸特征向量输入所述LSTM网络中,并基于反向传播算法对所述LSTM网络进行训练,以使得输入的图像经所述LSTM网络处理后输出的值与所述图像所属表情类别的映射值的偏差在预设的允许范围内。当然,对所述LSTM网络的训练过程也可以参照其它已有的技术方案实现,此处不作限定。The specific process of training the LSTM network in this embodiment is as follows: each frame image in the face image is extracted through the ResNet network to extract the face feature vector into the LSTM network, and the LSTM network is processed based on the back propagation algorithm. Training is performed so that the deviation between the value of the input image processed by the LSTM network and the mapping value of the expression category to which the image belongs is within a preset allowable range. Of course, the training process of the LSTM network can also be implemented with reference to other existing technical solutions, which is not limited here.
本申请中的图像的AU检测方法基于ResNet网络和LSTM网络构建训练模型,并将人脸图像的连续帧图像集合(例如视频)作为所述训练模型的训练输入,能够使所述训练模型充分利用脸部AU变化的动态信息自动学习识别对象的AU特征之间的映射关系,从而提高所述训练模型的预测精度和鲁棒性,进而提升人脸图像的AU识别的性能。The image AU detection method in this application builds a training model based on the ResNet network and the LSTM network, and uses a collection of continuous frame images (such as video) of the face image as the training input of the training model, so that the training model can be fully utilized The dynamic information of face AU changes automatically learns the mapping relationship between the AU features of the recognition object, thereby improving the prediction accuracy and robustness of the training model, and further improving the performance of AU recognition of the face image.
实施例3Example 3
图7为本申请一实施方式中电子设备6的示意图。FIG. 7 is a schematic diagram of the electronic device 6 in an embodiment of the application.
所述电子设备6包括存储器61、处理器62以及存储在所述存储器61中并可在所述处理器62上运行的计算机可读指令63。所述处理器62执行所述计算机可读指令63时实现上述图像的AU检测方法实施例中的步骤,例如图2所示的步骤S201~S204。或者,所述处理器62执行所述计算机可读指令63时实现上述图像的AU检测装置实施例中各模块/单元的功能,例如图6中的模块401~404。The electronic device 6 includes a memory 61, a processor 62, and computer readable instructions 63 stored in the memory 61 and executable on the processor 62. The processor 62 implements the steps in the embodiment of the AU detection method for the image when the processor 62 executes the computer readable instruction 63, such as steps S201 to S204 shown in FIG. 2. Alternatively, when the processor 62 executes the computer-readable instruction 63, the functions of the modules/units in the embodiment of the image AU detection device are realized, such as the modules 401 to 404 in FIG. 6.
示例性的,所述计算机可读指令63可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器61中,并由所述处理器62执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令段,所述指令段用于描述所述计算机可读指令63在所述电子设备6中的执行过程。例如,所述计算机可读指令63可以被分割成图6中的获取模块401、预处理模块402、特征提取模块403、识别模块404,各模块具体功能参见实施例2。Exemplarily, the computer-readable instructions 63 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 62, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 63 in the electronic device 6. For example, the computer-readable instruction 63 may be divided into the acquisition module 401, the preprocessing module 402, the feature extraction module 403, and the recognition module 404 in FIG. 6. For the specific functions of each module, refer to Embodiment 2.
所述电子设备6可以是桌上型计算机、笔记本、掌上电脑及云端终端装置1等计算设备。本领域技术人员可以理解,所述示意图仅仅是电子设备6的示例,并不构成对电子设备6的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述电子设备6还可以包括输入输出设备、网络接入设备、总线等。The electronic device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud terminal device 1. Those skilled in the art can understand that the schematic diagram is only an example of the electronic device 6 and does not constitute a limitation on the electronic device 6. It may include more or less components than those shown in the figure, or combine certain components, or different components. Components, for example, the electronic device 6 may also include input and output devices, network access devices, buses, and so on.
所称处理器62可以是中央处理模块(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者所述处理器62也可以是任何常规的处理器等,所述处理器62是所述电子设备6的控制中心,利用各种接口和线路连接整个电子设备6的各个部分。The so-called processor 62 may be a central processing module (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor 62 can also be any conventional processor, etc. The processor 62 is the control center of the electronic device 6 and connects the entire electronic device 6 through various interfaces and lines. Parts.
在一实施例中,本申请提供了一种电子设备,其中,所述电子设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令以实现如下步骤:In an embodiment, the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
获取人脸图像;Obtain face images;
对获取的人脸图像进行检测处理以获取统一的人脸区域;Perform detection processing on the acquired face image to obtain a uniform face area;
将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及Input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
其中,所述处理器执行所述计算机可读指令时还实现如下步骤:Wherein, the processor further implements the following steps when executing the computer-readable instruction:
基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域,其中所述人脸区域包括额头、下巴、左脸颊、右脸颊的固定矩形区域。Based on the Adaboost face detection algorithm, scan each frame of the face image with a preset size window and preset compensation until the face area in each frame of the image is determined, wherein the face area includes the forehead, the chin, The fixed rectangular area of the left and right cheeks.
其中,所述处理器执行所述计算机可读指令时还实现如下步骤:Wherein, the processor further implements the following steps when executing the computer-readable instruction:
检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对所述人脸图像进行对齐校准,其中所述人脸区域中的关键特征点包括眼睛、鼻子、嘴巴、左脸颊外轮廓、右脸颊外轮廓;及Detect key feature points in the face area, and align the face image based on the positions of the detected key feature points. The key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。The face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
其中,所述ResNet网络结构包括卷积层、池化层、4组参数不同的卷积包、池化层、全连接层及sigmoid层。Wherein, the ResNet network structure includes a convolutional layer, a pooling layer, four groups of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
其中,所述处理器执行所述计算机可读指令时还实现如下步骤:Wherein, the processor further implements the following steps when executing the computer-readable instruction:
将原始图像进行增广得到训练的样本数据;Augment the original image to obtain training sample data;
训练所述样本数据对所述ResNet网络进行优化,得到优化的ResNet网络;及Training the sample data to optimize the ResNet network to obtain an optimized ResNet network; and
在优化后的所述ResNet网络中输入待检测的人脸图像,得到人脸特征向量。Input the face image to be detected into the optimized ResNet network to obtain the face feature vector.
其中,所述处理器执行所述计算机可读指令时还实现如下步骤:Wherein, the processor further implements the following steps when executing the computer-readable instruction:
获取所述原始图像;Acquiring the original image;
对所述原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;Randomly crop a picture with preset resolution pixels from the original image to obtain an initial sample picture;
获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数;Obtain [0,1] uniformly distributed random numbers, and when the random number of the initial sample picture is less than the random threshold, flip to generate a new [0,1] uniformly distributed random number;
当所述随机数小于0.5则对所述初始样本图片进行灰度化得到第一样本图片;When the random number is less than 0.5, graying the initial sample picture to obtain a first sample picture;
对所述第一样本图片作增加点光源处理,得到第二样本图片;Processing the first sample picture by adding a point light source to obtain a second sample picture;
及将得到的初始样本图片、第一样本图片和第二样本图片作为所述样本数据。And use the obtained initial sample picture, first sample picture, and second sample picture as the sample data.
所述存储器61可用于存储所述计算机可读指令63和/或模块/单元,所述处理器62通过运行或执行存储在所述存储器61内的计算机可读指令和/或模块/单元,以及调用存储在存储器61内的数据,实现所述电子设备6的各种功能。所述存储器61可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备6的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器61可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 61 may be used to store the computer-readable instructions 63 and/or modules/units, and the processor 62 can run or execute the computer-readable instructions and/or modules/units stored in the memory 61, and The data stored in the memory 61 is called to realize various functions of the electronic device 6. The memory 61 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.); the storage data area may The data (such as audio data, phone book, etc.) created according to the use of the electronic device 6 is stored. In addition, the memory 61 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
所述电子设备6集成的模块/单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一计算机可读存储介质中,所述计算机可读存储介质可以是非易失性,也可以是易失性,所述计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。If the integrated module/unit of the electronic device 6 is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a computer-readable storage medium. Wherein, the computer-readable storage medium may be non-volatile or volatile, and the computer-readable instructions may implement the steps of the foregoing method embodiments when executed by a processor. Wherein, the computer-readable instruction includes computer-readable instruction code, and the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable medium Does not include electrical carrier signals and telecommunication signals.
在一实施例中,本社申请还提供了一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:In an embodiment, the agency’s application also provides one or more readable storage media storing computer readable instructions, wherein when the computer readable instructions are executed by one or more processors, the one or Multiple processors perform the following steps:
获取人脸图像;Obtain face images;
对获取的人脸图像进行检测处理以获取统一的人脸区域;Perform detection processing on the acquired face image to obtain a uniform face area;
将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及Input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:Wherein, when the computer-readable instruction is executed by one or more processors, the one or more processors further execute the following steps:
基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域,其中所述人脸区域包括额头、下巴、左脸颊、右脸颊的固定矩形区域。Based on the Adaboost face detection algorithm, scan each frame of the face image with a preset size window and preset compensation until the face area in each frame of the image is determined, wherein the face area includes the forehead, the chin, The fixed rectangular area of the left and right cheeks.
其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:Wherein, when the computer-readable instruction is executed by one or more processors, the one or more processors further execute the following steps:
检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对所述人脸图像进行对齐校准,其中所述人脸区域中的关键特征点包括眼睛、鼻子、嘴巴、左 脸颊外轮廓、右脸颊外轮廓;及Detect key feature points in the face area, and align the face image based on the positions of the detected key feature points. The key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。The face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
其中,所述ResNet网络结构包括卷积层、池化层、4组参数不同的卷积包、池化层、全连接层及sigmoid层。Wherein, the ResNet network structure includes a convolutional layer, a pooling layer, four groups of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:Wherein, when the computer-readable instruction is executed by one or more processors, the one or more processors further execute the following steps:
将原始图像进行增广得到训练的样本数据;Augment the original image to obtain training sample data;
训练所述样本数据对所述ResNet网络进行优化,得到优化的ResNet网络;及Training the sample data to optimize the ResNet network to obtain an optimized ResNet network; and
在优化后的所述ResNet网络中输入待检测的人脸图像,得到人脸特征向量。Input the face image to be detected into the optimized ResNet network to obtain the face feature vector.
其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:Wherein, when the computer-readable instruction is executed by one or more processors, the one or more processors further execute the following steps:
获取所述原始图像;Acquiring the original image;
对所述原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;Randomly crop a picture with preset resolution pixels from the original image to obtain an initial sample picture;
获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数;Obtain [0,1] uniformly distributed random numbers, and when the random number of the initial sample picture is less than the random threshold, flip to generate a new [0,1] uniformly distributed random number;
当所述随机数小于0.5则对所述初始样本图片进行灰度化得到第一样本图片;When the random number is less than 0.5, graying the initial sample picture to obtain a first sample picture;
对所述第一样本图片作增加点光源处理,得到第二样本图片;Processing the first sample picture by adding a point light source to obtain a second sample picture;
及将得到的初始样本图片、第一样本图片和第二样本图片作为所述样本数据。And use the obtained initial sample picture, first sample picture, and second sample picture as the sample data.
在本申请所提供的几个实施例中,应该理解到,所揭露的电子设备和方法,可以通过其它的方式实现。例如,以上所描述的电子设备实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided in this application, it should be understood that the disclosed electronic device and method may be implemented in other ways. For example, the electronic device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
另外,在本申请各个实施例中的各功能模块可以集成在相同处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在相同模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, the functional modules in the various embodiments of the present application may be integrated in the same processing module, or each module may exist alone physically, or two or more modules may be integrated in the same module. The above-mentioned integrated modules can be implemented in the form of hardware, or in the form of hardware plus software functional modules.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他模块或步骤,单数不排除复数。电子设备权利要求中陈述的多个模块或电子设备也可以由同一个模块或电子设备通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other modules or steps, and the singular does not exclude the plural. Multiple modules or electronic devices stated in the claims of an electronic device can also be implemented by the same module or electronic device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Modifications or equivalent replacements are made without departing from the spirit and scope of the technical solution of this application.

Claims (20)

  1. 一种图像的AU检测方法,其中,所述方法包括:An image AU detection method, wherein the method includes:
    获取人脸图像;Obtain face images;
    对获取的人脸图像进行检测处理以获取统一的人脸区域;Perform detection processing on the acquired face image to obtain a uniform face area;
    将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及Input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
    将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  2. 如权利要求1所述的图像的AU检测方法,其中,所述对获取的人脸图像进行检测处理以获取统一的人脸区域包括:The AU detection method of an image according to claim 1, wherein said performing detection processing on the obtained face image to obtain a unified face area comprises:
    基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域,其中所述人脸区域包括额头、下巴、左脸颊、右脸颊的固定矩形区域。Based on the Adaboost face detection algorithm, scan each frame of the face image with a preset size window and preset compensation until the face area in each frame of the image is determined, wherein the face area includes the forehead, the chin, The fixed rectangular area of the left and right cheeks.
  3. 如权利要求1所述的图像的AU检测方法,其中,所述对获取的人脸图像进行检测处理以获取统一的人脸区域包括:The AU detection method of an image according to claim 1, wherein said performing detection processing on the obtained face image to obtain a unified face area comprises:
    检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对所述人脸图像进行对齐校准,其中所述人脸区域中的关键特征点包括眼睛、鼻子、嘴巴、左脸颊外轮廓、右脸颊外轮廓;及Detect key feature points in the face area, and align the face image based on the positions of the detected key feature points. The key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
    将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。The face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  4. 如权利要求1所述的图像的AU检测方法,其中,所述ResNet网络结构包括卷积层、池化层、4组参数不同的卷积包、池化层、全连接层及sigmoid层。8. The image AU detection method according to claim 1, wherein the ResNet network structure includes a convolutional layer, a pooling layer, 4 sets of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  5. 如权利要求4所述的图像的AU检测方法,其中,所述将经过检测处理的人脸图像作为原始图像输入到优化后的ResNet网络中进行特征值提取以输出人脸特征向量包括:The AU detection method of an image according to claim 4, wherein the input of the detected face image as the original image into the optimized ResNet network for feature value extraction to output the face feature vector comprises:
    将原始图像进行增广得到训练的样本数据;Augment the original image to obtain training sample data;
    训练所述样本数据对所述ResNet网络进行优化,得到优化的ResNet网络;及Training the sample data to optimize the ResNet network to obtain an optimized ResNet network; and
    在优化后的所述ResNet网络中输入待检测的人脸图像,得到人脸特征向量。Input the face image to be detected into the optimized ResNet network to obtain the face feature vector.
  6. 如权利要求5所述的图像的AU检测方法,其中,所述将原始图像进行增广得到训练的样本数据包括:8. The image AU detection method of claim 5, wherein the sample data obtained by augmenting the original image to obtain training comprises:
    获取所述原始图像;Acquiring the original image;
    对所述原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;Randomly crop a picture with preset resolution pixels from the original image to obtain an initial sample picture;
    获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数;Obtain [0,1] uniformly distributed random numbers, and when the random number of the initial sample picture is less than the random threshold, flip to generate a new [0,1] uniformly distributed random number;
    当所述随机数小于0.5则对所述初始样本图片进行灰度化得到第一样本图片;When the random number is less than 0.5, graying the initial sample picture to obtain a first sample picture;
    对所述第一样本图片作增加点光源处理,得到第二样本图片;Processing the first sample picture by adding a point light source to obtain a second sample picture;
    及将得到的初始样本图片、第一样本图片和第二样本图片作为所述样本数据。And use the obtained initial sample picture, first sample picture, and second sample picture as the sample data.
  7. 如权利要求1所述的图像的AU检测方法,其中,所述LSTM网络包括输入门、遗忘门、输出门、状态单元及LSTM输出门,所述输入门、遗忘门、输出门、状态单元和LSTM输出门通过以下公式计算实现:The image AU detection method according to claim 1, wherein the LSTM network includes an input gate, a forget gate, an output gate, a state unit and an LSTM output gate, and the input gate, forget gate, output gate, state unit and The LSTM output gate is calculated and realized by the following formula:
    i t=σ(W ix·x t+W im·m t-1+W ic·c t-1+b i); i t =σ(W ix ·x t +W im ·m t-1 +W ic ·c t-1 +b i );
    f t=σ(W fx·xt+W fm·mt-1+W fc·c t-1+b f); f t =σ(W fx ·xt+W fm ·mt-1+W fc ·c t-1 +b f );
    c t=ft⊙c t-1+i t⊙σ(W cx·x t+W cm·m t-1+b c); c t =ft⊙c t-1 +i t ⊙σ(W cx ·x t +W cm ·m t-1 +b c );
    o t=σ(W ox·x t+W om·m t-1+W oc·c t-1+b o); o t =σ(W ox ·x t +W om ·m t-1 +W oc ·c t-1 +b o );
    m t=o t⊙h(c t); m t =o t ⊙h(c t );
    其中,x t表示为t时刻输入的人脸特征向量;W ix、W im、W ic、W fx、W fm、W fc、W cx、W cm、W ox、W om及W oc为预设的权重矩阵;b i、b f、b c、b o表示预设的偏置向量;i t、f t、o t、c t、m t分别表示t时刻的所述输入门、遗忘门、输出门、状态单元和所述LSTM输出门的状态,其中σ()为sigmoid函数,h()为上述状态单元的输出激活函数,其中所述输出激活函为tanh函数。 Among them, x t represents the face feature vector input at time t; Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc are preset weight matrix; b i, b f, b c, b o represents a predetermined offset vector; i t, f t, o t, c t, m t represent time t of the input gate, door forgotten, The state of the output gate, the state unit and the LSTM output gate, where σ() is a sigmoid function, h() is the output activation function of the above state unit, and the output activation function is a tanh function.
  8. 一种图像的AU检测装置,其中,所述装置包括:An image AU detection device, wherein the device includes:
    获取模块,用于获取人脸图像;The acquisition module is used to acquire a face image;
    预处理模块,用于对获取的人脸图像进行检测处理以获取统一的人脸区域;The preprocessing module is used to detect and process the acquired face images to obtain a unified face area;
    特征提取模块,用于将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及The feature extraction module is used to input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
    识别模块,用于将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The recognition module is used to input the face feature vector output by the ResNet network into the LSTM network for training, and obtain the AU recognition result of the face image.
  9. 一种电子设备,其中,所述电子设备包括处理器,所述处理器用于执行存储器中存储的计算机可读指令以实现如下步骤:An electronic device, wherein the electronic device includes a processor configured to execute computer-readable instructions stored in a memory to implement the following steps:
    获取人脸图像;Obtain face images;
    对获取的人脸图像进行检测处理以获取统一的人脸区域;Perform detection processing on the acquired face image to obtain a uniform face area;
    将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及Input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
    将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  10. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:9. The electronic device of claim 9, wherein the processor further implements the following steps when executing the computer-readable instructions:
    基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域,其中所述人脸区域包括额头、下巴、左脸颊、右脸颊的固定矩形区域。Based on the Adaboost face detection algorithm, scan each frame of the face image with a preset size window and preset compensation until the face area in each frame of the image is determined, wherein the face area includes the forehead, the chin, The fixed rectangular area of the left and right cheeks.
  11. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:9. The electronic device of claim 9, wherein the processor further implements the following steps when executing the computer-readable instructions:
    检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对所述人脸图像进行对齐校准,其中所述人脸区域中的关键特征点包括眼睛、鼻子、嘴巴、左脸颊外轮廓、右脸颊外轮廓;及Detect key feature points in the face area, and align the face image based on the positions of the detected key feature points. The key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
    将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。The face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  12. 如权利要求9所述的电子设备,其中,所述ResNet网络结构包括卷积层、池化层、4组参数不同的卷积包、池化层、全连接层及sigmoid层。8. The electronic device according to claim 9, wherein the ResNet network structure includes a convolutional layer, a pooling layer, four sets of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  13. 如权利要求12所述的电子设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:The electronic device according to claim 12, wherein the processor further implements the following steps when executing the computer-readable instruction:
    将原始图像进行增广得到训练的样本数据;Augment the original image to obtain training sample data;
    训练所述样本数据对所述ResNet网络进行优化,得到优化的ResNet网络;及Training the sample data to optimize the ResNet network to obtain an optimized ResNet network; and
    在优化后的所述ResNet网络中输入待检测的人脸图像,得到人脸特征向量。Input the face image to be detected into the optimized ResNet network to obtain the face feature vector.
  14. 如权利要求13所述的电子设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:The electronic device according to claim 13, wherein the processor further implements the following steps when executing the computer-readable instructions:
    获取所述原始图像;Acquiring the original image;
    对所述原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;Randomly crop a picture with preset resolution pixels from the original image to obtain an initial sample picture;
    获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数;Obtain [0,1] uniformly distributed random numbers, and when the random number of the initial sample picture is less than the random threshold, flip to generate a new [0,1] uniformly distributed random number;
    当所述随机数小于0.5则对所述初始样本图片进行灰度化得到第一样本图片;When the random number is less than 0.5, graying the initial sample picture to obtain a first sample picture;
    对所述第一样本图片作增加点光源处理,得到第二样本图片;Processing the first sample picture by adding a point light source to obtain a second sample picture;
    及将得到的初始样本图片、第一样本图片和第二样本图片作为所述样本数据。And use the obtained initial sample picture, first sample picture, and second sample picture as the sample data.
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
    获取人脸图像;Obtain face images;
    对获取的人脸图像进行检测处理以获取统一的人脸区域;Perform detection processing on the acquired face image to obtain a uniform face area;
    将经过检测处理的人脸图像作为原始图像输入到优化后的ResNe网络中进行特征值提取以输出人脸特征向量;及Input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector; and
    将经过所述ResNet网络输出的人脸特征向量输入LSTM网络中进行训练,得到所述人脸图像的AU识别结果。The face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  16. 如权利要求15所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:15. The readable storage medium of claim 15, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors further execute the following steps:
    基于Adaboost人脸检测算法以预设大小的窗口和预设补偿扫描所述人脸图像中的每帧图像直至确定出每帧图像中的人脸区域,其中所述人脸区域包括额头、下巴、左脸颊、右脸颊的固定矩形区域。Based on the Adaboost face detection algorithm, scan each frame of the face image with a preset size window and preset compensation until the face area in each frame of the image is determined, wherein the face area includes the forehead, the chin, The fixed rectangular area of the left and right cheeks.
  17. 如权利要求15所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:15. The readable storage medium of claim 15, wherein when the computer readable instructions are executed by one or more processors, the one or more processors further execute the following steps:
    检测所述人脸区域中的关键特征点,并基于检测到的关键特征点的位置对所述人脸图像进行对齐校准,其中所述人脸区域中的关键特征点包括眼睛、鼻子、嘴巴、左脸颊外轮廓、右脸颊外轮廓;及Detect key feature points in the face area, and align the face image based on the positions of the detected key feature points. The key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
    将经过对齐校准后的人脸图像按照预设的模板进行编辑处理,以获得统一大小的人脸图像。The face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  18. 如权利要求15所述的可读存储介质,其中,所述ResNet网络结构包括卷积层、池化层、4组参数不同的卷积包、池化层、全连接层及sigmoid层。15. The readable storage medium of claim 15, wherein the ResNet network structure includes a convolutional layer, a pooling layer, four sets of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  19. 如权利要求18所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 18, wherein when the computer readable instructions are executed by one or more processors, the one or more processors further execute the following steps:
    将原始图像进行增广得到训练的样本数据;Augment the original image to obtain training sample data;
    训练所述样本数据对所述ResNet网络进行优化,得到优化的ResNet网络;及Training the sample data to optimize the ResNet network to obtain an optimized ResNet network; and
    在优化后的所述ResNet网络中输入待检测的人脸图像,得到人脸特征向量。Input the face image to be detected into the optimized ResNet network to obtain the face feature vector.
  20. 如权利要求19所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium of claim 19, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors further execute the following steps:
    获取所述原始图像;Acquiring the original image;
    对所述原始图像随机裁剪预设分辨率像素的图片,得到初始样本图片;Randomly crop a picture with preset resolution pixels from the original image to obtain an initial sample picture;
    获取[0,1]均分布的随机数,当所述初始样本图片的随机数小于随机阈值时,则翻转产生新的[0,1]均分布的随机数;Obtain [0,1] uniformly distributed random numbers, and when the random number of the initial sample picture is less than the random threshold, flip to generate a new [0,1] uniformly distributed random number;
    当所述随机数小于0.5则对所述初始样本图片进行灰度化得到第一样本图片;When the random number is less than 0.5, graying the initial sample picture to obtain a first sample picture;
    对所述第一样本图片作增加点光源处理,得到第二样本图片;Processing the first sample picture by adding a point light source to obtain a second sample picture;
    及将得到的初始样本图片、第一样本图片和第二样本图片作为所述样本数据。And use the obtained initial sample picture, first sample picture, and second sample picture as the sample data.
PCT/CN2020/093313 2019-06-13 2020-05-29 Au detection method and apparatus for image, and electronic device and storage medium WO2020248841A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910511707.1A CN110399788A (en) 2019-06-13 2019-06-13 AU detection method, device, electronic equipment and the storage medium of image
CN201910511707.1 2019-06-13

Publications (1)

Publication Number Publication Date
WO2020248841A1 true WO2020248841A1 (en) 2020-12-17

Family

ID=68324059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093313 WO2020248841A1 (en) 2019-06-13 2020-05-29 Au detection method and apparatus for image, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN110399788A (en)
WO (1) WO2020248841A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528265A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Identity recognition method, device, equipment and medium based on online conference
CN112633218A (en) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 Face detection method and device, terminal equipment and computer readable storage medium
CN116206355A (en) * 2023-04-25 2023-06-02 鹏城实验室 Face recognition model training, image registration and face recognition method and device
CN116416667A (en) * 2023-04-25 2023-07-11 天津大学 Facial action unit detection method based on dynamic association information embedding
CN117475360A (en) * 2023-12-27 2024-01-30 南京纳实医学科技有限公司 Biological sign extraction and analysis method based on audio and video characteristics of improved MLSTM-FCN

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399788A (en) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 AU detection method, device, electronic equipment and the storage medium of image
CN111639537A (en) * 2020-04-29 2020-09-08 深圳壹账通智能科技有限公司 Face action unit identification method and device, electronic equipment and storage medium
CN111612785B (en) * 2020-06-03 2024-02-02 浙江大华技术股份有限公司 Face picture quality assessment method, device and storage medium
CN111723709B (en) * 2020-06-09 2023-07-11 大连海事大学 Fly face recognition method based on deep convolutional neural network
CN113780202A (en) * 2021-09-15 2021-12-10 北京紫光展锐通信技术有限公司 Face detection method and device, computer readable storage medium and terminal equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650813A (en) * 2016-12-27 2017-05-10 华南理工大学 Image understanding method based on depth residual error network and LSTM
CN108596069A (en) * 2018-04-18 2018-09-28 南京邮电大学 Neonatal pain expression recognition method and system based on depth 3D residual error networks
CN109409198A (en) * 2018-08-31 2019-03-01 平安科技(深圳)有限公司 AU detection model training method, AU detection method, device, equipment and medium
CN109508660A (en) * 2018-10-31 2019-03-22 上海交通大学 A kind of AU detection method based on video
CN110399788A (en) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 AU detection method, device, electronic equipment and the storage medium of image

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117788A (en) * 2018-08-10 2019-01-01 重庆大学 A kind of public transport compartment crowding detection method merging ResNet and LSTM
CN109492822B (en) * 2018-11-24 2021-08-03 上海师范大学 Air pollutant concentration time-space domain correlation prediction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650813A (en) * 2016-12-27 2017-05-10 华南理工大学 Image understanding method based on depth residual error network and LSTM
CN108596069A (en) * 2018-04-18 2018-09-28 南京邮电大学 Neonatal pain expression recognition method and system based on depth 3D residual error networks
CN109409198A (en) * 2018-08-31 2019-03-01 平安科技(深圳)有限公司 AU detection model training method, AU detection method, device, equipment and medium
CN109508660A (en) * 2018-10-31 2019-03-22 上海交通大学 A kind of AU detection method based on video
CN110399788A (en) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 AU detection method, device, electronic equipment and the storage medium of image

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528265A (en) * 2020-12-18 2021-03-19 平安银行股份有限公司 Identity recognition method, device, equipment and medium based on online conference
CN112633218A (en) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 Face detection method and device, terminal equipment and computer readable storage medium
CN112633218B (en) * 2020-12-30 2023-10-13 深圳市优必选科技股份有限公司 Face detection method, face detection device, terminal equipment and computer readable storage medium
CN116206355A (en) * 2023-04-25 2023-06-02 鹏城实验室 Face recognition model training, image registration and face recognition method and device
CN116416667A (en) * 2023-04-25 2023-07-11 天津大学 Facial action unit detection method based on dynamic association information embedding
CN116416667B (en) * 2023-04-25 2023-10-24 天津大学 Facial action unit detection method based on dynamic association information embedding
CN117475360A (en) * 2023-12-27 2024-01-30 南京纳实医学科技有限公司 Biological sign extraction and analysis method based on audio and video characteristics of improved MLSTM-FCN
CN117475360B (en) * 2023-12-27 2024-03-26 南京纳实医学科技有限公司 Biological feature extraction and analysis method based on audio and video characteristics of improved MLSTM-FCN

Also Published As

Publication number Publication date
CN110399788A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
WO2020248841A1 (en) Au detection method and apparatus for image, and electronic device and storage medium
WO2020221013A1 (en) Image processing method and apparaus, and electronic device and storage medium
WO2021057848A1 (en) Network training method, image processing method, network, terminal device and medium
WO2020199693A1 (en) Large-pose face recognition method and apparatus, and device
CN111369427B (en) Image processing method, image processing device, readable medium and electronic equipment
WO2021184902A1 (en) Image classification method and apparatus, training method and apparatus, device, and medium
WO2023116231A1 (en) Image classification method and apparatus, computer device, and storage medium
CN110287836B (en) Image classification method and device, computer equipment and storage medium
CN111133453A (en) Artificial neural network
CN111209933A (en) Network traffic classification method and device based on neural network and attention mechanism
CN110555334B (en) Face feature determination method and device, storage medium and electronic equipment
CN115311730B (en) Face key point detection method and system and electronic equipment
WO2024109374A1 (en) Training method and apparatus for face swapping model, and device, storage medium and program product
CN111414879A (en) Face shielding degree identification method and device, electronic equipment and readable storage medium
CN113205047B (en) Medicine name identification method, device, computer equipment and storage medium
CN110717401A (en) Age estimation method and device, equipment and storage medium
CN111881740A (en) Face recognition method, face recognition device, electronic equipment and medium
CN115131803A (en) Document word size identification method and device, computer equipment and storage medium
CN110826534A (en) Face key point detection method and system based on local principal component analysis
WO2024041108A1 (en) Image correction model training method and apparatus, image correction method and apparatus, and computer device
WO2020244076A1 (en) Face recognition method and apparatus, and electronic device and storage medium
CN115393376A (en) Medical image processing method, medical image processing device, computer equipment and storage medium
JP7479507B2 (en) Image processing method and device, computer device, and computer program
CN114792295B (en) Method, device, equipment and medium for correcting blocked object based on intelligent photo frame
US20230289605A1 (en) Neural bregman divergences for distance learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20821617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20821617

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20821617

Country of ref document: EP

Kind code of ref document: A1