WO2020248841A1 - Procédé et appareil de détection d'au pour une image et dispositif électronique et support d'informations - Google Patents

Procédé et appareil de détection d'au pour une image et dispositif électronique et support d'informations Download PDF

Info

Publication number
WO2020248841A1
WO2020248841A1 PCT/CN2020/093313 CN2020093313W WO2020248841A1 WO 2020248841 A1 WO2020248841 A1 WO 2020248841A1 CN 2020093313 W CN2020093313 W CN 2020093313W WO 2020248841 A1 WO2020248841 A1 WO 2020248841A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
network
face image
sample picture
Prior art date
Application number
PCT/CN2020/093313
Other languages
English (en)
Chinese (zh)
Inventor
盛建达
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020248841A1 publication Critical patent/WO2020248841A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Definitions

  • This application relates to the field of artificial intelligence image processing, and in particular to an image AU detection method, device, electronic equipment and storage medium.
  • Existing AU (Action Units, used to detect subtle movements of facial muscles) detection refers to comparing the similarity between the expression in the face image and the AU to determine which category the AU of the face image belongs to.
  • FACS Fluorescence Action Coding System
  • AU refers to the basic muscle action units of the human face, such as: raised inner eyebrows, raised mouth corners, and wrinkled nose.
  • the AU detection methods for video streams in the industry usually include the following: 1 AU detection based on a single frame of image; 2 AU detection using LSTM (Long Short-Term Memory) algorithm.
  • the AU detection method based on a single frame image detects AU on an average face.
  • the inventor realized that the method ignores the correlation between frames and the AU detection accuracy is not high.
  • the method of using the LSTM algorithm for AU detection makes good use of spatial correlation, the extraction of AU feature values is relatively rough, which also makes the AU detection accuracy not high.
  • the first aspect of the present application provides an image AU detection method, the method includes:
  • the face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  • the second aspect of the present application provides an image AU detection device, the device includes:
  • the acquisition module is used to acquire a face image
  • the preprocessing module is used to detect and process the acquired face images to obtain a unified face area
  • the feature extraction module is used to input the detected face image as the original image into the optimized ResNe network for feature value extraction to output the face feature vector;
  • the recognition module is used to input the face feature vector output by the ResNet network into the LSTM network for training, and obtain the AU recognition result of the face image.
  • a third aspect of the present application provides an electronic device, the electronic device includes a processor, and the processor is configured to implement the AU detection method of the image when executing computer-readable instructions stored in a memory.
  • the fourth aspect of the present application provides one or more readable storage media storing computer readable instructions.
  • the computer readable instructions are executed by one or more processors, when the one or more processors execute Realize the AU detection method of the image.
  • This application enables the training model to make full use of the dynamic information of facial AU changes to automatically learn the mapping relationship between the AU features of the recognized object, thereby improving the prediction accuracy and robustness of the training model, thereby improving the face image AU recognition performance.
  • Fig. 1 is an application environment diagram of an image AU detection method in an embodiment of this application.
  • Fig. 2 is a flowchart of an image AU detection method in an embodiment of the present application.
  • FIG. 3 is a schematic diagram of the basic operation structure of the ResNet network in this application.
  • Fig. 4 is a structural diagram of a ResNet network in an embodiment of this application.
  • FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
  • FIG. 6 is a structural diagram of an image AU detection device in an embodiment of this application.
  • Fig. 7 is a schematic diagram of the electronic equipment of this application.
  • the AU detection method of the image of this application is applied to one or more electronic devices.
  • the electronic device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC) , Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC application specific integrated circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • embedded equipment etc.
  • the electronic device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, and a cloud server.
  • the device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • FIG. 1 is a schematic diagram of an application environment of an image AU detection method in an embodiment of the present application.
  • the terminal device 1 includes an image acquisition unit 11.
  • the image collection unit 11 is used to collect face images.
  • the terminal device 1 may obtain a face image through the image acquisition unit 11 and perform AU detection on the face image.
  • 19 AUs in FACS are selected, including 6 upper half face AUs and 13 lower half face AUs.
  • the above 19 AUs are used as the standard for detecting and comparing face images to predict which AU category the face image belongs to.
  • the terminal device 1 is also connected to an external device 2 in communication.
  • the terminal device 1 is in communication connection with the external device 2 via a network.
  • the network used to support the communication between the terminal device 1 and the external device 2 may be a wired network or a wireless network, such as radio, wireless fidelity (WIFI), cellular, satellite, broadcast Wait.
  • the terminal device 1 may be a computer device, a single server, a server cluster, or a cloud server.
  • the external device 2 can be, but is not limited to, a computer device, a mobile phone, a notebook computer, a tablet computer, and other devices.
  • Fig. 2 is a flowchart of an image AU detection method in an embodiment of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • the AU detection method of the image specifically includes the following steps:
  • Step S201 Obtain a face image.
  • the image acquisition unit 11 may be a 2D camera, and the terminal device 1 acquires the user's 2D face image as the user's face image through the 2D camera.
  • the image acquisition unit 11 may also be a 3D camera, and the terminal device 1 acquires the user's 3D face image as the user's face image through the 3D camera.
  • the terminal device 1 receives a face picture sent by an external device 2 communicatively connected with the terminal device.
  • the face image is stored in a storage device of the terminal device 1, and the terminal device 1 obtains the face image from the storage device.
  • the face image includes consecutive frames of face pictures.
  • the face picture may be a face video or the like.
  • Step S202 Perform detection processing on the acquired face image to acquire a unified face area.
  • the terminal device 1 may use the Adaboost face detection algorithm based on Haar-like features to perform face detection on each frame of face images in the acquired face images to determine the face area.
  • the Adaboost face detection algorithm may be used to scan each frame of the face image with a window of a preset size and a preset compensation until the face area in each frame of the image is determined.
  • the face area may be a fixed rectangular area including the forehead, chin, left cheek, and right cheek in the face image.
  • the terminal device 1 is also used to calibrate the face area. Specifically, the terminal device 1 detects the key feature points in the face area, and performs alignment and calibration on the corresponding face images based on the positions of the detected key feature points.
  • the key feature points in the face area may be eyes, nose, mouth, left cheek outer contour, right cheek outer contour, and so on.
  • the face image can be aligned and calibrated by the landmark method, so that the positions of the key feature points of the face in the face image are basically the same.
  • the terminal device 1 may also edit the face image after alignment and calibration according to a preset template to obtain uniformity.
  • the size of the face image includes one or two of cutting processing and zoom processing.
  • the terminal device 1 cuts out the corresponding face image according to a uniform template based on the key feature points in the detected face area and scales the face image to a uniform size, so, The editing of the face image is realized.
  • opencv's resize can be used to scale the face image to a uniform size based on a bilinear difference or area difference algorithm.
  • Step S203 Input the face image subjected to the detection process as the original image to the optimized ResNet (Residual Neural Network, deep residual network) network for feature value extraction to output a face feature vector.
  • ResNet Residual Neural Network, deep residual network
  • FIG. 3 shows a schematic diagram of the basic operation structure of the ResNet network in this application.
  • the first basic operation structure of the ResNet network is shown in FIG. 3(a), and the input in the first basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the first basic operation structure shown in Figure 3(a) is used.
  • the second basic operation structure of the ResNet network is shown in FIG. 3(b).
  • the input in the second basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the second basic operation structure shown in Figure 3(b) is used.
  • FIG. 4 shows a structural diagram of a ResNet network in an embodiment of this application.
  • the overall structure of the ResNet network includes: a convolutional layer, a pooling layer, 4 sets of convolution packages with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the original image is sequentially processed through the convolutional layer, pooling layer, 4 sets of convolution packages with different parameters, pooling layer, fully connected layer, and sigmoid layer of the ResNet network to obtain the face Feature vector.
  • the four convolutional packets include conv2_x, conv3_x, conv4_x, and conv5_x, respectively.
  • the second convolutional layer of the first package in each group of convolutional packages is down-sampled with a stride of 2.
  • the step of "inputting the face image after the detection process as the original image into the optimized ResNet network for feature value extraction to output the face feature vector” includes:
  • the augmentation of the original image to obtain sample data specifically includes: obtaining the original image; randomly cropping a picture with preset resolution pixels from the original image to obtain an initial sample picture; and obtaining an evenly distributed image with [0,1] Random number.
  • the random number of the initial sample picture is less than the random threshold, it will flip to generate a new [0,1] uniformly distributed random number.
  • the random number is less than 0.5, the initial sample picture will be grayed out to obtain the first Sample picture; add point light source processing to the first sample picture to obtain a second sample picture; use the obtained initial sample picture, first sample picture, and second sample picture as sample data.
  • the purpose of augmenting the original image is to increase the number of training samples.
  • the methods of augmentation include, but are not limited to, flipping, randomly cropping an original image with a size of 256*256 by 248*248, graying the original image, modifying the lighting of the original image, and adding point light source lighting to the original image.
  • the random number When the random number is less than 0.5, it will flip to generate a new uniform [0,1]
  • the distributed random number when the random number is less than 0.5, the initial sample image is grayed out to obtain the first sample image; or the first sample image is processed by adding point light sources to obtain the second sample image, etc., when the When the number of initial sample pictures, first sample pictures, and second sample pictures reaches the actual application scenario requirements, the augmentation ends.
  • the ResNe network is optimized through the training sample data of the first basic operation structure or the second basic operation structure of the ResNet network.
  • the first basic arithmetic structure of the ResNet network is the superposition of the output of the input through the three convolutional layers and the original input.
  • the first basic operation structure is used when the input and output matrices have the same size.
  • training the sample data to optimize the ResNet network, and the optimized ResNet network includes: training a classification network of face images to obtain a face classification network; and transferring the trained face classification network to train an AU neural network , Get the sample data after training.
  • the migration method can be used to train the face classification network step by step, that is, the last fully-linked layer parameter is set to the number of face classifications, and the last 9 AU sigmoid layers in the AU neural network are replaced with a softmax layer .
  • the parameters of each layer of the ResNet network from the convolutional layer to the conv3_x are fixed, and 16000 human face training parameters are transferred as the initial parameters to train the parameters of the conv4_x and subsequent layers. In this way, the prior knowledge of the existing face classification learning is fully utilized to improve the AU detection accuracy.
  • the migration training refers to directly loading the face classification network parameters into the AU neural network, because only the last layer of the two neural networks is different in structure, and the number of other parameters are the same. So the parameters can be loaded.
  • the AU neural network has a low output dimension and only 19 results, while the face classification network has a high dimension. Use the face classification training results and transfer them, and at the same time part of the layers are locked, so that the face structure features learned in the face classification are fully utilized in the AU detection.
  • the face image after the detection process is input to the optimized ResNet network to obtain the features of the fully connected layer as the face feature vector output by the ResNet network.
  • Step S204 Input the face feature vector output by the ResNet network into an LSTM (Long Short-Term Memory) network for training, and obtain the AU recognition result of the face image.
  • LSTM Long Short-Term Memory
  • the LSTM network is a special recurrent neural network.
  • the LSTM network regards the input sequence as a time sequence, and the long and short-term memory network can learn the short-term and long-term dependencies of the data in the input sequence in time.
  • the aforementioned AU recognition result may indicate the AU category of the face image.
  • FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
  • X0, X1,..., Xn are each frame image of a face image with a length of n frames, and each frame image in the face image is extracted through the ResNet network to extract the face feature vector Y0, Y1,... ., Yn, the face feature vectors Y0, Y1,..., Yn are sequentially input to the LSTM network in chronological order, and the AU recognition results h0, h1,... , Hn.
  • the LSTM network includes: input gates (that is, input gates), forget gates (that is, forget gates), output gates (that is, output gates), state units (that is, cells), and LSTM output gates.
  • the processing procedures of the input gate, forget gate, output gate, state unit and LSTM output gate can be calculated and implemented by the following formulas:
  • i t ⁇ (W ix ⁇ x t +W im ⁇ m t-1 +W ic ⁇ c t-1 +b i );
  • o t ⁇ (W ox ⁇ x t +W om ⁇ m t-1 +W oc ⁇ c t-1 +b o );
  • x t is expressed as the face feature vector input at time t;
  • W ie Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc
  • W are preset weight matrices, indicating that the elements of each gate are obtained from the data of the corresponding dimension, that is to say, nodes of different dimensions do not interfere with each other;
  • b ie, b i , b f, b c, b o
  • represents a predetermined offset vector, i t, f t, o t, c t, m t respectively represent the gate input at time t, forgetting gate, the output of the gate, and the status unit
  • is the dot product
  • ⁇ () is the sigmoid function
  • h() is the output activation function
  • each frame image in the face image is extracted through the ResNet network to extract the face feature vector into the LSTM network, and the LSTM network is processed based on the back propagation algorithm. Training is performed so that the deviation between the value of the input image processed by the LSTM network and the mapping value of the expression category to which the image belongs is within a preset allowable range.
  • the training process of the LSTM network can also be implemented with reference to other existing technical solutions, which is not limited here.
  • the image AU detection method in this application builds a training model based on the ResNet network and the LSTM network, and uses a collection of continuous frame images (such as video) of the face image as the training input of the training model, so that the training model can be fully utilized
  • the dynamic information of face AU changes automatically learns the mapping relationship between the AU features of the recognition object, thereby improving the prediction accuracy and robustness of the training model, and further improving the performance of AU recognition of the face image.
  • FIG. 6 is a structural diagram of an image AU detection device 40 in an embodiment of this application.
  • the image AU detection device 40 runs in the terminal device 1.
  • the image AU detection device 40 may include multiple functional modules composed of program code segments.
  • the program code of each program segment in the image AU detection device 40 can be stored in the memory and executed by at least one processor to perform the function of face recognition.
  • the image AU detection device 40 can be divided into multiple functional modules according to the functions it performs.
  • the image AU detection device 40 may include an acquisition module 401, a preprocessing module 402, a feature extraction module 403, and an identification module 404.
  • the module referred to in this application refers to a series of computer-readable instruction segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In some embodiments, the functions of each module will be detailed in subsequent embodiments.
  • the acquiring module 401 is used to acquire a face image.
  • the image acquisition unit 11 may be a 2D camera, and the acquisition module 401 acquires the user's 2D face image as the user's face image through the 2D camera.
  • the image acquisition unit 11 may also be a 3D camera, and the acquisition module 401 acquires the user's 3D face image as the user's face image through the 3D camera.
  • the acquisition module 401 receives a face picture sent by the external device 2 communicatively connected with the terminal device.
  • the face image is stored in a storage device of the terminal device 1, and the acquisition module 401 acquires the face image from the storage device.
  • the face image includes consecutive frames of face pictures.
  • the face picture may be a face video or the like.
  • the preprocessing module 402 is used to perform detection processing on the acquired face image to acquire a unified face area.
  • the preprocessing module 402 may use the Adaboost face detection algorithm based on Haar-like features to perform face detection on each frame of face images in the acquired face images to determine the face area.
  • the Adaboost face detection algorithm may be used to scan each frame of the face image with a window of a preset size and a preset compensation until the face area in each frame of the image is determined.
  • the face area may be a fixed rectangular area including the forehead, chin, left cheek, and right cheek in the face image.
  • the preprocessing module 402 is also used to perform calibration processing on the face area. Specifically, the preprocessing module 402 detects key feature points in the face region, and performs alignment and calibration on the corresponding face image based on the positions of the detected key feature points.
  • the key feature points in the face area may be eyes, nose, mouth, left cheek outer contour, right cheek outer contour, and so on.
  • the face image can be aligned and calibrated by the landmark method, so that the positions of the key feature points of the face in the face image are basically the same.
  • the preprocessing module 402 may also edit the face image after alignment and calibration according to a preset template to obtain Face images of uniform size.
  • the editing processing includes one or two of cutting processing and zoom processing.
  • the preprocessing module 402 cuts out the corresponding face image according to a uniform template based on the key feature points in the detected face area and scales the face image to a uniform size.
  • opencv's resize can be used to scale the face image to a uniform size based on a bilinear difference or area difference algorithm.
  • the feature extraction module 403 is configured to input the detected face image as an original image into an optimized ResNet (Residual Neural Network, deep residual network) network for feature value extraction to output a face feature vector.
  • ResNet Residual Neural Network, deep residual network
  • the first basic operation structure of the ResNet network is shown in FIG. 3(a), and the input in the first basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the first basic operation structure shown in Figure 3(a) is used.
  • the second basic operation structure of the ResNet network is shown in FIG. 3(b). The input in the second basic operation structure is superimposed with the original input through the output of three convolutional layers.
  • the second basic operation structure shown in Figure 3(b) is used.
  • the overall structure of the ResNet network includes: a convolutional layer, a pooling layer, 4 sets of convolution packages with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the original image is sequentially processed through the convolutional layer, pooling layer, 4 sets of convolution packages with different parameters, pooling layer, fully connected layer, and sigmoid layer of the ResNet network to obtain the face Feature vector.
  • the four convolutional packets include conv2_x, conv3_x, conv4_x, and conv5_x, respectively.
  • the second convolutional layer of the first package in each group of convolutional packages is down-sampled with a stride of 2.
  • the “inputting the face image after the detection process as the original image into the optimized ResNet network for feature value extraction to output the face feature vector” includes:
  • the augmentation of the original image to obtain sample data specifically includes: obtaining the original image; randomly cropping a picture with preset resolution pixels from the original image to obtain an initial sample picture; and obtaining an evenly distributed image with [0,1] Random number.
  • the random number of the initial sample picture is less than the random threshold, it will flip to generate a new [0,1] uniformly distributed random number.
  • the random number is less than 0.5, the initial sample picture will be grayed out to obtain the first Sample picture; add point light source processing to the first sample picture to obtain a second sample picture; use the obtained initial sample picture, first sample picture, and second sample picture as sample data.
  • the purpose of augmenting the original image is to increase the number of training samples.
  • the methods of augmentation include, but are not limited to, flipping, randomly cropping an original image with a size of 256*256 by 248*248, graying the original image, modifying the lighting of the original image, and adding point light source lighting to the original image.
  • the random number When the random number is less than 0.5, it will flip to generate a new uniform [0,1]
  • the distributed random number when the random number is less than 0.5, the initial sample image is grayed out to obtain the first sample image; or the first sample image is processed by adding point light sources to obtain the second sample image, etc., when the When the number of initial sample pictures, first sample pictures, and second sample pictures reaches the actual application scenario requirements, the augmentation ends.
  • the ResNe network is optimized through the training sample data of the first basic operation structure or the second basic operation structure of the ResNet network.
  • the first basic arithmetic structure of the ResNet network is the superposition of the output of the input through three convolutional layers and the original input.
  • the first basic operation structure is used when the input and output matrices have the same size.
  • training the sample data to optimize the ResNet network, and the optimized ResNet network includes: training a classification network of face images to obtain a face classification network; and transferring the trained face classification network to train an AU neural network , Get the sample data after training.
  • the migration method can be used to train the face classification network step by step, that is, the last fully-linked layer parameter is set to the number of face classifications, and the last 9 AU sigmoid layers in the AU neural network are replaced with a softmax layer .
  • the parameters of each layer of the ResNet network from the convolutional layer to the conv3_x are fixed, and 16000 human face training parameters are transferred as the initial parameters to train the parameters of the conv4_x and subsequent layers. In this way, the prior knowledge of the existing face classification learning is fully utilized to improve the AU detection accuracy.
  • the migration training refers to directly loading the face classification network parameters into the AU neural network, because only the last layer of the two neural networks is different in structure, and the number of other parameters are the same. So the parameters can be loaded.
  • the AU neural network has a low output dimension and only 19 results, while the face classification network has a high dimension. Use the face classification training results and transfer them, and at the same time part of the layers are locked, so that the face structure features learned in the face classification are fully utilized in the AU detection.
  • the face image after the detection process is input to the optimized ResNet network to obtain the features of the fully connected layer as the face feature vectors output by the ResNet network.
  • the recognition module 404 is configured to input the face feature vector output by the ResNet network into an LSTM (Long Short-Term Memory) network for training, and obtain the AU recognition result of the face image.
  • LSTM Long Short-Term Memory
  • the LSTM network is a special recurrent neural network.
  • the LSTM network regards the input sequence as a time sequence, and the long and short-term memory network can learn the short-term and long-term dependence of data in the input sequence in time.
  • the aforementioned AU recognition result may indicate the AU category of the face image.
  • FIG. 5 is a schematic diagram of the sequence processing flow of the LSTM network in an embodiment of this application.
  • X0, X1,..., Xn are each frame image of a face image with a length of n frames, and each frame image in the face image is extracted through the ResNet network to extract the face feature vector Y0, Y1,... ., Yn, the face feature vectors Y0, Y1,..., Yn are sequentially input to the LSTM network in chronological order, and the AU recognition results h0, h1,... , Hn.
  • the LSTM network includes: input gates (that is, input gates), forget gates (that is, forget gates), output gates (that is, output gates), state units (that is, cells), and LSTM output gates.
  • the processing procedures of the input gate, forget gate, output gate, state unit and LSTM output gate can be calculated and implemented by the following formulas:
  • i t ⁇ (W ix ⁇ x t +W im ⁇ m t-1 +W ic ⁇ c t-1 +b i );
  • o t ⁇ (W ox ⁇ x t +W om ⁇ m t-1 +W oc ⁇ c t-1 +b o );
  • x t is expressed as the face feature vector input at time t;
  • W ie Wix , Wim , Wiic , W fx , W fm , W fc , W cx , W cm , W ox , W om and W oc
  • W are preset weight matrices, indicating that the elements of each gate are obtained from the data of the corresponding dimension, that is to say, nodes of different dimensions do not interfere with each other;
  • b ie, b i , b f, b c, b o
  • represents a predetermined offset vector, i t, f t, o t, c t, m t respectively represent the gate input at time t, forgetting gate, the output of the gate, and the status unit
  • is the dot product
  • ⁇ () is the sigmoid function
  • h() is the output activation function
  • each frame image in the face image is extracted through the ResNet network to extract the face feature vector into the LSTM network, and the LSTM network is processed based on the back propagation algorithm. Training is performed so that the deviation between the value of the input image processed by the LSTM network and the mapping value of the expression category to which the image belongs is within a preset allowable range.
  • the training process of the LSTM network can also be implemented with reference to other existing technical solutions, which is not limited here.
  • the image AU detection method in this application builds a training model based on the ResNet network and the LSTM network, and uses a collection of continuous frame images (such as video) of the face image as the training input of the training model, so that the training model can be fully utilized
  • the dynamic information of face AU changes automatically learns the mapping relationship between the AU features of the recognition object, thereby improving the prediction accuracy and robustness of the training model, and further improving the performance of AU recognition of the face image.
  • FIG. 7 is a schematic diagram of the electronic device 6 in an embodiment of the application.
  • the electronic device 6 includes a memory 61, a processor 62, and computer readable instructions 63 stored in the memory 61 and executable on the processor 62.
  • the processor 62 implements the steps in the embodiment of the AU detection method for the image when the processor 62 executes the computer readable instruction 63, such as steps S201 to S204 shown in FIG. 2.
  • the processor 62 executes the computer-readable instruction 63, the functions of the modules/units in the embodiment of the image AU detection device are realized, such as the modules 401 to 404 in FIG. 6.
  • the computer-readable instructions 63 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 61 and executed by the processor 62, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 63 in the electronic device 6.
  • the computer-readable instruction 63 may be divided into the acquisition module 401, the preprocessing module 402, the feature extraction module 403, and the recognition module 404 in FIG. 6.
  • the specific functions of each module refer to Embodiment 2.
  • the electronic device 6 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud terminal device 1.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud terminal device 1.
  • the schematic diagram is only an example of the electronic device 6 and does not constitute a limitation on the electronic device 6. It may include more or less components than those shown in the figure, or combine certain components, or different components. Components, for example, the electronic device 6 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 62 may be a central processing module (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor 62 can also be any conventional processor, etc.
  • the processor 62 is the control center of the electronic device 6 and connects the entire electronic device 6 through various interfaces and lines. Parts.
  • the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:
  • the face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
  • the face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  • the ResNet network structure includes a convolutional layer, a pooling layer, four groups of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the processor further implements the following steps when executing the computer-readable instruction:
  • the memory 61 may be used to store the computer-readable instructions 63 and/or modules/units, and the processor 62 can run or execute the computer-readable instructions and/or modules/units stored in the memory 61, and The data stored in the memory 61 is called to realize various functions of the electronic device 6.
  • the memory 61 may mainly include a program storage area and a data storage area, where the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.); the storage data area may The data (such as audio data, phone book, etc.) created according to the use of the electronic device 6 is stored.
  • the memory 61 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated module/unit of the electronic device 6 is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile, and the computer-readable instructions may implement the steps of the foregoing method embodiments when executed by a processor.
  • the computer-readable instruction includes computer-readable instruction code
  • the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media software distribution media.
  • the content contained in the computer-readable medium can be appropriately added or deleted in accordance with the requirements of the legislation and patent practice in the jurisdiction.
  • the computer-readable medium Does not include electrical carrier signals and telecommunication signals.
  • the agency’s application also provides one or more readable storage media storing computer readable instructions, wherein when the computer readable instructions are executed by one or more processors, the one or Multiple processors perform the following steps:
  • the face feature vector output by the ResNet network is input into the LSTM network for training, and the AU recognition result of the face image is obtained.
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the key feature points in the face area include eyes, nose, mouth, The outer contour of the left cheek and the outer contour of the right cheek; and
  • the face image after alignment and calibration is edited according to a preset template to obtain a face image of a uniform size.
  • the ResNet network structure includes a convolutional layer, a pooling layer, four groups of convolutional packets with different parameters, a pooling layer, a fully connected layer, and a sigmoid layer.
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the one or more processors when executed by one or more processors, the one or more processors further execute the following steps:
  • the functional modules in the various embodiments of the present application may be integrated in the same processing module, or each module may exist alone physically, or two or more modules may be integrated in the same module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Geometry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention porte sur un procédé et un appareil de détection d'AU pour une image, ainsi que sur un dispositif électronique et un support d'informations, lesquels se rapportent à des domaines de traitement d'image et d'analyse prédictive tels que l'intelligence artificielle. Le procédé consiste : à acquérir une image faciale (S201) ; à réaliser un traitement de détection sur l'image faciale acquise pour acquérir une zone faciale unifiée (S202) ; en prenant l'image faciale sur laquelle le traitement de détection a été effectué en tant qu'image d'origine, à l'entrer dans un réseau Resnet optimisé pour réaliser une extraction de valeur de caractéristique afin de délivrer en sortie un vecteur de caractéristique faciale (S203) ; et à entrer le vecteur de caractéristique faciale délivré en sortie par le réseau ResNet dans un réseau LSTM pour un apprentissage afin d'obtenir un résultat de reconnaissance d'AU de l'image faciale (S204). Selon le procédé, un modèle d'apprentissage peut rendre possible l'utilisation complète d'informations dynamiques d'un changement d'AU facial pour apprendre automatiquement une relation de mie en correspondance entre des caractéristiques d'AU d'un objet reconnu de telle sorte que la précision de prédiction et la robustesse du modèle d'apprentissage soient améliorées, et les performances de reconnaissance d'AU de l'image faciale sont ainsi améliorées.
PCT/CN2020/093313 2019-06-13 2020-05-29 Procédé et appareil de détection d'au pour une image et dispositif électronique et support d'informations WO2020248841A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910511707.1A CN110399788A (zh) 2019-06-13 2019-06-13 图像的au检测方法、装置、电子设备及存储介质
CN201910511707.1 2019-06-13

Publications (1)

Publication Number Publication Date
WO2020248841A1 true WO2020248841A1 (fr) 2020-12-17

Family

ID=68324059

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093313 WO2020248841A1 (fr) 2019-06-13 2020-05-29 Procédé et appareil de détection d'au pour une image et dispositif électronique et support d'informations

Country Status (2)

Country Link
CN (1) CN110399788A (fr)
WO (1) WO2020248841A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528265A (zh) * 2020-12-18 2021-03-19 平安银行股份有限公司 基于在线会议的身份识别方法、装置、设备及介质
CN112633218A (zh) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 人脸检测方法、装置、终端设备及计算机可读存储介质
CN116206355A (zh) * 2023-04-25 2023-06-02 鹏城实验室 人脸识别模型训练、图像注册、人脸识别方法及装置
CN116416667A (zh) * 2023-04-25 2023-07-11 天津大学 基于动态关联信息嵌入的面部动作单元检测方法
CN117475360A (zh) * 2023-12-27 2024-01-30 南京纳实医学科技有限公司 基于改进型mlstm-fcn的音视频特点的生物体征提取与分析方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399788A (zh) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 图像的au检测方法、装置、电子设备及存储介质
CN111639537A (zh) * 2020-04-29 2020-09-08 深圳壹账通智能科技有限公司 人脸动作单元识别方法、装置、电子设备及存储介质
CN111612785B (zh) * 2020-06-03 2024-02-02 浙江大华技术股份有限公司 人脸图片质量评估方法、装置及存储介质
CN111723709B (zh) * 2020-06-09 2023-07-11 大连海事大学 一种基于深度卷积神经网络的蝇类面部识别方法
CN113780202A (zh) * 2021-09-15 2021-12-10 北京紫光展锐通信技术有限公司 人脸检测方法及装置、计算机可读存储介质、终端设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650813A (zh) * 2016-12-27 2017-05-10 华南理工大学 一种基于深度残差网络和lstm的图像理解方法
CN108596069A (zh) * 2018-04-18 2018-09-28 南京邮电大学 基于深度3d残差网络的新生儿疼痛表情识别方法及系统
CN109409198A (zh) * 2018-08-31 2019-03-01 平安科技(深圳)有限公司 Au检测模型训练方法、au检测方法、装置、设备及介质
CN109508660A (zh) * 2018-10-31 2019-03-22 上海交通大学 一种基于视频的au检测方法
CN110399788A (zh) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 图像的au检测方法、装置、电子设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117788A (zh) * 2018-08-10 2019-01-01 重庆大学 一种融合ResNet和LSTM的公交车厢拥挤度检测方法
CN109492822B (zh) * 2018-11-24 2021-08-03 上海师范大学 空气污染物浓度时空域关联预测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650813A (zh) * 2016-12-27 2017-05-10 华南理工大学 一种基于深度残差网络和lstm的图像理解方法
CN108596069A (zh) * 2018-04-18 2018-09-28 南京邮电大学 基于深度3d残差网络的新生儿疼痛表情识别方法及系统
CN109409198A (zh) * 2018-08-31 2019-03-01 平安科技(深圳)有限公司 Au检测模型训练方法、au检测方法、装置、设备及介质
CN109508660A (zh) * 2018-10-31 2019-03-22 上海交通大学 一种基于视频的au检测方法
CN110399788A (zh) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 图像的au检测方法、装置、电子设备及存储介质

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528265A (zh) * 2020-12-18 2021-03-19 平安银行股份有限公司 基于在线会议的身份识别方法、装置、设备及介质
CN112633218A (zh) * 2020-12-30 2021-04-09 深圳市优必选科技股份有限公司 人脸检测方法、装置、终端设备及计算机可读存储介质
CN112633218B (zh) * 2020-12-30 2023-10-13 深圳市优必选科技股份有限公司 人脸检测方法、装置、终端设备及计算机可读存储介质
CN116206355A (zh) * 2023-04-25 2023-06-02 鹏城实验室 人脸识别模型训练、图像注册、人脸识别方法及装置
CN116416667A (zh) * 2023-04-25 2023-07-11 天津大学 基于动态关联信息嵌入的面部动作单元检测方法
CN116416667B (zh) * 2023-04-25 2023-10-24 天津大学 基于动态关联信息嵌入的面部动作单元检测方法
CN117475360A (zh) * 2023-12-27 2024-01-30 南京纳实医学科技有限公司 基于改进型mlstm-fcn的音视频特点的生物体征提取与分析方法
CN117475360B (zh) * 2023-12-27 2024-03-26 南京纳实医学科技有限公司 基于改进型mlstm-fcn的音视频特点的生物特征提取与分析方法

Also Published As

Publication number Publication date
CN110399788A (zh) 2019-11-01

Similar Documents

Publication Publication Date Title
WO2020248841A1 (fr) Procédé et appareil de détection d'au pour une image et dispositif électronique et support d'informations
WO2020221013A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
WO2021057848A1 (fr) Procédé d'entraînement de réseau, procédé de traitement d'image, réseau, dispositif terminal et support
WO2020199693A1 (fr) Procédé et appareil de reconnaissance faciale de grande pose et dispositif associé
CN111369427B (zh) 图像处理方法、装置、可读介质和电子设备
WO2021184902A1 (fr) Procédé et appareil de classification d'image, procédé et appareil d'entraînement, dispositif et support
WO2023116231A1 (fr) Procédé et appareil de classification d'image, dispositif informatique et support de stockage
CN110287836B (zh) 图像分类方法、装置、计算机设备和存储介质
CN111133453A (zh) 人工神经网络
CN111209933A (zh) 基于神经网络和注意力机制的网络流量分类方法和装置
CN110555334B (zh) 人脸特征确定方法、装置、存储介质及电子设备
CN115311730B (zh) 一种人脸关键点的检测方法、系统和电子设备
WO2024109374A1 (fr) Procédé et appareil d'entraînement pour modèle de permutation de visage, dispositif, support de stockage et produit programme
CN111414879A (zh) 人脸遮挡程度识别方法、装置、电子设备及可读存储介质
CN113205047B (zh) 药名识别方法、装置、计算机设备和存储介质
CN110717401A (zh) 年龄估计方法及装置、设备、存储介质
CN111881740A (zh) 人脸识别方法、装置、电子设备及介质
CN115131803A (zh) 文档字号的识别方法、装置、计算机设备和存储介质
CN110826534A (zh) 一种基于局部主成分分析的人脸关键点检测方法及系统
WO2024041108A1 (fr) Procédé et appareil d'entraînement de modèle de correction d'image, procédé et appareil de correction d'image, et dispositif informatique
WO2020244076A1 (fr) Procédé et appareil de reconnaissance faciale, dispositif électronique et support d'informations
CN115393376A (zh) 医学图像处理方法、装置、计算机设备和存储介质
JP7479507B2 (ja) 画像処理方法及び装置、コンピューター機器、並びにコンピュータープログラム
CN114792295B (zh) 基于智能相框的被遮挡物修正方法、装置、设备及介质
US20230289605A1 (en) Neural bregman divergences for distance learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20821617

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20821617

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20821617

Country of ref document: EP

Kind code of ref document: A1