WO2020103700A1 - 一种基于微表情的图像识别方法、装置以及相关设备 - Google Patents

一种基于微表情的图像识别方法、装置以及相关设备

Info

Publication number
WO2020103700A1
WO2020103700A1 PCT/CN2019/116515 CN2019116515W WO2020103700A1 WO 2020103700 A1 WO2020103700 A1 WO 2020103700A1 CN 2019116515 W CN2019116515 W CN 2019116515W WO 2020103700 A1 WO2020103700 A1 WO 2020103700A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
expression
model
sample
original
Prior art date
Application number
PCT/CN2019/116515
Other languages
English (en)
French (fr)
Inventor
张凯皓
罗文寒
马林
刘威
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19886293.0A priority Critical patent/EP3885965B1/en
Publication of WO2020103700A1 publication Critical patent/WO2020103700A1/zh
Priority to US17/182,024 priority patent/US20210174072A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/175Static expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression

Definitions

  • the present application relates to the field of computer technology, and in particular, to an image recognition method, device, and related equipment based on micro-expressions.
  • AI Artificial Intelligence
  • Artificial Intelligence is a theory, method, technology, and application system that uses digital computers or digital computer-controlled machines to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology in computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject, covering a wide range of fields, both hardware-level technology and software-level technology.
  • Basic technologies of artificial intelligence generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation / interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology and machine learning / deep learning.
  • Computer vision in Computer Vision is a science that studies how to make a machine "see”. More specifically, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets. Vision, and further graphics processing, so that the computer processing becomes more suitable for human eyes to observe or send to the instrument to detect the image.
  • Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content / behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronized positioning and map Construction and other technologies, including common face recognition, fingerprint recognition and other biometric recognition technologies.
  • micro-expressions are the imperceptible facial expressions that the characters try to hide the true emotions in the heart, but they can't help but reveal themselves.
  • Micro-expressions usually occur when the character has a concealing psychology. Compared with ordinary facial expressions, micro-expressions are the most obvious It is characterized by short duration and weak intensity, reflecting the true emotions that the characters try to suppress and hide, and is an effective non-verbal clue. Especially when the intention is to cover up their psychological changes, it is easier to make corresponding actions. Therefore, the recognition of micro-expressions can be used in areas such as security, criminal investigation, and psychology that need to explore the real thoughts of the character and crack the character ’s hidden intention.
  • the recognition method of the micro-expression image is mainly by extracting the features of the micro-expression image, and then classifying and identifying according to the extracted features.
  • micro-expressions have the characteristics of low expression intensity and fast action behavior, even different types of micro-expression images are very similar, resulting in the extracted features are not very distinguishable, which will reduce the micro-expression images Recognition accuracy.
  • Embodiments of the present application provide a micro-expression-based image recognition method, device, and related equipment, which can improve the accuracy of micro-expression image recognition.
  • An aspect of an embodiment of the present application provides an image recognition method based on micro-expressions, which is executed by an electronic device and includes:
  • the original expression image belonging to the first expression type is an image containing micro expressions
  • the image enhancement model is based on the first
  • the sample expression image of the expression type and the sample expression image of the second expression type are trained; the expression intensity of the sample expression image of the second expression type is greater than the expression intensity of the sample image of the first expression type;
  • Identify the expression attribute type corresponding to the target expression image and determine the expression attribute type corresponding to the target expression image as the expression attribute type corresponding to the original expression image.
  • An aspect of an embodiment of the present application provides an image recognition device based on micro-expressions, including:
  • a first acquisition module for acquiring an original expression image belonging to the first expression type, and inputting the original expression image into an image enhancement model;
  • the original expression image belonging to the first expression type is an image containing micro expressions;
  • the image The enhanced model is trained based on the sample expression image belonging to the first expression type and the sample expression image belonging to the second expression type; the expression intensity of the sample expression image belonging to the second expression type is greater than that of the sample belonging to the first expression type The expression intensity of the image;
  • An enhancement module configured to enhance the expression characteristics of the micro-expressions in the original expression image in the image enhancement model to obtain a target expression image belonging to the second expression type
  • a recognition module used to recognize the expression attribute type corresponding to the target expression image
  • the determining module is configured to determine the expression attribute type corresponding to the target expression image as the expression attribute type corresponding to the original expression image.
  • An aspect of an embodiment of the present application provides an electronic device, including: a processor and a memory;
  • the processor is connected to a memory, wherein the memory is used to store program code, and the processor is used to call the program code to perform the method as in the embodiments of the present application.
  • An aspect of an embodiment of the present application provides a computer storage medium, where the computer storage medium stores a computer program, and the computer program includes program instructions, and when executed by a processor, the program instructions are executed as in the embodiments of the present application Methods.
  • FIG. 1 is a system architecture diagram of an image recognition method based on micro-expressions provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a scene of an image recognition method based on micro-expressions provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a micro-expression-based image recognition method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of enhancing expression features provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of another image recognition method based on micro-expressions provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of generating a model loss value according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of calculating a model loss value provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an image recognition device based on micro-expressions provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • artificial intelligence technology has been researched and applied in many fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, autonomous driving, drones , Robots, intelligent medical care, intelligent customer service, voice recognition, etc., I believe that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important value.
  • FIG. 1 is a system architecture diagram of a micro-expression-based image recognition method provided by an embodiment of the present application.
  • the server 40a provides services for the user terminal cluster.
  • the user terminal cluster may include: a user terminal 40b, a user terminal 40c, ..., and a user terminal 40d.
  • the user terminal which may be the user terminal 40b, the user terminal 40c, or the user terminal 40d
  • the micro-expression image is sent to the server 40a.
  • the server 40a enhances the expression characteristics of the micro-expression in the micro-expression image based on the pre-trained image enhancement model to convert the micro-expression image into an exaggerated expression image with high emotional expression intensity, and the server 40a recognizes based on the pre-trained image recognition model
  • the attribute type of the exaggerated facial expression image mentioned above is the attribute type of the micro facial expression image sent by the user terminal.
  • the server 40a may send the identified attribute type to the user terminal, and store the micro-expression image and the identified attribute type in the database in association.
  • the user terminal may display the attribute type in text on the screen.
  • the micro-expression image can be converted into an exaggerated expression image locally on the user terminal, and then the exaggerated expression image can be recognized, and the recognized attributes can be similarly recognized.
  • the type serves as the attribute type corresponding to the micro-expression image.
  • the training image enhancement model and the training image recognition model involve a lot of offline calculations, the local image enhancement model and the image recognition model of the user terminal may be sent to the user terminal after the server 40a is trained. The following will take the identification of an attribute type of a micro-expression image as an example (it may be recognized in the server 40a, or it may be recognized in the user terminal) to explain.
  • user terminals may include mobile phones, tablets, laptops, PDAs, smart speakers, mobile Internet device (MID, mobile internet device), POS (Point Of Of Sales) machines, wearable devices (such as smart watches, Smart bracelet, etc.) etc.
  • MID mobile Internet device
  • POS Point Of Of Sales
  • wearable devices such as smart watches, Smart bracelet, etc.
  • FIG. 2 is a schematic diagram of an image recognition method based on micro-expressions provided by an embodiment of the present application.
  • the region images belonging to facial facial features in the micro-expression image 10a are extracted, that is, the image 10b (the image in the left-eye region of the micro-expression image 10a is extracted from the micro-expression image 10a) ), Image 10c (image in the micro-expression image 10a located in the right eye area), image 10d (image in the micro-expression image 10a located in the nose area), and image 10e (image in the micro-expression image 10a located in the mouth area).
  • the above image 10b is input into the image enhancement model 20a.
  • the image enhancement model 20a is used to enhance the facial expression features of the image, wherein the enhanced facial expression features are to adjust the shape of the facial features, such as: open eyes, raised eyelids, wrinkled eyebrows, Mouth opened, teeth exposed or lips turned down, etc.
  • Enhance the expression features of the image 10b in the image enhancement model 20a to obtain the image 10f similarly, input the image 10c into the image enhancement model 20a and enhance the expression features of the image 10c in the image enhancement model 20a to obtain the image 10g; convert the image 10d Input the image enhancement model 20a, enhance the expression features of the image 10d in the image enhancement model 20a to obtain the image 10h; input the image 10e into the image enhancement model 20a, and enhance the expression features of the image 10e in the image enhancement model 20a to obtain the image 10k .
  • the following uses the image 10b as an example to explain how to enhance the expression features of the image 10b in the image enhancement model 20a.
  • the expression features of the image can be enhanced in the same manner.
  • deconvolution processing is performed on the above 1 * n column vectors.
  • the deconvolution processing is the reverse operation of the convolution processing, and the specific process is to fully connect and reconstruct the above column vectors.
  • the advanced characterization vector of 10b is 1 * p * p * 3. If the image to be enhanced is a color image, then the 0th index dimension of the above advanced characterization vector is compressed to obtain a color image with a size of p * p; the image to be enhanced is a grayscale image, then By compressing the 0th and 3rd index dimensions of the above advanced characterization vector, a grayscale image of size p * p can be obtained.
  • the image output from the image enhancement model 20a is a grayscale image as an example
  • the 0th and 3rd index dimensions are compressed to obtain the image 10f, that is, the image 10f output from the image enhancement model is the image 10b enhancement
  • the image after the emoji feature is the image 10f enhancement
  • the image enhancement model 20a can also be understood as a reverse process based on the convolutional neural network to identify the attribute type of the object in the image.
  • the convolutional neural network recognizes the image by inputting an image and outputting a column vector, which represents the input The degree of matching between the image and multiple attribute types; and the image enhancement model is to randomly sample a column vector from the image, that is, a column vector is input from the image enhancement model 20a, and the output is an image.
  • the above image enhancement model 20a may correspond to the generation model in the adversarial network.
  • the adversarial network includes a generation model and a discriminant model. The generation model is used to generate simulated sample data.
  • the generation model (image enhancement Model) is used to generate exaggerated expression images with more emotional expression capabilities;
  • the discriminant model is used to determine the probability that the exaggerated expression images generated by the generation model are real images, where the images belonging to the real expression type are real images, corresponding to Ground, the image of the type of simulated expression is a simulated image (also known as a simulated image), the real image is a normal face image collected by an image collector, and the simulated image is an image generated by the fictional model, for example, a person photographed by a camera
  • the image obtained by the facial expression is a real image that belongs to the real expression type.
  • the image enhancement model 20b inputs the complete micro-expression image 10a into the image enhancement model 20b, and enhance the micro-expression expression features of the micro-expression image 10a in the image enhancement model 20b to obtain an image 10m with enhanced expression features and an image 10m with enhanced expression features. It has a higher degree of expression recognition than the micro-expression image 10a, and the expression intensity is stronger. Since the structure of the image enhancement model 20b and the image enhancement model 20a are completely the same, the difference is the value of the parameters in the model, so the specific process of the image enhancement model 20b to enhance the expression features can be found in the image enhancement model 20a enhanced image 10b process.
  • the image enhancement model 20a is used to enhance the image 10b, image 10c, image 10d, and image 10e
  • the image enhancement model 20b is used to enhance the micro-expression image 10a, which is not limited in sequence, that is, it can be based on the image enhancement model first.
  • 20a enhances the image 10b, image 10c, image 10d, and image 10e; the micro-expression image 10a may also be enhanced based on the image enhancement model 20b first; or the two image enhancement models may enhance the expression features of the image in parallel.
  • image 10f, image 10g, image 10h, image 10k, and image 10m with enhanced expression features are determined, according to the position information of the corresponding corresponding image 10b, image 10c, image 10d, and image 10e in the micro expression image 10a
  • the image 10f, image 10g, image 10h, image 10k and image 10m are combined into one image 10n. Since image 10n is a combination of multiple exaggerated expressions, image 10n is also an image with high emotional expression and high expression recognition, that is, image 10n is an exaggerated expression compared to micro-expression image 10a .
  • the image recognition model 20c is used to identify the expression attribute type corresponding to the expression in the image 10n.
  • the expression attribute type may include: happy, sad, scared, surprised, disgusted and angry Wait.
  • the image recognition model 20c may be a convolutional neural network model.
  • the recognition process specifically includes: inputting the image 10n to the input layer in the image recognition model 20c, and using the convolution operation and pooling layer of the convolution layer in the image recognition model 20c Pooling operation to extract the static structural feature information corresponding to the image 10n, and use the classifier in the image recognition model 20c to calculate the static structural feature information corresponding to the image 10n to match the six expression attribute type features included in the image recognition model
  • the probabilities are: 0.1 happy, 0.8 sad, 0.2 frightened, 0.2 surprised, 0.1 disgusted, and 0.3 angry, where the value in the matching result indicates the probability that the static structural feature information of the image 10n matches the six expression attribute type features, For example: "0.1 happy” means that the probability that the static structural feature information of image 10n matches the "happy" expression attribute type feature is 0.1.
  • the image enhancement model 20a and the image enhancement model 20b can not only enhance the expression characteristics of the micro-expression image or the micro-expression sub-image, so that the micro-expression image can be converted into an exaggerated expression image, and can ensure that the image recognition model 20c recognizes
  • the expression attribute type of the exaggerated expression image has the same expression attribute type as the micro expression image, that is, the converted exaggerated expression image is not only exaggerated (has a higher expression intensity), real, but also guarantees the expression attribute type and the expression of the micro expression image The attribute types are consistent.
  • the terminal may perform the corresponding operation. For example, if the recognized expression attribute type is: sadness, and the probability corresponding to the expression attribute type "sadness" is greater than or equal to 0.8, the terminal performs a payment operation or a photographing operation.
  • the type of the expression attribute of the micro-expression image since the expression characteristics of the target expression image are clearly distinguishable, the type of the expression attribute of the target expression image can be accurately identified, and the accuracy of identifying the micro-expression image can be improved.
  • FIG. 3 is a schematic flowchart of a micro-expression-based image recognition method provided by an embodiment of the present application.
  • the micro-expression-based image recognition method may include:
  • Step S101 Obtain an original expression image belonging to the first expression type, and input the original expression image into an image enhancement model; the original expression image belonging to the first expression type is an image containing micro expressions; the image enhancement model is based on A sample expression image belonging to the first expression type and a sample expression image belonging to the second expression type are trained; the expression intensity of the sample expression image belonging to the second expression type is greater than the expression intensity of the sample image belonging to the first expression type .
  • an image to be recognized or to be classified that belongs to the first expression type (such as the micro-expression image 10a in the embodiment corresponding to FIG. 2 above) is called an original expression image that belongs to the first expression type, and belongs to the first Expression-type images are images that contain micro-expressions, and micro-expressions are short-lived facial expressions that are unconsciously made by characters when they try to hide their emotions.
  • the image corresponding to the second expression type is an image containing exaggerated expressions (such as the image 10n in the embodiment corresponding to FIG. 2 above), which can also be understood as the expression intensity corresponding to the image belonging to the second expression type, The distinguishability of expressions is much greater than the images of the first expression type.
  • Images with high expression intensity refer to images with obvious facial emotions and exaggerated facial features. For example, the expression intensity of happy laughter is much greater than that of expressionless expressions. .
  • the original expression image is input to the image enhancement model (as in the embodiment corresponding to FIG. 2 above)
  • the image enhancement model 20a and the image enhancement model 20b) the expression characteristics of the micro expression in the original expression image are enhanced.
  • the image enhancement model is obtained by training the adversarial network based on the sample expression image belonging to the first expression type and the sample expression image belonging to the second expression type, and the image enhancement model corresponds to the generation model in the adversarial network.
  • Step S102 Enhance the expression characteristics of the micro expressions in the original expression image in the image enhancement model to obtain a target expression image belonging to the second expression type.
  • the expression features of the micro-expressions in the original expression image are enhanced. Since facial expressions are composed of eyes, eyebrows, nose, mouth, forehead, cheeks, and jaws in facial features, the expression features are enhanced That is to adjust the external shape of the eyes, eyebrows, nose, mouth, forehead, cheeks and jaws to enhance the emotional expression of facial expressions, such as open eyes, raised eyebrows, pull down corners of the mouth, enlarged nostrils, wrinkles on the cheeks Forehead tight wrinkle, jaw tightening, etc. It can be known that after the enhanced expression features, the resulting image has higher expression intensity and obvious distinguishability of the expression, so the image with the enhanced expression features of the original expression image belongs to the second expression type image, which is called Target expression image of the second expression type.
  • the target enhancement model can It includes two sub-models, namely the first enhancer sub-model (such as the image enhancement model 20a in the embodiment corresponding to FIG. 2 above) and the second enhancer sub-model (such as the image enhancement model 20b in the embodiment corresponding to FIG. 2 above) , Where the first enhancer model is used to enhance the facial expression features of eyes, eyebrows, nose and mouth; the second enhancer model is used to enhance the facial expression features of the entire micro-expression. Then, the expression images respectively enhanced by the above two enhancer models are combined into a target expression image.
  • the first enhancer sub-model such as the image enhancement model 20a in the embodiment corresponding to FIG. 2 above
  • the second enhancer sub-model such as the image enhancement model 20b in the embodiment corresponding to FIG. 2 above
  • the specific process of enhancing the original expression image based on the first enhancer model and the second enhancer model to obtain the target expression image is: in the original expression image, determine the expression identification area, and extract the determined expression identification area from the original expression image It comes out as the unit original expression image (such as image 10b, image 10c, image 10d, and image 10e in the embodiment corresponding to FIG. 2 described above).
  • the expression identification area is the area where the eyes, eyebrows, nose and mouth are located in the facial expression. It can be known that there are a plurality of unit original expression images here.
  • the unit original expression images are input into the first enhancement sub-model, respectively, and the expression features of the above unit original images are respectively enhanced in the first enhancement sub-model.
  • unit auxiliary images such as image 10f, image 10g, image 10h, and image 10k in the embodiment corresponding to FIG. 2 described above. It can be known that the number of unit auxiliary images and the number of unit original expression images are the same, and each unit auxiliary image has a unique corresponding unit original expression image.
  • the original expression image is input into the second enhancement sub-model, and the expression characteristics of the original expression image are enhanced in the second enhancement sub-model, and the image with enhanced expression is called a target auxiliary image.
  • the execution order of obtaining the unit auxiliary image based on the first enhancer model and obtaining the target auxiliary image based on the second enhancer model is not limited. After the unit auxiliary image and the target auxiliary image are determined, since the unit auxiliary image and the unit original expression image are in one-to-one correspondence, according to the position information of the unit original expression image in the original expression image, the unit auxiliary image and the target auxiliary image are combined into Target expression image, where the target expression image is an image with high expression intensity.
  • Binary processing is performed on the original expression image, and the image obtained after the binarization processing is called a binary image, and the pixel value of the pixels in the binary image is 1 or 0.
  • the binarization process is to set the value of pixels in the original expression image whose pixel value is greater than the pixel threshold to 1, and correspondingly set the value of pixels in the original expression image whose pixel value is less than or equal to the above pixel threshold to 0.
  • the pixel values of the expression images have been normalized, that is, the pixel values of all original expression images range from 0 to 1. In terms of display effect, if the pixel value is equal to 1, then the pixel value is displayed as white; if the pixel value is equal to 0, then the pixel value is displayed as black. Edge detection is performed on binary images.
  • Edge detection refers to the detection of areas in the binary image where the gray scale changes sharply.
  • the change in the gray scale of the image can be reflected by the gradient of the gray distribution. Therefore, the binary image can be based on the gradient operator.
  • the value image is subjected to edge detection to obtain a gradient image, where the gradient operator may include: Roberts operator, Prewitt operator, Sobel operator, Lapacian operator, etc.
  • the gradient image is an image that reflects the sharp change of the gray level in the binary image
  • the gradient image is an image composed of the edge contour of the original expression image.
  • the edge contour is the contour of the eyes, eyebrows, nose, and mouth.
  • the target position information determines the position information (referred to as target position information) where the above-mentioned edge contour is located.
  • the target position information may include 4 coordinate information, and the above 4 coordinate information represents the coordinates of 4 vertices of a rectangular area, which is The smallest rectangular area containing the edge outline.
  • the area identified by the target position information is the area where the eyes, eyebrows, nose, and mouth are located in the facial expression, that is, the area identified by the target position information in the original expression image is the expression identification area.
  • Step S103 Identify the expression attribute type corresponding to the target expression image, and determine the expression attribute type corresponding to the target expression image as the expression attribute type corresponding to the original expression image.
  • the target facial image can be adjusted to a fixed size first, and then the resized target facial image can be input into the image recognition model (as in the embodiment corresponding to FIG. 2 above)
  • the image recognition model may include an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer; wherein the parameter size of the input layer is equal to the size of the adjusted target expression image.
  • the target expression image When the target expression image is input to the output layer of the convolutional neural network, and then enter the convolution layer, first randomly select a small piece of the target expression image as a sample, and learn some feature information from this small sample, and then use this
  • the sample slides through all the pixel regions of the target expression image as a window, that is, the feature information learned from the sample is convolved with the target expression image to obtain the most significant feature information at different positions of the target expression image.
  • the convolution operation After the convolution operation is completed, the feature information of the target facial expression image has been extracted, but only the number of features extracted by the convolution operation is large.
  • pooling operation is also required, that is, from the target facial expression image
  • the feature information extracted through the convolution operation is transmitted to the pooling layer, and the extracted feature information is aggregated and counted.
  • the magnitude of these statistical feature information is much lower than that of the feature information extracted by the convolution operation, and it will also improve the classification. effect.
  • Commonly used pooling methods mainly include average pooling calculation method and maximum pooling calculation method.
  • the average pooling operation method is to calculate an average feature information in a feature information set to represent the characteristics of the feature information set;
  • the maximum pooling operation is to extract the largest feature information in a feature information set to represent the characteristics of the feature information set.
  • the target structural feature information of the target expression image can be extracted, which is called the target structural feature information, and the target structural feature information has a lower order of magnitude.
  • the target structural feature information has a lower order of magnitude.
  • the classifier in the image recognition model (that is, the fully connected layer and the output layer of the image recognition model) to identify the matching degree between the target structural feature information of the target facial expression image and the multiple facial expression type features in the image recognition model.
  • the classifier is trained in advance.
  • the input of the classifier is static structural feature information
  • the output is the matching degree of the static structural feature information and the features of multiple expression attribute types.
  • the greater the matching probability of the expression attribute type corresponding to the type feature the number of matching degrees obtained is the same as the number of expression attribute type features in the image recognition model.
  • the number and types of expression attribute type features included in the image recognition model are determined by the number and types of expression attribute types included in the training data set when training the image recognition model.
  • the target structural feature information A and "happy” are recognized "The matching degree of the expression attribute type feature is 0.1; the matching degree of the recognition target structural feature information A and the" fear “expression attribute type feature is 0.3; the matching degree of the recognition target structural feature information A and the" angry “expression attribute type feature is 0.6 ; The matching degree between the recognition target structural feature information A and the "surprise” expression attribute type feature is 0.9.
  • Extract the expression attribute type corresponding to the maximum matching degree from the above four matching degrees that is, extract the "amaze" expression attribute type corresponding to the maximum matching degree 0.9
  • the expression attribute type corresponding to the target image is: surprise
  • the original The expression attribute type corresponding to the expression image is also: surprise.
  • FIG. 4 is a schematic flowchart of enhancing expression features provided by an embodiment of the present application.
  • the specific process of enhancing expression features includes the following steps S201-S203, and steps S201-S203 are a specific embodiment of step S102 in the embodiment corresponding to FIG. 3:
  • Step S201 determining the expression identification area in the original expression image as a unit original expression image, and inputting the unit original expression image into the first enhancement sub-model, and enhancing the first enhancement sub-model in the first enhancement sub-model
  • the facial features of the original facial expression image of the unit are used to obtain the auxiliary image of the unit.
  • the image enhancement model includes a first enhancer model and a second enhancer model.
  • the expression identification area is determined, and the determined expression identification area is extracted from the original expression image as the unit original expression image.
  • the expression identification area is the area where the eyes, eyebrows, nose and mouth are located in the facial expression.
  • the following uses the first enhancer model as an example to enhance the expression features of a unit of original expression images. If there are multiple unit original expression images, the expression features can be enhanced in the same way to obtain a unit auxiliary image with enhanced expression features.
  • the unit original expression image is input into the input layer of the first enhancement sub-model to obtain a matrix corresponding to the unit original expression image, which is called the first original matrix, that is, the pixel points of the unit original expression image are discretized to obtain a The first original matrix with the same size of the unit original expression image. Random sampling from the first original matrix, combining the sampled values into a column vector 1 * n with length n (called the first original vector), where the target length is set in advance, for example, the target length n It can be 100. Then, by downsampling in the first original matrix, a first original vector of 1 * 100 is obtained.
  • the first original vector needs to be expanded to a four-dimensional tensor of 1 * 1 * 1 * n.
  • deconvolution the above 4D tensor to obtain the tensor corresponding to the first transposed convolution layer, of which deconvolution and convolution The operation of product processing is reversed, and the space changes from small to large.
  • the above tensor deconvolution process corresponding to the first transposed convolution layer is obtained to obtain the corresponding to the second transposed convolution layer Tensor ...
  • a 4-dimensional tensor of size 1 * a * b * 3 can be obtained, compressing the 0th After the index dimension and the third index dimension, a 2-dimensional tensor a * b is obtained, called the first target tensor, and the first target tensor is determined as the unit auxiliary image, and the size of the unit auxiliary image It is equal to a * b.
  • the unit auxiliary image obtained is a grayscale image; if compressed
  • the unit auxiliary image obtained by the 0th index dimension is a color image with size a * b, and the first target tensor corresponding to the color image is a 3-dimensional tensor a * b * 3.
  • the unit auxiliary images need to be combined later, the size of the unit auxiliary image and the unit original expression image are the same.
  • the expression characteristics can be enhanced in the above manner to obtain a unit auxiliary image corresponding to each unit original expression image.
  • Step S202 input the original expression image into the second enhancement sub-model, and enhance the expression characteristics of the original expression image in the second enhancement sub-model to obtain a target auxiliary image.
  • the second original matrix which is to discretize the pixels of the original expression image to obtain a The second original matrix of the same size. Randomly sample from the second original matrix and combine the sampled values into a column vector 1 * m with length m (referred to as the second original vector).
  • the transposed convolutional layer In order to input the transposed convolutional layer into the second enhancer sub-model, it is first necessary to expand the above second original vector into a four-dimensional tensor of 1 * 1 * 1 * m.
  • the above 4D tensor deconvolution process is performed to obtain a tensor corresponding to the first transposed convolution layer.
  • the operation of product processing is reversed, and the space changes from small to large.
  • the above tensor deconvolution process corresponding to the first transposed convolution layer is obtained to obtain the corresponding to the second transposed convolution layer Tensor ... until the deconvolution process based on the last transposed convolution layer in the second enhancer model, a 4-dimensional tensor of size 1 * c * d * 3 can be obtained.
  • a 2-dimensional tensor c * d is obtained, called the second target tensor, and the second target tensor is determined as the target auxiliary image, and the target auxiliary The size of the image is equal to c * d.
  • the target auxiliary image is a grayscale image; if the compression is What is obtained by 0 index dimensions is that the target auxiliary image is a color image with a size of c * d, and the second target tensor corresponding to the color image is a 3-dimensional tensor c * d * 3.
  • the first enhancer model and the second enhancer model have the same structure, but the model parameters (for example, the convolution kernel of the transposed convolution layer and the number of transposed convolution layers) are not consistent, and the target auxiliary image The size is the same as the original emoji image.
  • Step S203 Combine the unit auxiliary image and the target auxiliary image into the target expression image according to the position information of the unit original expression image in the original expression image.
  • the unit auxiliary image and the unit original expression image have a one-to-one correspondence, and the size of the unit auxiliary image and the unit original expression image are the same, according to the position information of the unit original expression image in the original expression image, the unit auxiliary The image and the target auxiliary image are combined into a target expression image, where the position information refers to the position coordinates of the corresponding unit original expression image in the original expression image, and the combined target expression image is also the same size as the original expression image.
  • the image enhancement model enhances the expression characteristics of the micro-expressions in the micro-expression images to convert the micro-expression images into high-recognition target expression images, and uses the expression discrimination features of the target expression images to identify the target expression images
  • the expression attribute type as the expression attribute type of the micro expression image, because the expression characteristics of the target expression image are clearly distinguishable, the expression attribute type of the target expression image can be accurately identified, and the accuracy of identifying the micro expression image can be improved.
  • FIG. 5 is a schematic flowchart of another micro-expression-based image recognition method provided by an embodiment of the present application.
  • the specific process of the micro-expression-based image recognition method is as follows:
  • Step S301 Obtain a first sample expression image belonging to a first expression type, and obtain a second sample expression image belonging to a second expression type.
  • a first sample expression image belonging to the first expression type, and a second sample expression image belonging to the second expression type the expression in the first sample expression image are obtained It is a micro expression
  • the expression in the second sample expression image is an exaggerated expression.
  • the following uses a first sample expression image and a second sample expression image as examples.
  • steps S302-S305 are used to describe the process of training the image enhancement model and the image recognition model
  • steps S306-step S308 are used to describe the process of recognizing images containing micro-expressions.
  • the image enhancement model is used to enhance the expression characteristics of an image. It can also be understood that the image enhancement model generates an image with higher expression recognition and stronger expression intensity, so the image enhancement model can correspond to the adversarial network.
  • Sample generation model where the adversarial network includes a sample generation model and a sample discriminant model.
  • the sample generation model is used to generate sample data, here is to generate expression images with high expression intensity, and the sample discrimination model is used to identify the probability that the sample data belongs to the real expression image and the probability of belonging to the simulated expression image (the model is generated by the sample).
  • the generated sample data is simulated sample data.
  • the image collected by the data collector is real sample data, and the sum of the probability of being a real expression image and the probability of being a simulated expression image is 1), so the essence of training the image enhancement model
  • the above is to train an adversarial network, not only to train the sample generation model, but also to train the sample discrimination model.
  • the adversarial network can also be understood as: the sample generation model should generate an expression image that is as real and exaggerated as possible, and the sample discrimination model should try to recognize that the image generated by the sample generation model is a simulated image of the model, rather than a truly collected expression image, so This is an adversarial game process (also called adversarial network), so the training process is to find a balance between the authenticity corresponding to the sample generation model and the accuracy corresponding to the sample discrimination model.
  • the objective function of the adversarial network can be expressed as formula (1):
  • x represents the first sample expression image belonging to the first expression type
  • z represents the second sample expression image belonging to the second expression type
  • T represents the sample generation model, which is used to enhance the expression characteristics of the micro expression image
  • T ( x) represents the image with enhanced expression
  • D represents a sample discriminant model, which is used to identify the object (the object here includes the second sample expression image or the image with enhanced expression features) belongs to the probability of the real expression type
  • the image of the real expression type here refers to the use of image acquisition
  • the images about facial expressions collected by the device correspond to the real expression types, which are simulated expression types, and the images belonging to the simulated expression types are fictional expression images generated by the model.
  • maximizing D means that for the sample discriminant model, when the second sample expression image (the second sample expression image is a real and exaggerated expression image) is input, the identification tag of the second sample expression image to be recognized is 1 (the identification label is 1 indicates that the probability that the corresponding image belongs to the normal expression image type is 1), so the larger the D (z), the better.
  • the identification label of the image with enhanced facial expression enhancement is 0 (the identification label of 0 indicates that the probability that the corresponding image belongs to the normal expression image type is 0, this is because the sample enhanced image It is the image generated by the model rather than the actual image), that is, the smaller the D (T (x)), the better, so change the second term to 1-D (T (x)), so the bigger the better Together, the bigger the better.
  • minimizing T means that for the sample generation model, when the sample enhanced image is input, the identification label of the sample enhanced image that is desired to be recognized is 1 (since the sample generation model wants the sample generated image to be sufficiently realistic and realistic, it is hoped that The sample discriminant model recognizes that the identification label of the sample enhanced image is 1), so the larger the D (T (x)), the better. In order to write it in the form of D (T (x)), then the corresponding is to minimize 1-D (T (x)).
  • Step S302 Enhance the expression characteristics of the micro-expressions in the first sample expression image based on the sample generation model to obtain a sample enhanced image.
  • a sample generation model is initialized, and based on the sample generation model, the expression characteristics of the micro expressions in the first sample expression image are enhanced to obtain a sample enhanced image.
  • the quality of the sample enhanced image may be relatively low (that is, the sample enhanced image is not realistic, the expression intensity is low, not exaggerated, or even not an expression image), and
  • the structure of the sample generation model before training or the sample generation model (that is, the image enhancement model) after training is the same, the difference is the value of the parameters in the model.
  • Step S303 Extract first structural feature information corresponding to the sample enhanced image based on the sample discriminant model, and identify the matching probability corresponding to the first structural feature information according to the classifier in the sample discriminant model; the matching The probability is used to characterize the probability that the sample enhanced image belongs to the real expression type.
  • a sample discrimination model is initialized, and the sample discrimination model may be a classification model based on a convolutional neural network. Extract the structural feature information corresponding to the sample enhanced image based on the sample discriminant model (referred to as the first structural feature information), and identify the matching probability that the sample enhanced image belongs to the real expression type based on the classifier and the first structural feature information in the sample discriminant model .
  • the image of the real expression type here refers to a real and normal facial expression image collected by an image collector (for example, a camera).
  • an image collector for example, a camera
  • the matching probability calculated by the sample discriminant model can only determine the probability that the sample enhanced image is a real and normal facial expression.
  • Step S304 extracting the second structural feature information corresponding to the sample enhanced image based on the sample identification model, and identifying the label information set corresponding to the second structural feature information according to the classifier in the sample identification model;
  • the tag information set is used to characterize the degree of matching between the sample enhanced image and multiple expression attribute types.
  • a sample recognition model is initialized, and the sample recognition model may be a classification model based on a convolutional neural network.
  • the sample recognition model Based on the sample recognition model, extract the structural feature information corresponding to the sample enhanced image (referred to as the second structural feature information), and identify the matching between the sample enhanced image and multiple expression attribute types based on the classifier and the second structural feature information in the sample recognition model Degree, associating multiple matching degrees and corresponding expression attribute types to obtain multiple tag information, and combining multiple tag information into a set of tag information.
  • the matching degree of the sample enhanced image A and the "happy” expression attribute type is 0.2; the matching degree of the sample enhanced image A and the "sad heart” expression attribute type is 0.1; the matching degree of the sample enhanced image A and the "fear” expression attribute type Is 0.7, and then associated with the corresponding expression attribute type, that is to obtain the label information set: 0.2-happy, 0.1-sad, 0.7-fear.
  • the sample discrimination model and the sample recognition model can be classification models based on convolutional neural networks
  • the first structural feature information extracted by the sample discrimination model mainly reflects whether the sample enhanced image is a real image or a simulated image Hidden high-level feature information.
  • the second structural feature information extracted by the sample identification model is mainly hidden high-level feature information that reflects the attribute type of the sample enhanced image.
  • Step S305 generate a model loss value according to the sample enhanced image, the second sample expression image, and the matching probability and the label information set, and determine the image enhancement model and the image recognition according to the model loss value model.
  • a loss value is generated based on the sample enhanced image and the second sample expression image; a discrimination loss value is generated based on the matching probability identified by the sample discrimination model and the second sample expression image; according to the label information set, the first sample expression image corresponds
  • the expression attribute type generates the model loss value.
  • the above three loss values are combined into a model loss value, and the weight values of the parameters in the sample generation model, the sample discrimination model, and the sample identification model are adjusted according to the model loss value.
  • the sample generation model, the sample discrimination model and the sample recognition model are trained, and then the sample generation model can be determined as the image enhancement model and the sample recognition model as the image recognition model. It can be seen from the above that there is a sample discrimination model in the training phase, but in the application phase, no sample discrimination model is required.
  • the image enhancement model includes the first enhancer model and the second enhancer model
  • the sample generation model includes the first generator submodel and the second generator submodel, and the sample generation is adjusted according to the model loss value
  • the weight of the model parameters is to adjust the weight of the first generated sub-model parameters and adjust the weight of the second generated sub-model parameters according to the model loss value.
  • Step S306 Obtain an original expression image belonging to the first expression type, and input the original expression image into the image enhancement model; the original expression image belonging to the first expression type is an image containing micro expressions; the image enhancement model is based on A sample expression image belonging to the first expression type and a sample expression image belonging to the second expression type are trained; the expression intensity of the sample expression image belonging to the second expression type is greater than the expression intensity of the sample image belonging to the first expression type .
  • Step S307 Enhance the expression characteristics of the micro expressions in the original expression image in the image enhancement model to obtain a target expression image belonging to the second expression type.
  • Step S308 Identify the expression attribute type corresponding to the target expression image, and determine the expression attribute type corresponding to the target expression image as the expression attribute type corresponding to the original expression image.
  • steps S306-step S308 For the specific implementation of steps S306-step S308, reference may be made to steps S101-step S103 in the embodiment corresponding to FIG. 2 above, and for the specific process of enhancing expression features, reference may be made to steps S201-step S203 in the embodiment corresponding to FIG. 3 above. No more details will be given here.
  • FIG. 6 is a schematic flowchart of generating a model loss value according to an embodiment of the present application.
  • the specific process of generating the model loss value includes the following steps S401-S402, and steps S401-S402 are a specific embodiment of step S305 in the embodiment corresponding to FIG. 5:
  • Step S401 Generate a model loss value according to the sample enhanced image, the second sample expression image, the matching probability, the label information set, and the expression attribute type corresponding to the first sample expression image.
  • the error between the generated sample enhanced image and the second sample expression image with high expression intensity can be calculated, and the sample enhancement can be calculated using formula (2)
  • x represents the micro-expression image
  • z represents the real exaggerated image
  • T represents the enhancement of the expression features of the micro-expression image
  • T (x) represents the sample enhanced image.
  • the generated loss value calculated by formula (2) is used to ensure that the expression intensity of the image generated by the sample generation model (the image with enhanced expression features) in the subsequent adjustment of the sample generation model is as large as possible, that is, after the enhancement
  • the emoji in the image should be as exaggerated as possible.
  • the sample enhanced image is an image generated by the model, rather than the image actually collected, the sample enhanced image does not belong to the real expression type. Therefore, for the sample discrimination model, it is hoped that the sample enhanced image belongs to the real expression
  • the matching probability of the type is 0, that is, the identification label corresponding to the sample enhanced image is expected to be 0.
  • the corresponding sample can be calculated based on the formula (3) Identify the error of the model;
  • T (x) represents the sample enhanced image
  • z represents the second sample expression image
  • x represents the first sample expression image
  • D (z) represents the probability of identifying the second sample expression image belongs to the real expression type
  • D (T (z)) represents the probability of identifying the sample enhancement image belongs to the real expression type
  • the image belonging to the real expression type here refers to the real collection
  • the matching probability of the sample enhanced image that belongs to the real expression type is expected to be 1, that is, the identification label corresponding to the sample enhanced image is expected to be 1.
  • the error corresponding to the sample generation model can be calculated based on formula (4):
  • D (T (z)) represents the probability of identifying that the sample enhanced image belongs to the real expression type.
  • L 2 represents the error corresponding to the sample discrimination model
  • L 3 represents the error corresponding to the sample generation model.
  • the discriminant loss value calculated by formula (5) is used in the process of subsequent adjustment of the sample generation model and the sample discrimination model to ensure that the sample enhanced image generated by the sample generation model is as real as possible, and the sample discrimination model discrimination result is as much as possible Accuracy, or to ensure that the sample generation model and the sample discrimination model can achieve a balance.
  • p (x) represents the set of label information recognized by the sample recognition model
  • q (x) represents the true expression attribute type corresponding to the first sample expression image.
  • the verification loss value calculated by formula (6) is used in the subsequent adjustment of the sample recognition model to ensure that the expression attribute type of the image (the image with enhanced expression features) determined by the sample recognition model is as accurate as possible, or It is ensured that the type of expression attribute recognized by the enhanced image of the recognition sample and the type of expression attribute of the first sample expression image are the same as possible.
  • the combination method can adopt formula (7):
  • L 6 L 1 + ⁇ ⁇ L 4 + ⁇ ⁇ L 5 (7)
  • ⁇ and ⁇ are connection weights, and the value is between 0 and 1.
  • L 1 represents the generation loss value
  • L 4 represents the discrimination loss value
  • L 5 represents the identification loss value.
  • the model loss value can be combined to be understood as: generating a loss value to ensure that the sample enhanced image is as exaggerated as possible, judging the loss value to ensure that the sample enhanced image is as real as possible, and verifying the loss value to ensure that the expression attribute type of the sample enhanced image is as accurate as possible, then the combined model loss value is guaranteed The sample-enhanced image is exaggerated, real, and the expression attribute type is accurate.
  • Step S402 Adjust the weights of the parameters in the sample generation model, the weights of the parameters in the sample discrimination model, and the weights of the parameters in the sample identification model according to the model loss value, when the model loss value is less than At the target threshold, the adjusted sample generation model is determined as the image enhancement model, and the adjusted sample recognition model is determined as the image recognition model.
  • the generation model, the sample discrimination model and the sample recognition model are trained, and then the sample generation model can be determined as the image enhancement model for enhancing expression features, and the sample recognition model can be determined as the image recognition model for identifying expression attribute types.
  • FIG. 7 is a schematic diagram of calculating a model loss value provided by an embodiment of the present application.
  • the first sample expression image 30e is input into the sample generation model 30d (the sample generation model 30d includes the first generation sub-model 30b and the second generation sub-model 30c) to obtain a sample enhanced image 30e.
  • the sample enhanced image 30e is input into the sample discrimination model 30g, and the matching probability that the sample enhanced image 30e belongs to the real expression type is calculated.
  • the sample enhanced image 30e is input into the sample recognition model 30h, and the matching degree between the sample enhanced image 30e and various expression attribute types is calculated, and the multiple matching degrees and corresponding expression attribute types are combined into a tag information set.
  • the error function calculator 30k calculates and generates a loss value according to the sample enhanced image 30e, the second sample expression image 30f and formula (2); the error function calculator 30k according to the matching probability of the sample enhanced image 30e belonging to the real expression type, the second sample expression image 30f is the matching probability of the real expression type, formula (3), formula (4) and formula (5) to calculate the discriminant loss value; the error function calculator 30k according to the label information set and the real expression attribute corresponding to the first sample expression image 30a Type and formula (6) calculate the identification loss value.
  • the error function calculator 30k combines the above three loss values into a model loss value, and adjusts the weight of the parameters in the sample generation model 30d (the first generation sub-model 30b and the second generation sub-model 30c) according to the model loss value, according to the model loss The value adjusts the weights of the parameters in the sample discrimination model 30g, and adjusts the weights of the parameters in the sample identification model 30h according to the model loss value.
  • the above is guaranteed by three loss values.
  • the sample enhanced image is exaggerated, real and the expression attribute type is accurate.
  • the above-mentioned discriminant loss value and verification loss value can be combined into one loss value.
  • the specific process is as follows: obtaining a first sample expression image belonging to the first expression type, and obtaining a second sample expression image belonging to the second expression type, the following still uses a first sample expression image and a second sample
  • the emoticon image is used as an example. Initialize a sample generation model, and enhance the expression characteristics of the micro expressions in the first sample expression image based on the sample generation model to obtain a sample enhanced image.
  • the sample discriminant model may be a classification model based on a convolutional neural network. Based on the sample discriminant model, extract the structural feature information corresponding to the sample enhanced image (referred to as the third structural feature information). According to the classifier and the third structural feature information in the sample discriminant model, identify that the sample enhanced image belongs to the real expression type
  • the joint matching probability of matching facial expression attribute types, the number of joint matching probabilities is the same as the number of facial expression attribute types in the discriminant model, where the third structural feature information mainly reflects the sample enhanced image is a real image, and the hidden advanced feature information about the attribute type .
  • sample discrimination model and sample identification model are combined into a sample discrimination model.
  • error between the generated sample enhanced image and the second sample expression image with high expression intensity can be calculated, and the sample can be calculated using the above formula (2) Enhance the generated loss value between each pixel in the image and each pixel of the second sample expression image.
  • the joint matching probability identified by the sample discriminant model and the real expression attribute type corresponding to the first sample expression image calculates the recognition result and the real result
  • calculate the recognition result and the real result Between the identification loss value. Combine the two loss values into the model loss value. Adjust the weight of the parameters in the sample generation model and the sample discrimination model according to the model loss value. After the parameter weight adjustment, the above method is used to generate the sample enhanced image again, and then the model loss value is calculated. Either the model loss value converges, or the number of cycles reaches the target number of times, then the sample generation model and the sample discrimination model are trained, and the sample generation model can be determined as the image enhancement model later.
  • the model loss value can also be understood as: generating a loss value to ensure that the sample enhanced image is as exaggerated as possible, discriminating the loss value to ensure that the sample enhanced image is as real as possible, and the expression attribute type is as accurate as possible, then the combined model loss value guarantees that the sample enhanced image is exaggerated , True and accurate expression attribute type.
  • FIG. 8 is a schematic structural diagram of an image recognition device based on micro-expressions provided by an embodiment of the present application.
  • the image recognition device 1 based on micro-expressions may include: a first acquisition module 11, an enhancement module 12, a recognition module 13, and a determination module 14.
  • the first obtaining module 11 is configured to obtain an original expression image belonging to a first expression type, and input the original expression image into an image enhancement model; the original expression image belonging to the first expression type is an image containing micro expressions; The image enhancement model is trained based on the sample expression image belonging to the first expression type and the sample expression image belonging to the second expression type; the expression intensity of the sample expression image belonging to the second expression type is greater than that of the first expression type The expression intensity of the sample image;
  • the enhancement module 12 is used to enhance the expression characteristics of the micro expressions in the original expression image in the image enhancement model to obtain a target expression image belonging to the second expression type;
  • the recognition module 13 is used to recognize the expression attribute type corresponding to the target expression image
  • the determining module 14 is configured to determine the expression attribute type corresponding to the target expression image as the expression attribute type corresponding to the original expression image.
  • step S101-step S103 for specific implementation manners of the first acquisition module 11, the enhancement module 12, the identification module 13, and the determination module 14, reference may be made to step S101-step S103 in the embodiment corresponding to FIG. 3 above, and details are not repeated here.
  • the enhancement module 12 may include: a determination unit 121, a first input unit 122, a second input unit 123, and a combination unit 124.
  • the determining unit 121 is configured to determine the expression identification area in the original expression image as a unit original expression image
  • the first input unit 122 is configured to input the unit original expression image into the first enhancement sub-model, and enhance the expression characteristics of the unit original expression image in the first enhancement sub-model to obtain a unit auxiliary image;
  • the second input unit 123 is configured to input the original expression image into the second enhancement sub-model, and enhance the expression characteristics of the original expression image in the second enhancement sub-model to obtain a target auxiliary image;
  • the combining unit 124 is configured to combine the unit auxiliary image and the target auxiliary image into the target expression image according to the position information of the unit original expression image in the original expression image.
  • step S201-step S203 for the specific function implementation manners of the determining unit 121, the first input unit 122, the second input unit 123, and the combining unit 124, reference may be made to step S201-step S203 in the embodiment corresponding to FIG. 4 above, and details are not repeated here.
  • the first input unit 122 may include: a first input subunit 1221 and a first convolution subunit 1222.
  • the first input subunit 1221 is configured to input the unit original expression image into the input layer of the first enhancement submodel to obtain a first original matrix corresponding to the unit original expression image;
  • the first convolution subunit 1222 is used to randomly sample from the first original matrix to obtain a first original vector with a target length. According to the transposed convolution layer in the first enhancement submodel, the The first original vector is subjected to deconvolution processing to obtain a first target tensor, and the first target tensor is determined as the unit auxiliary image.
  • step S201 For the specific function implementation manner of the first input subunit 1221 and the first convolution subunit 1222, reference may be made to step S201 in the embodiment corresponding to FIG. 4 above, and details are not described herein again.
  • the second input unit 123 may include: a second input subunit 1231 and a second convolution subunit 1232.
  • the second input subunit 1231 is configured to input the original expression image into the input layer of the second enhancement submodel to obtain a second original matrix corresponding to the original expression image;
  • the second convolution subunit 1232 is used to randomly sample from the second original matrix to obtain a second original vector with the target length. According to the transposed convolution layer in the second enhancement submodel, the The second original vector is subjected to deconvolution processing to obtain a second target tensor, and the second target tensor is determined as the target auxiliary image.
  • step S202 For the specific function implementation manner of the second input subunit 1231 and the second convolution subunit 1232, reference may be made to step S202 in the embodiment corresponding to FIG. 4 above, and details are not described herein again.
  • the micro-expression-based image recognition device 1 may include: a first acquisition module 11, an enhancement module 12, a recognition module 13, a determination module 14, and may further include: a binary processing module 15 and an edge detection module 16.
  • the edge detection module 16 is configured to perform edge detection on the binary image based on a gradient operator to obtain a gradient image, and determine target position information where an edge contour is located in the gradient image;
  • the determination module 14 is also used to determine the area identified by the target position information as the expression identification area in the original expression image.
  • step S102 for the specific function implementation manners of the binary processing module 15 and the edge detection module 16, reference may be made to step S102 in the embodiment corresponding to FIG. 3 above, and details are not repeated here.
  • the identification module 13 may include an extraction unit 131 and an identification unit 132.
  • the extraction unit 131 is used to input the target facial expression image into the image recognition model
  • the extraction unit 131 is further configured to extract target structural feature information corresponding to the target facial expression image according to the convolution processing of the forward convolution layer and the pooling processing of the pooling layer in the image recognition model;
  • the recognition unit 132 is used for recognizing the matching degree between the target structural feature information and the multiple expression attribute type features in the image recognition model according to the classifier in the image recognition model, which is obtained from the target structural feature information Among the multiple matching degrees, the expression attribute type corresponding to the maximum matching degree is taken as the expression attribute type corresponding to the target expression image.
  • step S103 For the specific function implementation manners of the extracting unit 131 and the identifying unit 132, reference may be made to step S103 in the embodiment corresponding to FIG. 3 above, and details are not repeated here.
  • the micro-expression-based image recognition device 1 may include: a first acquisition module 11, an enhancement module 12, a recognition module 13, a determination module 14, a binary processing module 15, and an edge detection module 16; and may also include: Two acquisition module 17, extraction module 18, generation module 19.
  • the second obtaining module 17 is configured to obtain a first sample expression image belonging to the first expression type, and obtain a second sample expression image belonging to the second expression type;
  • the determination module 14 is further configured to enhance the expression characteristics of the micro-expressions in the first sample expression image based on the sample generation model to obtain a sample enhanced image;
  • the extraction module 18 is configured to extract the first structural feature information corresponding to the sample enhanced image based on the sample discrimination model, and identify the matching probability corresponding to the first structural feature information according to the classifier in the sample discrimination model;
  • the matching probability is used to characterize the probability that the sample enhanced image belongs to the real expression type;
  • the extraction module 18 is further configured to extract second structural feature information corresponding to the sample enhanced image based on a sample identification model, and identify the second structural feature information corresponding to the second structural feature information according to the classifier in the sample identification model A set of tag information; the set of tag information is used to characterize the degree of matching of the sample enhanced image with multiple expression attribute types;
  • the generating module 19 is configured to generate a model loss value according to the sample enhanced image, the second sample expression image, the matching probability, and the label information set, and determine the image enhancement model and according to the model loss value The image recognition model.
  • the generation module 19 may include: a generation unit 191 and an adjustment unit 192.
  • the generating unit 191 is configured to generate a model loss value based on the sample enhancement image, the second sample expression image, the matching probability, the label information set, and the expression attribute type corresponding to the first sample expression image;
  • the adjusting unit 192 is configured to adjust the weights of the parameters in the sample generation model, the weights of the parameters in the sample discrimination model and the weights of the parameters in the sample identification model according to the model loss value When the loss value is less than the target threshold, the adjusted sample generation model is determined as the image enhancement model, and the adjusted sample identification model is determined as the image identification model.
  • the generating unit 191 may include a first determining subunit 1911 and a second determining subunit 1912.
  • the first determining subunit 1911 is configured to determine the generated loss value according to the sample enhanced image and the second sample expression image;
  • the first determining subunit 1911 is further configured to determine the discriminant loss value according to the matching probability and the second sample expression image;
  • a second determination subunit 1912 configured to determine the verification loss value according to the tag information set and the expression attribute type corresponding to the first sample expression image
  • the second determining subunit 1912 is further configured to generate the model loss value based on the generated loss value, the discriminated loss value, and the verification loss value.
  • step S401 For the specific function implementation manner of the first determining subunit 1911 and the second determining subunit 1912, reference may be made to step S401 in the embodiment corresponding to FIG. 6 above, and details are not described herein again.
  • the embodiment of the present application inputs the original expression image into the image enhancement model by acquiring the original expression image belonging to the first expression type; the original expression image belonging to the first expression type is an image containing micro expressions; the image enhancement model is based on The sample expression images of the type and the sample expression images of the second expression type are trained; the expression intensity of the sample expression images of the second expression type is greater than the expression intensity of the sample images of the first expression type; the original is enhanced in the image enhancement model
  • the expression characteristics of the micro-expressions in the expression image to obtain the target expression image belonging to the second expression type; identify the expression attribute type corresponding to the target expression image, and determine the expression attribute type corresponding to the target expression image as corresponding to the original expression image Expression attribute type.
  • the image enhancement model enhances the expression characteristics of the micro-expressions in the micro-expression images to convert the micro-expression images into target facial expression images with high recognition and high expression intensity, and uses the facial differentiation features of the target facial expression images to identify the The expression attribute type of the target expression image, as the expression attribute type of the micro expression image, since the expression characteristics of the target expression image after the enhancement of the expression characteristics are clearly distinguishable, the expression attribute type of the target expression image can be accurately identified, which can be improved The accuracy of identifying micro-expression images.
  • FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the micro-expression-based image recognition apparatus 1 in FIG. 9 described above may be applied to the electronic device 1000, and the electronic device 1000 may include: a processor 1001, a network interface 1004, and a memory 1005.
  • the electronic device 1000 may further include: a user interface 1003, and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display (Display) and a keyboard (Keyboard), and the user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory.
  • the memory 1005 may also be at least one storage device located away from the foregoing processor 1001. As shown in FIG. 9, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide network communication functions; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call device control applications stored in the memory 1005 Procedures to achieve:
  • the original expression image belonging to the first expression type is an image containing micro expressions
  • the image enhancement model is based on the first
  • the sample expression image of the expression type and the sample expression image of the second expression type are trained; the expression intensity of the sample expression image of the second expression type is greater than the expression intensity of the sample image of the first expression type;
  • Identify the expression attribute type corresponding to the target expression image and determine the expression attribute type corresponding to the target expression image as the expression attribute type corresponding to the original expression image.
  • the image enhancement model includes a first enhancer model and a second enhancer model
  • the processor 1001 specifically executes the following steps when performing the enhancement of the expression feature of the micro-expression in the original expression image in the image enhancement model to obtain the target expression image belonging to the second expression type:
  • the unit auxiliary image and the target auxiliary image are combined into the target expression image.
  • the processor 1001 executes inputting the unit original expression image into the first enhancement sub-model, and enhances the expression characteristics of the unit original expression image in the first enhancement sub-model to obtain When the unit assists the image, specifically perform the following steps:
  • a first target tensor is obtained, and the first target tensor is determined as the unit auxiliary image.
  • the processor 1001 executes inputting the original expression image into the second enhancement sub-model, and enhances the expression characteristics of the original expression image in the second enhancement sub-model to obtain target assistance When performing an image, perform the following steps:
  • the processor 1001 also performs the following steps:
  • the area identified by the target position information is determined as the expression identification area.
  • the processor 1001 specifically performs the following steps when performing recognition of the expression attribute type corresponding to the target expression image:
  • the classifier in the image recognition model identify the matching degree between the target structural feature information and multiple expression attribute type features in the image recognition model, among the multiple matching degrees obtained from the target structural feature information ,
  • the expression attribute type corresponding to the maximum matching degree is used as the expression attribute type corresponding to the target expression image.
  • the processor 1001 also performs the following steps:
  • a model loss value is generated according to the sample enhanced image, the second sample expression image, the matching probability, and the label information set, and the image enhancement model and the image recognition model are determined according to the model loss value.
  • the processor 1001 executes to generate a model loss value according to the sample enhanced image, the second sample expression image, the matching probability, and the label information set, and according to the model loss value
  • the processor 1001 executes to generate a model loss value according to the sample enhanced image, the second sample expression image, the matching probability, and the label information set, and according to the model loss value
  • the model loss value includes: generating a loss value, discriminating a loss value, and verifying a loss value;
  • the processor 1001 executes to generate a model loss value according to the expression attribute type corresponding to the sample enhanced image, the second sample expression image, the matching probability, the label information set, and the first sample expression image At this time, specifically perform the following steps:
  • the model loss value is generated based on the generated loss value, the discrimination loss value, and the verification loss value.
  • the image enhancement model enhances the expression characteristics of the micro-expressions in the micro-expression images to convert the micro-expression images into high-recognition target expression images, and uses the expression discrimination features of the target expression images to identify the target expression images
  • the expression attribute type as the expression attribute type of the micro expression image, because the expression characteristics of the target expression image are clearly distinguishable, the expression attribute type of the target expression image can be accurately identified, and the accuracy of identifying the micro expression image can be improved.
  • the electronic device 1000 described in the embodiments of the present application can perform the description of the micro-expression-based image recognition method in the embodiments corresponding to FIGS. 3 to 7 above, and can also execute the embodiment corresponding to FIG. 8 above
  • the description of the micro-expression-based image recognition device 1 in FIG. 2 will not be repeated here.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the embodiments of the present application also provide a computer storage medium, and the computer storage medium stores the computer program executed by the micro-expression-based image recognition device 1 mentioned above, and the The computer program includes program instructions.
  • the processor executes the program instructions, it can execute the description of the micro-expression-based image recognition method in the embodiments corresponding to FIG. 3 to FIG. 7 above. Repeat them again.
  • the description of the beneficial effects of using the same method will not be repeated.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

一种基于微表情的图像识别方法、装置以及相关设备,涉及人工智能的计算机视觉技术。方法包括:获取属于第一表情类型的原始表情图像,将原始表情图像输入图像增强模型;属于第一表情类型的原始表情图像是包含微表情的图像;图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到,所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度(S101);在图像增强模型中增强原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像(S102);识别与目标表情图像对应的表情属性类型,并将与目标表情图像对应的表情属性类型确定为与原始表情图像对应的表情属性类型(S103)。

Description

一种基于微表情的图像识别方法、装置以及相关设备
本申请要求于2018年11月21日提交中国专利局、申请号为201811392529.7、发明名称为“一种基于微表情的图像识别方法、装置以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种基于微表情的图像识别方法、装置以及相关设备。
背景技术
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
计算机视觉技术(Computer Vision,CV)中的计算机视觉是一门研究如何使机器“看”的科学,更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还包括常见的人脸识别、指纹识别等生物特征识别技术。
在生活的任何地方、任何时刻,人们都会具有各种不同的情绪。情绪与表情具有密切的联系,表情是情绪的外部表现,情绪是表情内心体验。而表情中的微表情是人物试图隐藏内心真实情感却又不由自主流露出的不易察觉的面部表情,微表情通常发生于人物具有隐瞒心理的状态下,与一般的面部表情相比,微表情最显著的特点是持续时间短、强度弱,反应了人物试图压抑与隐藏的真实情感,是一种有效的非言语线索。特别是在意图对自己心理变化做出掩饰时,更容易做出相应动作,因此微表情的识别可以用于安全、刑侦、心理等需要探查人物真实想法的领域,破解人物的隐瞒意图。
在现有技术中,对微表情图像的识别方法主要是通过提取微表情图像的特征,再根据提取出来的特征进行分类和识别。但是由于微表情具有表情强度低,动作行 为快的特点,即使是不同类型的微表情图像也非常的相似,就导致提取出来的特征不具备很好的区分性,进而就会降低对微表情图像识别的准确率。
发明内容
本申请实施例提供一种基于微表情的图像识别方法、装置以及相关设备,可以提高微表情图像识别的准确率。
本申请实施例一方面提供了一种基于微表情的图像识别方法,由电子设备执行,包括:
获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度;
在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;
识别与所述目标表情图像对应的表情属性类型,并将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
本申请实施例一方面提供了一种基于微表情的图像识别装置,包括:
第一获取模块,用于获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度;
增强模块,用于在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;
识别模块,用于识别与所述目标表情图像对应的表情属性类型;
确定模块,用于将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
本申请实施例一方面提供了一种电子设备,包括:处理器和存储器;
所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行如本申请实施例中的方法。
本申请实施例一方面提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如本申请实施例中的方法。
附图简要说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种基于微表情的图像识别方法的系统架构图;
图2是本申请实施例提供的一种基于微表情的图像识别方法的场景示意图;
图3是本申请实施例提供的一种基于微表情的图像识别方法的流程示意图;
图4是本申请实施例提供的一种增强表情特征的流程示意图;
图5是本申请实施例提供的另一种基于微表情的图像识别方法的流程示意图;
图6是本申请实施例提供的一种生成模型损失值的流程示意图;
图7是本申请实施例提供的一种计算模型损失值的示意图;
图8是本申请实施例提供的一种基于微表情的图像识别装置的结构示意图;
图9是本申请实施例提供的一种电子设备的结构示意图。
实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
随着人工智能技术研究和进步,人工智能技术在多个领域展开研究和应用,例如常见的智能家居、智能穿戴设备、虚拟助理、智能音箱、智能营销、无人驾驶、自动驾驶、无人机、机器人、智能医疗、智能客服、语音识别等,相信随着技术的发展,人工智能技术将在更多的领域得到应用,并发挥越来越重要的价值。
请参见图1,是本申请实施例提供的一种基于微表情的图像识别方法的系统架构图。服务器40a为用户终端集群提供服务,用户终端集群可以包括:用户终端40b、用户终端40c、...、用户终端40d。当用户终端(可以是用户终端40b、用户终端40c或用户终端40d)获取到微表情图像,并需要识别该微表情图像的属性类型时,将该微表情图像发送至服务器40a。服务器40a基于提前训练好的图像增强模型增强微表情图像中微表情的表情特征,以将该微表情图像转换为情绪表现力度高的夸张表情图像,服务器40a再基于提前训练好的图像识别模型识别上述夸张表情图像的属性类型,识别出来的属性类型即是用户终端发送的微表情图像的属性类型。后续,服务器40a可以将识别出来的属性类型发送至用户终端,以及将微表情图像和识别出来的属性类型关联存储在数据库中。用户终端接收到服务器发送的属性类型后,可以在屏幕上以文字的方式显示该属性类型。当然,若用户终端的本地存储了训练好的图像增强模型和图像识别模型,可以在用户终端本地将微表情图像转换为夸张表情图像,再对夸张表情图像进行识别,同样地将识别出来的属性类型就作为微表情图像对应的属性类型。其中,由于训练图像增强模型和训练图像识别模型涉及到大量的离线计算,因此用户终端本地的图像增强模型和图像识别模型可以是由服务器40a训练完成后发送至用户终端的。下述以识别一张微表情图像的属性类型为例(可以是在服务器40a中识别,也可以是在用户终端中识别),进行说明。
其中,用户终端可以包括手机、平板电脑、笔记本电脑、掌上电脑、智能音响、移动互联网设备(MID,mobile internet device)、POS(Point Of Sales,销售点)机、可穿戴设备(例如智能手表、智能手环等)等。
请参见图2,是本申请实施例提供的一种基于微表情的图像识别方法的场景示意图。获取待识别的微表情图像10a,其中微表情是持续时间短、情绪表达强度低下、且区别特征不明显的表情。由于人脸表情主要是由五官构成,因此将微表情图像10a中属于脸部五官的区域图像提取出来,即从微表情图像10a中提取出图像10b(微表情图像10a中位于左眼区域的图像)、图像10c(微表情图像10a中位于右眼 区域的图像)、图像10d(微表情图像10a中位于鼻子区域的图像)和图像10e(微表情图像10a中位于嘴巴区域的图像)。将上述图像10b输入图像增强模型20a中,图像增强模型20a是用于增强图像的表情特征,其中增强表情特征是调整人脸五官的形态,例如:眼睛外张、眼睑抬起、眉毛内皱、嘴角张开、牙齿露出或唇角向下等。在图像增强模型20a中增强图像10b的表情特征,得到图像10f;同样地,将图像10c输入图像增强模型20a中,在图像增强模型20a中增强图像10c的表情特征,得到图像10g;将图像10d输入图像增强模型20a中,在图像增强模型20a中增强图像10d的表情特征,得到图像10h;将图像10e输入图像增强模型20a中,在图像增强模型20a中增强图像10e的表情特征,得到图像10k。
下面以图像10b为例,说明如何在图像增强模型20a中增强图像10b的表情特征,其余的图像输入增强模型20a后都可以采用相同的方式增强图像的表情特征。将图像10b输入图像增强模型20a的输入层,以将图像10b转化为对应的矩阵,从上述矩阵中随机采样,将采样数据组合为一个列向量,此处的列向量可以是一个大小为1*n的列向量。根据图像增强模型20a中的转置卷积层,对上述1*n的列向量进行反卷积处理,反卷积处理是卷积处理的逆向操作,具体过程是将上述列向量全连接和重塑(reshape)为一个四维张量1*1*1*n,再将上述四维张量投影到有多个特征映射的小空间范围卷积,通过一连串的微步幅卷积,得到能够表征图像10b的高级表征向量1*p*p*3。若需要增强后的图像是彩色图像,那么就将上述高级表征向量的第0个索引维度压缩,即可以得到一张尺寸为p*p的彩色图像;需要增强后的图像是灰度图像,那么就将上述高级表征向量的第0个和第3个索引维度压缩,可以得到一张尺寸为p*p的灰度图像。以从图像增强模型20a中输出的图像是灰度图像为例,那么就压缩第0个和第3个索引维度,即可以得到图像10f,即从图像增强模型中输出的图像10f是图像10b增强表情特征后的图像。
对于图像增强模型20a也可以理解为是一个基于卷积神经网络识别图像中对象的属性类型的逆过程,卷积神经网络识别图像是输入一张图像,输出一个列向量,该列向量就表示输入图像与多种属性类型的匹配程度;而图像增强模型是从图像中随机采样后得到一个列向量,即是从图像增强模型20a中输入的是一个列向量,输出是一张图像。还需要说明的是,上述图像增强模型20a可以对应于对抗网络中的生成模型,对抗网络包括生成模型和判别模型,生成模型是用于生成模拟的样本数据,在本申请中生成模型(图像增强模型)就用于生成更具有情绪表达能力的夸张的表情图像;判别模型就是用于判断生成模型所生成的夸张表情图像是真实图像的概率,其中,属于真实表情类型的图像是真实图像,对应地,属于模拟表情类型的图像是模拟图像(也可以称为仿真图像),真实图像是利用图像采集器采集的正常人脸图像,模拟图像是由模型虚构生成的图像,例如,由照相机拍摄人脸表情所得到的图像就是属于真实表情类型的真实图像。
再将完整的微表情图像10a输入图像增强模型20b中,在图像增强模型20b中增强微表情图像10a中微表情的表情特征,得到表情特征增强后的图像10m,表情特征增强后的图像10m相比微表情图像10a具有更高表情辨识度,且表情强度更强。由于图像增强模型20b和图像增强模型20a的结构完全相同,不同的是模型中参数的取值,因此图像增强模型20b增强表情特征的具体过程可以参见上述图像增强模型20a增强图像10b的表情特征的过程。
需要说明的是,利用图像增强模型20a分别增强图像10b、图像10c、图像10d、图像10e,和利用图像增强模型20b增强微表情图像10a是没有先后顺序限定的, 即是可以先基于图像增强模型20a增强图像10b、图像10c、图像10d、图像10e;也可以先基于图像增强模型20b增强微表情图像10a;或者两个图像增强模型并行地对图像进行表情特征的增强。
确定了表情特征增强后的图像10f、图像10g、图像10h、图像10k和图像10m后,根据上述对应的图像10b、图像10c、图像10d和图像10e在微表情图像10a中的位置信息,将图像10f、图像10g、图像10h、图像10k和图像10m组合为一张图像10n。由于图像10n是由多张表情夸张的图像组合而来,因此图像10n也是一张具有高情绪表达力、高表情辨识度的图像,即图像10n相比微表情图像10a是一张表情夸张的图像。
将表情夸张的图像10n输入图像识别模型20c中,图像识别模型20c是用于识别图像10n中表情所对应的表情属性类型,其中表情属性类型可以包括:高兴、悲伤、害怕、惊奇、厌恶和愤怒等。图像识别模型20c可以是一个卷积神经网络模型,识别过程具体为:将图像10n输入至图像识别模型20c中的输入层,利用图像识别模型20c中的卷积层的卷积运算和池化层的池化运算,提取与图像10n对应的静态结构特征信息,利用图像识别模型20c中的分类器,计算图像10n对应的静态结构特征信息与图像识别模型中所包含的6种表情属性类型特征匹配的概率,分别是:0.1高兴、0.8悲伤、0.2害怕、0.2惊奇、0.1厌恶和0.3愤怒,其中,匹配结果中的数值表示图像10n的静态结构特征信息与6种表情属性类型特征匹配的概率,例如:“0.1高兴”就表示图像10n的静态结构特征信息与“高兴”表情属性类型特征匹配的概率为0.1。从上述匹配的结果中,将匹配概率最高的表情属性类型特征对应表情属性类型作为图像10n对应的表情属性类型。由于图像10n是微表情图像10a经由增强表情特征后得到的,因此识别出来的图像10n的表情属性类型即是与微表情图像10a对应的表情属性类型。因此,根据上述6个匹配概率的结果,可以确定与图像10n对应的表情属性类型是:悲伤(0.8悲伤>0.3愤怒>0.2惊奇=0.2害怕>0.1高兴=0.1厌恶)。那么与微表情图像10a对应的表情属性类型也是:悲伤。上述可知,图像增强模型20a、图像增强模型20b不仅可以将微表情图像或者微表情子图像的表情特征增强,使得微表情图像可以转换为夸张表情图像,且能够保证由图像识别模型20c识别出来的夸张表情图像的表情属性类型具有与微表情图像相同的表情属性类型,即转换而来的夸张表情图像不仅夸张(具有更高的表情强度)、真实、且保证表情属性类型与微表情图像的表情属性类型一致。
后续若确定出微表情图像10a的表情属性类型与预设的表情属性类型相同,且该表情属性类型对应的概率大于预设的概率阈值时,则终端可以执行相应的操作。例如,若识别出的表情属性类型是:悲伤,且表情属性类型“悲伤”对应的概率大于或等于0.8,则终端执行支付操作或者执行拍照操作。
通过图像增强模型增强微表情图像中微表情的表情特征,以将微表情图像转换为具有高辨识度的目标表情图像,利用目标表情图像所具备的表情区分特征识别该目标表情图像的表情属性类型,作为微表情图像的表情属性类型,由于目标表情图像的表情特征区分性明显,因此可以准确地识别出目标表情图像的表情属性类型,进而可以提高识别微表情图像的准确率。
其中,基于图像增强模型增强表情特征、识别与图像对应的表情属性类型的具体过程可以参见以下图3至图7所对应的实施例。
进一步地,请参见图3,是本申请实施例提供的一种基于微表情的图像识别方法的流程示意图。如图3所示,所述基于微表情的图像识别方法可以包括:
步骤S101,获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度。
具体的,获取待识别或者待分类的属于第一表情类型的图像(如上述图2所对应实施例中的微表情图像10a),称为属于第一表情类型的原始表情图像,其中属于第一表情类型的图像是包含微表情的图像,且微表情是人物试图隐藏情感时无意识做出的、短暂的、表情强度低的面部表情。那么,对应地属于第二表情类型的图像是包含夸张表情的图像(如上述图2所对应实施例中的图像10n),也可以理解为属于第二表情类型的图像所对应的的表情强度、表情区分性都要远大于属于第一表情类型的图像,表情强度大的图像是指脸部情绪表达明显,五官形态夸张的图像,例如开心大笑的表情强度就远大于目无表情的表情强度。为了识别与原始表情图像对应的表情属性类型,而原始表情图像又是不具有特征区分性的属于第一表情类型的图像,因此将原始表情图像输入图像增强模型(如上述图2所对应实施例中的图像增强模型20a和图像增强模型20b)中,以增强原始表情图像中微表情的表情特征。其中,图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练对抗网络得到的,且图像增强模型就对应于对抗网络中的生成模型。
步骤S102,在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像。
具体的,在图像增强模型中,增强原始表情图像中微表情的表情特征,由于人脸表情是由脸部五官中的眼睛、眉毛、鼻子、嘴巴、前额、面颊和下颚构成,因此增强表情特征即是调整眼睛、眉毛、鼻子、嘴巴、前额、面颊和下颚的外部形态,以增强面部表情的情绪表达能力,例如,眼睛张开、眉毛上扬、嘴角下拉、鼻孔张大、面颊上抬起皱、前额紧皱、下颚收紧等。可以知道,由于增强了表情特征后,所得到的图像具有较高的表情强度和明显的表情区分性,因此原始表情图像经过表情特征增强后的图像就属于第二表情类型的图像,称为属于第二表情类型的目标表情图像。
由于人脸表情主要是由五官中眼睛、眉毛、鼻子和嘴巴的变化构成,为了使原始表情图像(微表情图像)可以转换为表情强度高的目标表情图像(夸张表情图像),目标增强模型可以包括两个子模型,分别为第一增强子模型(如上述图2所对应实施例中的图像增强模型20a)和第二增强子模型(如上述图2所对应实施例中的图像增强模型20b),其中第一增强子模型是用于增强脸部表情中眼睛、眉毛、鼻子和嘴巴的表情特征;第二增强子模型是用于增强整个微表情的表情特征。再将经过上述两个增强子模型分别增强的表情图像组合为目标表情图像。
基于第一增强子模型和第二增强子模型增强原始表情图像以得到目标表情图像的具体过程是:在原始表情图像中,确定表情标识区域,将确定出来的表情标识区域从原始表情图像中提取出来,作为单位原始表情图像(如上述图2所对应实施例中的图像10b、图像10c、图像10d和图像10e)。其中,表情标识区域是脸部表情中眼睛、眉毛、鼻子和嘴巴所在的区域。可以知道,此处的单位原始表情图像的数量有多个,将单位原始表情图像分别输入第一增强子模型中,在第一增强子模型中分别增强上述单位原始图像的表情特征,得到的图像均称为单位辅助图像(如上述 图2所对应实施例中的图像10f、图像10g、图像10h和图像10k)。可以知道,单位辅助图像的数量和单位原始表情图像的数量是相同的,且每个单位辅助图像都具有唯一对应的单位原始表情图像。
再将原始表情图像输入第二增强子模型中,在第二增强子模型中增强原始表情图像的表情特征,将表情增强后的图像称为目标辅助图像。
基于第一增强子模型得到单位辅助图像和基于第二增强子模型得到目标辅助图像的执行顺序没有限定。确定了单位辅助图像和目标辅助图像后,由于单位辅助图像和单位原始表情图像是一一对应的,根据单位原始表情图像在原始表情图像中的位置信息,将单位辅助图像和目标辅助图像组合为目标表情图像,其中目标表情图像是具有高表情强度的图像。
将原始表情图像进行二值化处理,二值化处理后得到的图像称为二值图像,二值图像中像素的像素值是1或0。二值化处理是将原始表情图像中像素值大于像素阈值的像素的数值设置为1,对应地将原始表情图像中像素值小于或等于上述像素阈值的像素的数值设置为0,此处的原始表情图像的像素值是已经归一化处理后的,即所有原始表情图像的像素值范围在0到1之间。从显示效果上来说,若像素值等于1,那么该像素值显示为白色;若像素值等于0,那么该像素值显示为黑色。对二值图像进行边缘检测,边缘检测是指检测出二值图像中灰度发生急剧变化的区域,图像灰度的变化情况可以用灰度分布的梯度来反映,因此可以基于梯度算子对二值图像进行边缘检测,得到梯度图像,其中梯度算子可以包括:Roberts算子、Prewitt算子、Sobel算子、Lapacian算子等。由于梯度图像是反应二值图像中灰度发生急剧变化的图像,那么梯度图像就是由原始表情图像的边缘轮廓所组成的图像。对脸部表情来说,边缘轮廓即是眼睛的轮廓、眉毛的轮廓、鼻子的轮廓、嘴巴的轮廓。在梯度图像中,确定上述边缘轮廓所在的位置信息(称为目标位置信息),目标位置信息可以包括4个坐标信息,上述4个坐标信息表示一个矩形区域的4个顶点坐标,该矩形区域是包含边缘轮廓的最小矩形区域。在原始表情图像中,上述目标位置信息所标识的区域就是脸部表情中眼睛、眉毛、鼻子和嘴巴所在的区域,即目标位置信息在原始表情图像中所标识的区域就是表情标识区域。
步骤S103,识别与所述目标表情图像对应的表情属性类型,并将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
具体的,为了提高后续目标表情图像识别的准确率,可以先将目标表情图像调整至固定尺寸大小,随后将调整尺寸后的目标表情图像输入图像识别模型(如上述图2所对应实施例中的图像识别模型20c)中的输入层。图像识别模型可以包括输入层、卷积层、池化层、全连接层和输出层;其中输入层的参数大小等于调整尺寸后的目标表情图像的尺寸。当目标表情图像输入至卷积神经网络的输出层后,随后进入卷积层,首先随机选取目标表情图像中的一小块作为样本,并从这个小样本中学习到一些特征信息,然后利用这个样本作为一个窗口依次滑过目标表情图像的所有像素区域,也就是说,从样本中学习到的特征信息跟目标表情图像做卷积运算,从而获得目标表情图像不同位置上最显著的特征信息。在做完卷积运算后,已经提取到目标表情图像的特征信息,但仅仅通过卷积运算提取的特征数量大,为了减少计算量,还需进行池化运算,也就是将从目标表情图像中通过卷积运算提取的特征信息传输至池化层,对提取的特征信息进行聚合统计,这些统计特征信息的数量级要远远低于卷积运算提取到的特征信息的数量级,同时还会提高分类效果。常用的池化方法主要包括平均池化运算方法和最大池化运算方法。平均池化运算方法是在 一个特征信息集合里计算出一个平均特征信息代表该特征信息集合的特征;最大池化运算是在一个特征信息集合里提取出最大特征信息代表该特征信息集合的特征。
通过上述卷积层的卷积处理和池化层的池化处理,可以提取出目标表情图像的静态结构特征信息,称为目标结构特征信息,同时该目标结构特征信息的数量级较低。卷积神经网络中的卷积层可以只有一个也可以有多个,同理池化层可以只有一个也可以有多个。
利用图像识别模型中的分类器(也就对应于图像识别模型的全连接层和输出层),识别目标表情图像的目标结构特征信息与图像识别模型中多个表情属性类型特征的匹配度,上述分类器是提前训练完成的,该分类器的输入是静态结构特征信息,输出是静态结构特征信息与多种表情属性类型特征的匹配度,匹配度越高说明目标表情图像中的表情与表情属性类型特征对应的表情属性类型的匹配概率越大,得到的匹配度的数量和图像识别模型中表情属性类型特征的数量相同。图像识别模型中包含的表情属性类型特征的数量和种类是训练图像识别模型时由训练数据集中所包含的表情属性类型的数量和种类决定的。从上述得到的与多个表情属性类型特征的匹配度中,提取出最大匹配度所对应的表情属性类型特征对应的表情属性类型,将提取出来的表情属性类型作为目标表情图像的表情属性类型,且该提取出来的表情属性类型也是原始表情图像的表情属性类型。
举例来说,图像识别模型中存在“开心”表情属性类型特征、“恐惧”表情属性类型特征、“愤怒”表情属性类型特征,根据图像识别模型中的分类器识别目标结构特征信息A与“开心”表情属性类型特征的匹配度为0.1;识别目标结构特征信息A与“恐惧”表情属性类型特征的匹配度为0.3;识别目标结构特征信息A与“愤怒”表情属性类型特征的匹配度为0.6;识别目标结构特征信息A与“惊奇”表情属性类型特征的匹配度为0.9。从上述4个匹配度中提取最大匹配度所对应的表情属性类型,即提取出最大匹配度0.9对应的“惊奇”表情属性类型,那么与目标图像对应的表情属性类型为:惊奇;且与原始表情图像对应的表情属性类型也是:惊奇。
进一步地,请参见图4,是本申请实施例提供的一种增强表情特征的流程示意图。如图4所示,增强表情特征的具体过程包括如下步骤S201-步骤S203,且步骤S201-步骤S203为图3所对应实施例中步骤S102的一个具体实施例:
步骤S201,将所述原始表情图像中的表情标识区域确定为单位原始表情图像,并将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像。
具体的,图像增强模型包括第一增强子模型和第二增强子模型。在原始表情图像中,确定表情标识区域,将确定出来的表情标识区域从原始表情图像中提取出来,作为单位原始表情图像。其中,表情标识区域是脸部表情中眼睛、眉毛、鼻子和嘴巴所在的区域。下述以第一增强子模型增强一个单位原始表情图像的表情特征为例进行说明,若有多个单位原始表情图像都可以采用相同的方式增强表情特征,得到表情特征增强后的单位辅助图像。
将单位原始表情图像输入第一增强子模型的输入层,得到与单位原始表情图像对应的矩阵,称为第一原始矩阵,也即是将单位原始表情图像的像素点进行离散化,得到一个与单位原始表情图像的尺寸相同的第一原始矩阵。从第一原始矩阵中随机采样,将采样到的数值组合为一个具有长度为n的列向量1*n(称为第一原始向量),其中该目标长度是提前设置好的,例如目标长度n可以是100,那么就是通过在第一原始矩阵中下采样,得到一个1*100的第一原始向量。为了输入到第一增强子模 型中的转置卷积层,首先需要将上述第一原始向量扩充为一个1*1*1*n的四维张量。基于第一增强子模型中的第一个转置卷积层,对上述4维张量反卷积处理,得到与第一个转置卷积层对应的张量,其中反卷积处理和卷积处理的运算相反,空间由小变大。再经过第一增强子模型中的第二个转置卷积层,对上述与第一个转置卷积层对应的张量反卷积处理,得到与第二个转置卷积层对应的张量....,一直到基于第一增强子模型中最后一个转置卷积层的反卷积处理后,可以得到大小为1*a*b*3的4维张量,压缩第0个索引维度和第3个索引维度之后就得到一个2维的张量a*b,称为第一目标张量,将该第一目标张量确定为单位辅助图像,且该单位辅助图像的尺寸就等于a*b。需要说明的是,在确定了大小为1*a*b*3的4维张量后,若压缩第0个索引维度和第3个索引维度所得到的单位辅助图像是灰度图像;若压缩第0个索引维度所得到的单位辅助图像是尺寸为a*b的彩色图像,该彩色图像对应的第一目标张量就是3维的张量a*b*3。还需要说明的是,由于后续还需将单位辅助图像进行组合,因此单位辅助图像的尺寸和单位原始表情图像的尺寸是相同的。
对多个单位原始表情图像,都可以采用上述方式增强表情特征,得到与每个单位原始表情图像对应的单位辅助图像。
步骤S202,将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像。
将原始表情图像输入第二增强子模型的输入层,得到与原始表情图像对应的矩阵,称为第二原始矩阵,也即是将原始表情图像的像素点进行离散化,得到一个与原始表情图像的尺寸相同的第二原始矩阵。从第二原始矩阵中随机采样,将采样到的数值组合为一个具有长度为m的列向量1*m(称为第二原始向量)。为了输入到第二增强子模型中的转置卷积层,首先需要将上述第二原始向量扩充为一个1*1*1*m的四维张量。基于第二增强子模型中的第一个转置卷积层,对上述4维张量反卷积处理,得到与第一个转置卷积层对应的张量,其中反卷积处理和卷积处理的运算相反,空间由小变大。再经过第二增强子模型中的第二个转置卷积层,对上述与第一个转置卷积层对应的张量反卷积处理,得到与第二个转置卷积层对应的张量....,一直到基于第二增强子模型中最后一个转置卷积层的反卷积处理后,可以得到大小为1*c*d*3的4维张量,同样地,压缩第0个索引维度和第3个索引维度之后就得到一个2维的张量c*d,称为第二目标张量,将该第二目标张量确定为目标辅助图像,且该目标辅助图像的尺寸就等于c*d。同样地,在确定了大小为1*c*d*3的4维张量后,若压缩第0个索引维度和第3个索引维度所得到的是目标辅助图像是灰度图像;若压缩第0个索引维度所得到的是目标辅助图像是尺寸为c*d的彩色图像,该彩色图像对应的第二目标张量就是3维的张量c*d*3。上述可知,第一增强子模型和第二增强子模型具有相同的结构,只是模型参数(例如,转置卷积层的卷积核以及转置卷积层的数量)不一致,且目标辅助图像的尺寸和原始表情图像的尺寸是一样的。
步骤S203,根据所述单位原始表情图像在所述原始表情图像中的位置信息,将所述单位辅助图像和所述目标辅助图像组合为所述目标表情图像。
由于单位辅助图像和单位原始表情图像具有一一对应的关系,且单位辅助图像的尺寸和单位原始表情图像的尺寸是相同的,根据单位原始表情图像在原始表情图像中的位置信息,将单位辅助图像和目标辅助图像组合为目标表情图像,其中位置信息是指对应单位原始表情图像在原始表情图像中的位置坐标,且组合得到的目标表情图像也和原始表情图像的尺寸是相同的。
上述可知,通过图像增强模型增强微表情图像中微表情的表情特征,以将微表情图像转换为具有高辨识度的目标表情图像,利用目标表情图像所具备的表情区分特征识别该目标表情图像的表情属性类型,作为微表情图像的表情属性类型,由于目标表情图像的表情特征区分性明显,因此可以准确地识别出目标表情图像的表情属性类型,进而可以提高识别微表情图像的准确率。
请参见图5,是本申请实施例提供的另一种基于微表情的图像识别方法的流程示意图,该基于微表情的图像识别方法具体过程如下:
步骤S301,获取属于第一表情类型的第一样本表情图像,并获取属于第二表情类型的第二样本表情图像。
具体的,为了训练图像增强模型和图像识别模型,获取属于第一表情类型的第一样本表情图像,以及获取属于第二表情类型的第二样本表情图像,第一样本表情图像中的表情是微表情,第二样本表情图像中的表情是夸张表情。下述以一张第一样本表情图像和一张第二样本表情图像为例进行说明。
下述步骤S302-步骤S305是用于描述训练图像增强模型和图像识别模型的过程,步骤S306-步骤S308是用于描述识别包含微表情图像的过程。图像增强模型是用于将一张图像的表情特征增强,也可以理解为利用图像增强模型生成了一张表情辨识度更高、表情强度更强的图像,因此图像增强模型可以对应于对抗网络中的样本生成模型,其中对抗网络包括样本生成模型和样本判别模型。样本生成模型是用于生成样本数据,此处就是生成表情强度高的表情图像,样本判别模型是用于判别出样本数据属于真实表情图像的概率,和属于模拟表情图像的概率(由样本生成模型生成的样本数据就是模拟的样本数据,由数据采集器采集来的图像是真实的样本数据,且属真实表情图像的概率和属于模拟表情图像的概率之和为1),因此训练图像增强模型实质上是在训练对抗网络,既要训练样本生成模型,也要训练样本判别模型。对于对抗网络也可以理解为:样本生成模型要生成尽量真实、夸张的表情图像,样本判别模型要尽量识别出由样本生成模型生成的图像是模型的仿真图像,而不是真实采集的表情图像,所以这是一个对抗博弈的过程(也就称为对抗网络),因此训练过程就是在样本生成模型对应的真实性和样本判别模型对应的准确性之间寻找一个平衡。对抗网络的目标函数可以表示为公式(1):
Figure PCTCN2019116515-appb-000001
其中,x表示属于第一表情类型的第一样本表情图像,z表示属于第二表情类型的第二样本表情图像,T表示样本生成模型,用于将微表情图像的表情特征增强,T(x)表示表情增强后的图像。D表示样本判别模型,用于识别对象(此处的对象包括第二样本表情图像或者表情特征增强后的图像)属于真实表情类型的概率,此处的属于真实表情类型的图像是指利用图像采集器采集的关于人脸脸部表情的图像,与真实表情类型对应的是模拟表情类型,属于模拟表情类型的图像是由模型生成的、虚构的表情图像。
其中,最大化D表示对样本判别模型来说,当第二样本表情图像(第二样本表情图像是真实、夸张的表情图像)输入的时候,希望识别出来的第二样本表情图像的识别标签为1(识别标签为1表示对应图像属于正常表情图像类型的概率为1),所以D(z)越大越好。当表情特征增强后的图像输入的时候,希望识别出来的表情特征增强后的图像的识别标签是0(识别标签为0表示对应图像属于正常表情图像类型的概率为0,这是因为样本增强图像是由模型生成的图像而不是真实采集的图 像),也就是D(T(x))越小越好,所以把第二项改成1-D(T(x)),这样就是越大越好,两者合起来就是越大越好。
其中,最小化T表示对样本生成模型来说,当样本增强图像输入的时候,希望识别出来的样本增强图像的识别标签是1(由于样本生成模型希望样本生成图像足够真实、逼真,所以希望经由样本判别模型识别样本增强图像的识别标签是1),所以D(T(x))越大越好,为了统一写成D(T(x))的形式,那么对应地就是最小化1-D(T(x))。
步骤S302,基于样本生成模型增强所述第一样本表情图像中微表情的表情特征,得到样本增强图像。
具体的,初始化一个样本生成模型,基于样本生成模型增强第一样本表情图像中微表情的表情特征,得到样本增强图像,具体的增强过程可以参见上述图4所对应实施例中的步骤S201-步骤S203。需要说明的是,由于此时的样本生成模型还没有训练完成,样本增强图像的质量可能比较低(即是样本增强图像不逼真、表情强度低、不夸张,甚至不是一张表情图像),且不论训练前的样本生成模型,或者是训练完成后的样本生成模型(即是图像增强模型)结构都是相同的,不同的是模型中参数的取值。
步骤S303,基于样本判别模型提取与所述样本增强图像对应的第一结构特征信息,并根据所述样本判别模型中的分类器识别与所述第一结构特征信息对应的匹配概率;所述匹配概率用于表征所述样本增强图像属于真实表情类型的概率。
具体的,得到了由样本生成模型生成的样本增强图像后,初始化一个样本判别模型,该样本判别模型可以是一个基于卷积神经网络的分类模型。基于样本判别模型提取样本增强图像对应的结构特征信息(称为第一结构特征信息,),根据样本判别模型中的分类器和第一结构特征信息,识别样本增强图像属于真实表情类型的匹配概率,此处的属于真实表情类型的图像是指图像采集器(例如,照相机)采集的真实、正常的人脸表情图像,当然,若生成的样本增强图像越逼真,那么对应的匹配概率就越高。需要说明的是,样本判别模型所计算出来的匹配概率只能确定样本增强图像是真实、正常的人脸表情的概率。
步骤S304,基于样本识别模型提取与所述样本增强图像对应的第二结构特征信息,并根据所述样本识别模型中的分类器识别与所述第二结构特征信息对应的标签信息集合;所述标签信息集合用于表征所述样本增强图像与多种表情属性类型的匹配度。
具体的,得到了由样本生成模型生成的样本增强图像后,初始化一个样本识别模型,该样本识别模型可以是一个基于卷积神经网络的分类模型。基于样本识别模型提取样本增强图像对应的结构特征信息(称为第二结构特征信息),根据样本识别模型中的分类器和第二结构特征信息,识别样本增强图像与多种表情属性类型的匹配度,将多个匹配度以及对应的表情属性类型进行关联,得到多个标签信息,将多个标签信息组合为标签信息集合。
例如,样本增强图像A与“开心”表情属性类型的匹配度为0.2;样本增强图像A与“伤心”表情属性类型的匹配度为0.1;样本增强图像A与“恐惧”表情属性类型的匹配度为0.7,再关联对应的表情属性类型,即得到标签信息集合:0.2-开心、0.1-伤心、0.7-恐惧。
上述可知,虽然样本判别模型和样本识别模型都可以是基于卷积神经网络的分类模型,但样本判别模型所提取出来的第一结构特征信息主要是反映样本增强图像 是真实图像或者是模拟图像的隐藏高级特征信息,样本识别模型所提取出来的第二结构特征信息主要是反映样本增强图像关于属性类型的隐藏高级特征信息。
步骤S305,根据所述样本增强图像、所述第二样本表情图像、所述匹配概率所述标签信息集合生成模型损失值,并根据所述模型损失值确定所述图像增强模型和所述图像识别模型。
具体的,根据样本增强图像、第二样本表情图像,生成损失值;根据样本判别模型识别出来的匹配概率和第二样本表情图像生成判别损失值;根据标签信息集合、第一样本表情图像对应的表情属性类型生成模型损失值。将上述3种损失值组合为模型损失值,根据模型损失值调整样本生成模型、样本判别模型和样本识别模型中参数的权值。参数权值调整后采用上述方法再次生成样本增强图像,再次计算模型损失值,不断循环,直至当模型损失值小于目标阈值时,或者当模型损失值收敛时,或者当循环的次数达到目标次数时,此时样本生成模型、样本判别模型和样本识别模型就训练完毕,后续就可以将样本生成模型确定为图像增强模型,并将样本识别模型确定为图像识别模型。上述可知,训练阶段是存在样本判别模型的,但在应用阶段,就不需要样本判别模型。
还需要说明的是,由于图像增强模型包括第一增强子模型和第二增强子模型,那么对应地,样本生成模型包括第一生成子模型和第二生成子模型,根据模型损失值调整样本生成模型参数的权值,就是根据模型损失值调整第一生成子模型参数的权值和调整第二生成子模型参数的权值。
步骤S306,获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度。
步骤S307,在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像。
步骤S308,识别与所述目标表情图像对应的表情属性类型,并将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
其中,步骤S306-步骤S308的具体实现方式可以参见上述图2对应实施例中的步骤S101-步骤S103,且增强表情特征的具体过程可以参见上述图3对应实施例中的步骤S201-步骤S203,这里不再进行赘述。
进一步地,请参见图6,是本申请实施例提供的一种生成模型损失值的流程示意图。如图6所示,生成模型损失值的具体过程包括如下步骤S401-步骤S402,且步骤S401-步骤S402为图5所对应实施例中步骤S305的一个具体实施例:
步骤S401,根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合和所述第一样本表情图像对应的表情属性类型生成模型损失值。
具体的,根据样本增强图像和第二样本表情图像,可以计算出生成的样本增强图像和表情强度高(表情夸张)的第二样本表情图像之间的误差,可以利用公式(2)计算样本增强图像中每个像素点和第二样本表情图像的每个像素点之间的生成损失值:
Figure PCTCN2019116515-appb-000002
其中,x表示微表情图像,z表示真实夸张图像,T表示将微表情图像的表情特 征增强,T(x)表示样本增强图像。公式(2)所计算出来的生成损失值是用于在后续调整样本生成模型过程中保证样本生成模型所生成的图像(表情特征增强后的图像)的表情强度要尽量的大,即是增强后的图像中的表情要尽可能的夸张。
对样本判别模型来说,由于样本增强图像是由模型生成的图像,而不是真实采集的图像,因此样本增强图像不属于真实表情类型,进而对样本判别模型来说,希望样本增强图像属于真实表情类型的匹配概率为0,即希望样本增强图像对应的识别标签为0。那么根据样本判别模型所判别出来的匹配概率(二分类匹配概率,分别对应属于真实表情的概率和属于模拟表情的概率),以及第二样本表情图像,可以基于公式(3)计算出对应于样本判别模型的误差;
L 2=log D(z)+log(1-D(T(x))       (3)
其中,T(x)表示样本增强图像,z表示第二样本表情图像,x表示第一样本表情图像。D(z)表示识别第二样本表情图像属于真实表情类型的概率,D(T(z))表示识别样本增强图像属于真实表情类型的概率,此处的属于真实表情类型的图像是指真实采集的人脸脸部表情的图像,而不是模型虚构模拟的人脸表情图像。
对样本生成模型来说,希望样本增强图像识别出来属于真实表情类型的匹配概率为1,即希望样本增强图像对应的识别标签为1。那么根据样本判别模型所判别出来的匹配概率(二分类匹配概率),以及样本增强图像对应的识别标签1,可以基于公式(4)计算出对应于样本生成模型的误差:
L 3=log(1-D(T(x))      (4)
其中,D(T(z))表示识别样本增强图像属于真实表情类型的概率。
上述两个误差之和称为判别损失值。可以利用公式(4)计算判别损失值:
L 4=L 2+L 3       (5)
其中,L 2表示样本判别模型对应的误差,L 3表示样本生成模型对应的误差。公式(5)所计算出来的判别损失值是用于在后续调整样本生成模型和样本判别模型过程中,保证样本生成模型所生成的样本增强图像尽可能的真实,且样本判别模型判别结果尽可能的准确,或者说保证样本生成模型和样本判别模型可以达到平衡。
根据样本识别模型所确定的标签信息集合,和第一样本表情图像对应的表情属性类型(真实的表情属性类型),确定样本识别模型识别出来的结果和真实的结果之间的验证损失值,可以利用公式(6)计算上述验证损失值:
Figure PCTCN2019116515-appb-000003
其中,p(x)表示样本识别模型所识别出来的标签信息集合,q(x)表示第一样本表情图像对应的真实的表情属性类型。公式(6)所计算出来的验证损失值是用于在后续调整样本识别模型过程中,保证样本识别模型所判定图像(表情特征增强后的图像)的表情属性类型要尽可能的准确,或者说保证识别样本增强图像所识别出来的表情属性类型和第一样本表情图像的表情属性类型尽可能相同。
为了使上述3种损失值协同调节样本生成模型、样本判别模型和样本识别模型,将3种损失值组合为模型损失值,组合的方式可以采用公式(7):
L 6=L 1+α·L 4+β·L 5       (7)
其中,α和β是连接权重,且取值在0到1之间。L 1表示生成损失值、L 4表示判别损失值,L 5表示识别损失值。模型损失值可以合并理解为:生成损失值保证样本增强图像尽量夸张、判别损失值保证样本增强图像尽量真实、验证损失值保证样本增强图像的表情属性类型尽量准确,那么合并后的模型损失值保证样本增强图像夸张、真实且表情属性类型准确。
步骤S402,根据所述模型损失值调整所述样本生成模型中参数的权值、所述样本判别模型中参数的权值和所述样本识别模型中参数的权值,当所述模型损失值小于目标阈值时,将调整后的样本生成模型确定为所述图像增强模型,并将调整后的样本识别模型确定为所述图像识别模型。
根据模型损失值调整样本生成模型、样本判别模型和样本识别模型中参数的权值。参数权值调整后采用上述方法再次生成样本增强图像,再计算模型损失值,不断循环,直至当模型损失值小于目标阈值,或者模型损失值收敛,或者循环的次数达到目标次数时,此时样本生成模型、样本判别模型和样本识别模型就训练完毕,后续就可以将样本生成模型确定为图像增强模型用于增强表情特征,并将样本识别模型确定为图像识别模型用于识别表情属性类型。
请参见图7,是本申请实施例提供的一种计算模型损失值的示意图。第一样本表情图像30e输入样本生成模型30d(样本生成模型30d包括第一生成子模型30b和第二生成子模型30c)中,得到样本增强图像30e。将样本增强图像30e输入样本判别模型30g中,计算样本增强图像30e属于真实表情类型的匹配概率。再将样本增强图像30e输入样本识别模型30h中,计算样本增强图像30e与多种表情属性类型之间的匹配度,并将多个匹配度和对应的表情属性类型组合为标签信息集合。误差函数计算器30k根据样本增强图像30e、第二样本表情图像30f和公式(2)计算生成损失值;误差函数计算器30k根据样本增强图像30e属于真实表情类型的匹配概率、第二样本表情图像30f属于真实表情类型的匹配概率、公式(3)、公式(4)和公式(5)计算判别损失值;误差函数计算器30k根据标签信息集合、第一样本表情图像30a对应的真实表情属性类型和公式(6)计算识别损失值。误差函数计算器30k将上述3种损失值组合为模型损失值,根据模型损失值调整样本生成模型30d(第一生成子模型30b和第二生成子模型30c)中参数的权值,根据模型损失值调整样本判别模型30g中参数的权值,根据模型损失值调整样本识别模型30h中参数的权值。
上述是通过3个损失值来保证,样本增强图像夸张、真实且表情属性类型准确,可以将上述判别损失值和验证损失值合并为一个损失值。具体过程为:获取属于第一表情类型的第一样本表情图像,以及获取属于第二表情类型的第二样本表情图像,下述仍以一张第一样本表情图像和一张第二样本表情图像为例进行说明。初始化一个样本生成模型,基于样本生成模型增强第一样本表情图像中微表情的表情特征,得到样本增强图像。
得到了由样本生成模型生成的样本增强图像后,初始化一个样本判别模型,该样本判别模型可以是一个基于卷积神经网络的分类模型。基于样本判别模型提取样本增强图像对应的结构特征信息(称为第三结构特征信息),根据样本判别模型中的分类器和第三结构特征信息,识别样本增强图像属于真实表情类型,且与多种表情属性类型匹配的联合匹配概率,联合匹配概率的数量和判别模型中表情属性类型的 数量相同,其中第三结构特征信息主要反映样本增强图像是真实图像、且关于属性类型的隐藏高级特征信息。这里也可以理解为上述样本判别模型和样本识别模型合并为样本判别模型。同样地,根据样本增强图像和第二样本表情图像,可以计算出生成的样本增强图像和表情强度高(表情夸张)的第二样本表情图像之间的误差,可以利用上述公式(2)计算样本增强图像中每个像素点和第二样本表情图像的每个像素点之间的生成损失值。
根据样本判别模型所识别出来的联合匹配概率以及第一样本表情图像对应的真实的表情属性类型(此处第一样本表情图像是属于真实表情类型),计算识别出来的结果和真实的结果之间的识别损失值。将2种损失值组合为模型损失值。根据模型损失值调整样本生成模型和样本判别模型中参数的权值,参数权值调整后采用上述方法再次生成样本增强图像,再计算模型损失值,不断循环,直至当模型损失值小于目标阈值,或者模型损失值收敛,或者循环的次数达到目标次数时,此时样本生成模型和样本判别模型就训练完毕,后续就可以将样本生成模型确定为图像增强模型。
其中,模型损失值同样可以合并理解为:生成损失值保证样本增强图像尽量夸张、判别损失值保证样本增强图像尽量真实,且表情属性类型尽量准确,那么合并后的模型损失值保证样本增强图像夸张、真实且表情属性类型准确。
上述仅仅是训练图像增强模型的过程,后续还需要对图像的表情属性类型进行识别,或者说是图像识别模型可以和图像增强模型分开训练。由于图像增强模型所生成的图像夸张、真实且表情属性类型准确,因此后续任意一个表情识别准确率高的图像识别模型都可以准确识别出表情特征增强后的图像,且由于已经将微表情图像转换为夸张表情图像,提取夸张表情图像的特征识别夸张表情图像的表情属性类型的难度大大降低。
进一步的,请参见图8,是本申请实施例提供的一种基于微表情的图像识别装置的结构示意图。如图8所示,基于微表情的图像识别装置1可以包括:第一获取模块11、增强模块12、识别模块13、确定模块14。
第一获取模块11,用于获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度;
增强模块12,用于在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;
识别模块13,用于识别与所述目标表情图像对应的表情属性类型;
确定模块14,用于将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
其中,第一获取模块11、增强模块12、识别模块13、确定模块14的具体功能实现方式可以参见上述图3对应实施例中的步骤S101-步骤S103,这里不再进行赘述。
请参见图8,增强模块12可以包括:确定单元121、第一输入单元122、第二输入单元123、组合单元124。
确定单元121,用于将所述原始表情图像中的表情标识区域确定为单位原始表情图像;
第一输入单元122,用于将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像;
第二输入单元123,用于将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像;
组合单元124,用于根据所述单位原始表情图像在所述原始表情图像中的位置信息,将所述单位辅助图像和所述目标辅助图像组合为所述目标表情图像。
其中,确定单元121、第一输入单元122、第二输入单元123、组合单元124的具体功能实现方式可以参见上述图4对应实施例中的步骤S201-步骤S203,这里不再进行赘述。
请参见图8,第一输入单元122可以包括:第一输入子单元1221、第一卷积子单元1222。
第一输入子单元1221,用于将所述单位原始表情图像输入所述第一增强子模型的输入层,得到与所述单位原始表情图像对应的第一原始矩阵;
第一卷积子单元1222,用于从所述第一原始矩阵中随机采样,得到具有目标长度的第一原始向量,根据所述第一增强子模型中的转置卷积层,对所述第一原始向量进行反卷积处理,得到第一目标张量,并将所述第一目标张量确定为所述单位辅助图像。
其中,第一输入子单元1221、第一卷积子单元1222的具体功能实现方式可以参见上述图4对应实施例中的步骤S201,这里不再进行赘述。
请参见图8,第二输入单元123可以包括:第二输入子单元1231、第二卷积子单元1232。
第二输入子单元1231,用于将所述原始表情图像输入所述第二增强子模型的输入层,得到与所述原始表情图像对应的第二原始矩阵;
第二卷积子单元1232,用于从所述第二原始矩阵中随机采样,得到具有所述目标长度的第二原始向量,根据所述第二增强子模型中的转置卷积层,对所述第二原始向量进行反卷积处理,得到第二目标张量,并将所述第二目标张量确定为所述目标辅助图像。
其中,第二输入子单元1231、第二卷积子单元1232的具体功能实现方式可以参见上述图4对应实施例中的步骤S202,这里不再进行赘述。
请参见图8,基于微表情的图像识别装置1可以包括:第一获取模块11、增强模块12、识别模块13、确定模块14,还可以包括:二值处理模块15、边缘检测模块16。
二值处理模块15,用于将所述原始表情图像二值化处理,得到二值图像;
边缘检测模块16,用于基于梯度算子对所述二值图像进行边缘检测,得到梯度图像,并在所述梯度图像中确定边缘轮廓所在的目标位置信息;
所述确定模块14,还用于在所述原始表情图像中,将所述目标位置信息所标识的区域确定为所述表情标识区域。
其中,二值处理模块15、边缘检测模块16的具体功能实现方式可以参见上述图3对应实施例中的步骤S102,这里不再进行赘述。
请参见图8,识别模块13可以包括:提取单元131、识别单元132。
提取单元131,用于将所述目标表情图像输入图像识别模型中;
所述提取单元131,还用于根据所述图像识别模型中的正向卷积层的卷积处理 和池化层的池化处理,提取与所述目标表情图像对应的目标结构特征信息;
识别单元132,用于根据所述图像识别模型中的分类器,识别所述目标结构特征信息与所述图像识别模型中多个表情属性类型特征的匹配度,在由所述目标结构特征信息得到的多个匹配度中,将最大匹配度所对应的表情属性类型,作为与所述目标表情图像对应的表情属性类型。
其中,提取单元131、识别单元132的具体功能实现方式可以参见上述图3对应实施例中的步骤S103,这里不再进行赘述。
请参见图8,基于微表情的图像识别装置1可以包括:第一获取模块11、增强模块12、识别模块13、确定模块14、二值处理模块15、边缘检测模块16;还可以包括:第二获取模块17、提取模块18、生成模块19。
第二获取模块17,用于获取属于第一表情类型的第一样本表情图像,并获取属于第二表情类型的第二样本表情图像;
所述确定模块14,还用于基于样本生成模型增强所述第一样本表情图像中微表情的表情特征,得到样本增强图像;
提取模块18,用于基于样本判别模型提取与所述样本增强图像对应的第一结构特征信息,并根据所述样本判别模型中的分类器识别与所述第一结构特征信息对应的匹配概率;所述匹配概率用于表征所述样本增强图像属于真实表情类型的概率;
所述提取模块18,还用于基于样本识别模型提取与所述样本增强图像对应的第二结构特征信息,并根据所述样本识别模型中的分类器识别与所述第二结构特征信息对应的标签信息集合;所述标签信息集合用于表征所述样本增强图像与多种表情属性类型的匹配度;
生成模块19,用于根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合生成模型损失值,并根据所述模型损失值确定所述图像增强模型和所述图像识别模型。
其中,第二获取模块17、提取模块18、生成模块19的具体功能实现方式可以参见上述图5对应实施例中的步骤S301-步骤S305,这里不再进行赘述。
请参见图8,生成模块19可以包括:生成单元191、调整单元192。
生成单元191,用于根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合和所述第一样本表情图像对应的表情属性类型生成模型损失值;
调整单元192,用于根据所述模型损失值调整所述样本生成模型中参数的权值、所述样本判别模型中参数的权值和所述样本识别模型中参数的权值,当所述模型损失值小于目标阈值时,将调整后的样本生成模型确定为所述图像增强模型,并将调整后的样本识别模型确定为所述图像识别模型。
其中,生成单元191、调整单元192的具体功能实现方式可以参见上述图6对应实施例中的步骤S401-步骤S402,这里不再进行赘述。
请参见图8,生成单元191可以包括:第一确定子单元1911、第二确定子单元1912。
第一确定子单元1911,用于根据所述样本增强图像和所述第二样本表情图像,确定所述生成损失值;
所述第一确定子单元1911,还用于根据所述匹配概率和所述第二样本表情图像,确定所述判别损失值;
第二确定子单元1912,用于根据所述标签信息集合和所述第一样本表情图像对 应的表情属性类型,确定所述验证损失值;
所述第二确定子单元1912,还用于根据所述生成损失值、所述判别损失值和所述验证损失值,生成所述模型损失值。
其中,第一确定子单元1911、第二确定子单元1912的具体功能实现方式可以参见上述图6对应实施例中的步骤S401,这里不再进行赘述。
本申请实施例通过获取属于第一表情类型的原始表情图像,将原始表情图像输入图像增强模型;属于第一表情类型的原始表情图像是包含微表情的图像;图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;属于第二表情类型的样本表情图像的表情强度大于属于第一表情类型的样本图像的表情强度;在图像增强模型中增强原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;识别与目标表情图像对应的表情属性类型,并将与目标表情图像对应的表情属性类型确定为与原始表情图像对应的表情属性类型。上述可知,通过图像增强模型增强微表情图像中微表情的表情特征,以将微表情图像转换为辨识度高、表情强度大的目标表情图像,利用目标表情图像所具备的表情区分特征,识别该目标表情图像的表情属性类型,作为微表情图像的表情属性类型,由于表情特征增强后的目标表情图像的表情特征区分性明显,因此可以准确地识别出目标表情图像的表情属性类型,进而可以提高识别微表情图像的准确率。
进一步地,请参见图9,是本申请实施例提供的一种电子设备的结构示意图。如图9所示,上述图9中的基于微表情的图像识别装置1可以应用于所述电子设备1000,所述电子设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,所述电子设备1000还可以包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005还可以是至少一个位于远离前述处理器1001的存储装置。如图9所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在图9所示的电子设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:
获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度;
在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;
识别与所述目标表情图像对应的表情属性类型,并将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
在一个实施例中,所述图像增强模型包括第一增强子模型和第二增强子模型;
所述处理器1001在执行在所述图像增强模型中增强所述原始表情图像中微表 情的表情特征,得到属于第二表情类型的目标表情图像时,具体执行以下步骤:
将所述原始表情图像中的表情标识区域确定为单位原始表情图像,并将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像;
将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像;
根据所述单位原始表情图像在所述原始表情图像中的位置信息,将所述单位辅助图像和所述目标辅助图像组合为所述目标表情图像。
在一个实施例中,所述处理器1001在执行将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像时,具体执行以下步骤:
将所述单位原始表情图像输入所述第一增强子模型的输入层,得到与所述单位原始表情图像对应的第一原始矩阵;
从所述第一原始矩阵中随机采样,得到具有目标长度的第一原始向量,根据所述第一增强子模型中的转置卷积层,对所述第一原始向量进行反卷积处理,得到第一目标张量,并将所述第一目标张量确定为所述单位辅助图像。
在一个实施例中,所述处理器1001在执行将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像时,具体执行以下步骤:
将所述原始表情图像输入所述第二增强子模型的输入层,得到与所述原始表情图像对应的第二原始矩阵;
从所述第二原始矩阵中随机采样,得到具有所述目标长度的第二原始向量,根据所述第二增强子模型中的转置卷积层,对所述第二原始向量进行反卷积处理,得到第二目标张量,并将所述第二目标张量确定为所述目标辅助图像。
在一个实施例中,所述处理器1001还执行以下步骤:
将所述原始表情图像二值化处理,得到二值图像;
基于梯度算子对所述二值图像进行边缘检测,得到梯度图像,并在所述梯度图像中确定边缘轮廓所在的目标位置信息;
在所述原始表情图像中,将所述目标位置信息所标识的区域确定为所述表情标识区域。
在一个实施例中,所述处理器1001在执行识别与所述目标表情图像对应的表情属性类型时,具体执行以下步骤:
将所述目标表情图像输入图像识别模型中;
根据所述图像识别模型中的正向卷积层的卷积处理和池化层的池化处理,提取与所述目标表情图像对应的目标结构特征信息;
根据所述图像识别模型中的分类器,识别所述目标结构特征信息与所述图像识别模型中多个表情属性类型特征的匹配度,在由所述目标结构特征信息得到的多个匹配度中,将最大匹配度所对应的表情属性类型,作为与所述目标表情图像对应的表情属性类型。
在一个实施例中,所述处理器1001还执行以下步骤:
获取属于第一表情类型的第一样本表情图像,并获取属于第二表情类型的第二样本表情图像;
基于样本生成模型增强所述第一样本表情图像中微表情的表情特征,得到样本 增强图像;
基于样本判别模型提取与所述样本增强图像对应的第一结构特征信息,并根据所述样本判别模型中的分类器识别与所述第一结构特征信息对应的匹配概率;所述匹配概率用于表征所述样本增强图像属于真实表情类型的概率;
基于样本识别模型提取与所述样本增强图像对应的第二结构特征信息,并根据所述样本识别模型中的分类器识别与所述第二结构特征信息对应的标签信息集合;所述标签信息集合用于表征所述样本增强图像与多种表情属性类型的匹配度;
根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合生成模型损失值,并根据所述模型损失值确定所述图像增强模型和所述图像识别模型。
在一个实施例中,所述处理器1001在执行根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合生成模型损失值,并根据所述模型损失值确定所述图像增强模型和所述图像识别模型时,具体执行以下步骤:
根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合和所述第一样本表情图像对应的表情属性类型生成模型损失值;
根据所述模型损失值调整所述样本生成模型中参数的权值、所述样本判别模型中参数的权值和所述样本识别模型中参数的权值,当所述模型损失值小于目标阈值时,将调整后的样本生成模型确定为所述图像增强模型,并将调整后的样本识别模型确定为所述图像识别模型。
在一个实施例中,所述模型损失值包括:生成损失值、判别损失值和验证损失值;
所述处理器1001在执行根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合和所述第一样本表情图像对应的表情属性类型生成模型损失值时,具体执行以下步骤:
根据所述样本增强图像和所述第二样本表情图像,确定所述生成损失值;
根据所述匹配概率和所述第二样本表情图像,确定所述判别损失值;
根据所述标签信息集合和所述第一样本表情图像对应的表情属性类型,确定所述验证损失值;
根据所述生成损失值、所述判别损失值和所述验证损失值,生成所述模型损失值。
上述可知,通过图像增强模型增强微表情图像中微表情的表情特征,以将微表情图像转换为具有高辨识度的目标表情图像,利用目标表情图像所具备的表情区分特征识别该目标表情图像的表情属性类型,作为微表情图像的表情属性类型,由于目标表情图像的表情特征区分性明显,因此可以准确地识别出目标表情图像的表情属性类型,进而可以提高识别微表情图像的准确率。
应当理解,本申请实施例中所描述的电子设备1000可执行前文图3到图7所对应实施例中对所述基于微表情的图像识别方法的描述,也可执行前文图8所对应实施例中对所述基于微表情的图像识别装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
此外,这里需要指出的是:本申请实施例还提供了一种计算机存储介质,且所述计算机存储介质中存储有前文提及的基于微表情的图像识别装置1所执行的计算机程序,且所述计算机程序包括程序指令,当所述处理器执行所述程序指令时,能够执行前文图3到图7所对应实施例中对所述基于微表情的图像识别方法的描述, 因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (17)

  1. 一种基于微表情的图像识别方法,由电子设备执行,包括:
    获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度;
    在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;
    识别与所述目标表情图像对应的表情属性类型,并将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
  2. 根据权利要求1所述的方法,其中,所述图像增强模型包括第一增强子模型和第二增强子模型;
    所述在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像,包括:
    将所述原始表情图像中的表情标识区域确定为单位原始表情图像,并将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像;
    将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像;
    根据所述单位原始表情图像在所述原始表情图像中的位置信息,将所述单位辅助图像和所述目标辅助图像组合为所述目标表情图像。
  3. 根据权利要求2所述的方法,其中,所述将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像,包括:
    将所述单位原始表情图像输入所述第一增强子模型的输入层,得到与所述单位原始表情图像对应的第一原始矩阵;
    从所述第一原始矩阵中随机采样,得到第一原始向量,根据所述第一增强子模型中的转置卷积层,对所述第一原始向量进行反卷积处理,得到第一目标张量,并将所述第一目标张量确定为所述单位辅助图像。
  4. 根据权利要求2所述的方法,其中,所述将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像,包括:
    将所述原始表情图像输入所述第二增强子模型的输入层,得到与所述原始表情图像对应的第二原始矩阵;
    从所述第二原始矩阵中随机采样,得到第二原始向量,根据所述第二增强子模型中的转置卷积层,对所述第二原始向量进行反卷积处理,得到第二目标张量,并将所述第二目标张量确定为所述目标辅助图像。
  5. 根据权利要求2所述的方法,其中,还包括:
    将所述原始表情图像二值化处理,得到二值图像;
    基于梯度算子对所述二值图像进行边缘检测,得到梯度图像,并在所述梯度图 像中确定边缘轮廓所在的目标位置信息;
    在所述原始表情图像中,将所述目标位置信息所标识的区域确定为所述表情标识区域。
  6. 根据权利要求1所述的方法,其中,所述识别与所述目标表情图像对应的表情属性类型,包括:
    将所述目标表情图像输入图像识别模型中;
    根据所述图像识别模型中的正向卷积层的卷积处理和池化层的池化处理,提取与所述目标表情图像对应的目标结构特征信息;
    根据所述图像识别模型中的分类器,识别所述目标结构特征信息与所述图像识别模型中多个表情属性类型特征的匹配度,在由所述目标结构特征信息得到的多个匹配度中,将最大匹配度所对应的表情属性类型,作为与所述目标表情图像对应的表情属性类型。
  7. 根据权利要求1所述的方法,其中,还包括:
    获取属于第一表情类型的第一样本表情图像,并获取属于第二表情类型的第二样本表情图像;
    基于样本生成模型增强所述第一样本表情图像中微表情的表情特征,得到样本增强图像;
    基于样本判别模型提取与所述样本增强图像对应的第一结构特征信息,并根据所述样本判别模型中的分类器识别与所述第一结构特征信息对应的匹配概率;所述匹配概率用于表征所述样本增强图像属于真实表情类型的概率;
    基于样本识别模型提取与所述样本增强图像对应的第二结构特征信息,并根据所述样本识别模型中的分类器识别与所述第二结构特征信息对应的标签信息集合;所述标签信息集合用于表征所述样本增强图像与多种表情属性类型的匹配度;
    根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合生成模型损失值,并根据所述模型损失值确定所述图像增强模型和所述图像识别模型。
  8. 根据权利要求7所述的方法,其中,所述根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合生成模型损失值,并根据所述模型损失值确定所述图像增强模型和所述图像识别模型,包括:
    根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合和所述第一样本表情图像对应的表情属性类型生成所述模型损失值;
    根据所述模型损失值调整所述样本生成模型中参数的权值、所述样本判别模型中参数的权值和所述样本识别模型中参数的权值,当所述模型损失值小于目标阈值时,将调整后的样本生成模型确定为所述图像增强模型,并将调整后的样本识别模型确定为所述图像识别模型。
  9. 根据权利要求8所述的方法,其中,所述模型损失值包括:生成损失值、判别损失值和验证损失值;
    所述根据所述样本增强图像、所述第二样本表情图像、所述匹配概率、所述标签信息集合和所述第一样本表情图像对应的表情属性类型生成模型损失值,包括:
    根据所述样本增强图像和所述第二样本表情图像,确定所述生成损失值;
    根据所述匹配概率和所述第二样本表情图像,确定所述判别损失值;
    根据所述标签信息集合和所述第一样本表情图像对应的表情属性类型,确定所述验证损失值;
    根据所述生成损失值、所述判别损失值和所述验证损失值,生成所述模型损失值。
  10. 一种基于微表情的图像识别装置,包括:
    第一获取模块,用于获取属于第一表情类型的原始表情图像,将所述原始表情图像输入图像增强模型;所述属于第一表情类型的原始表情图像是包含微表情的图像;所述图像增强模型是根据属于第一表情类型的样本表情图像和属于第二表情类型的样本表情图像训练得到;所述属于第二表情类型的样本表情图像的表情强度大于所述属于第一表情类型的样本图像的表情强度;
    增强模块,用于在所述图像增强模型中增强所述原始表情图像中微表情的表情特征,得到属于第二表情类型的目标表情图像;
    识别模块,用于识别与所述目标表情图像对应的表情属性类型;
    确定模块,用于将与所述目标表情图像对应的表情属性类型确定为与所述原始表情图像对应的表情属性类型。
  11. 根据权利要求10所述的装置,其中,所述图像增强模型包括第一增强子模型和第二增强子模型;
    所述增强模块,包括:
    确定单元,用于将所述原始表情图像中的表情标识区域确定为单位原始表情图像;
    第一输入单元,用于将所述单位原始表情图像输入所述第一增强子模型,在所述第一增强子模型中增强所述单位原始表情图像的表情特征,得到单位辅助图像;
    第二输入单元,用于将所述原始表情图像输入所述第二增强子模型,在所述第二增强子模型中增强所述原始表情图像的表情特征,得到目标辅助图像;
    组合单元,用于根据所述单位原始表情图像在所述原始表情图像中的位置信息,将所述单位辅助图像和所述目标辅助图像组合为所述目标表情图像。
  12. 根据权利要求11所述的装置,其中,所述第一输入单元,包括:
    第一输入子单元,用于将所述单位原始表情图像输入所述第一增强子模型的输入层,得到与所述单位原始表情图像对应的第一原始矩阵;
    第一卷积子单元,用于从所述第一原始矩阵中随机采样,得到第一原始向量,根据所述第一增强子模型中的转置卷积层,对所述第一原始向量进行反卷积处理,得到第一目标张量,并将所述第一目标张量确定为所述单位辅助图像。
  13. 根据权利要求11所述的装置,其中,所述第二输入单元,包括:
    第二输入子单元,用于将所述原始表情图像输入所述第二增强子模型的输入层,得到与所述原始表情图像对应的第二原始矩阵;
    第二卷积子单元,用于从所述第二原始矩阵中随机采样,得到第二原始向量,根据所述第二增强子模型中的转置卷积层,对所述第二原始向量进行反卷积处理,得到第二目标张量,并将所述第二目标张量确定为所述目标辅助图像。
  14. 根据权利要求10所述的装置,其中,所述装置进一步包括:
    二值处理模块,用于将所述原始表情图像二值化处理,得到二值图像;
    边缘检测模块,用于基于梯度算子对所述二值图像进行边缘检测,得到梯度图像,并在所述梯度图像中确定边缘轮廓所在的目标位置信息;
    所述确定模块,还用于在所述原始表情图像中,将所述目标位置信息所标识的区域确定为所述表情标识区域。
  15. 根据权利要求10所述的装置,其中,所述识别模块进一步包括提取单元和 识别单元;
    所述提取单元用于将所述目标表情图像输入图像识别模型中;
    所述提取单元,还用于根据所述图像识别模型中的正向卷积层的卷积处理和池化层的池化处理,提取与所述目标表情图像对应的目标结构特征信息;
    所述识别单元,用于根据所述图像识别模型中的分类器,识别所述目标结构特征信息与所述图像识别模型中多个表情属性类型特征的匹配度,在由所述目标结构特征信息得到的多个匹配度中,将最大匹配度所对应的表情属性类型,作为与所述目标表情图像对应的表情属性类型。
  16. 一种电子设备,包括:处理器和存储器;
    所述处理器和存储器相连,其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行如权利要求1-9任一项所述的方法。
  17. 一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时,执行如权利要求1-9任一项所述的方法。
PCT/CN2019/116515 2018-11-21 2019-11-08 一种基于微表情的图像识别方法、装置以及相关设备 WO2020103700A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19886293.0A EP3885965B1 (en) 2018-11-21 2019-11-08 Image recognition method based on micro facial expressions, apparatus and related device
US17/182,024 US20210174072A1 (en) 2018-11-21 2021-02-22 Microexpression-based image recognition method and apparatus, and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811392529.7 2018-11-21
CN201811392529.7A CN109657554B (zh) 2018-11-21 2018-11-21 一种基于微表情的图像识别方法、装置以及相关设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/182,024 Continuation US20210174072A1 (en) 2018-11-21 2021-02-22 Microexpression-based image recognition method and apparatus, and related device

Publications (1)

Publication Number Publication Date
WO2020103700A1 true WO2020103700A1 (zh) 2020-05-28

Family

ID=66111311

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116515 WO2020103700A1 (zh) 2018-11-21 2019-11-08 一种基于微表情的图像识别方法、装置以及相关设备

Country Status (4)

Country Link
US (1) US20210174072A1 (zh)
EP (1) EP3885965B1 (zh)
CN (1) CN109657554B (zh)
WO (1) WO2020103700A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256589A (zh) * 2020-11-11 2021-01-22 腾讯科技(深圳)有限公司 一种仿真模型的训练方法、点云数据的生成方法及装置
CN113920575A (zh) * 2021-12-15 2022-01-11 深圳佑驾创新科技有限公司 一种人脸表情识别方法、装置及存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657554B (zh) * 2018-11-21 2022-12-20 腾讯科技(深圳)有限公司 一种基于微表情的图像识别方法、装置以及相关设备
KR20210067442A (ko) * 2019-11-29 2021-06-08 엘지전자 주식회사 객체 인식을 위한 자동 레이블링 장치 및 방법
CN111026899A (zh) * 2019-12-11 2020-04-17 兰州理工大学 一种基于深度学习的产品生成方法
CN111460981B (zh) * 2020-03-30 2022-04-01 山东大学 一种基于重构跨域视频生成对抗网络模型的微表情识别方法
CN112487980A (zh) * 2020-11-30 2021-03-12 深圳市广信安科技股份有限公司 基于微表情治疗方法、装置、系统与计算机可读存储介质
CN112529978B (zh) * 2020-12-07 2022-10-14 四川大学 一种人机交互式抽象画生成方法
CN113269173B (zh) * 2021-07-20 2021-10-22 佛山市墨纳森智能科技有限公司 一种建立情感识别模型和识别人物情感的方法和装置
US11910120B2 (en) * 2021-11-11 2024-02-20 International Business Machines Corporation Visual experience modulation based on stroboscopic effect
CN114882578B (zh) * 2022-07-12 2022-09-06 华中科技大学 一种多域对抗学习的小样本条件下复合表情识别方法
CN116369918B (zh) * 2023-02-22 2024-02-20 北京决明科技有限公司 情绪检测方法、系统、设备及存储介质
CN116894653B (zh) * 2023-08-16 2024-02-23 广州红海云计算股份有限公司 一种基于多预测模型联动的人事管理数据处理方法及系统
CN116824280B (zh) * 2023-08-30 2023-11-24 安徽爱学堂教育科技有限公司 基于微表情变化的心理预警方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570474A (zh) * 2016-10-27 2017-04-19 南京邮电大学 一种基于3d卷积神经网络的微表情识别方法
CN107292256A (zh) * 2017-06-14 2017-10-24 西安电子科技大学 基于辅任务的深度卷积小波神经网络表情识别方法
US20180027307A1 (en) * 2016-07-25 2018-01-25 Yahoo!, Inc. Emotional reaction sharing
CN108491835A (zh) * 2018-06-12 2018-09-04 常州大学 面向面部表情识别的双通道卷积神经网络
CN108830237A (zh) * 2018-06-21 2018-11-16 北京师范大学 一种人脸表情的识别方法
CN109657554A (zh) * 2018-11-21 2019-04-19 腾讯科技(深圳)有限公司 一种基于微表情的图像识别方法、装置以及相关设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170031814A (ko) * 2015-09-11 2017-03-22 한국과학기술원 얼굴의 미세 표정 인식 방법 및 장치
CN106548149B (zh) * 2016-10-26 2020-04-03 河北工业大学 监控视频序列中人脸微表情图像序列的识别方法
KR20180057096A (ko) * 2016-11-21 2018-05-30 삼성전자주식회사 표정 인식과 트레이닝을 수행하는 방법 및 장치
KR102387570B1 (ko) * 2016-12-16 2022-04-18 삼성전자주식회사 표정 생성 방법, 표정 생성 장치 및 표정 생성을 위한 학습 방법
CN107273876B (zh) * 2017-07-18 2019-09-10 山东大学 一种基于深度学习的‘宏to微转换模型’的微表情自动识别方法
CN108710829A (zh) * 2018-04-19 2018-10-26 北京红云智胜科技有限公司 一种基于深度学习的表情分类及微表情检测的方法
CN108629314B (zh) * 2018-05-07 2021-08-10 山东大学 一种基于主动迁移学习的微表情识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180027307A1 (en) * 2016-07-25 2018-01-25 Yahoo!, Inc. Emotional reaction sharing
CN106570474A (zh) * 2016-10-27 2017-04-19 南京邮电大学 一种基于3d卷积神经网络的微表情识别方法
CN107292256A (zh) * 2017-06-14 2017-10-24 西安电子科技大学 基于辅任务的深度卷积小波神经网络表情识别方法
CN108491835A (zh) * 2018-06-12 2018-09-04 常州大学 面向面部表情识别的双通道卷积神经网络
CN108830237A (zh) * 2018-06-21 2018-11-16 北京师范大学 一种人脸表情的识别方法
CN109657554A (zh) * 2018-11-21 2019-04-19 腾讯科技(深圳)有限公司 一种基于微表情的图像识别方法、装置以及相关设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3885965A4 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256589A (zh) * 2020-11-11 2021-01-22 腾讯科技(深圳)有限公司 一种仿真模型的训练方法、点云数据的生成方法及装置
CN112256589B (zh) * 2020-11-11 2022-02-01 腾讯科技(深圳)有限公司 一种仿真模型的训练方法、点云数据的生成方法及装置
CN113920575A (zh) * 2021-12-15 2022-01-11 深圳佑驾创新科技有限公司 一种人脸表情识别方法、装置及存储介质

Also Published As

Publication number Publication date
CN109657554A (zh) 2019-04-19
EP3885965B1 (en) 2024-02-28
EP3885965A4 (en) 2022-01-12
CN109657554B (zh) 2022-12-20
EP3885965A1 (en) 2021-09-29
US20210174072A1 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
WO2020103700A1 (zh) 一种基于微表情的图像识别方法、装置以及相关设备
CN109359538B (zh) 卷积神经网络的训练方法、手势识别方法、装置及设备
CN110348387B (zh) 一种图像数据处理方法、装置以及计算机可读存储介质
EP3992919B1 (en) Three-dimensional facial model generation method and apparatus, device, and medium
WO2023098128A1 (zh) 活体检测方法及装置、活体检测系统的训练方法及装置
CN109685713B (zh) 化妆模拟控制方法、装置、计算机设备及存储介质
US20230095182A1 (en) Method and apparatus for extracting biological features, device, medium, and program product
CN110909680A (zh) 人脸图像的表情识别方法、装置、电子设备及存储介质
CN113850168A (zh) 人脸图片的融合方法、装置、设备及存储介质
CN108388889B (zh) 用于分析人脸图像的方法和装置
WO2021127916A1 (zh) 脸部情感识别方法、智能装置和计算机可读存储介质
WO2024109374A1 (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
WO2024001095A1 (zh) 面部表情识别方法、终端设备及存储介质
WO2022227765A1 (zh) 生成图像修复模型的方法、设备、介质及程序产品
CN117237547B (zh) 图像重建方法、重建模型的处理方法和装置
Das et al. A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition
WO2021217919A1 (zh) 人脸动作单元识别方法、装置、电子设备及存储介质
Guo et al. Hand gesture recognition and interaction with 3D stereo camera
Purps et al. Reconstructing facial expressions of HMD users for avatars in VR
Shukla et al. Deep Learning Model to Identify Hide Images using CNN Algorithm
CN114677476A (zh) 一种脸部处理方法、装置、计算机设备及存储介质
CN117011449A (zh) 三维面部模型的重构方法和装置、存储介质及电子设备
CN114005156A (zh) 人脸替换方法、系统、终端设备及计算机存储介质
CN112132107A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
US20240104180A1 (en) User authentication based on three-dimensional face modeling using partial face images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19886293

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019886293

Country of ref document: EP

Effective date: 20210621