WO2020140723A1 - 人脸动态表情的检测方法、装置、设备及存储介质 - Google Patents
人脸动态表情的检测方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2020140723A1 WO2020140723A1 PCT/CN2019/124928 CN2019124928W WO2020140723A1 WO 2020140723 A1 WO2020140723 A1 WO 2020140723A1 CN 2019124928 W CN2019124928 W CN 2019124928W WO 2020140723 A1 WO2020140723 A1 WO 2020140723A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sequence
- state
- face
- facial
- dynamic expression
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 230000008921 facial expression Effects 0.000 title claims abstract description 14
- 230000014509 gene expression Effects 0.000 claims abstract description 123
- 230000001815 facial effect Effects 0.000 claims abstract description 95
- 210000004709 eyebrow Anatomy 0.000 claims description 104
- 210000001508 eye Anatomy 0.000 claims description 81
- 230000008859 change Effects 0.000 claims description 39
- 238000001514 detection method Methods 0.000 claims description 26
- 239000011159 matrix material Substances 0.000 claims description 26
- 210000003128 head Anatomy 0.000 claims description 24
- 210000000744 eyelid Anatomy 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 210000000214 mouth Anatomy 0.000 description 66
- 210000001331 nose Anatomy 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004399 eye closure Effects 0.000 description 1
- 230000000193 eyeblink Effects 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
Definitions
- the present application relates to the field of image recognition technology, for example, to a method, device, device, and storage medium for detecting facial dynamic expressions.
- facial facial expressions are detected through facial images, and the computer can detect and recognize facial dynamic expressions in human-computer interaction scenarios to better understand user emotions State, which in turn improves the user experience during human-computer interaction.
- the dynamic expression detection method in the related art needs to collect dynamic expression video data corresponding to a period of time, thus affecting the scalability of the solution.
- dynamic facial expression detection can be achieved through deep neural network adaptive extraction of features in video.
- this type of scheme has high accuracy and good scalability, it requires a large amount of video data as training samples, and The computational complexity is high, and it is difficult to achieve real-time detection of dynamic expressions.
- the embodiments of the present application provide a method, a device, a device, and a storage medium for detecting a dynamic expression of a human face, which can accurately detect the dynamic expression of a person appearing in a video stream in real time.
- an embodiment of the present application provides a method for detecting dynamic facial expressions, including:
- an embodiment of the present application provides a device for detecting facial dynamic expressions, including:
- the facial image acquisition module is configured to acquire at least two frames of facial images in the video stream;
- a state sequence determination module configured to determine a face state sequence based on the coordinate information set of key points in the at least two frames of face images
- the dynamic expression determination module is configured to determine the facial dynamic expression by comparing the facial state sequence and the preset dynamic expression sequence.
- an embodiment of the present application provides a computer device, including:
- One or more processors are One or more processors;
- Storage device set to store one or more programs
- the one or more programs are executed by the one or more processors, so that the one or more processors implement the method provided by the embodiments of the present application.
- an embodiment of the present application provides a computer-readable storage medium that stores a computer program on the computer-readable storage medium, and the computer program is executed by a processor to implement the method provided by the embodiment of the present application.
- FIG. 1 is a schematic flowchart of a method for detecting a facial dynamic expression provided in an embodiment of the present application
- Figure 2 shows a schematic diagram of a face image with key point identification after key point detection
- FIG. 3 is a structural block diagram of a device for detecting dynamic facial expressions provided by an embodiment of the present application
- FIG. 4 is a schematic diagram of a hardware structure of a computer device provided by an embodiment of the present application.
- the embodiment of the present application is applicable to a live streaming platform that provides a video stream or other human-computer interaction scenes that detect dynamic expressions.
- the face in the video stream can be quickly and simply Detect dynamic expressions (such as blinking, opening mouth, shaking head, nodding, raising eyebrows, etc.).
- the method may be implemented by a device for detecting facial dynamic expressions, where the device may be implemented by software and/or hardware, and may generally be integrated as a plug-in in application software with human-computer interaction.
- the key to dynamic expression detection is to detect whether there is a facial expression change in the video information over a period of time.
- the detection method includes: 1) analyzing the texture and geometric information of each frame of the image over a period of time and comparing multiple frames of the image Features are fused, and the corresponding dynamic expressions in the time period are identified through the classification model; 2) the position information of the key points in the face picture within a period of time is detected, and the key points in the face picture in the period of time are detected As the input of the classifier, the dynamic expression in this time period is predicted by training the classification model.
- the embodiments of the present application provide a method, a device, a device, and a storage medium for detecting facial dynamic expressions, which ensure scalability in practical applications, and can accurately detect the dynamic expressions of characters appearing in a video stream in real time.
- FIG. 1 is a schematic flowchart of a method for detecting a facial dynamic expression provided in an embodiment of the present application. As shown in FIG. 1, the method includes S1010 to S1030.
- the video stream may be understood as a video being played in real time, such as a live video, etc.
- the face image may be understood as an image including facial information of a person in an image frame constituting the video stream.
- images of consecutive frames in the video stream may be acquired in real time, or at least two frames of images may be acquired at set intervals.
- the acquired image includes face information of a person who performs face detection.
- an image including face information of a person is referred to as a face image.
- the acquired image may include multiple persons
- the facial information is equivalent to the existence of multiple human faces that can perform dynamic expressions.
- dynamic expression detection can be performed on multiple characters appearing in the face image.
- the person Face dynamic expression detection is performed on the same person in multiple acquired face images.
- the corresponding key point coordinate information set can be determined, and the key point coordinate information set can be understood as used to identify the contour of the face of the person and the facial organ in the face image Collection of coordinate points.
- the key point coordinate information set of the face image of any frame includes key point coordinate information identifying the contour of the face and key point coordinate information identifying the eyes, eyebrows, mouth and nose.
- a preset key point detection model may be used to detect a face image, and then a key point coordinate information set corresponding to the face image may be obtained.
- the key point detection model used is obtained through pre-training and learning.
- the training and learning process of the key point detection model can be expressed as: given a sample picture set with actual marked values of key points, the feature vector of the sample picture in the sample picture set is extracted through a convolutional neural network to obtain the sample picture.
- the predicted key point coordinate information using the loss function L2-loss to calculate the loss between the predicted key point coordinate information and the actual marked value of the key point corresponding to the sample picture, and the network parameters are corrected by back propagation until the network convergence is stable and the available Key point detection model.
- the implementation of the obtained key point coordinate information The number is not specifically limited, and can be adjusted according to the actual application.
- the determination method of the key point coordinate information includes a method based on the above key point detection model, and other methods may also be used, such as a supervised descent method.
- FIG. 2 shows a schematic diagram of a face image with key point identification after key point detection.
- the total number of key points of the face image in FIG. 2 is 40, of which the reference number is 200.
- the key points of -209 can be used to identify the face contour of the face image; the key points labeled 210-212, 216, and 217 are used to identify the left eyebrows of the face image relative to the user; the labels are 213-215, 218
- the key points of 219 are used to identify the right eyebrows of the face image relative to the user; the key points labeled 220-222, 226 and 223-225, 227 are used to identify the relative user in the face image
- the left and right eyes; the key points labeled 228-233 are used to identify the nose in the face image; the key points labeled 234-239 are used to identify the mouth in the face image, where multiple key points are Each has its own coordinate information.
- the face state sequence may be understood as a state sequence composed of face states in at least two frames of the acquired face images.
- the current face state of the person in the face image can be determined; the face corresponding to each face image
- the facial states are combined in chronological order to obtain a facial state sequence, which can represent the facial expression state of the person in the human face image currently in the video stream.
- the face state includes at least one of the following: eye state, eyebrow state, mouth state, face swinging left and right, face swinging up and down;
- the face state sequence includes at least one of the following: eyes open Closed state sequence, eyebrow closed state sequence, mouth closed state sequence, shaking head state sequence, nodding state sequence.
- the face state detected in this embodiment may be the eye state, eyebrow state, mouth One or more of the state and the face swing state (such as up and down or left and right swing), and the eye state can be closed, the eyebrow state can be picked up, the mouth state can be opened and closed, and the face swing state can be Swing up and down (nodding) or swinging left and right (shaking his head), therefore, after determining the facial state of the acquired multi-frame face images, the resulting facial state sequence can be correspondingly the eye open state sequence and the eyebrow retracted state One or more of a sequence, a mouth opening and closing state sequence, a swinging state sequence (such as a shaking head state sequence or a nodding state sequence).
- the eye state is divided into left and right eye states, and the eyebrow state is also divided into left and right eyebrow states, that is, the left and right eyes can have open and closed states, respectively, and the left and right eyebrows can also have retracted states.
- the state sequence corresponding to the left eye and the right eye and the state sequence corresponding to the left eyebrow and the right eyebrow are distinguished by different naming of the face state sequence.
- the facial state sequence can be compared with a preset dynamic expression sequence (ie, a preset dynamic expression sequence), and the video can be determined according to the comparison result Whether the character in the stream currently has a facial dynamic expression.
- the dynamic expression sequence can be understood as a set of facial states that realize facial expression changes.
- the expression change from eyes open to closed can be represented by a dynamic expression sequence, and for example, the mouth changes from closed to open.
- the open expression change can also be represented by a dynamic expression sequence.
- the dynamic expression sequence can be preset according to the state of the face when the facial expression changes, for example, a state sequence including two states of mouth opening and mouth closing can be set as the dynamic expression representing the mouth opening and closing sequence.
- the face state sequence is a set of face states determined based on at least two frames of face images.
- the state information included in the dynamic expression sequence and the state information in the face state sequence can be used Matching to determine whether the state information of the dynamic expression sequence all appears in the facial state sequence, thereby determining whether the person in the video stream currently has a facial dynamic expression.
- the dynamic expression sequence includes: an eye dynamic change sequence, an eyebrow dynamic change sequence, a mouth dynamic change sequence, a shaking head change sequence, and a nod change sequence.
- the face state sequence when the face state sequence includes one or more of the eye opening and closing state sequence, the eyebrow closing state sequence, the mouth opening and closing state sequence, the shaking head state sequence, and the nodding state sequence, it is also preset
- the dynamic change sequence of the eyes, the dynamic change sequence of the eyebrows, the dynamic change sequence of the mouth, the change sequence of the shaking head, and the change sequence of the nodding are used as the corresponding dynamic expression sequences.
- the face state sequence is a mouth opening and closing state sequence
- the dynamic expression sequence based on the comparison in this step is actually a mouth dynamic change sequence.
- the method provided by the embodiment of the present application avoids the limitation of the collection of the video data to be detected under the premise of ensuring the detection speed, thereby ensuring the practical application of the solution.
- the scalability of the application in addition, the solution of the present application does not require training and learning through training samples in advance, only by comparing the determined facial state sequence with the preset dynamic expression sequence, you can easily and quickly determine whether there is dynamic face Expressions effectively reduce the computational complexity and better reflect the real-time nature of dynamic expression detection.
- the determination of the facial state is one of the steps. From the above expression, it can be seen that the facial state can be the eye state, the eyebrow state, the mouth state, and the face swinging left and right And one or more of the face swinging up and down, thus, this embodiment provides a way to determine the state of the face.
- the face state is determined as follows:
- the above determination method in this embodiment is applicable to the case where the face state is an eye state (left eye state and/or right eye state).
- the above determination method determines whether the eye state is open or based on the distance from the upper eyelid to the lower eyelid of the eye closure.
- the key point coordinate information set contains key point coordinate information for identifying all key points of the face, as shown in the face image shown in FIG.
- the key point 221 and the key point 224 respectively identify the upper eyelid of the left and right eyes
- the key point coordinate information of the key point 221 and the key point 224 can be obtained
- the key point 226 and the key point 227 respectively identify
- the lower eyelid of the left and right eyes can also obtain the keypoint coordinate information of the keypoint 226 and the keypoint 227, and thus, the distance from the keypoint 221 to the keypoint 226 can be determined as the distance value of the left eyelid, and the keypoint 224 to The distance of the key point 227 is determined as the eyelid distance value of the right eye.
- this embodiment introduces the length of the nose bridge of the nose in the face image (such as the distance from the key point 228 to the key point 230 in FIG. 2) as the normalized eye value . Because the change in the size of the face image is proportional to the change in the length of the bridge of the nose in the face image, normalize the eyelid distance value (the eyelid distance value for the left and/or right eye), and normalize the normalized A value (the ratio of the eyelid distance value to the normalized eye standard value) is compared with the eye state threshold, thereby determining that the face state is that the eyes (left eye and/or right eye) are open or eyes are closed.
- the eyelid distance value the eyelid distance value for the left and/or right eye
- a value the ratio of the eyelid distance value to the normalized eye standard value
- the determined eye states can be merged based on the time sequence, thereby forming a face state sequence with the eye state as the face state.
- a face state sequence with the eye state as the face state.
- the partial state sequence is equivalent to the eye-closed state sequence, which can be expressed as ⁇ eye-open, eye-open, eye-closed, eye-closed, eye-opened ⁇ .
- the dynamic expression sequence compared with the sequence of the eye opening and closing state is actually equivalent to the dynamic sequence of the eye.
- the face state is determined as follows:
- the upper eyebrow root and the lower eyebrow root are collectively identified Coordinate information of the key points of the face, determine the width of the eyebrow root in the face image, and use the width of the eyebrow root as the normalized eyebrow standard value to obtain the normalized value of the connection distance value; when the normalized value is greater than the set In the case of a fixed eyebrow state threshold, the face state is raised eyebrows; in the case where the normalized value is less than or equal to the set eyebrow state threshold, the face state is normal eyebrows.
- the above determination method in this embodiment is applicable to the case where the facial state is the eyebrow state (the left eyebrow state and/or the right eyebrow state).
- the above determination method determines the eyebrow state based on the distance between the upper eyebrow root of the eyebrows and the corner of the same eye Pick up or normal.
- the key point coordinate information set includes key point coordinate information used to identify all key points of the face, and still the human face shown in FIG.
- the key point 212 and the key point 213 respectively identify the upper eyebrow roots of the left and right eyebrows
- the key point coordinate information of the key point 212 and the key point 213 can be obtained, where the key point 222 and the key point 223 are respectively The eye corners of the left and right eyes are identified, and the key point coordinate information of the key point 222 and the key point 223 can be obtained, from which the distance between the key point 212 and the key point 222 can be determined as the connection between the upper eyebrow root of the left eyebrow and the eye corner on the same side
- the distance from the key point 213 to the key point 223 is determined as the line distance value from the upper eyebrow root of the right eyebrow to the corner of the eye on the same side.
- this embodiment introduces the width of the eyebrow root of the eyebrows in the face image (such as the distance from the key point 212 to the key point 217 in FIG. 2 or the key point 213 to the key The distance of point 218) is used as the normalized standard value of the eyebrow, because the size change of the face image is also proportional to the change of the width of the eyebrow root in the face image.
- connection distance the connection distance of the left and/or right eyebrows
- normalized normalized value connection distance value and The ratio of the width of the eyebrow root
- the determined eyebrow states can also be merged based on the time sequence, thereby forming a facial state sequence that takes the eyebrow state as the face state.
- the eyebrow states of the left eyebrow in the five frame image images are determined to be normal eyebrows, normal eyebrows, normal eyebrows, eyebrows, and eyebrows, then the face at this time
- the partial state sequence is equivalent to the eyebrow retracting state sequence, which can be expressed as ⁇ normal eyebrows, normal eyebrows, eyebrows up, eyebrows up, eyebrows up ⁇ , in order to facilitate subsequent comparison with the corresponding dynamic expression sequence, this implementation
- 1 indicates that the eyebrows are raised, and 0 indicates that the eyebrows are normal
- LB is set to indicate the eyebrow corresponding to the left eyebrow Pick up status sequence
- set RB to indicate the eyebrow picking status sequence corresponding to the right eyebrow, therefore, the above determined eyebrow
- the dynamic expression sequence compared with the eyebrow retracted state sequence in this embodiment is actually equivalent to the eyebrow dynamic change sequence.
- the face state is determined as follows:
- the above determination method in this embodiment is applicable to the case where the face state is the mouth state.
- the above determination method determines whether the mouth state is open or closed based on the distance between the lower edge of the upper lip of the mouth and the upper edge of the lower lip.
- the key point coordinate information set contains key point coordinate information for identifying all key points of the face, as shown in the face image shown in FIG. 2
- the key point 237 identifies the lower edge of the upper lip of the mouth
- the key point 238 identifies the upper edge of the lower lip of the mouth
- the key point coordinate information of the key point 237 and the key point 238 can be obtained.
- the distance from the key point 237 to the key point 238 is determined as the inter-lip distance value of the mouth.
- this embodiment introduces the thickness of the upper lip of the mouth in the face image (such as the distance from the key point 234 to the key point 237 in FIG. 2) as the lip normalization standard Value, because the change in the size of the face image is also proportional to the change in the thickness of the upper lip in the face image, normalize the inter-lip distance value, and normalize the normalized value (the inter-lip distance value and the lip The ratio of the partial normalized standard value) is compared with the lip state threshold, thereby determining whether the face state is open or closed.
- the determined mouth states may be merged based on the time sequence, thereby forming a face state sequence with the mouth state as the face state.
- the sequence of facial states at this time is equivalent
- the sequence can be expressed as ⁇ mouth closed, mouth closed, mouth closed, mouth open, mouth closed ⁇ .
- 1 represents the mouth Open, with 0 indicating that the mouth is closed
- the M is set to indicate the mouth opening and closing state sequence. Therefore, the above determination
- the dynamic expression sequence compared with the mouth opening and closing state sequence in this embodiment is actually equivalent to the mouth dynamic change sequence.
- the face state is determined as follows:
- the element value of, m and n are both positive integers.
- the above determination method in this embodiment is applicable to the case where the face state is swinging left and right of the face or the face is swinging up and down.
- the above determination method determines the face swing state as the face based on the rotation matrix of the key point coordinate information from the two-dimensional plane to the three-dimensional space Swing left or right or face up and down.
- the key point coordinate information set corresponding to each frame of the face image contains key point coordinate information for identifying all key points of the face in the two-dimensional plane.
- the face image can also determine the coordinate information of the three-dimensional key points corresponding to the key points in the two-dimensional plane of each frame of the face image in three-dimensional space, and the key point coordinate information in the two-dimensional plane can be expressed by a two-dimensional plane matrix
- the coordinate information of key points in three-dimensional space can be represented by a three-dimensional space matrix.
- the two-dimensional plane matrix and the corresponding three-dimensional space matrix are known, based on the preset rotation matrix calculation model, it can be determined that the two-dimensional plane matrix is converted into The rotation matrix of the three-dimensional space matrix, according to the rotation matrix and the set yaw angle value calculation formula or pitch angle value calculation formula, the yaw angle value or pitch angle value of the face in the face image can be determined.
- the yaw angle value or the pitch angle value can be used as the face state corresponding to each frame of the face image.
- the determination manner of the face state sequence in the above determination manner is different from the determination manners of the other several face states.
- the face state sequence formed based on the yaw angle value the face state sequence at this time is equivalent to the shaking head state sequence
- Y can be set to identify the shaking head state sequence
- Y i can be set to identify the shaking head state sequence
- the face state sequence formed based on the pitch angle value the face state sequence at this time is equivalent to the nodding state sequence
- P can be set to identify the nodding state sequence
- P can be set i to identify the ith nod status value in the nod status sequence.
- the manner of determining the shaking head state sequence can be described as: merging multiple frames of face images in chronological order, and obtaining the yaw angle values corresponding to the multiple frames of face images, and corresponding to the first frame of the combined face images
- the value of the shaking head state is set to 0.
- the shaking head state value corresponding to the i-th face image in the shaking head state sequence it can be determined based on the following formula:
- Y i represents the shaking head state value corresponding to the i-th face image
- yaw i represents the yaw angle value corresponding to the i-th face image
- yaw i-1 represents the yaw corresponding to the i- th face image
- yaw_thres represents the set threshold of shaking the head.
- the way to determine the nodding state sequence can be described as: merging multiple frames of face images in chronological order, and acquiring the pitch angle values corresponding to the multiple frames of face images, and converting the first frame of the merged face images to the corresponding
- the nod status value is set to 0.
- the nod status value corresponding to the i-th face image in the nod status sequence it can be determined based on the following formula:
- P i represents the nodding state value corresponding to the i-th face image
- pitch i represents the pitch angle value corresponding to the i-th face image
- pitch i-1 represents the pitch angle value corresponding to the i- th face image
- Pitch_thres represents the set nod state threshold
- the dynamic expression sequence compared with the shaking state sequence in this embodiment is actually equivalent to the shaking head change sequence.
- the dynamic expression sequence compared with the nodding state sequence in this embodiment is actually equivalent to the nodding change sequence, which is implemented in one
- the comparison of the facial state sequence and the preset dynamic expression sequence to determine the facial dynamic expression includes:
- the element information in the preset dynamic expression sequence appears sequentially in the face state sequence, it is determined that there is a facial dynamic expression corresponding to the preset dynamic expression sequence; in the preset dynamic expression If the element information in the sequence does not sequentially appear in the facial state sequence, it is determined that there is no facial dynamic expression corresponding to the preset dynamic expression sequence.
- this embodiment will also compare the facial state sequence and the preset dynamic expression sequence to determine the facial dynamic expression, which may include:
- variable i is greater than the sequence length and continuous target element information has been found, it is determined that the preset dynamic expression sequence corresponds to the face state sequence Face dynamic expression, if the variable i is less than or equal to the sequence length, it is determined that there is no facial dynamic expression corresponding to the preset dynamic expression sequence in the facial state sequence.
- variable i The value of the variable i is changed to 2, the sequence number of the comparison is still 1, you can return to continue the element information comparison, and you can determine that the second element value 1 in Tm matches the fourth element value 1 in M,
- the value of variable i is equal to 3
- M contains all the elements in Tm in turn, so it can be considered that there is currently a mouth state sequence corresponding to the dynamic change sequence of the mouth, which can further explain that the characters in the video stream currently have mouth opening dynamics expression.
- the method provided by the embodiments of the present application can determine the dynamic expression in the video stream through the comparison detection of the mouth state, the eye state, the eyebrow state, and the face swing state, and the human face dynamic expression detection in the related art
- the technical solution of the present invention on the premise of ensuring the detection speed, avoids the collection limitation of the video data to be detected, thereby ensuring the scalability of the solution in practical applications.
- the method embodiments are all expressed as a series of action combinations, but the embodiments of the present application are not limited by the sequence of actions described, because according to the embodiments of the present application, certain steps can be Use other sequences or at the same time.
- FIG. 3 it is a structural block diagram of a device for detecting human facial dynamic expressions provided by an embodiment of the present application.
- the device may be implemented in software and/or hardware, and may be integrated on a computer device.
- the computer device may be composed of two or more physical entities, or a physical entity.
- the device may be a personal computer (Personal Computer, PC), computer, mobile phone, tablet device, and personal digital assistant. .
- the device includes a face image acquisition module 31, a state sequence determination module 32, and a dynamic expression determination module 33.
- the face image acquisition module 31 is set to acquire at least two frames of face images in the video stream;
- the state sequence determination module 32 is configured to determine the face state sequence according to the key point coordinate information set in the at least two frames of face images;
- the dynamic expression determination module 33 is configured to determine the facial dynamic expression by comparing the facial state sequence and the preset dynamic expression sequence.
- the device for detecting facial dynamic expressions can execute the method provided in any embodiment of the present application, and has functions and effects corresponding to the execution method.
- an embodiment of the present application further provides a computer device, including: a processor and a memory. At least one instruction is stored in the memory, and the instruction is executed by the processor, so that the computer device executes the method as described in the foregoing method embodiments.
- the computer device may include a processor 40, a storage device 41, a display screen 42 with a touch function, an input device 43, an output device 44, and a communication device 45.
- the number of processors 40 in the computer device may be one or more, and one processor 40 is taken as an example in FIG. 4.
- the number of storage devices 41 in the computer device may be one or more, and one storage device 41 is taken as an example in FIG. 4.
- the processor 40, the storage device 41, the display screen 42, the input device 43, the output device 44, and the communication device 45 of the computer device may be connected via a bus or other means. In FIG. 4, connection via a bus is used as an example.
- the processor 40 executes one or more programs stored in the storage device 41, the following operations are implemented: acquiring at least two frames of face images in the video stream; according to the key in the at least two frames of face images The point coordinate information set determines the facial state sequence; compares the facial state sequence with the preset dynamic expression sequence to determine the facial dynamic expression.
- Embodiments of the present application further provide a computer-readable storage medium, and when the program in the storage medium is executed by a processor of a computer device, the computer device can execute the method described in the foregoing method embodiments.
- the method includes: acquiring at least two frames of face images in a video stream; determining a facial state sequence according to the key point coordinate information set in the at least two frames of face images; comparing the facial state sequence and The preset dynamic expression sequence determines the facial dynamic expression.
- the multiple units and modules included are only divided according to functional logic, but it is not limited to the above division, as long as the corresponding functions can be realized; in addition, multiple functional units The name is just to distinguish each other.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
一种人脸动态表情的检测方法、装置、设备及存储介质。该方法包括:获取视频流中的至少两帧人脸图像(S1010);根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列(S1020);比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情(S1030)。
Description
本申请要求在2018年12月30日提交中国专利局、申请号为201811648826.3的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
本申请涉及图像识别技术领域,例如涉及人脸动态表情的检测方法、装置、设备及存储介质。
在人机交互场景(如直播平台及活体检测等应用场景)中,通过面部图像检测面部动态表情,计算机通过对人机交互场景中面部动态表情的检测识别,从而能够更好地理解用户的情感状态,进而提高用户在人机交互过程中的用户体验。
相关技术中的动态表情检测方法需要采集一段时间相对应的动态表情视频数据,因而影响方案的可扩展性。此外,可以通过深度神经网络自适应地提取视频中的特征来实现动态表情检测,该类方案尽管具有较高准确率,且具有较好可扩展性,但是需要大量的视频数据作为训练样本,且计算复杂度高,很难实现动态表情的实时检测。
发明内容
本申请实施例提供了人脸动态表情的检测方法、装置、设备及存储介质,能够实时准确的检测视频流中所出现人物的动态表情。
在一实施例中,本申请实施例提供了一种人脸动态表情的检测方法,包括:
获取视频流中的至少两帧人脸图像;
根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;
比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
在一实施例中,本申请实施例提供了一种人脸动态表情的检测装置,包括:
人脸图像获取模块,设置为获取视频流中的至少两帧人脸图像;
状态序列确定模块,设置为根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;
动态表情确定模块,设置为比较所述脸部状态序列和预设的动态表情序列 确定脸部动态表情。
在一实施例中,本申请实施例提供了一种计算机设备,包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请实施例提供的方法。
在一实施例中,本申请实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时实现本申请实施例提供的方法。
图1为本申请实施例中提供的一种人脸动态表情的检测方法的流程示意图;
图2给出了关键点检测后具备关键点标识的人脸图像示意图;
图3为本申请实施例提供的一种人脸动态表情的检测装置的结构框图;
图4为本申请实施例提供的一种计算机设备的硬件结构示意图。
下面结合附图和实施例对本申请进行说明。此处所描述的实施例仅仅用于解释本申请,而非对本申请的限定。附图中仅示出了与本申请相关的部分而非全部结构。
在一实施例中,本申请实施例适用于提供视频流的直播平台或其他检测动态表情的人机交互场景中,基于本申请实施例提供的方法,能够简单快速的对视频流中的人脸进行动态表情(如眨眼、张嘴、摇头、点头、挑眉等)检测。在一实施例中,该方法可以由人脸动态表情的检测装置实现,其中,该装置可以由软件和/或硬件实现,并一般可作为插件集成在具备人机交互的应用软件中。
在一实施例中,动态表情检测的关键在于检测一段时间内的视频信息中是否存在人脸表情变化,检测方法包括:1)分析一段时间内每帧图片的纹理和几何信息并对多帧图片的特征进行融合,通过分类模型,判别该时间段内的对应的动态表情;2)检测一段时间内人脸图片中的关键点位置信息,并将该段时间内的人脸图片中的关键点作为分类器的输入,通过训练分类模型来预测该时间段内的动态表情。
上述两类方案尽管具有较快的检测速度,但均需要采集相对应的动态表情 视频数据,因而影响方案的可扩展性。本申请实施例提供了一种人脸动态表情的检测方法、装置、设备及存储介质,保证了在实际应用中的可扩展性,能够实时准确的检测视频流中所出现人物的动态表情。
图1为本申请实施例中提供的一种人脸动态表情的检测方法的流程示意图。如图1所示,该方法包括S1010至S1030。
S1010、获取视频流中的至少两帧人脸图像。
在本实施例中,所述视频流可理解为正在实时播放的视频,如直播视频等,所述人脸图像可理解为构成视频流的图像帧中包括人物脸部信息的图像。在一实施例中,本步骤可以实时获取视频流中连续帧的图像,也可以间隔设定时间获取至少两帧图像。在一实施例中,所获取的图像中包含进行人脸检测的人物脸部信息,本实施例将包括人物脸部信息的图像称为人脸图像,此外,所获取的图像中可以包括多个人物的脸部信息,相当于存在多张可以进行动态表情的人脸,基于本实施例提供的方法,可以对人脸图像中出现的多个人物均进行动态表情检测,在一实施例中,人脸动态表情检测是针对所获取多张人脸图像中的同一个人物进行的。
S1020、根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列。
在本实施例中,针对每帧人脸图像,都可确定出相应的关键点坐标信息集,所述关键点坐标信息集可理解为用于标识人脸图像中人物脸部轮廓及脸部器官的坐标点集合。示例性的,任一帧人脸图像的关键点坐标信息集中均包括了标识脸部轮廓的关键点坐标信息及标识眼睛、眉毛、嘴巴和鼻子的关键点坐标信息。
本实施例中,可以采用预设的关键点检测模型来检测人脸图像,进而获得人脸图像对应的关键点坐标信息集,所采用的关键点检测模型经过预先训练学习来获得。在一实施例中,关键点检测模型的训练学习过程可表述为:给定带有关键点实际标注值的样本图片集,通过卷积神经网络提取样本图片集中样本图片的特征向量,获得样本图片的预测关键点坐标信息,利用损失函数L2-loss计算预测关键点坐标信息与样本图片对应的关键点实际标注值之间的损失,通过反向传播修正网络参数,直至网络收敛稳定,获得可用的关键点检测模型。
在一实施例中,利用预设的关键点检测模型确定的人脸图像的关键点个数越多,越能更好的标识人脸图像的脸部信息,本实施对获取的关键点坐标信息的个数没有具体限定,可根据实际应用实际调整。在一实施例中,对于第i个关键点坐标信息可以表示为p
i=(x
i,y
i),假设关键点个数为A,则i的取值为[0,A-1], 关键点坐标信息的确定方式包括基于上述关键点检测模型实现的方式,还可以采用其他方式,如采用监督下降法等方式来实现。
参考图2,图2给出了关键点检测后具备关键点标识的人脸图像示意图,在一实施例中,检测出图2中人脸图像的关键点总数为40个,其中,标号为200-209的关键点可用于标识人脸图像的脸部轮廓;标号为210-212、216、217的关键点用于标识人脸图像中相对用户而言的左眉毛;标号为213-215、218、219的关键点用于标识人脸图像中相对用户而言的右眉毛;标号为220-222、226以及标号为223-225、227的关键点分别用于标识人脸图像中相对用户而言的左眼和右眼;标号为228-233的关键点用于标识人脸图像中的鼻子;标号为234-239的关键点用于标识人脸图像中的嘴巴,其中,多个关键点均分别具有各自的坐标信息。
在本实施例中,所述脸部状态序列可理解为基于所获取的至少两帧人脸图像中的脸部状态组成的状态序列。在一实施例中,针对每帧人脸图像,基于该帧人脸图像对应的关键点坐标信息集,可以确定人脸图像中人物当前的脸部状态;将每张人脸图像所对应的脸部状态以时间顺序进行组合,就可获得一个脸部状态序列,所述脸部状态序列可以表征人脸图像中的人物当前在视频流中的脸部表情状态。
在一实施例中,脸部状态包括下述至少一种:眼睛状态、眉毛状态、嘴巴状态、脸部左右摆动、脸部上下摆动;所述脸部状态序列包括下述至少一种:眼睛睁闭状态序列、眉毛收挑状态序列、嘴巴张合状态序列、摇头状态序列、点头状态序列。
在本实施例中,对视频流中的人物进行动态表情检测时,可检测人物脸部的脸部状态是否发生了变化,本实施例所检测的脸部状态可以为眼睛状态、眉毛状态、嘴巴状态、脸部摆动状态(如上下摆动或左右摆动)中的一种或几种,且眼睛状态可以有睁闭、眉毛状态可以有收挑、嘴巴状态可以有张合、脸部摆动状态可以有上下摆动(点头)或左右摆动(摇头),因此,对获取的多帧人脸图像进行脸部状态确定后,所形成的脸部状态序列则相应可以为眼睛睁闭状态序列、眉毛收挑状态序列、嘴巴张合状态序列、摆动状态序列(如摇头状态序列或点头状态序列)中的一种或几种。
在一实施例中,眼睛状态又分为左右眼状态,眉毛状态也分为左右眉状态,即,左右眼可分别具有睁闭状态,同样,左右眉也可分别具有收挑状态。在一实施例中,通过对脸部状态序列的进行不同命名的方式来区分左眼和右眼分别对应的状态序列以及左眉和右眉分别对应的状态序列。
S1030、比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
在本实施例中,上述确定出脸部状态序列后,可以将脸部状态序列与预先设定的动态表情序列(即预设的动态表情序列)进行比对,并可根据比对结果确定视频流中的人物当前是否存在脸部动态表情。所述动态表情序列可理解为实现脸部表情变化的一系列脸部状态的集合,如,眼睛由睁到闭的表情变化就可通过一个动态表情序列来表示,又如,嘴巴由闭合到张开的表情变化也可通过一个动态表情序列来表示。所述动态表情序列可根据实现脸部表情变化时脸部具有的状态来预先设定,例如,可以设定一个包含嘴巴张开和嘴巴闭合两个状态的状态序列作为表示嘴巴张合的动态表情序列。
在一实施例中,脸部状态序列中为基于至少两帧人脸图像确定的脸部状态集合,本实施例可以通过将动态表情序列中包含的状态信息与脸部状态序列中的状态信息进行匹配,来确定动态表情序列的状态信息是否均出现在脸部状态序列中,从而确定视频流中的人物当前是否存在脸部动态表情。
在一实施例中,所述动态表情序列包括:眼睛动态变化序列、眉毛动态变化序列、嘴巴动态变化序列、摇头变化序列以及点头变化序列。
在一实施例中,当脸部状态序列包含眼睛睁闭状态序列、眉毛收挑状态序列、嘴巴张合状态序列、摇头状态序列以及点头状态序列中的一种或几种时,也预先设定眼睛动态变化序列、眉毛动态变化序列、嘴巴动态变化序列、摇头变化序列以及点头变化序列来作为相应的动态表情序列。示例性的,假设脸部状态序列为嘴巴张合状态序列,则基于本步骤中与之相比对的动态表情序列实际为嘴巴动态变化序列。
本申请实施例提供的方法,与相关技术中的人脸动态表情检测方法相比,在保证检测速度的前提下,避免了对待检测视频数据的采集限制,由此保证了本方案在实际应用中的可扩展性,此外,本申请的方案无需预先通过训练样本进行训练学习,只通过所确定脸部状态序列与预设的动态表情序列的比对,就能简单快速的确定脸部是否存在动态表情,有效的降低了计算复杂度,更好的体现了动态表情检测的实时性。
在一实施例中,在确定脸部状态序列的操作中,对脸部状态的确定为其中的步骤,由上述表述,可知脸部状态可以为眼睛状态、眉毛状态、嘴巴状态、脸部左右摆动以及脸部上下摆动中的一种或几种,由此,本实施例提供了确定脸部状态的方式。
在本申请的一个可选实施例中,针对每帧人脸图像,所述脸部状态通过如下方式确定:
基于关键点坐标信息集中标识上眼睑和下眼睑的关键点坐标信息,确定所 述人脸图像中上眼睑到下眼睑的眼睑距离值;基于所述关键点坐标信息集中标识鼻子的关键点坐标信息,确定所述人脸图像中鼻子的鼻梁长度,并将所述鼻梁长度作为眼部归一标准值得到所述眼睑距离值的归一值;在所述归一值小于设定的眼部状态阈值的情况下,所述脸部状态为眼睛闭合;在所述归一值大于或等于所述设定的眼部状态阈值的情况下,所述脸部状态为眼睛睁开。
本实施例上述确定方式适用于脸部状态为眼睛状态(左眼状态和/或右眼状态)的情况,上述确定方式基于眼睛的上眼睑到下眼睑的距离,来确定眼睛状态为睁开或闭合。示例性的,对于每帧人脸图像相应存在的关键点坐标信息集,关键点坐标信息集包含了用于标识人脸的所有关键点的关键点坐标信息,以图2所示的人脸图像为例,可认为其中的关键点221和关键点224分别标识了左右眼的上眼睑,并可获取关键点221和关键点224的关键点坐标信息,其中关键点226和关键点227分别标识了左右眼的下眼睑,同样可获取关键点226和关键点227的关键点坐标信息,由此,可将关键点221到关键点226的距离确定为左眼的眼睑距离值,将关键点224到关键点227的距离确定为右眼的眼睑距离值。
为了避免所获取人脸图像的尺寸变化对眼睛状态的影响,本实施例引入人脸图像中鼻子的鼻梁长度(如图2中关键点228到关键点230的距离)作为眼部归一标准值,因为人脸图像的尺寸变化与人脸图像中鼻梁长度的变化成正比,对眼睑距离值(左眼和/或右眼的眼睑距离值)进行归一化,并将归一化后的归一值(眼睑距离值与眼部归一标准值的比值)与眼部状态阈值进行比较,由此确定脸部状态为眼睛(左眼和/或右眼)睁开或眼睛闭合。
在一实施例中,当确定出多帧人脸图像的眼睛状态后,可以基于时间顺序对确定出的眼睛状态进行汇合,由此可形成以眼睛状态为脸部状态的脸部状态序列。示例性的,假设获取5张人脸图像,确定出5帧人脸图像中左眼的眼睛状态分别为眼睛睁开、眼睛睁开、眼睛闭合、眼睛闭合以及眼睛睁开,则此时的脸部状态序列相当于眼睛睁闭状态序列,该序列可表示为{眼睛睁开,眼睛睁开,眼睛闭合,眼睛闭合,眼睛睁开},为便于后续与相应动态表情序列的比对,本实施例中以1表示眼睛睁开,以0表示眼睛闭合,且为便于后续能够识别出当前的脸部状态序列为眼睛睁闭状态序列,在一实施例中,设定LE来标识左眼对应的眼睛睁闭状态序列,设定RE来标识右眼对应的眼睛睁闭状态序列,因此,上述确定的眼睛睁闭状态序列{眼睛睁开,眼睛睁开,眼睛闭合,眼睛闭合,眼睛睁开},实际可表示为:LE={1,1,0,0,1}。
此外,本实施例中与眼睛睁闭状态序列比对的动态表情序列实际相当于眼睛动态变化序列,在一实施例中,设定眼睛动态变化序列表示为:Te={1,0,1}, 并以此作为眼睛的动态变化标准。此时,可以将确定的LE={1,1,0,0,1}和Te={1,0,1}进行比对,如果Te={1,0,1}中的元素均存在于LE中,则确定视频流中的人物当前存在左眼眨眼的动态表情。
在本申请的一个可选实施例中,针对每帧人脸图像,所述脸部状态通过如下方式确定:
基于关键点坐标信息集中标识上眉根及眼角的关键点坐标信息,确定所述人脸图像中上眉根到同侧眼角的连线距离值;基于所述关键点坐标信息集中标识上眉根及下眉根的关键点坐标信息,确定所述人脸图像中的眉根宽度,并将所述眉根宽度作为眉部归一标准值得到所述连线距离值的归一值;在所述归一值大于设定的眉毛状态阈值的情况下,所述脸部状态为眉毛上挑;在所述归一值小于或等于所述设定的眉毛状态阈值的情况下,所述脸部状态为眉毛正常。
本实施例上述确定方式适用于脸部状态为眉毛状态(左眉状态和/或右眉状态)的情况,上述确定方式基于眉毛的上眉根到同侧眼角的连线距离值,来确定眉毛状态为上挑或正常。示例性的,对于每帧人脸图像相应存在的关键点坐标信息集,关键点坐标信息集包含了用于标识人脸的所有关键点的关键点坐标信息,仍以图2所示的人脸图像为例,可认为其中的关键点212和关键点213分别标识了左右眉的上眉根,并可获取关键点212和关键点213的关键点坐标信息,其中的关键点222和关键点223分别标识了左右眼的眼角,并可获取关键点222和关键点223的关键点坐标信息,由此,可将关键点212到关键点222的距离确定为左眉毛的上眉根到同侧眼角的连线距离值,将关键点213到关键点223的距离确定为右眉毛的上眉根到同侧眼角的连线距离值。
为了避免所获取的人脸图像的尺寸变化对眉毛状态的影响,本实施例引入人脸图像中眉毛的眉根宽度(如图2中关键点212到关键点217的距离,或者关键点213到关键点218的距离)作为眉部归一标准值,因为人脸图像的尺寸变化与人脸图像中眉根宽度的变化也成正比,对连线距离值(左眉和/或右眉的连线距离值)进行归一化计算,可选的,每侧眉毛选取同侧的眉根宽度作为眉部归一标准值进行归一化计算,并将归一化后的归一值(连线距离值与眉根宽度的比值)与眉毛状态阈值进行比较,由此确定脸部状态为眉毛上挑(左眉和/或右眉)或眉毛正常。
在一实施例中,当确定出多帧人脸图像的眉毛状态后,同样可基于时间顺序对确定出的眉毛状态进行汇合,由此可形成以眉毛状态为脸部状态的脸部状态序列。示例性的,假设获取5张人脸图像,确定出5帧人脸图像中左眉的眉毛状态分别为眉毛正常、眉毛正常、眉毛上挑、眉毛上挑以及眉毛上挑,则此时的脸部状态序列相当于眉毛收挑状态序列,该序列可表示为{眉毛正常,眉毛 正常,眉毛上挑,眉毛上挑,眉毛上挑},为便于后续与相应动态表情序列的比对,本实施例中以1表示眉毛上挑,以0表示眉毛正常,且为便于后续能够识别出当前的脸部状态序列为眉毛收挑状态序列,在一实施例中,设定LB表示左眉对应的眉毛收挑状态序列,设定RB表示右眉对应的眉毛收挑状态序列,因此,上述确定的眉毛收挑状态序列{眉毛正常,眉毛正常,眉毛上挑,眉毛上挑,眉毛上挑},实际可表示为:LB={0,0,1,1,1}。
此外,本实施例中与眉毛收挑状态序列比对的动态表情序列实际相当于眉毛动态变化序列,在一实施例中,设定眉毛动态变化序列表示为:Tb={0,1,1},并以此作为眉毛的动态变化标准。此时,可以将确定的LB={0,0,1,1,1}和Tb={0,1,1}进行比对,如果Tb={0,1,1}中的元素均存在于LB中,则确定视频流中的人物当前存在左眉挑眉的动态表情。
在本申请的一个可选实施例中,针对每帧人脸图像,所述脸部状态通过如下方式确定:
基于关键点坐标信息集中上唇下边缘及下唇上边缘的关键点坐标信息,确定所述人脸图像中上唇下边缘到下唇上边缘的唇间距离值;基于所述关键点坐标信息集中上唇上边缘及上唇下边缘的关键点坐标信息,确定所述人脸图像中的上嘴唇厚度,并将所述上嘴唇厚度作为唇部归一标准值得到所述唇间距离值的归一值;在所述归一值大于设定的唇部状态阈值的情况下,所述脸部状态为嘴巴张开;在所述归一值小于或等于所述设定的唇部状态阈值的情况下,所述脸部状态为嘴巴闭合。
本实施例上述确定方式适用于脸部状态为嘴巴状态的情况,上述确定方式基于嘴巴上唇下边缘与下唇上边缘的距离,来确定嘴巴状态为张开或闭合。示例性的,对于每帧人脸图像相应存在的关键点坐标信息集,关键点坐标信息集包含了用于标识人脸的所有关键点的关键点坐标信息,以图2所示的人脸图像为例,可认为其中的关键点237标识了嘴巴的上唇下边缘,关键点238标识了嘴巴的下唇上边缘,并可获取关键点237和关键点238的关键点坐标信息,由此,可将关键点237到关键点238的距离确定为嘴巴的唇间距离值。
为了避免所获取人脸图像的尺寸变化对嘴巴状态的影响,本实施例引入人脸图像中嘴巴的上嘴唇厚度(如图2中关键点234到关键点237的距离)作为唇部归一标准值,因为人脸图像的尺寸变化也与人脸图像中上嘴唇厚度的变化成正比,对唇间距离值进行归一化,并将归一化后的归一值(唇间距离值与唇部归一标准值的比值)与唇部状态阈值进行比较,由此确定脸部状态为嘴巴张开或嘴巴闭合。
在一实施例中,当确定出多帧人脸图像的嘴巴状态后,可以基于时间顺序 对确定出的嘴巴状态进行汇合,由此可形成以嘴巴状态为脸部状态的脸部状态序列。示例性的,假设获取5张人脸图像,确定出5帧人脸图像中的嘴巴状态分别为嘴巴闭合、嘴巴闭合、嘴巴闭合、嘴巴张开以及嘴巴闭合,则此时的脸部状态序列相当于嘴巴张合状态序列,该序列可表示为{嘴巴闭合,嘴巴闭合,嘴巴闭合,嘴巴张开,嘴巴闭合},为便于后续与相应动态表情序列的比对,本实施例中以1表示嘴巴张开,以0表示嘴巴闭合,且为便于后续能够识别出当前的脸部状态序列为嘴巴张合状态序列,在一实施例中,设定M标识嘴巴张合状态序列,因此,上述确定的嘴巴张合状态序列{嘴巴闭合,嘴巴闭合,嘴巴闭合,嘴巴张开,嘴巴闭合},实际可表示为:M={0,0,0,1,0}。
此外,本实施例中与嘴巴张合状态序列比对的动态表情序列实际相当于嘴巴动态变化序列,在一实施例中,设定嘴巴动态变化序列表示为:Tm={0,1},并以此作为嘴巴的动态变化标准。此时,可以将确定的M={0,0,0,1,0}和Tm={0,1}进行比对,如果Tm={0,1}中的元素均存在于M中,则确定视频流中的人物当前存在张嘴的动态表情。
在本申请的一个可选实施例中,针对每帧人脸图像,所述脸部状态通过如下方式确定:
基于关键点坐标信息集构成所述人脸图像的二维平面矩阵及三维空间矩阵;确定所述二维平面矩阵转换成所述三维空间矩阵的旋转矩阵;根据所述旋转矩阵确定所述人脸图像中人脸的偏航角度值,并将所述偏航角度值作为脸部状态;或者,根据所述旋转矩阵确定所述人脸图像中人脸的俯仰角度值,并将所述俯仰角度值作为脸部状态。
在一实施例中,所述俯仰角度值的计算公式表示为:pitch=arcsin(R
2,3)×π/180;所述偏航角度值的计算公式表示为:yaw=-arctan(-R
1,3/R
3,3)×π/180;其中,所述pitch表示俯仰角度值,所述yaw表示偏航角度值,所述R
m,n表示旋转矩阵R中第m行第n列的元素值,所述m和所述n均为正整数。
本实施例上述确定方式适用于脸部状态为脸部左右摆动或脸部上下摆动的情况,上述确定方式基于关键点坐标信息由二维平面到三维空间的旋转矩阵来确定脸部摆动状态为脸部左右摆动或脸部上下摆动。
示例性的,对于每帧人脸图像相应存在的关键点坐标信息集,关键点坐标信息集包含了二维平面下用于标识人脸的所有关键点的关键点坐标信息,此外,对于每帧人脸图像,还可确定出每帧人脸图像在三维空间下与二维平面中关键点相对应的三维关键点坐标信息,且二维平面下的关键点坐标信息可采用二维平面矩阵表示,三维空间下的关键点坐标信息可采用三维空间矩阵表示,在已 知二维平面矩阵及相应三维空间矩阵的前提下,基于预设的旋转矩阵计算模型,可以确定出二维平面矩阵转换成三维空间矩阵的旋转矩阵,根据旋转矩阵及设定的偏航角度值计算公式或俯仰角度值计算公式,就可以确定出人脸图像中人脸的偏航角度值或俯仰角度值,由此,可将偏航角度值或俯仰角度值作为每帧人脸图像对应的脸部状态。
当偏航角度值或俯仰角度值作为每帧人脸图像的脸部状态时,上述确定方式的脸部状态序列的确定方式与上述其他几种脸部状态的确定方式不同。其中,对于基于偏航角度值形成的脸部状态序列,此时的脸部状态序列相当于摇头状态序列,可设定Y来标识该摇头状态序列,且可设定Y
i来标识摇头状态序列中的第i个摇头状态值;对于基于俯仰角度值形成的脸部状态序列,此时的脸部状态序列相当于点头状态序列,可设定P来标识该点头状态序列,且可设定P
i来标识该点头状态序列中的第i个点头状态值。在一实施例中,摇头状态序列的确定方式可描述为:以时间顺序汇合多帧人脸图像,并获取多帧人脸图像对应偏航角度值,将汇合后的第一帧人脸图像对应的摇头状态值设定为0,对于摇头状态序列中第i帧人脸图像对应的摇头状态值,则可基于下述公式确定:
在一实施例中,点头状态序列的确定方式可描述为:以时间顺序汇合多帧人脸图像,并获取多帧人脸图像对应俯仰角度值,将汇合后的第一帧人脸图像对应的点头状态值设定为0,对于点头状态序列中第i帧人脸图像对应的点头状态值,则可基于下述公式确定:
其中,P
i表示第i帧人脸图像对应的点头状态值,pitch
i表示第i帧人脸图像对应的俯仰角度值,pitch
i-1表示第i-1帧人脸图像对应的俯仰角度值,pitch_thres表示设定的点头状态阈值。
此外,本实施例中与摇头状态序列比对的动态表情序列实际相当于摇头变化序列,在一实施例中,设定摇头变化序列表示为Ty_a={1,-1},Ty_b={-1,1},此时可将确定的摇头状态序列Y与Ty_a={1,-1}及Ty_b={-1,1}进行比对,如果Ty_a={1,-1}或者Ty_b={-1,1}中的元素存在于Y中,则确定视频流中的人物当前存在摇头的动态表情;本实施例中与点头状态序列比对的动态表情序列实际相 当于点头变化序列,在一实施例中,设定点头变化序列P表示为Tp_a={0,1},Tp_b={-1,1},此时可将确定的摇头状态序列P与Tp_a={0,1}及Tp_b={-1,1}进行比对,如果Tp_a={0,1}或者Tp_b={-1,1}中的元素存在于P中,则确定视频流中的人物当前存在点头的动态表情。
在上述实施例的基础上,可选的,所述比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情,包括:
在所述预设的动态表情序列中的元素信息依次出现在脸部状态序列中的情况下,确定存在对应所述预设的动态表情序列的脸部动态表情;在所述预设的动态表情序列中的元素信息未依次出现在所述脸部状态序列中的情况下,确定不存在对应所述预设的动态表情序列的脸部动态表情。
在一实施例中,本实施例还将比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情,可以包括:
确定所述预设的动态表情序列的序列长度,并将变量i的值初始化为1以及将所述脸部状态序列的比对序列号初始化为1,其中,所述i为正整数;在脸部状态序列中,从所述比对序列号1对应的元素信息开始,查找是否存在与所述动态表情序列中第i个元素信息匹配的目标元素信息;响应于在所述脸部状态序列中存在目标元素信息,将所述目标元素信息对应的序列号作为新的比对序列号,以及对所述变量i加1后继续查找与当前变量匹配的目标元素信息;响应于在所述脸部状态序列中不存在目标元素信息,在所述变量i大于所述序列长度且已查找到连续的目标元素信息的情况下,确定所述脸部状态序列中存在所述预设的动态表情序列对应的脸部动态表情,在所述变量i小于或等于所述序列长度的情况下,确定所述脸部状态序列中不存在所述预设的动态表情序列对应的脸部动态表情。
示例性的,以脸部状态序列为嘴巴状态序列为例,且设定嘴巴状态序列M={0,0,0,1,0},此外,脸部状态序列对应的动态表情序列实际相当于嘴巴动态变化序列Tm,其中,预设的Tm={0,1},基于上述比对方式进行序列比对的过程可表述为:预设的动态表情序列的序列长度为2,变量i初始为1,比对序列号初始为1;在已知的嘴巴状态序列M={0,0,0,1,0}中,比对序列号1对应的元素信息实际为M中的第1个元素值0,Tm的第i个元素实际对应Tm中的第1个元素值0,由此,可以确定M中的第1个元素值0与Tm中的第1个元素值0相匹配,可将变量i的值变更为2,比对序列号仍为1,可返回继续进行元素信息比对,并可确定Tm中的第2个元素值1与M中的第4个元素值1相匹配,在变量i的值等于3时,确定M中依次包含了Tm中的所有元素,由此可认为当前存在对应嘴巴动态变化序列的嘴巴状态序列,进而可说明视频流中的人物当 前存在张嘴的动态表情。
综上,本申请实施例提供的方法,可通过嘴巴状态、眼睛状态、眉毛状态以及脸部摆动状态的比对检测,来确定视频流中的动态表情,与相关技术中的人脸动态表情检测相比,可以仅通过确定脸部状态序列与预设的动态表情序列的比对,来简单快速的确定视频流中人物脸部是否存在动态表情,有效的降低了计算复杂度,更好的体现了动态表情的实时性,此外,本技术的方案,在保证检测速度的前提下,避免了对待检测视频数据的采集限制,由此保证了本方案在实际应用中的可扩展性。
对于方法实施例,为了简单描述,故将方法实施例都表述为一系列的动作组合,但是本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。
本申请实施例还给出了一种人脸动态表情的检测装置,参考图3,为本申请实施例提供的一种人脸动态表情的检测装置的结构框图,该检测装置适用于对人机交互场景中出现的视频流进行人脸动态表情检测的情况,该装置可以软件和/或硬件实现,可集成在计算机设备上。在实现中,该计算机设备可以是两个或多个物理实体构成,也可以是一个物理实体构成,如设备可以是个人计算机(Personal Computer,PC)、电脑、手机、平板设备以及个人数字助理等。
如图3所示,该装置包括:人脸图像获取模块31、状态序列确定模块32以及动态表情确定模块33。
其中,人脸图像获取模块31,设置为获取视频流中的至少两帧人脸图像;
状态序列确定模块32,设置为根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;
动态表情确定模块33,设置为比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
在一实施例中,上述提供的人脸动态表情的检测装置可执行本申请任意实施例所提供的方法,具备执行方法相应的功能和效果。
此外,本申请实施例还提供一种计算机设备,包括:处理器和存储器。存储器中存储有至少一条指令,且指令由所述处理器执行,使得所述计算机设备执行如上述方法实施例中所述的方法。
参照图4,为本申请实施例提供的一种计算机设备的硬件结构示意图。如图4所示,该计算机设备可以包括:处理器40、存储装置41、具有触摸功能的显示屏42、输入装置43、输出装置44以及通信装置45。该计算机设备中处理器40的数量可以是一个或者多个,图4中以一个处理器40为例。该计算机设备中 存储装置41的数量可以是一个或者多个,图4中以一个存储装置41为例。该计算机设备的处理器40、存储装置41、显示屏42、输入装置43、输出装置44以及通信装置45可以通过总线或者其他方式连接,图4中以通过总线连接为例。
在一实施例中,处理器40执行存储装置41中存储的一个或多个程序时,实现如下操作:获取视频流中的至少两帧人脸图像;根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
本申请实施例还提供一种计算机可读存储介质,所述存储介质中的程序由计算机设备的处理器执行时,使得计算机设备能够执行如上述方法实施例所述的方法。示例性的,该方法包括:获取视频流中的至少两帧人脸图像;根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
对于装置、计算机设备、存储介质实施例而言,由于装置、计算机设备、存储介质实施例与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述人脸动态表情的检测装置中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分。
Claims (12)
- 一种人脸动态表情的检测方法,包括:获取至少两帧人脸图像;根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
- 根据权利要求1所述的方法,其中,脸部状态包括下述至少一种:眼睛状态、眉毛状态、嘴巴状态、脸部左右摆动、脸部上下摆动;所述脸部状态序列包括下述至少一种:眼睛睁闭状态序列、眉毛收挑状态序列、嘴巴张合状态序列、摇头状态序列、点头状态序列;所述动态表情序列包括:眼睛动态变化序列、眉毛动态变化序列、嘴巴动态变化序列、摇头变化序列以及点头变化序列。
- 根据权利要求2所述的方法,其中,针对每帧人脸图像,所述脸部状态通过如下方式确定:基于所述关键点坐标信息集中标识上眼睑和下眼睑的关键点坐标信息,确定所述人脸图像中上眼睑到下眼睑的眼睑距离值;基于所述关键点坐标信息集中标识鼻子的关键点坐标信息,确定所述人脸图像中鼻子的鼻梁长度,并将所述鼻梁长度作为眼部归一标准值得到所述眼睑距离值的归一值;在所述归一值小于设定的眼部状态阈值的情况下,所述脸部状态为眼睛闭合;在所述归一值大于或等于所述设定的眼部状态阈值的情况下,所述脸部状态为眼睛睁开。
- 根据权利要求2所述的方法,其中,针对每帧人脸图像,所述脸部状态通过如下方式确定:基于所述关键点坐标信息集中标识上眉根及眼角的关键点坐标信息,确定所述人脸图像中上眉根到同侧眼角的连线距离值;基于所述关键点坐标信息集中标识上眉根及下眉根的关键点坐标信息,确定所述人脸图像中的眉根宽度,并将所述眉根宽度作为眉部归一标准值得到所述连线距离值的归一值;在所述归一值大于设定的眉毛状态阈值的情况下,所述脸部状态为眉毛上挑;在所述归一值小于或等于所述设定的眉毛状态阈值的情况下,所述脸部状态为眉毛正常。
- 根据权利要求2所述的方法,其中,针对每帧人脸图像,所述脸部状态 通过如下方式确定:基于所述关键点坐标信息集中上唇下边缘及下唇上边缘的关键点坐标信息,确定所述人脸图像中上唇下边缘到下唇上边缘的唇间距离值;基于所述关键点坐标信息集中上唇上边缘及上唇下边缘的关键点坐标信息,确定所述人脸图像中的上嘴唇厚度,并将所述上嘴唇厚度作为唇部归一标准值得到所述唇间距离值的归一值;在所述归一值大于设定的唇部状态阈值的情况下,所述脸部状态为嘴巴张开;在所述归一值小于或等于所述设定的唇部状态阈值的情况下,所述脸部状态为嘴巴闭合。
- 根据权利要求2所述的方法,其中,针对每帧人脸图像,所述脸部状态通过如下方式确定:基于所述关键点坐标信息集构成所述人脸图像的二维平面矩阵及三维空间矩阵;确定所述二维平面矩阵转换成所述三维空间矩阵的旋转矩阵;根据所述旋转矩阵确定所述人脸图像中人脸的偏航角度值,并将所述偏航角度值作为所述脸部状态;或者,根据所述旋转矩阵确定所述人脸图像中人脸的俯仰角度值,并将所述俯仰角度值作为所述脸部状态。
- 根据权利要求6所述的方法,其中,所述俯仰角度值的计算公式表示为:pitch=arcsin(R 2,3)×π/180;所述偏航角度值的计算公式表示为:yaw=-arctan(-R 1,3/R 3,3)×π/180;其中,所述pitch表示俯仰角度值,所述yaw表示偏航角度值,所述R m,n表示所述旋转矩阵R中第m行第n列的元素值,所述m和所述n均为正整数。
- 根据权利要求1-7任一项所述的方法,其中,所述比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情,包括:在所述预设的动态表情序列中的元素信息依次出现在所述脸部状态序列中的情况下,确定存在对应所述预设的动态表情序列的脸部动态表情;在所述预设的动态表情序列中的元素信息未依次出现在所述脸部状态序列中的情况下,确定不存在对应所述预设的动态表情序列的脸部动态表情。
- 根据权利要求8所述的方法,其中,所述比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情,包括:确定所述预设的动态表情序列的序列长度,并将变量i的值初始化为1以及 将所述脸部状态序列的比对序列号初始化为1,其中,所述i为正整数;在所述脸部状态序列中,从所述比对序列号1对应的元素信息开始,查找是否存在与所述预设的动态表情序列中第i个元素信息匹配的目标元素信息;响应于在所述脸部状态序列中存在所述目标元素信息,将所述目标元素信息对应的序列号作为新的比对序列号,以及对所述变量i加1后继续查找与当前变量匹配的目标元素信息;响应于在所述脸部状态序列中不存在所述目标元素信息,在所述变量i大于所述序列长度且已查找到连续的目标元素信息的情况下,确定所述脸部状态序列中存在所述预设的动态表情序列对应的脸部动态表情,在所述变量i小于或等于所述序列长度的情况下,确定所述脸部状态序列中不存在所述预设的动态表情序列对应的脸部动态表情。
- 一种人脸动态表情的检测装置,包括:人脸图像获取模块,设置为获取视频流中的至少两帧人脸图像;状态序列确定模块,设置为根据所述至少两帧人脸图像中关键点坐标信息集,确定脸部状态序列;动态表情确定模块,设置为比较所述脸部状态序列和预设的动态表情序列确定脸部动态表情。
- 一种计算机设备,包括:一个或多个处理器;存储装置,设置为存储一个或多个程序;所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9中任一项所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1-9中任一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811648826.3A CN111382648A (zh) | 2018-12-30 | 2018-12-30 | 人脸动态表情的检测方法、装置、设备及存储介质 |
CN201811648826.3 | 2018-12-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020140723A1 true WO2020140723A1 (zh) | 2020-07-09 |
Family
ID=71221196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/124928 WO2020140723A1 (zh) | 2018-12-30 | 2019-12-12 | 人脸动态表情的检测方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111382648A (zh) |
WO (1) | WO2020140723A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069954A (zh) * | 2020-08-26 | 2020-12-11 | 武汉普利商用机器有限公司 | 一种活体微表情检测方法及系统 |
CN112580434A (zh) * | 2020-11-25 | 2021-03-30 | 奥比中光科技集团股份有限公司 | 一种基于深度相机的人脸误检优化方法、系统及人脸检测设备 |
WO2024001539A1 (zh) * | 2022-06-30 | 2024-01-04 | 上海商汤智能科技有限公司 | 说话状态识别方法及模型训练方法、装置、车辆、介质、计算机程序及计算机程序产品 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111915479B (zh) * | 2020-07-15 | 2024-04-26 | 抖音视界有限公司 | 图像处理方法及装置、电子设备和计算机可读存储介质 |
CN111832512A (zh) * | 2020-07-21 | 2020-10-27 | 虎博网络技术(上海)有限公司 | 表情检测方法和装置 |
CN112183197B (zh) * | 2020-08-21 | 2024-06-25 | 深圳追一科技有限公司 | 基于数字人的工作状态确定方法、装置和存储介质 |
CN112991496B (zh) * | 2021-01-22 | 2022-11-18 | 厦门大学 | 一种基于tps变形算法的中国画动画自动生成方法、设备和介质 |
CN113093106A (zh) * | 2021-04-09 | 2021-07-09 | 北京华捷艾米科技有限公司 | 一种声源定位方法及系统 |
CN114268453B (zh) * | 2021-11-17 | 2024-07-12 | 中国南方电网有限责任公司 | 电力系统解锁方法、装置、计算机设备和存储介质 |
CN114217693A (zh) * | 2021-12-17 | 2022-03-22 | 广州轻游信息科技有限公司 | 一种人脸识别的软件交互方法、系统和存储介质 |
CN115797523B (zh) * | 2023-01-05 | 2023-04-18 | 武汉创研时代科技有限公司 | 一种基于人脸动作捕捉技术的虚拟角色处理系统及方法 |
TWI831582B (zh) * | 2023-01-18 | 2024-02-01 | 瑞昱半導體股份有限公司 | 偵測系統以及偵測方法 |
CN116895090A (zh) * | 2023-07-21 | 2023-10-17 | 无锡无界探索科技有限公司 | 一种基于机器视觉的人脸五官状态检测方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110310237A1 (en) * | 2010-06-17 | 2011-12-22 | Institute For Information Industry | Facial Expression Recognition Systems and Methods and Computer Program Products Thereof |
CN106372621A (zh) * | 2016-09-30 | 2017-02-01 | 防城港市港口区高创信息技术有限公司 | 基于人脸识别的疲劳驾驶检测方法 |
CN108460345A (zh) * | 2018-02-08 | 2018-08-28 | 电子科技大学 | 一种基于人脸关键点定位的面部疲劳检测方法 |
CN108958488A (zh) * | 2018-07-20 | 2018-12-07 | 汪若海 | 一种人脸指令识别方法 |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1866270B (zh) * | 2004-05-17 | 2010-09-08 | 香港中文大学 | 基于视频的面部识别方法 |
JP5055166B2 (ja) * | 2008-02-29 | 2012-10-24 | キヤノン株式会社 | 眼の開閉度判定装置、方法及びプログラム、撮像装置 |
CN101908149A (zh) * | 2010-07-06 | 2010-12-08 | 北京理工大学 | 一种从人脸图像序列中识别脸部表情的方法 |
CN102324166B (zh) * | 2011-09-19 | 2013-06-12 | 深圳市汉华安道科技有限责任公司 | 一种疲劳驾驶检测方法及装置 |
US9152847B2 (en) * | 2012-11-27 | 2015-10-06 | Adobe Systems Incorporated | Facial landmark localization by exemplar-based graph matching |
CN103632147A (zh) * | 2013-12-10 | 2014-03-12 | 公安部第三研究所 | 实现面部特征标准化语义描述的系统及方法 |
EP2960862B1 (en) * | 2014-06-24 | 2017-03-22 | Vicarious Perception Technologies B.V. | A method for stabilizing vital sign measurements using parametric facial appearance models via remote sensors |
CN104091150B (zh) * | 2014-06-26 | 2019-02-26 | 浙江捷尚视觉科技股份有限公司 | 一种基于回归的人眼状态判断方法 |
CN104484669A (zh) * | 2014-11-24 | 2015-04-01 | 苏州福丰科技有限公司 | 基于三维人脸识别的手机支付方法 |
CN105159452B (zh) * | 2015-08-28 | 2018-01-12 | 成都通甲优博科技有限责任公司 | 一种基于人脸姿态估计的控制方法与系统 |
CN106127139B (zh) * | 2016-06-21 | 2019-06-25 | 东北大学 | 一种mooc课程中学生面部表情的动态识别方法 |
CN106295549A (zh) * | 2016-08-05 | 2017-01-04 | 深圳市鹰眼在线电子科技有限公司 | 多角度人脸数据采集方法和装置 |
CN107243905A (zh) * | 2017-06-28 | 2017-10-13 | 重庆柚瓣科技有限公司 | 基于养老机器人的情绪自适应系统 |
CN107292299B (zh) * | 2017-08-14 | 2018-10-30 | 河南工程学院 | 基于内核规范相关分析的侧面人脸识别方法 |
CN108345849A (zh) * | 2018-01-31 | 2018-07-31 | 深圳港云科技有限公司 | 一种面部识别方法及其设备 |
CN108364355B (zh) * | 2018-02-12 | 2022-12-09 | 成都睿码科技有限责任公司 | 一种贴合面部表情的ar渲染方法 |
-
2018
- 2018-12-30 CN CN201811648826.3A patent/CN111382648A/zh active Pending
-
2019
- 2019-12-12 WO PCT/CN2019/124928 patent/WO2020140723A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110310237A1 (en) * | 2010-06-17 | 2011-12-22 | Institute For Information Industry | Facial Expression Recognition Systems and Methods and Computer Program Products Thereof |
CN106372621A (zh) * | 2016-09-30 | 2017-02-01 | 防城港市港口区高创信息技术有限公司 | 基于人脸识别的疲劳驾驶检测方法 |
CN108460345A (zh) * | 2018-02-08 | 2018-08-28 | 电子科技大学 | 一种基于人脸关键点定位的面部疲劳检测方法 |
CN108958488A (zh) * | 2018-07-20 | 2018-12-07 | 汪若海 | 一种人脸指令识别方法 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112069954A (zh) * | 2020-08-26 | 2020-12-11 | 武汉普利商用机器有限公司 | 一种活体微表情检测方法及系统 |
CN112069954B (zh) * | 2020-08-26 | 2023-12-19 | 武汉普利商用机器有限公司 | 一种活体微表情检测方法及系统 |
CN112580434A (zh) * | 2020-11-25 | 2021-03-30 | 奥比中光科技集团股份有限公司 | 一种基于深度相机的人脸误检优化方法、系统及人脸检测设备 |
CN112580434B (zh) * | 2020-11-25 | 2024-03-15 | 奥比中光科技集团股份有限公司 | 一种基于深度相机的人脸误检优化方法、系统及人脸检测设备 |
WO2024001539A1 (zh) * | 2022-06-30 | 2024-01-04 | 上海商汤智能科技有限公司 | 说话状态识别方法及模型训练方法、装置、车辆、介质、计算机程序及计算机程序产品 |
Also Published As
Publication number | Publication date |
---|---|
CN111382648A (zh) | 2020-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020140723A1 (zh) | 人脸动态表情的检测方法、装置、设备及存储介质 | |
WO2019174439A1 (zh) | 图像识别方法、装置、终端和存储介质 | |
WO2020244032A1 (zh) | 用于检测人脸图像的方法和装置 | |
WO2019128508A1 (zh) | 图像处理方法、装置、存储介质及电子设备 | |
CN108829900B (zh) | 一种基于深度学习的人脸图像检索方法、装置及终端 | |
KR102174595B1 (ko) | 비제약형 매체에 있어서 얼굴을 식별하는 시스템 및 방법 | |
WO2019128507A1 (zh) | 图像处理方法、装置、存储介质及电子设备 | |
CN109472198B (zh) | 一种姿态鲁棒的视频笑脸识别方法 | |
WO2020182121A1 (zh) | 表情识别方法及相关装置 | |
Zhou et al. | Cascaded interactional targeting network for egocentric video analysis | |
WO2020103700A1 (zh) | 一种基于微表情的图像识别方法、装置以及相关设备 | |
WO2020001083A1 (zh) | 一种基于特征复用的人脸识别方法 | |
CN108805047A (zh) | 一种活体检测方法、装置、电子设备和计算机可读介质 | |
CN108363973B (zh) | 一种无约束的3d表情迁移方法 | |
Chen et al. | 3D model-based continuous emotion recognition | |
WO2021175071A1 (zh) | 图像处理方法、装置、存储介质及电子设备 | |
CN102375970A (zh) | 一种基于人脸的身份认证方法和认证装置 | |
CN114241379B (zh) | 一种乘客异常行为识别方法、装置、设备及乘客监控系统 | |
CN110427795A (zh) | 一种基于头部照片的属性分析方法、系统和计算机设备 | |
WO2018103416A1 (zh) | 用于人脸图像的检测方法和装置 | |
Cornejo et al. | Emotion recognition from occluded facial expressions using weber local descriptor | |
WO2020124993A1 (zh) | 活体检测方法、装置、电子设备及存储介质 | |
WO2020244160A1 (zh) | 终端设备控制方法、装置、计算机设备及可读存储介质 | |
JPWO2018078857A1 (ja) | 視線推定装置、視線推定方法及びプログラム記録媒体 | |
Xia et al. | Face occlusion detection using deep convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19906870 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19906870 Country of ref document: EP Kind code of ref document: A1 |