CN112395922A - Face action detection method, device and system - Google Patents
Face action detection method, device and system Download PDFInfo
- Publication number
- CN112395922A CN112395922A CN201910760634.XA CN201910760634A CN112395922A CN 112395922 A CN112395922 A CN 112395922A CN 201910760634 A CN201910760634 A CN 201910760634A CN 112395922 A CN112395922 A CN 112395922A
- Authority
- CN
- China
- Prior art keywords
- face
- images
- target
- video
- color
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 52
- 230000009471 action Effects 0.000 title claims abstract description 30
- 230000001815 facial effect Effects 0.000 claims abstract description 165
- 230000002159 abnormal effect Effects 0.000 claims abstract description 63
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000012544 monitoring process Methods 0.000 claims abstract description 9
- 230000033001 locomotion Effects 0.000 claims description 119
- 230000004927 fusion Effects 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 8
- 230000002194 synthesizing effect Effects 0.000 claims description 6
- 206010000117 Abnormal behaviour Diseases 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000015654 memory Effects 0.000 description 7
- 230000008451 emotion Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 210000000887 face Anatomy 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 210000004709 eyebrow Anatomy 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 210000000744 eyelid Anatomy 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 241000283899 Gazella Species 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 210000003205 muscle Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a facial action detection method, a facial action detection device and a facial action detection system, and belongs to the technical field of video monitoring. The method comprises the following steps: the method comprises the steps of obtaining a plurality of color face images and corresponding depth face images which are related to a current video clip, determining the face action of a target through a target network model based on the color face images and the corresponding depth face images, and obtaining target video data when the face action of the target is determined to belong to an abnormal action, wherein the target video data comprises the current video clip and/or video images corresponding to the face action belonging to the abnormal action. According to the method and the device, the target network model is used for automatically determining the facial action in any video clip and determining whether the facial action belongs to the abnormal action, so that the target video data with the abnormal action is positioned, the condition that a plurality of video clips are manually checked is avoided, and therefore the efficiency of determining the video clips with the abnormal action is improved.
Description
Technical Field
The present application relates to the field of video surveillance technologies, and in particular, to a method, an apparatus, and a system for detecting facial movements.
Background
In daily life, an object such as a user may have abnormal behavior such as illegal criminal behavior. When abnormal behaviors occur, negative effects are often generated on social stability, and in order to know the field conditions when the abnormal behaviors occur, the scenes where the abnormal behaviors occur are usually required to be positioned.
In the related art, a scene in which an abnormal behavior occurs in a certain target may be usually located through video surveillance, for example, a plurality of video segments in which the target exists may be screened out from a surveillance video through a human eye search or a human face recognition technique, and then the plurality of video segments are manually examined to determine the video segment in which the abnormal behavior occurs in the target, such as a video segment in a scene of a case, so as to locate the scene in which the abnormal behavior occurs.
However, since a plurality of video segments need to be manually checked, the workload is large and the manual checking speed is slow, resulting in low efficiency in determining the video segments including abnormal behaviors.
Disclosure of Invention
The application provides a facial motion detection method, a facial motion detection device and a facial motion detection system, which can solve the problem of low efficiency of determining video clips including abnormal behaviors in the related art. The technical scheme is as follows:
in one aspect, a facial motion detection method is provided, the method comprising:
acquiring a plurality of color face images associated with a current video clip and corresponding depth face images;
determining the facial action of the target through a target network model based on the multiple color face images and the corresponding depth face images;
and when the target face action is determined to belong to the abnormal action, acquiring target video data, wherein the target video data comprises the current video clip and/or a video image corresponding to the face action belonging to the abnormal action.
In a possible implementation manner of the present application, the acquiring multiple face images and corresponding depth face images associated with a current video segment includes:
acquiring a plurality of color video images and corresponding depth video images which are associated with the video clip and comprise the target;
respectively carrying out face detection on the obtained multiple color video images and the corresponding depth video images;
determining a plurality of face color area images and corresponding face depth area images from the plurality of color video images and corresponding depth video images according to a face detection result;
and determining the plurality of face color area images and the corresponding face depth area images as the plurality of color face images and the corresponding depth face images.
In a possible implementation manner of the present application, determining the plurality of face color region images and the corresponding face depth region images as the plurality of face color region images and the corresponding depth face images includes:
respectively carrying out face alignment treatment on the plurality of face color area images and the corresponding face depth area images;
adjusting the sizes of the plurality of face color area images and the corresponding face depth area images after the face alignment processing to be the same;
and taking the plurality of face color area images and the corresponding face depth area images after size adjustment as the plurality of color face images and the corresponding depth face images.
In one possible implementation manner of the present application, the determining, by the target network model, a facial action of the target based on the multiple color face images and the corresponding depth face images includes:
inputting the multiple color face images and the corresponding depth face images into the target network model, extracting key features of each face image and the corresponding depth face image through a feature fusion network in the target network model, and fusing to obtain fusion features corresponding to each face image;
and analyzing the obtained multiple fusion features through a multi-frame analysis network in the target network model, and determining the facial action of the target.
In one possible implementation manner of the present application, after determining the facial action of the target, the method further includes:
classifying facial movements of the target;
and determining whether the facial action belongs to abnormal actions according to the classification result.
In a possible implementation manner of the present application, after the obtaining the target video data, the method further includes:
extracting video sub-segments corresponding to the facial actions from other video segments comprising the target or extracting video sub-segments corresponding to action categories to which the facial actions belong;
synthesizing the target video data and the extracted video sub-segments into a video according to the shooting time of the target video data and the extracted video sub-segments and/or the image frame numbers of the target video data and the extracted video sub-segments;
and playing the synthesized video.
In one possible implementation manner of the present application, the method further includes:
acquiring image information of a video image corresponding to the facial action belonging to the specified category;
determining the position of a camera for shooting the facial action belonging to the specified category according to the image information;
and sending the determined position of the camera to a designated terminal, and/or adding the determined position of the camera to the image information and then displaying.
In a possible implementation manner of the present application, the target network model is obtained by training a network model to be trained based on a plurality of face color image samples, corresponding face depth image samples, and actual face action categories of faces in the plurality of face image samples.
In another aspect, there is provided a facial motion detection apparatus, the apparatus comprising:
the first acquisition module is used for acquiring a plurality of color face images and corresponding depth face images which are associated with the current video clip;
the determining module is used for determining the facial action of the target through a target network model based on the multiple colorful facial images and the corresponding deep facial images;
and the second acquisition module is used for acquiring target video data when the target face action is determined to belong to an abnormal action, wherein the target video data comprises the current video clip and/or a video image corresponding to the target face action belonging to the abnormal action.
In one possible implementation manner of the present application, the first obtaining module is configured to:
acquiring a plurality of color video images and corresponding depth video images which are associated with the video clip and comprise the target;
respectively carrying out face detection on the obtained multiple color video images and the corresponding depth video images;
determining a plurality of face color area images and corresponding face depth area images from the plurality of color video images and corresponding depth video images according to a face detection result;
and determining the plurality of face color area images and the corresponding face depth area images as the plurality of color face images and the corresponding depth face images.
In one possible implementation manner of the present application, the first obtaining module is configured to:
respectively carrying out face alignment treatment on the plurality of face color area images and the corresponding face depth area images;
adjusting the sizes of the plurality of face color area images and the corresponding face depth area images after the face alignment processing to be the same;
and taking the plurality of face color area images and the corresponding face depth area images after size adjustment as the plurality of color face images and the corresponding depth face images.
In one possible implementation manner of the present application, the target network model includes a feature fusion network and a multi-frame analysis network, and the determining module is configured to:
inputting the multiple color face images and the corresponding depth face images into the target network model, extracting key features of each face image and the corresponding depth face image through a feature fusion network in the target network model, and fusing to obtain fusion features corresponding to each face image;
and analyzing the obtained multiple fusion features through a multi-frame analysis network in the target network model, and determining the facial action of the target.
In one possible implementation manner of the present application, the determining module is further configured to:
classifying facial movements of the target;
and determining whether the facial action belongs to abnormal actions according to the classification result.
In a possible implementation manner of the present application, the second obtaining module is configured to:
extracting video sub-segments corresponding to the facial actions from other video segments comprising the target or extracting video sub-segments corresponding to action categories to which the facial actions belong;
synthesizing the target video data and the extracted video sub-segments into a video according to the shooting time of the target video data and the extracted video sub-segments and/or the image frame numbers of the target video data and the extracted video sub-segments;
and playing the synthesized video.
In a possible implementation manner of the present application, the second obtaining module is further configured to:
acquiring image information of a video image corresponding to the facial action belonging to the specified category;
determining the position of a camera for shooting the facial action belonging to the specified category according to the image information;
and sending the determined position of the camera to a designated terminal, and/or adding the determined position of the camera to the image information and then displaying.
In a possible implementation manner of the present application, the target network model is obtained by training a network model to be trained based on a plurality of face color image samples, corresponding face depth image samples, and actual face action categories of faces in the plurality of face image samples.
In another aspect, a monitoring system is provided, the monitoring system comprising a processor and at least one camera, the processor being configured to:
acquiring a plurality of color face images and corresponding depth face images which are acquired by the at least one camera and are associated with the current video clip;
determining the facial action of the target through a target network model based on the multiple color face images and the corresponding depth face images;
and when the target face action is determined to belong to the abnormal action, acquiring target video data, wherein the target video data comprises the current video clip and/or a video image corresponding to the face action belonging to the abnormal action.
In one possible implementation manner of the present application, the processor is further configured to:
when the at least one camera comprises a red, green and blue depth RGBD camera, acquiring a color video image through the RGBD camera and acquiring a corresponding depth video image under the condition that infrared light exists; or,
when the at least one camera comprises two red, green and blue (RGB) cameras, the two RGB cameras respectively collect color video images, and corresponding depth video images are determined according to the color video images respectively collected by the two RGB cameras; or,
when the at least one camera comprises an RGB camera and a depth camera, acquiring a color video image through the RGB camera, and acquiring a depth video image through the depth camera.
In another aspect, a control device is provided, which includes a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory complete mutual communication through the communication bus, the memory stores a computer program, and the processor executes the program stored in the memory to implement the steps of the facial motion detection method according to the above aspect.
In another aspect, a computer-readable storage medium is provided, in which a computer program is stored, which, when executed by a processor, implements the steps of the facial motion detection method according to one aspect described above.
In another aspect, a computer program product is provided that comprises instructions which, when run on a computer, cause the computer to perform the steps of the facial motion detection method of one aspect described above.
The technical scheme provided by the application can at least bring the following beneficial effects:
and acquiring a plurality of color face images and corresponding depth face images associated with the current video clip. And inputting the colorful face images and the corresponding depth face images into a target network model, and determining the facial action of the target by the target network model based on the colorful face images and the corresponding depth face images. When the facial motion belongs to an abnormal motion, target video data is acquired. Therefore, for any video clip, the facial action of the target can be automatically determined through the target network model according to the color face image and the depth face image, so that whether the facial action belongs to the abnormal action or not is determined, the target video data with the abnormal action is further positioned, the need of manually checking a plurality of video clips is avoided, the efficiency of determining the video clip with the abnormal action is improved, and the detection accuracy is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a facial motion detection method provided in an embodiment of the present application;
fig. 2 is a schematic diagram of a video image and a face area image according to an embodiment of the present application;
fig. 3 is a schematic diagram of a face region image according to an embodiment of the present application;
fig. 4 is a schematic diagram of another face region image provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of a facial movement detection apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a control device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Before describing the facial motion detection method provided by the embodiment of the present application in detail, terms and implementation environments related to the embodiment of the present application will be briefly described.
First, terms related to the embodiments of the present application will be briefly described.
Depth image: refers to an image constructed with the distance from the camera to each point in the scene as a pixel value.
Color image: refers to an image comprising pixels each consisting of R, G, B components.
Next, a brief description will be given of an implementation environment related to the embodiments of the present application.
Embodiments of the present application may be directed to an implementation environment that includes a monitoring system that may include a processor and at least one camera. The processor may be configured in the control device, and the at least one camera may be connected to the control device, or may be configured on the control device.
As an example, the at least one camera may include an RGBD (Red Green Blue Depth) camera, and a color video image and a corresponding Depth video image may be obtained by exposing the RGBD camera twice in different manners. For example, the RGBD camera can acquire a color video image under a normal exposure condition, and acquire a corresponding depth video image under an infrared exposure condition.
As another example, the at least one camera may include two RGB (Red Green Blue Depth) cameras, so as to perform image acquisition by the two RGB cameras respectively, to obtain a color video image, and further, the obtained color video image may be processed, so as to obtain a Depth video image.
As another example, the at least one camera may also include an RGB camera that may be used to capture color video images and a depth camera that may be used to capture depth video images. As an example, the depth camera may be monocular structured light, TOF (Time Of flight), binocular vision, and the like.
As an example, the control device may be a computer, a palm pc (ppc) (pocket pc), a tablet pc, or the like.
After the terms and implementation environments related to the embodiments of the present application are described, a facial motion detection method provided by the embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a flowchart illustrating a facial motion detection method according to an exemplary embodiment, which may be applied in the above implementation environment, and the facial motion detection method may include the following steps:
step 101: and acquiring a plurality of color face images and corresponding depth face images associated with the current video clip.
The current video clip may be any video clip currently shot in real time, or the video clip may be any one of a plurality of video clips stored in advance. In addition, the monitoring scene in the video clip may be preset by the user according to actual conditions, for example, the monitoring scene may be a withdrawal area of an ATM (Automatic Teller Machine), an exit area of a subway entrance, a door area of a hospital, and the like.
As an example, the control device may be provided with a user interaction interface, and when a certain target needs to be queried, the user may specify the target through the user interaction interface.
In the embodiment of the application, the control device determines the facial action of the target based on the facial image of the target. At present, color face images acquired from a monitoring video are generally processed, but the color face images have poor interference resistance to illumination, posture and the like, and thus the obtained results are not accurate enough.
The plurality of color face images and the depth face images may be in a one-to-one relationship, or may be in a many-to-one relationship, or may be in a one-to-many relationship.
In addition, since the facial motion of a target tends to be dynamic, the control device typically needs to determine the facial motion of the target by acquiring multiple color face images and corresponding depth face images.
As an example, a specific implementation of acquiring multiple face images and corresponding depth face images associated with a current video segment may include: the method comprises the steps of obtaining a plurality of color video images and corresponding depth video images which are related to the video clip and comprise the target, respectively carrying out face detection on the obtained plurality of color video images and the corresponding depth video images, determining a plurality of face color area images and corresponding face depth area images from the plurality of color video images and the corresponding depth video images according to a face detection result, and determining the plurality of face color area images and the corresponding face depth area images as the plurality of color face images and the corresponding depth face images.
As described above, the control device may acquire a plurality of color video images and corresponding depth video images associated with the video clip through one RGB camera and one depth camera, respectively. Then, the control device respectively carries out face detection on the multiple color video images and the corresponding depth video images, determines the multiple color video images and the corresponding depth video images of the face containing the target, and then cuts out the area where the face is located according to the face detection result to obtain the face area image corresponding to each color video image and the corresponding depth video image.
For example, as shown in fig. 2, a color video image is subjected to face detection to determine an object in the color video image, as shown in fig. 2(a), an area where a face is located is cut according to a face detection result, as shown in fig. 2(b), and a face color area image is obtained.
As an example, the face detection may be implemented by a single CNN (Convolutional Neural Networks) face detection method, a cascade CNN face detection method, an OpenCV face detection method, and the like, so as to determine whether the image contains a face and whether the face is a target face.
Further, after the plurality of determined face color area images and the corresponding face depth area images are obtained, the sizes of the plurality of face color area images and the corresponding face depth area images can be adjusted to be the same, and the adjusted plurality of face color area images and the corresponding face depth area images are determined as the plurality of face color images and the corresponding face depth images of the target. In addition, the size can be adjusted according to a reference size during adjustment, and the reference size can be preset according to actual requirements.
Further, before the plurality of determined face color area images and the corresponding face depth area images are determined as a plurality of color face images and corresponding depth face images, face alignment processing is respectively performed on the plurality of face color area images and the corresponding face depth area images, the sizes of the plurality of face color area images and the corresponding face depth area images after the face alignment processing are adjusted to be the same, and the plurality of face color area images and the corresponding face depth area images after the size adjustment are used as the plurality of color face images and the corresponding depth face images.
Because the size of the face region image may be various, and the pose of the face in the face region image may also be different, the face region image may be adjusted, for example, face alignment processing and image size adjustment may be performed on the face region image, in order to more accurately determine the target facial motion.
The face alignment processing means that the face in each face region image is aligned through the coordinates of the key points of the face region image, that is, the face region images with different face poses are normalized, so that the poses of the faces in all the face region images are as same as possible.
For example, for any face region image, the coordinates of two eyes, a nose, a mouth and other key parts can be determined, and according to the coordinates, the face alignment processing can be performed on the face through a face alignment algorithm to obtain a face region image of the front face. As shown in fig. 3, fig. 3(a) is a face region image without face alignment processing, and fig. 3(b) is a face region image after face alignment processing.
Because the positions of the target in the video shot by the camera and the camera may be different, the sizes of the obtained face region images may be different, and in order to ensure that the judgment result of the facial action category of the target is accurate, the face region images after the face alignment processing can be adjusted, so that the sizes of the face region images are completely the same. The standard data of a face region image, such as standard length, standard width and standard key point coordinates, can be preset, the data can be adjusted according to actual conditions, and all face region images need to be adjusted according to the set standard data.
As an example, as shown in fig. 4, the width of each face region image is adjusted to a standard width, the height of each face region image is adjusted to a standard height, and the coordinates of the key points in each face region image are adjusted to standard key point coordinates, where fig. 4(a) and 4(b) are images after face alignment processing, and after image size adjustment, the images all become images with the same size in fig. 4 (c).
Step 102: and determining the facial action of the target through a target network model based on the plurality of color face images and the corresponding depth face images.
The facial actions may include smiling, frowning, eyes opening, etc., and the facial actions may generally indicate the mood of the target, so the facial actions of the target may be determined prior to subsequent processing.
As an example, when the target network model includes a feature fusion network and a multi-frame analysis network, determining the facial movement of the target through the target network model based on the plurality of color face images and the corresponding depth face images, includes: inputting the multiple colorful face images and the corresponding deep face images into the target network model, extracting key features of each face image and the corresponding deep face image through a feature fusion network in the target network model, fusing to obtain fusion features corresponding to each face image, analyzing and processing the obtained multiple fusion features through a multi-frame analysis network in the target network model, and determining the facial action of the target.
Since the feature extraction is performed on the target face color image and the corresponding target face depth image respectively, and the recognition result obtained by analyzing the depth feature and the recognition result obtained by analyzing the color feature are not accurate enough, in order to obtain a more accurate recognition result, the extracted color feature and the depth feature need to be fused, and then the processing is performed according to the obtained fusion feature, so that the target network model includes a feature fusion network.
The control equipment can perform convolution operation on the target face color image and the corresponding target face depth image through a feature fusion network in the target network model, extract color features in the target face color image and depth features in the corresponding target face depth image, perform connection operation on the color features and the depth features, and extract fusion features from the connected color features and depth features through a full connection layer. The feature fusion can be realized based on a neural network, and the neural network automatically selects key features in the color features and the depth features through learning to fuse so as to achieve the optimal judgment result.
For example, when facial movements are usually determined, the most critical parts are eyes, mouth, eyebrows and the like, and different movements of the parts represent abundant facial movements, so that in comparison, the movement of the nose is less, and the nose can be analyzed as less as possible, so that the judgment speed is increased and the judgment accuracy is improved. That is, the color and depth characteristics of the eyes, mouth, eyebrow area can be selected as key features.
The control device obtains a plurality of fusion features through a feature fusion network in a target network model, and then inputs the fusion features into a multi-frame analysis network so as to determine the facial action of the target through analyzing a facial motion unit through the multi-frame analysis network.
For example, the facial motion unit may be used to indicate the motion of each region of the face, such as the mouth corner, the eyelid, the middle of the person, and so on, please refer to table 1, where table 1 is a definition of a facial motion unit (AU) shown according to an exemplary embodiment, and different facial motion units may represent different facial motions after being combined.
TABLE 1
AU | Definition of | AU | Definition of |
AU1 | Inner brow angle is raised | AU14 | Tightening of mouth angle |
AU2 | Outer eyebrow angle is raised | AU15 | Pulling the nozzle angle downwards |
AU4 | Frown | AU16 | Pull the lips downward |
AU5 | On the upper eyelid | AU17 | Push the lower lip upwards |
AU6 | Cheek lifting and eye orbicularis muscle outer ring tightening | AU20 | Stretching of mouth angle |
AU7 | Contracting eyelid | AU23 | Tighten the lips |
AU9 | Wrinkle nose | AU24 | Closing the mouth with force |
AU10 | Raise the upper lip | AU25 | Lip separation |
AU11 | The skin of the middle part of the human body is upward | AU26 | Mouth opening device |
AU12 | The angle of the pulling mouth is inclined upwards | AU32 | Biting lip |
Further, the target network model is obtained by training the network model to be trained based on a plurality of face color image samples, corresponding face depth image samples and actual face action types of faces in the plurality of face image samples.
If the target network model processes and analyzes a plurality of face color images and corresponding face depth images based on the initial model parameters, the recognition result of the target face action may not be accurate enough, and the target network model needs to be trained first in order to accurately determine the target face action. As an example, a plurality of face color image samples and corresponding face depth image samples may be preset, and the face color image samples and the corresponding face depth image samples that may represent different facial movements are selected as the face image samples. Then, the face color image samples and the corresponding face depth image samples in continuous time are set as a group of face image samples, that is, each group of face image samples corresponds to a video segment.
Determining actual facial movements of a plurality of facial color image samples and corresponding facial depth image samples, inputting the plurality of facial color image samples and corresponding facial depth image samples into a network model to be trained, analyzing the plurality of facial color image samples and corresponding facial depth image samples by the network model to be trained based on initial model parameters, outputting a recognition result of the facial movement, comparing the recognition result of the output facial movement with the actual facial movement, if the recognition result of the output facial movement is wrong, adjusting the initial model parameters until a large number of facial image samples, such as 1000 groups are input, wherein the accuracy of the facial movement recognition result is high, such as when the accuracy is greater than or equal to 95%, the network model to be trained can be considered to be finished, and the network model obtained at the moment and having finished training can be determined as a target network model, in this way, the target network model can be used to detect the facial movements of an arbitrary target based on multiple face color image samples and corresponding face depth image samples of the target.
Further, the number of face image samples used for training may be limited, for example, N groups may be selected, and correspondingly, when the face image of the target is detected, N groups of face images may be acquired.
Further, the face image samples used for training may also be resized to the same size, for example, according to the above-mentioned reference size.
Further, after determining the facial action of the target, the method further includes: and classifying the facial action of the target, and determining whether the facial action belongs to abnormal action according to the classification result.
Illustratively, the facial actions may be classified into different categories according to different classification rules. For example, when the facial movements are classified according to the emotion of the target, the facial movements indicating that the emotion of the target is happy, such as smiling, laughing, and eyebrow spreading, may be classified into the happy category, and the facial movements indicating that the emotion of the target is angry, such as frowning, eye dropping, and lower lip falling, may be classified into the angry category, and the facial movements may be classified into the angry category, the happy category, the fear category, the anger category, the tense category, and the like according to the difference in the emotion of the target.
The present embodiment may determine whether the facial motion belongs to an abnormal motion according to the classification result of the facial motion. The abnormal motion can be set by the user according to actual conditions, and generally, since there is a certain correlation between various facial motions, such as an angry class, an angry class and a tension class, which can represent a negative emotion, the abnormal motion may include not only one type of facial motion, that is, the abnormal motion may include one type of facial motion, but also multiple types of facial motion. If the abnormal action is a qi type or a tension type facial action, the abnormal action can also be a qi type facial action.
When the classification result of the facial motion is that the facial motion belongs to the facial motion category included in the abnormal motion, the facial motion is determined to belong to the abnormal motion. When the classification result of the facial motion is that the facial motion does not belong to the facial motion category included in the abnormal motion, the facial motion is determined not to belong to the abnormal motion.
For example, when an angry facial motion is set as an abnormal motion, the target detection model specifies that the target facial motion is a gazelle eye, classifies the facial motion of the gazelle eye, and as a result of the classification, the angry facial motion is an abnormal motion, and thus it can be specified that the facial motion belongs to the abnormal motion.
Step 103: and when the target face action is determined to belong to the abnormal action, acquiring target video data, wherein the target video data comprises the video image corresponding to the current video segment and/or the face action belonging to the abnormal action.
That is, the target video data may include a video image corresponding to the abnormal motion, may also include a video clip to which the video image corresponding to the abnormal motion belongs, and may also include a video image corresponding to the abnormal motion and a video clip to which the video image belongs.
Further, after the target video data is acquired, video sub-segments corresponding to the facial actions may be extracted from other video segments including the target, or video sub-segments corresponding to action categories to which the facial actions belong may be extracted. And synthesizing the target video data and the extracted video sub-segments into a video according to the shooting time of the target video data and the extracted video sub-segments and/or the image frame number of the target video data and the extracted video sub-segments, and playing the synthesized video.
That is, if the target's facial motion belongs to an abnormal motion, there is a high possibility that a need will be made to retrieve all the same video data as the facial motion when viewing the video later. Therefore, after the target video data is acquired, a video sub-segment including the facial motion can be extracted from other video segments, or video sub-segments corresponding to motion categories to which the facial motion belongs are extracted, and then the extracted video sub-segment and the acquired target video data are synthesized and played.
Further, since one type of facial motion indicates the same emotion, it may be considered that all the facial motions in the motion category to which the facial motion belongs belong to abnormal motions, and in some embodiments, there may be a need to retrieve video data of all the motion categories to which the facial motion belongs. Therefore, after the target video data is acquired, video sub-segments corresponding to motion categories to which the facial motion belongs can be extracted from other video segments, and then the extracted video sub-segments and the acquired target video data are synthesized and played.
Therefore, the video data belonging to abnormal actions are synthesized into the video, so that a user can conveniently and quickly locate the abnormal video, a large number of video segments are prevented from being played and checked one by one, and the video searching efficiency is improved.
In the synthesizing process, the target video data and the extracted video sub-segments can be synthesized according to the shooting time of the target video data and the extracted video sub-segments and according to the sequence of the shooting time. Or, the target video data and the extracted video sub-segments can be synthesized according to the image frame numbers of the target video data and the extracted video sub-segments and according to the shooting sequence. Alternatively, the target video data and the extracted video sub-segment may be synthesized based on the shooting times of the target video data and the extracted video sub-segment, and the shooting times of the target video data and the extracted video sub-segment.
Further, image information of a video image corresponding to the facial action belonging to the specified category is acquired, the position of a camera for shooting the facial action belonging to the specified category is determined according to the image information, the determined position of the camera is sent to a specified terminal, and/or the determined position of the camera is added to the image information and then displayed.
The image information may include, but is not limited to, information of the camera, such as the number of the camera, the position of the camera, and the like. As an example, the position of the camera may be represented by a coordinate, or may be represented by specific address information, such as 1-block 1 street south segment, 1 mall 1 gate, and so on.
The designated equipment can be security equipment, equipment held by police or an alarm.
The image information corresponding to the video image of the facial motion belonging to the abnormal motion, that is, the number and/or the position information of the camera which captured the video image are acquired, and the position of the camera is determined, so that the position of the abnormal motion can be determined. After the position of the camera is determined, the position of the camera can be sent to a designated terminal for displaying, the position of the camera can be added to the image information for displaying, and the position of the camera can be sent to the designated terminal and the position information can be added to the image information for displaying. For example, only one piece of location information may be displayed on a designated terminal, such as "the shooting location is mall # 1 door".
Furthermore, the image information added with the position information of the camera can be displayed on the video image with the abnormal action, so that the position of the abnormal action can be known according to the video image corresponding to the abnormal action.
In the embodiment of the application, a plurality of color face images and corresponding depth face images associated with a current video clip are obtained. And inputting the colorful face images and the corresponding depth face images into a target network model, and determining the facial action of the target by the target network model based on the colorful face images and the corresponding depth face images. When the facial motion belongs to an abnormal motion, target video data is acquired. Therefore, for any video clip, the facial action of the target can be automatically determined through the target network model according to the color face image and the depth face image, so that whether the facial action belongs to the abnormal action or not is determined, the target video data with the abnormal action is further positioned, the need of manually checking a plurality of video clips is avoided, the efficiency of determining the video clip with the abnormal action is improved, and the detection accuracy is improved.
Fig. 5 is a schematic structural diagram illustrating a facial motion detection apparatus according to an exemplary embodiment, and the image display apparatus may be implemented by software, hardware, or a combination of both. The facial motion detection apparatus may include:
a first obtaining module 510, configured to obtain a plurality of color face images and corresponding depth face images associated with a current video segment;
a determining module 520, configured to determine a facial action of the target through a target network model based on the multiple color face images and the corresponding depth face images;
the second obtaining module 530, when it is determined that the facial motion of the target belongs to the abnormal motion, obtains target video data, where the target video data includes the current video segment and/or a video image corresponding to the facial motion belonging to the abnormal motion.
In a possible implementation manner of the present application, the first obtaining module 510 is configured to:
acquiring a plurality of color video images and corresponding depth video images which are associated with the video clip and comprise the target;
respectively carrying out face detection on the obtained multiple color video images and the corresponding depth video images;
determining a plurality of face color area images and corresponding face depth area images from the plurality of color video images and corresponding depth video images according to a face detection result;
and determining the plurality of face color area images and the corresponding face depth area images as the plurality of color face images and the corresponding depth face images.
In a possible implementation manner of the present application, the first obtaining module 510 is configured to:
respectively carrying out face alignment treatment on the plurality of face color area images and the corresponding face depth area images;
adjusting the sizes of the plurality of face color area images and the corresponding face depth area images after the face alignment processing to be the same;
and taking the plurality of face color area images and the corresponding face depth area images after size adjustment as the plurality of color face images and the corresponding depth face images.
In a possible implementation manner of the present application, the target network model includes a feature fusion network and a multi-frame analysis network, and the determining module 520 is configured to:
inputting the multiple color face images and the corresponding depth face images into the target network model, extracting key features of each face image and the corresponding depth face image through a feature fusion network in the target network model, and fusing to obtain fusion features corresponding to each face image;
and analyzing the obtained multiple fusion features through a multi-frame analysis network in the target network model, and determining the facial action of the target.
In one possible implementation manner of the present application, the determining module 520 is further configured to:
classifying facial movements of the target;
and determining whether the facial action belongs to abnormal actions according to the classification result.
In a possible implementation manner of the present application, the second obtaining module 530 is configured to:
extracting video sub-segments corresponding to the facial actions from other video segments comprising the target or extracting video sub-segments corresponding to action categories to which the facial actions belong;
synthesizing the target video data and the extracted video sub-segments into a video according to the shooting time of the target video data and the extracted video sub-segments and/or the image frame numbers of the target video data and the extracted video sub-segments;
and playing the synthesized video.
In a possible implementation manner of the present application, the second obtaining module 530 is further configured to:
acquiring image information of a video image corresponding to the facial action belonging to the specified category;
determining the position of a camera for shooting the facial action belonging to the specified category according to the image information;
and sending the determined position of the camera to a designated terminal, and/or adding the determined position of the camera to the image information and then displaying.
In a possible implementation manner of the present application, the target network model is obtained by training a network model to be trained based on a plurality of face color image samples, corresponding face depth image samples, and actual face action categories of faces in the plurality of face image samples.
In the embodiment of the application, a plurality of color face images and corresponding depth face images associated with a current video clip are obtained. And inputting the colorful face images and the corresponding depth face images into a target network model, and determining the facial action of the target by the target network model based on the colorful face images and the corresponding depth face images. When the facial motion belongs to an abnormal motion, target video data is acquired. Therefore, for any video clip, the facial action of the target can be automatically determined through the target network model according to the color face image and the depth face image, so that whether the facial action belongs to the abnormal action or not is determined, the target video data with the abnormal action is further positioned, the need of manually checking a plurality of video clips is avoided, the efficiency of determining the video clip with the abnormal action is improved, and the detection accuracy is improved.
It should be noted that: in the face motion detection apparatus provided in the foregoing embodiment, when implementing the face motion detection method, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the facial motion detection apparatus provided by the above embodiment and the facial motion detection method embodiment belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.
Fig. 6 is a schematic structural diagram of a control device 600 according to an embodiment of the present application, where the control device 600 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 601 and one or more memories 602, where the memory 602 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 601 to implement the facial motion detection method provided by each method embodiment.
Of course, the control device 600 may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input and output, and the control device 600 may further include other components for implementing device functions, which are not described herein again.
Embodiments of the present application further provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the facial motion detection method provided in the embodiment shown in fig. 1.
Embodiments of the present application also provide a computer program product containing instructions, which when run on a computer, cause the computer to execute the facial motion detection method provided in the embodiment shown in fig. 1.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (11)
1. A facial motion detection method, the method comprising:
acquiring a plurality of color face images associated with a current video clip and corresponding depth face images;
determining the facial action of the target through a target network model based on the multiple color face images and the corresponding depth face images;
and when the target face action is determined to belong to the abnormal action, acquiring target video data, wherein the target video data comprises the current video clip and/or a video image corresponding to the face action belonging to the abnormal action.
2. The method of claim 1, wherein the obtaining a plurality of face images and corresponding depth face images associated with a current video segment comprises:
acquiring a plurality of color video images and corresponding depth video images which are associated with the video clip and comprise the target;
respectively carrying out face detection on the obtained multiple color video images and the corresponding depth video images;
determining a plurality of face color area images and corresponding face depth area images from the plurality of color video images and corresponding depth video images according to a face detection result;
and determining the plurality of face color area images and the corresponding face depth area images as the plurality of color face images and the corresponding depth face images.
3. The method of claim 2, wherein determining the plurality of face color region images and corresponding face depth region images as the plurality of color face images and corresponding depth face images comprises:
respectively carrying out face alignment treatment on the plurality of face color area images and the corresponding face depth area images;
adjusting the sizes of the plurality of face color area images and the corresponding face depth area images after the face alignment processing to be the same;
and taking the plurality of face color area images and the corresponding face depth area images after size adjustment as the plurality of color face images and the corresponding depth face images.
4. The method of claim 1, wherein the target network model comprises a feature fusion network and a multi-frame analysis network, and wherein determining the facial movements of the target based on the plurality of color face images and the corresponding depth face images through the target network model comprises:
inputting the multiple color face images and the corresponding depth face images into the target network model, extracting key features of each face image and the corresponding depth face image through a feature fusion network in the target network model, and fusing to obtain fusion features corresponding to each face image;
and analyzing the obtained multiple fusion features through a multi-frame analysis network in the target network model, and determining the facial action of the target.
5. The method of claim 4, wherein the determining the facial action of the target is followed by further comprising:
classifying facial movements of the target;
and determining whether the facial action belongs to abnormal actions according to the classification result.
6. The method of claim 5, wherein after the obtaining the target video data, further comprising:
extracting video sub-segments corresponding to the facial actions from other video segments comprising the target or extracting video sub-segments corresponding to action categories to which the facial actions belong;
synthesizing the target video data and the extracted video sub-segments into a video according to the shooting time of the target video data and the extracted video sub-segments and/or the image frame numbers of the target video data and the extracted video sub-segments;
and playing the synthesized video.
7. The method of claim 1, wherein the method further comprises:
acquiring image information of a video image corresponding to the facial action belonging to the specified category;
determining the position of a camera for shooting the facial action belonging to the specified category according to the image information;
and sending the determined position of the camera to a designated terminal, and/or adding the determined position of the camera to the image information and then displaying.
8. The method of claim 1, wherein the target network model is obtained by training a network model to be trained based on a plurality of face color image samples and corresponding face depth image samples, and actual facial motion classes of faces in the plurality of face image samples.
9. A facial motion detection apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring a plurality of color face images and corresponding depth face images which are associated with the current video clip;
the determining module is used for determining the facial action of the target through a target network model based on the multiple colorful facial images and the corresponding deep facial images;
and the second acquisition module is used for acquiring target video data when the target face action is determined to belong to an abnormal action, wherein the target video data comprises the current video clip and/or a video image corresponding to the target face action belonging to the abnormal action.
10. A monitoring system, comprising a processor and at least one camera, the processor configured to:
acquiring a plurality of color face images and corresponding depth face images which are acquired by the at least one camera and are associated with the current video clip;
determining the facial action of the target through a target network model based on the multiple color face images and the corresponding depth face images;
and when the target face action is determined to belong to the abnormal action, acquiring target video data, wherein the target video data comprises the current video clip and/or a video image corresponding to the face action belonging to the abnormal action.
11. The monitoring system of claim 10, wherein the processor is further configured to:
when the at least one camera comprises a red, green and blue depth RGBD camera, acquiring a color video image through the RGBD camera and acquiring a corresponding depth video image under the condition that infrared light exists; or,
when the at least one camera comprises two red, green and blue (RGB) cameras, the two RGB cameras respectively collect color video images, and corresponding depth video images are determined according to the color video images respectively collected by the two RGB cameras; or,
when the at least one camera comprises an RGB camera and a depth camera, acquiring a color video image through the RGB camera, and acquiring a depth video image through the depth camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910760634.XA CN112395922A (en) | 2019-08-16 | 2019-08-16 | Face action detection method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910760634.XA CN112395922A (en) | 2019-08-16 | 2019-08-16 | Face action detection method, device and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112395922A true CN112395922A (en) | 2021-02-23 |
Family
ID=74603119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910760634.XA Pending CN112395922A (en) | 2019-08-16 | 2019-08-16 | Face action detection method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112395922A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082298A (en) * | 2022-07-15 | 2022-09-20 | 北京百度网讯科技有限公司 | Image generation method, image generation device, electronic device, and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150029297A1 (en) * | 2013-07-25 | 2015-01-29 | Lenovo (Beijing) Co., Ltd. | Data Processing Method And Electronic Device |
CN106774856A (en) * | 2016-08-01 | 2017-05-31 | 深圳奥比中光科技有限公司 | Exchange method and interactive device based on lip reading |
CN106778506A (en) * | 2016-11-24 | 2017-05-31 | 重庆邮电大学 | A kind of expression recognition method for merging depth image and multi-channel feature |
CN106919251A (en) * | 2017-01-09 | 2017-07-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition |
CN107368778A (en) * | 2017-06-02 | 2017-11-21 | 深圳奥比中光科技有限公司 | Method for catching, device and the storage device of human face expression |
CN107368810A (en) * | 2017-07-20 | 2017-11-21 | 北京小米移动软件有限公司 | Method for detecting human face and device |
CN107491726A (en) * | 2017-07-04 | 2017-12-19 | 重庆邮电大学 | A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks |
CN108171212A (en) * | 2018-01-19 | 2018-06-15 | 百度在线网络技术(北京)有限公司 | For detecting the method and apparatus of target |
WO2019033573A1 (en) * | 2017-08-17 | 2019-02-21 | 平安科技(深圳)有限公司 | Facial emotion identification method, apparatus and storage medium |
CN109376667A (en) * | 2018-10-29 | 2019-02-22 | 北京旷视科技有限公司 | Object detection method, device and electronic equipment |
WO2019042216A1 (en) * | 2017-08-29 | 2019-03-07 | Oppo广东移动通信有限公司 | Image blurring processing method and device, and photographing terminal |
CN109712105A (en) * | 2018-12-24 | 2019-05-03 | 浙江大学 | A kind of image well-marked target detection method of combination colour and depth information |
GB201909300D0 (en) * | 2019-06-28 | 2019-08-14 | Facesoft Ltd | Facial behaviour analysis |
-
2019
- 2019-08-16 CN CN201910760634.XA patent/CN112395922A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150029297A1 (en) * | 2013-07-25 | 2015-01-29 | Lenovo (Beijing) Co., Ltd. | Data Processing Method And Electronic Device |
CN106774856A (en) * | 2016-08-01 | 2017-05-31 | 深圳奥比中光科技有限公司 | Exchange method and interactive device based on lip reading |
CN106778506A (en) * | 2016-11-24 | 2017-05-31 | 重庆邮电大学 | A kind of expression recognition method for merging depth image and multi-channel feature |
CN106919251A (en) * | 2017-01-09 | 2017-07-04 | 重庆邮电大学 | A kind of collaborative virtual learning environment natural interactive method based on multi-modal emotion recognition |
CN107368778A (en) * | 2017-06-02 | 2017-11-21 | 深圳奥比中光科技有限公司 | Method for catching, device and the storage device of human face expression |
CN107491726A (en) * | 2017-07-04 | 2017-12-19 | 重庆邮电大学 | A kind of real-time expression recognition method based on multi-channel parallel convolutional neural networks |
CN107368810A (en) * | 2017-07-20 | 2017-11-21 | 北京小米移动软件有限公司 | Method for detecting human face and device |
WO2019033573A1 (en) * | 2017-08-17 | 2019-02-21 | 平安科技(深圳)有限公司 | Facial emotion identification method, apparatus and storage medium |
WO2019042216A1 (en) * | 2017-08-29 | 2019-03-07 | Oppo广东移动通信有限公司 | Image blurring processing method and device, and photographing terminal |
CN108171212A (en) * | 2018-01-19 | 2018-06-15 | 百度在线网络技术(北京)有限公司 | For detecting the method and apparatus of target |
CN109376667A (en) * | 2018-10-29 | 2019-02-22 | 北京旷视科技有限公司 | Object detection method, device and electronic equipment |
CN109712105A (en) * | 2018-12-24 | 2019-05-03 | 浙江大学 | A kind of image well-marked target detection method of combination colour and depth information |
GB201909300D0 (en) * | 2019-06-28 | 2019-08-14 | Facesoft Ltd | Facial behaviour analysis |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115082298A (en) * | 2022-07-15 | 2022-09-20 | 北京百度网讯科技有限公司 | Image generation method, image generation device, electronic device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110235138B (en) | System and method for appearance search | |
CN106372662B (en) | Detection method and device for wearing of safety helmet, camera and server | |
CN105809144B (en) | A kind of gesture recognition system and method using movement cutting | |
WO2020215552A1 (en) | Multi-target tracking method, apparatus, computer device, and storage medium | |
US8866931B2 (en) | Apparatus and method for image recognition of facial areas in photographic images from a digital camera | |
CN110232369B (en) | Face recognition method and electronic equipment | |
JP5569990B2 (en) | Attribute determination method, attribute determination apparatus, program, recording medium, and attribute determination system | |
CN109299658B (en) | Face detection method, face image rendering device and storage medium | |
CN103079034A (en) | Perception shooting method and system | |
US20070116364A1 (en) | Apparatus and method for feature recognition | |
US10922531B2 (en) | Face recognition method | |
JP6157165B2 (en) | Gaze detection device and imaging device | |
CN111325133A (en) | Image processing system based on artificial intelligence recognition | |
CN106033539A (en) | Meeting guiding method and system based on video face recognition | |
JPWO2008035411A1 (en) | Mobile object information detection apparatus, mobile object information detection method, and mobile object information detection program | |
CN109986553B (en) | Active interaction robot, system, method and storage device | |
JPH1115979A (en) | Face detection and method and device for tracing face | |
US20160140748A1 (en) | Automated animation for presentation of images | |
Putro et al. | Adult image classifiers based on face detection using Viola-Jones method | |
Yuan et al. | Ear detection based on CenterNet | |
CN112395922A (en) | Face action detection method, device and system | |
Tu et al. | Face and gesture based human computer interaction | |
CN112668357A (en) | Monitoring method and device | |
WO2022149784A1 (en) | Method and electronic device for detecting candid moment in image frame | |
KR102194511B1 (en) | Representative video frame determination system and method using same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |