WO2020186867A1 - Method and apparatus for detecting gaze area and electronic device - Google Patents

Method and apparatus for detecting gaze area and electronic device Download PDF

Info

Publication number
WO2020186867A1
WO2020186867A1 PCT/CN2019/127833 CN2019127833W WO2020186867A1 WO 2020186867 A1 WO2020186867 A1 WO 2020186867A1 CN 2019127833 W CN2019127833 W CN 2019127833W WO 2020186867 A1 WO2020186867 A1 WO 2020186867A1
Authority
WO
WIPO (PCT)
Prior art keywords
gaze
face image
area
line
information
Prior art date
Application number
PCT/CN2019/127833
Other languages
French (fr)
Chinese (zh)
Inventor
黄诗尧
王飞
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021540793A priority Critical patent/JP7244655B2/en
Priority to KR1020217022187A priority patent/KR20210104107A/en
Publication of WO2020186867A1 publication Critical patent/WO2020186867A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2540/00Input parameters relating to occupants
    • B60W2540/225Direction of gaze

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular to a method, device and electronic equipment for detecting a gaze area.
  • Gaze area detection can play an important role in applications such as intelligent driving, human-computer interaction, and security monitoring.
  • human-computer interaction by determining the three-dimensional position of the human eye in space, combined with the three-dimensional line of sight direction, the position of the human gaze point in the three-dimensional space can be obtained and output to the machine for further interactive processing.
  • attention detection by estimating the gaze direction of the human eye, the gaze direction of the person can be judged and the area of interest of the person can be obtained, and then it can be judged whether the person's attention is concentrated.
  • a gaze area detection method comprising: acquiring a face image collected in a predetermined three-dimensional space; performing sight line detection based on the face image to obtain a sight line detection result;
  • the gaze area classifier trained in advance for the predetermined three-dimensional space detects the category of the target gaze area corresponding to the face image according to the line of sight detection result, wherein the target gaze area belongs to the predetermined three-dimensional space
  • the divided categories define one of the gaze areas.
  • a gaze area detection device comprising: an image acquisition module for acquiring a face image collected in a predetermined three-dimensional space; a line of sight detection module for The gaze detection of the face image is performed to obtain the gaze detection result; the gaze area detection module is configured to use the gaze area classifier trained in advance for the predetermined three-dimensional space to detect the target corresponding to the face image according to the gaze detection result
  • the category of the gaze area wherein the target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the processor realizes the method according to the above-mentioned first aspect.
  • an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor implements the method according to the above first aspect when the computer program is executed.
  • the gaze area classifier for changes in a predetermined three-dimensional space, only corresponding gaze area classifiers need to be trained for different three-dimensional spaces. Since the training of the classifier does not require a large amount of data and the training speed is relatively fast, it can significantly reduce the time cost and technical difficulty of migrating between different three-dimensional spaces (such as the space of different car models) using the gaze area detection method.
  • Fig. 1 is a flowchart of a gaze area detection method according to an exemplary embodiment of the present disclosure
  • FIG. 2 is a flowchart of a method for training a gaze area classifier for a predetermined three-dimensional space in real time according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of multiple types of defined gaze regions according to an exemplary embodiment of the present disclosure.
  • FIG. 4 is a flowchart of a method for determining starting point information of a person's line of sight in a face image according to an exemplary embodiment of the present disclosure
  • FIG. 5 is a flowchart of a method for detecting line-of-sight direction information of a person in a face image according to an exemplary embodiment of the present disclosure
  • FIG. 6 is a flowchart of a method for detecting head posture information of a person in a face image according to an exemplary embodiment of the present disclosure
  • Fig. 7 is a flowchart of a method for detecting line-of-sight direction information of a person in a face image based on head posture information according to an exemplary embodiment of the present disclosure
  • FIG. 8A is a flowchart of a method for normalizing a face image to obtain a normalized face image according to an exemplary embodiment of the present disclosure
  • Fig. 8B is a schematic diagram of normalizing an acquired face image according to an exemplary embodiment of the present disclosure.
  • FIG. 9A is a schematic diagram of a classifier outputting a target gaze area category according to an exemplary embodiment of the present disclosure
  • FIG. 9B is a schematic diagram of the classifier outputting the name of the target gaze area according to an exemplary embodiment of the present disclosure.
  • FIG. 10 is a flowchart of a method of training a neural network for detecting a three-dimensional line of sight direction according to an exemplary embodiment of the present disclosure
  • Fig. 11 is a block diagram of a gaze area detecting device according to an exemplary embodiment of the present disclosure.
  • FIG. 12 is a block diagram of a line of sight detection module of the gaze area detection device according to an exemplary embodiment of the present disclosure
  • FIG. 13 is a block diagram of another line of sight detection module of the gaze area detection device according to an exemplary embodiment of the present disclosure
  • FIGS. 12 and 13 are block diagrams of the eye position detection sub-module in FIGS. 12 and 13 according to an exemplary embodiment of the present disclosure
  • FIG. 15 is a block diagram of another line of sight detection module of the gaze area detection device according to an exemplary embodiment of the present disclosure.
  • FIG. 16 is a block diagram of a posture detection sub-module of the sight line detection module in FIG. 15 according to an exemplary embodiment of the present disclosure
  • FIG. 17 is a block diagram of a direction detection sub-module of the sight line detection module in FIG. 15 according to an exemplary embodiment of the present disclosure
  • FIG. 18 is a block diagram of an image processing unit of the direction detection sub-module in FIG. 17 according to an exemplary embodiment of the present disclosure
  • FIG. 19 is a block diagram of another gaze area detection device according to an exemplary embodiment of the present disclosure.
  • FIG. 20 is a block diagram of another gaze area detecting device according to an exemplary embodiment of the present disclosure.
  • 21 is a block diagram of another gaze area detection device according to an exemplary embodiment of the present disclosure.
  • FIG. 22 is a block diagram of another gaze area detection device according to an exemplary embodiment of the present disclosure.
  • FIG. 23 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • word “if” as used herein can be interpreted as "when” or “when” or “in response to”.
  • the present disclosure provides a gaze area detection method, which can be applied to scenarios such as intelligent driving, human-computer interaction, and security monitoring. This disclosure will take the gaze area detection method applied to an intelligent driving scene as an example for detailed description.
  • the involved execution subject may include: a computer system and a camera arranged in a predetermined three-dimensional space.
  • the camera set in the predetermined three-dimensional space can send the collected face image data of the user to the aforementioned computer system.
  • the computer system can use the artificial neural network to process the above-mentioned face image data, detect which part of the user’s attention is concentrated in the predetermined three-dimensional space, that is, detect the user’s target gaze area, so that the computer system can according to the above
  • the user's target gaze area outputs corresponding operation control information, such as instructions for smart driving vehicles.
  • the above-mentioned computer system may be installed in a server, a server cluster, or a cloud platform, or may be a computer system in electronic equipment such as personal computers, vehicle-mounted equipment, and mobile terminals.
  • the aforementioned camera may be a vehicle-mounted device such as a camera in a driving recorder, a camera of a smart terminal, and the like.
  • the above-mentioned smart terminal may include electronic devices such as smart phones, PDAs (Personal Digital Assistants), tablet computers, and vehicle-mounted devices.
  • the camera and the computer system can be independent of each other, while being connected to each other to jointly implement the gaze area detection method provided by the embodiments of the present disclosure.
  • the following uses a computer system as an example to describe in detail the gaze area detection method provided by the present disclosure.
  • Fig. 1 is a flowchart of a gaze area detection method according to an exemplary embodiment of the present disclosure.
  • the method can be executed by a computer system and can be applied to various smart devices (for example, smart vehicles, smart robots, smart home devices, etc.). As shown in Figure 1, the method may include steps 11-13.
  • step 11 a face image collected in a predetermined three-dimensional space is acquired.
  • the predetermined three-dimensional space is the space of the vehicle.
  • a camera can be fixedly installed in the internal space of the vehicle such as the center console.
  • the camera can collect target objects in real time or according to a preset time period.
  • the driver's face image is provided to the computer system, so that the computer system obtains the collected face image.
  • step 12 line of sight detection is performed based on the face image to obtain a line of sight detection result.
  • the computer system can perform the line of sight detection of the human eye based on the aforementioned face image, and obtain the line of sight detection result.
  • the line of sight detection is based on analyzing the position and/or direction of the line of sight of the human eye in the face image to obtain the line of sight detection result.
  • the present disclosure does not limit the method of detecting the human eye. That is, the method mentioned in the embodiment of the present disclosure may be used to detect the human eye, or other traditional methods may be used to detect the human eye.
  • the above-mentioned line of sight detection result may include the starting point information and the line of sight direction information of the person in the face image, and may also include information such as the head posture of the person in the face image.
  • step 13 a gaze area classifier that has been trained in advance for the predetermined three-dimensional space is used to detect the category of the target gaze area corresponding to the face image according to the line of sight detection result.
  • the target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance. For example, each space that the driver can look at when the vehicle is traveling can be used as a predetermined three-dimensional space, such as a front windshield, a rear-view mirror, or other spaces in the vehicle.
  • the computer system can input the gaze detection result into the pre-trained gaze area classifier for the M-type intelligent driving vehicle, thereby detecting the above
  • the category of the target gaze area corresponding to the face image is to detect which area of the vehicle the person in the face image, such as the driver, is looking at when the image is collected.
  • the above-mentioned gaze area classifier for the predetermined three-dimensional space is pre-trained by the computer system based on the training sample set for the above-mentioned predetermined three-dimensional space, wherein the training sample set includes a plurality of gaze feature samples, each of which is The gaze feature sample includes gaze starting point information, gaze direction information, and annotation information of a gaze area category corresponding to the gaze feature sample, and the type of the gaze area marked belongs to one of multiple types of defined gaze areas divided into the predetermined three-dimensional space.
  • the three-dimensional space areas that the human eye may pay attention to in the predetermined three-dimensional space are finely classified, and multiple types of defined gaze areas are obtained, and based on each type of definition
  • the training sample set corresponding to the gaze area is trained on a classifier to obtain a gaze area classifier for a predetermined three-dimensional space.
  • Subsequent use of the gaze area classifier can accurately detect the target gaze area information based on the gaze detection result, which is simple to calculate and can effectively reduce the misjudgment rate of the target gaze area, thereby providing more accurate information for subsequent operations.
  • the gaze detection stage corresponding to step 12 has nothing to do with the distribution of multiple types of defined gaze areas in the predetermined three-dimensional space
  • the gaze area detection stage corresponding to step 13 is related to the distribution of the multiple types of defined gaze areas in the predetermined three-dimensional space.
  • the overall space size of different models of vehicles may be different, and the location of the same type of area such as a glove box in different vehicle spaces may be different
  • the division of multiple types of defined gaze areas for different three-dimensional spaces may also be different, such as the definition
  • the number and types of fixation areas are different. Therefore, different gaze area classifiers need to be trained for different three-dimensional spaces, for example, different gaze area classifiers are trained for M-type cars and N-type cars with different spatial distributions.
  • the same method can be used for sight detection for different models of vehicles, and only the gaze area classifier needs to be retrained when changing models.
  • the training of the gaze region classifier is relatively simple, does not require so much data, and the training speed is fast, so it can significantly reduce the migration and use of the above gaze regions between different models The time cost and technical difficulty of the detection method.
  • the above-mentioned gaze area detection method may further include: before step 11, obtaining a gaze area classifier that has been trained for the predetermined three-dimensional space.
  • the following method 1 or method 2 may be used to obtain the gaze region classifier completed for the predetermined three-dimensional space training.
  • the first way is to train a gaze area classifier for a predetermined three-dimensional space in real time when gaze area detection is required.
  • real-time training of a gaze area classifier for a predetermined three-dimensional space may include: step 101, inputting the gaze starting point information and gaze direction information of at least one gaze feature sample into the gaze area classifier to be trained to obtain the gaze feature The gaze area category prediction information corresponding to the sample; step 102, according to the deviation between the gaze area category prediction information and the annotation information of the gaze area category corresponding to the gaze feature sample, adjust the parameters of the gaze area classifier to Training the gaze area classifier.
  • the aforementioned predetermined three-dimensional space may be the space of a certain model of vehicle.
  • determine the fixed position of the camera used to collect facial images For example, fix the camera to the position of the center console to collect the facial image of the driver in the driving area.
  • the subsequent classifier training phase and the detection phase need people
  • the face images are all collected by the above-mentioned camera at the fixed position.
  • the gaze area is divided for different parts of the above-mentioned vehicle, mainly according to the area that the driver needs to pay attention to during the driving of the vehicle, and multiple types of defined gaze areas are divided in the above-mentioned vehicle space, and each type of gaze is defined separately Category information corresponding to the locale.
  • the multiple types of defined gaze areas obtained by dividing the vehicle space may include at least two of the following: left front windshield area, right front windshield area, instrument panel area, interior rearview mirror area, Center console area, left rearview mirror area, right rearview mirror area, visor area, shift lever area, under the steering wheel, co-pilot area, glove box area in front of the co-pilot.
  • FIG. 3 is a schematic diagram of multiple types of defined gaze areas according to an exemplary embodiment of the present disclosure.
  • the following multiple types of defined gaze areas can be determined: left front windshield, right front windshield, instrument panel, interior rearview mirror, center console, left rearview mirror, right rearview mirror, Sun visor, shift lever, mobile phone.
  • Corresponding category information can be preset for each type of defined gaze area, such as a category value represented by a number. The corresponding relationship between the multiple types of defined gaze areas and the preset category values can be shown in Table 1:
  • category information can also be represented by preset English letters such as A, B, C...J, etc.
  • the training sample set may include a plurality of gaze feature samples, wherein each of the gaze feature samples includes gaze starting point information, gaze direction information, and label information of the gaze area category corresponding to the gaze feature sample, and the type of the gaze area that is labeled It belongs to one of multiple types of defined gaze areas divided for the predetermined three-dimensional space. Among them, how to determine the start point information and the line of sight direction information of the person based on the face image will be described in detail later.
  • the following steps are performed iteratively to train the classifier for the predetermined three-dimensional space: the gaze starting point information and gaze direction information of a gaze feature sample in the training sample set are input into the gaze area classifier to be trained, Obtain the prediction information of the gaze area category corresponding to the gaze feature sample; perform parameters on the gaze area classifier according to the deviation between the prediction information of the gaze area category and the label information of the gaze area category for the gaze feature sample Adjustment to train the gaze area classifier.
  • the foregoing step 102 may include: obtaining a loss function value according to the difference between the predicted value of the gaze area category and the label value of the gaze area category of the same gaze feature sample; when the loss function value When the preset training termination condition is met, the training is terminated, and the classifier in the current training stage is determined as the classifier that has been trained; otherwise, if the loss function value does not meet the preset training termination condition, it is based on the loss function value The parameters of the gaze area classifier are adjusted.
  • the loss function is a mathematical expression used to measure the degree of misclassification of training samples by the classifier model during the training process.
  • the value of the loss function can be obtained based on the entire training sample set. The larger the value of the above loss function, the greater the probability of misclassification of the classifier in the current training stage. On the contrary, the smaller the value of the above loss function indicates the probability of misclassification of the classifier in the current training stage. The smaller.
  • the aforementioned preset training termination condition is a condition for terminating the training of the gaze area classifier.
  • the foregoing preset training termination condition may be: the loss function value of the preset loss function is less than the preset threshold.
  • the aforementioned preset training termination condition should be that the loss function value is equal to 0, which means that the gaze area categories predicted by the current classifier are correct.
  • the above-mentioned preset threshold may be a preset empirical value.
  • the above loss function value can be used to adjust the relevant parameters of the gaze area classifier. Then, the gaze area classifier with updated parameters is used to iteratively execute step 101 and step 102 until the preset training termination condition is met, and the gaze area classifier completed for the predetermined three-dimensional space training is obtained.
  • the computer system may use algorithms such as support vector machines, naive Bayes, decision trees, random forests, and K-means to train the above-mentioned gaze area classifier.
  • the training of the classifier does not require a large amount of data and the training speed is relatively fast, it can significantly reduce the time cost and technical difficulty of using the gaze area detection method to migrate between different three-dimensional spaces (such as the space of different car models).
  • the computer system may store the gaze area classifier completed for each predetermined three-dimensional space training in association with the spatial identifier of the predetermined three-dimensional space in a designated storage resource, such as a cloud server, to form A set of preset gaze area classifiers.
  • a designated storage resource such as a cloud server
  • the above-mentioned preset gaze area classifier set may include the correspondence between multiple vehicle models and gaze area classifiers, as shown in Table 2:
  • the vehicle can automatically download it from the cloud server according to its own model (for example, M01) before performing gaze area detection
  • the corresponding target gaze area classifier program for example, the computer program corresponding to the above-mentioned first classifier, so as to quickly realize the gaze area detection.
  • the human eye sight detection result obtained in the above step 12 includes at least the starting point information and the sight direction information of the person in the face image, and may also include the head posture information of the person in the face image.
  • steps 1211-1212 may be executed to determine the starting point information of the person's line of sight in the face image.
  • step 1211 the position of the eyes in the face image is detected.
  • the aforementioned eye position is the position of the human eye in the face image in the actual camera coordinate system.
  • the aforementioned actual camera coordinate system is a spatial rectangular coordinate system determined by the computer system based on the aforementioned camera.
  • the aforementioned camera is a camera that captures the aforementioned human face image in the aforementioned predetermined three-dimensional space, and may be marked as a camera C0.
  • the Z axis of the actual camera coordinate system is the optical axis of the aforementioned camera, and the optical center of the camera lens is the origin of the preset actual camera coordinate system.
  • the horizontal axis of the actual camera coordinate system is the X axis and the vertical axis is the Y axis parallel to the lens plane of the camera.
  • the computer system can detect the eye position in the face image in any of the following ways:
  • the first way is based on at least two frames of face images simultaneously collected by at least two cameras for the same target object, such as the above-mentioned driver
  • the at least two cameras include cameras that collect the face image to be measured
  • the second method is to detect the head posture information of the person in the face image, Detecting the position of the eyes in the face image based on the head posture information.
  • the computer system can determine the above-mentioned head posture information of the driver by using head posture estimation methods in related technologies, such as flexible model methods, geometric methods, etc., according to a face image taken by a camera.
  • head posture estimation methods in related technologies, such as flexible model methods, geometric methods, etc.
  • the 3D position of the eyes of the target object in the preset actual camera coordinate system is acquired based on the head posture information, and the preset actual camera coordinate system is based on the camera coordinate system determined by the camera C0.
  • the 3D position of the human eye can be determined by using the face image collected by a single camera, that is, a monocular camera, so that the hardware configuration cost for gaze area detection can be saved.
  • step 1212 the starting point information of the line of sight of the person in the face image is determined according to the eye position.
  • the eye position detected from the face image in step 1211 may include the target object in the face image, such as the position of a single eye of the driver, and may also include the positions of both eyes (that is, the positions of the left and right eyes of the driver). .
  • the following method 1 or method 2 may be used to determine the starting point information of the person's line of sight in the face image.
  • Manner 1 Determine the starting point of the person's line of sight in the face image according to the position of the single eye.
  • the eye positions determined in step 1211 include the positions of both eyes, the starting point information of the line of sight of the person in the face image can be determined according to the position of any one of the eyes.
  • the eye position determined in step 1211 includes the position of a single eye, the starting point information of the line of sight of the person in the face image is determined according to the position of the single eye.
  • the middle position of the eyes is determined to be the line-of-sight starting point information, where the middle position of the eyes may be the middle point position of the 3D coordinate connection of the eyes, It can also be other positions on the 3D coordinate line of the eyes.
  • the second method described above is used to determine the start point information of the person's line of sight in the face image. Compared with the first method described above, it is beneficial to eliminate the inaccuracy of the start point information of the line of sight caused by the monocular detection error, thereby improving the line of sight detection result. Accuracy.
  • steps 1221-1222 can be executed to detect the line of sight direction information of the person in the face image.
  • step 1221 the head posture information of the person in the face image is detected.
  • the computer system can determine the head posture information of the driver by using the head posture estimation methods in related technologies such as flexible model method and geometric method according to the face image taken by a camera.
  • the above-mentioned flexible model method refers to matching a flexible model such as Active Shape Model (ASM), Active Appearance Model (AAM), and elastic image matching model on the head image and face structure of the image plane.
  • ASM Active Shape Model
  • AAM Active Appearance Model
  • EMM Elastic Graph Matching
  • the geometric method refers to the use of the shape of the head and the accurate morphological information of the local feature points of the face, such as the relative positions of the eyes, nose, and mouth, to estimate the head posture.
  • the head posture of a person in the image can be estimated based on a single frame image collected by a monocular camera.
  • the head posture information of the person in the face image can be detected by performing steps 1201 to 1202 (step 1221).
  • step 1201 multiple key points of the face in the face image are detected.
  • the key points of the face can be detected by edge detection algorithms such as the Robert algorithm and the Sobel algorithm, and the key points of the face can also be detected by related models such as active contour models (such as the Snake model).
  • edge detection algorithms such as the Robert algorithm and the Sobel algorithm
  • related models such as active contour models (such as the Snake model).
  • face key point detection may be performed by a neural network used for face key point detection.
  • a third-party application such as the Dlib toolkit
  • Dlib toolkit can also be used for face key point detection.
  • a preset number (such as 160) of facial key point positions can be detected, which may include the position coordinates of the key points of the face such as the left eye corner, the right eye corner, the nose tip, the left mouth corner, the right mouth corner, and the lower jaw. It is understandable that the number of face key point position coordinates obtained may be different according to different face key point detection methods. For example, using the Dlib toolkit can detect 68 key points on the face.
  • step 1202 based on the detected key points of the face and a preset average face model, the head posture information of the person in the face image is determined.
  • step 1222 the line of sight direction information of the person in the face image is detected based on the head posture information.
  • a trained neural network may be used to detect the line of sight direction information of the person in the face image.
  • the step 1222 may include steps 12221 to 12223.
  • step 12221 normalize the face image according to the head posture information to obtain a normalized face image.
  • the position of the face area image in the entire image changes randomly, and the posture of the person's head in the image also changes randomly. If the face image directly collected by the camera is used as the sample image when training the above neural network, the training difficulty and training time of the neural network will be increased due to the randomness of the head posture and the image position of the face area.
  • each sample image data in the training sample set is normalized, so that the normalized sample image data is equivalent to The virtual camera is facing the image data taken by the human head, and then the normalized sample image data is used to train the neural network.
  • step 12221 may include steps 12-1 to 12-3.
  • the head coordinate system of the person in the face image is determined according to the head posture information.
  • the X axis of the head coordinate system is parallel to the line connecting the left and right eye coordinates;
  • the Y axis of the head coordinate system is perpendicular to the X axis in the face plane;
  • the Z axis of the head coordinate system It is perpendicular to the face plane;
  • the starting point of the line of sight of the human eye is the origin of the head coordinate system.
  • the computer system detects the head posture information of the target object based on the aforementioned face image, which is equivalent to the computer system predicting the three-dimensional head model of the target object.
  • the three-dimensional head model may represent the posture information of the head of the target object relative to the camera C0 when the camera C0 collects the aforementioned face image.
  • the computer system can determine the head coordinate system of the target object based on the head posture information.
  • the head coordinate system can be expressed as a spatial rectangular coordinate system.
  • the X axis of the head coordinate system may be parallel to the line connecting the 3D position coordinates of the two eyes in the three-dimensional head model.
  • the midpoint of the line of the 3D position coordinates of the two eyes, that is, the starting point of the line of sight of the human eye can be determined as the origin of the head coordinate system.
  • the Y axis of the head coordinate system is perpendicular to the X axis in the face plane.
  • the Z axis of the head coordinate system is perpendicular to the face plane.
  • step 12-2 the actual camera coordinate system corresponding to the face image is rotated and translated based on the head coordinate system to obtain a virtual camera coordinate system.
  • the Z axis of the virtual camera coordinate system points to the origin of the head coordinate system
  • the X axis of the virtual camera coordinate system and the X axis of the head coordinate system are in the same plane
  • the virtual camera coordinates The origin of the system and the origin of the head coordinate system are separated by a predetermined distance in the Z-axis direction of the virtual camera coordinate system.
  • the camera can be rotated and translated with reference to the head coordinate system to determine a virtual camera, and based on the head coordinate system of the virtual camera Establish the virtual camera coordinate system corresponding to the above-mentioned virtual camera.
  • the method for establishing the virtual camera coordinate system is similar to the method for establishing the preset actual camera coordinate system, that is, the Z axis of the virtual camera coordinate system is the optical axis of the virtual camera, and the X and Y axes of the virtual camera coordinate system are parallel to The lens plane of the virtual camera; the optical center of the virtual camera lens is the origin of the virtual camera coordinate system.
  • the positional relationship between the virtual camera coordinate system and the head coordinate system meets the following three conditions:
  • Condition 1 The Z axis of the virtual camera coordinate system points to the origin of the head coordinate system;
  • Condition 3 The origin of the virtual camera coordinate system and the origin of the head coordinate system are separated by a predetermined distance in the Z-axis direction of the virtual camera coordinate system.
  • the above process is equivalent to determining a virtual camera by performing the following operations on the camera C0: rotating the camera C0 so that the Z axis points to the starting point of the person's three-dimensional line of sight in the human eye image, and making the X axis of the camera C0 coincide with the head
  • the X axis of the head coordinate system is in the same plane; the rotated camera C0 is translated along its Z axis so that the distance between the optical center of the lens and the origin of the head coordinate system is a preset length.
  • the computer system can determine the relationship between the actual camera coordinate system and the aforementioned virtual camera coordinate system based on the positional relationship between the actual camera coordinate system and the head coordinate system, and the positional relationship between the virtual camera coordinate system and the aforementioned head coordinate system.
  • the position transformation relationship can be determined using the position transformation relationship.
  • the virtual camera coordinate system is related to the head posture of the person in the face image. Therefore, different face images may correspond to different virtual camera coordinate systems.
  • step 12-3 according to the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system, normalization processing is performed on the face image to obtain the corrected face image.
  • the computer system can use the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system to perform processing such as rotation, affine, and zoom transformation on the face image to obtain the virtual camera coordinate system.
  • processing such as rotation, affine, and zoom transformation on the face image to obtain the virtual camera coordinate system.
  • face image Of the face image.
  • FIG. 8B shows a schematic diagram of normalization processing of acquired facial images according to an exemplary embodiment, where the image P0 is the facial image collected by the actual vehicle camera C0 for the driver, and the image P1 represents the normalization processing described above.
  • the corrected face image in the virtual camera coordinate system obtained later is equivalent to the driver's face image collected by a virtual camera C1 facing the driver's head.
  • the line of sight direction detection is performed based on the corrected face image to obtain the first detected line of sight direction.
  • the first detected line of sight direction is the three-dimensional line of sight direction information in the virtual camera coordinate system, and may be a three-dimensional direction vector.
  • the normalized face image that has undergone the above-mentioned normalization processing may be input to a trained neural network for detecting the line of sight direction to detect the three-dimensional line of sight information of the person in the above-mentioned corrected face image.
  • the aforementioned neural network for detecting the direction of the line of sight may include a deep neural network (DNN) such as a convolutional neural network (convolutional neural network, CNN), etc.
  • DNN deep neural network
  • CNN convolutional neural network
  • step 12223 perform coordinate inverse transformation processing on the first detected line of sight direction to obtain the person's line of sight direction information in the face image.
  • the computer system detects the line-of-sight direction information in the virtual camera coordinate system, that is, the first detected line-of-sight direction
  • the coordinate inverse transformation process is used to obtain the line-of-sight direction information in the actual camera coordinate system.
  • step 12 is equivalent to the process of determining the line of sight feature vector of the person in the face image, and the line of sight feature vector includes the start point information and the line of sight direction information of the person in the face image.
  • the artificial neural network used in this stage is such as the neural network used to detect the key points of the face ,
  • the neural network used to detect the direction of the line of sight, etc. can be applied to different car models and has good mobility.
  • the gaze starting point information and gaze direction information of the person in the face image determined in step 12 can be input into the gaze region classifier that has been trained in advance for a predetermined three-dimensional space. To detect the category of the target gaze area corresponding to the face image.
  • the above step 13 may include: determining target gaze area information according to the category of the target gaze area, and output the target gaze area information.
  • the classifier may output the category of the target gaze area, as shown in FIG. 9A, or directly output the name of the target gaze area, as shown in FIG. 9B.
  • the above-mentioned gaze area detection method may further include: before the above-mentioned step 11, training a neural network for detecting the direction of the line of sight.
  • This step corresponds to the training process of the 3D line of sight direction estimation model. It should be noted that this step and the process of real-time training of the gaze area classifier shown in FIG. 2 can be executed in different computer systems.
  • FIG. 10 is a flowchart of a method of training a neural network for detecting a three-dimensional line of sight direction according to an exemplary embodiment of the present disclosure.
  • the method may include steps 1001-1005.
  • step 1001 an original sample set containing at least one face sample is determined, where each face sample includes a face image sample and line-of-sight direction label information.
  • the above-mentioned neural network may be trained by a supervised learning method.
  • each sample in the sample set used to train the aforementioned neural network may include: input information used for prediction, that is, a face image sample; and the true value corresponding to the input information is the actual line of sight measured in the actual camera coordinate system Direction information.
  • the above-mentioned actually measured line-of-sight direction information is also referred to as line-of-sight direction marking information.
  • step 1002 according to the key points of the face and the average face model, head posture information corresponding to each of the face image samples is determined.
  • step 1003 based on the head posture information and the actual camera coordinate system, determine the normalized face image sample corresponding to each of the face image samples and the line-of-sight direction label information in the virtual coordinate system.
  • the implementation process of the foregoing step 1002 and step 1003 is similar to the foregoing step 1202 and steps 12-1 to 12-3, respectively, and will not be repeated here.
  • the computer system can convert the above-mentioned line-of-sight direction labeling information into virtual line-of-sight labeling information according to the position transformation relationship from the actual camera coordinate system to the virtual camera coordinate system.
  • step 1004 each of the corrected face image samples is input to the to-be-trained Three-dimensional line-of-sight direction detection neural network to obtain three-dimensional line-of-sight direction prediction information; in step 1005, according to the deviation between the three-dimensional line-of-sight direction prediction information and the virtual line-of-sight direction labeling information, the neural network is parameterized to obtain Neural network for detecting the direction of the line of sight.
  • the normalized face image processed in the virtual camera coordinate system is used as the training sample data, which can reduce the difficulty of neural network training caused by head posture changes, and improve the neural network used to detect the direction of sight Training efficiency.
  • the attention monitoring result of the person corresponding to the face image can be determined according to the detection result of the gaze area category.
  • the gaze area category detection result may be the gaze area detection category within a preset time period.
  • the detection result of the gaze area category may be "During the preset time period, the gaze area of the driver has always been area 2", then, if the area 2 is the right front windshield, it indicates that the driver is driving More attentive. If this area 2 is the glove box area in front of the co-pilot, it means that the driver is likely to be distracted and unable to concentrate.
  • the attention monitoring result may be output, for example, "driving is very attentive” may be displayed in a certain display area in the vehicle.
  • a distraction prompt message according to the attention monitoring result, and prompt the driver to "please concentrate on driving and ensure driving safety" through a prominent display on the display screen or voice prompts.
  • specific information is output, at least one of the attention monitoring result and the distraction prompt information may be output.
  • the monitoring of the driver's attention in the intelligent driving application scenario is taken as an example for description.
  • the detection of the gaze area can also have many other uses.
  • vehicle-machine interactive control based on gaze area detection can be performed.
  • Some electronic equipment such as a multimedia player, can be installed in the vehicle, which can automatically control the multimedia player to start the playback function according to the detection result of the gaze area by detecting the gaze area of the person in the vehicle.
  • the face image of the person (such as the driver or passenger) in the vehicle is captured by a camera deployed in the vehicle, and the detection result of the gaze area category is detected through a pre-trained neural network.
  • the detection result may be: within a period of time T, the gaze area of the person in the vehicle has been the area where the "gaze on" option on a certain multimedia player in the vehicle is located. According to the above detection result, it can be determined that the person in the vehicle wants to turn on the multimedia player, so that corresponding control instructions can be output to control the multimedia player to start playing.
  • the face image of the control person can be collected, and the gaze area category detection result can be detected through a pre-trained neural network.
  • the detection result may be: within a period of time T, the gaze area of the controller has been the area where the "gaze on" option on the smart air conditioner is located. According to the above detection results, it can be determined that the controller wants to start the smart air conditioner, so that a corresponding control command can be output to control the air conditioner to turn on.
  • the present disclosure may also provide embodiments of devices and electronic equipment corresponding to the foregoing method embodiments.
  • FIG. 11 is a block diagram of a gaze area detecting device 1100 according to an exemplary embodiment of the present disclosure.
  • the gaze area detection device 1100 may include an image acquisition module 21, a gaze detection module 22 and a gaze area detection module 23.
  • the image acquisition module 21 is used to acquire a face image collected in a predetermined three-dimensional space.
  • the sight line detection module 22 is configured to perform sight line detection based on the face image to obtain a sight line detection result.
  • the sight line detection result may include the start point information and the sight direction information of the person in the face image.
  • the gaze area detection module 23 is configured to use a gaze area classifier trained in advance for the predetermined three-dimensional space to detect the category of the target gaze area corresponding to the face image according to the line of sight detection result.
  • the target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance.
  • a line-of-sight detection module 22 of a gaze area detection device may include: an eye position detection sub-module 221 for detecting the eye position in the face image; a first starting point The information determining submodule 222 is configured to determine that the middle position of the eyes is the line of sight starting point information when the eye position includes the positions of the eyes.
  • another line of sight detection module 22 of the gaze area detection device may include: an eye position detection sub-module 221 for detecting the eye position in the face image; second The starting point information determining submodule 223 is configured to determine that the position of any one of the eyes is the line of sight starting point information when the eye position includes the positions of both eyes, or, when the eye position includes a single eye In the case of the position of, the position of the single eye is determined as the start point information of the line of sight.
  • the eye position detection sub-module 221 in FIGS. 12 and 13 may include: a posture detection unit 2211 for detecting head posture information of the person in the face image;
  • the position determining unit 2212 is configured to determine the position of the eyes in the face image according to the head posture information.
  • another line of sight detection module 22 of the gaze area detection device may include: a posture detection sub-module 22-1 for detecting the head posture of the person in the face image Information; direction detection sub-module 22-2, used to detect the line of sight direction information of the person in the face image based on the head posture information.
  • the posture detection sub-module 22-1 in FIG. 15 may include: a key point detection unit 22-11 for detecting multiple face key points in the face image
  • the posture determination unit 22-12 is configured to determine the head posture information of the person in the face image based on the key points of the face and a preset average face model.
  • the direction detection sub-module 22-2 in FIG. 15 may include: an image processing unit 22-21, configured to perform processing on the face image according to the head posture information Normalized processing to obtain the corrected face image; the first direction detection unit 22-22 is configured to detect the line of sight direction based on the corrected face image to obtain the first detected line of sight direction; the direction determining unit 22-23 is used to check the The first detected line of sight direction is subjected to coordinate inverse transformation processing to obtain the line of sight direction information of the person in the face image.
  • the image processing unit 22-21 in FIG. 17 may include: a head coordinate determination subunit 22-211 for determining the face according to the head posture information The head coordinate system of the person in the image; coordinate transformation subunits 22-212 are used to rotate and translate the actual camera coordinate system corresponding to the face image based on the head coordinate system to obtain a virtual camera coordinate system;
  • the processing subunits 22-213 are configured to perform normalization processing on the face image according to the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system to obtain the corrected face image.
  • the gaze area classifier may be trained in advance based on a training sample set for the predetermined three-dimensional space.
  • the training sample set may include a plurality of gaze feature samples, each of the gaze feature samples includes gaze starting point information, gaze direction information, and label information of the gaze area category corresponding to the gaze feature sample, and the type of the gaze area marked belongs to One of the multiple categories defined for the predetermined three-dimensional space is defined.
  • FIG. 19 is a block diagram of another gaze area detecting device 1900 according to an exemplary embodiment of the present disclosure. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 1900 may further include a classifier training module 20.
  • the classifier training module 20 may include: a category prediction sub-module 201, configured to input the gaze starting point information and the gaze direction information of at least one of the gaze feature samples into the gaze area classifier to be trained to obtain the gaze feature sample Corresponding gaze area category prediction information; parameter adjustment sub-module 202 for performing the gaze area classifier based on the deviation between the gaze area category prediction information and the annotation information of the gaze area category corresponding to the gaze feature sample The parameters are adjusted to train the gaze area classifier.
  • a category prediction sub-module 201 configured to input the gaze starting point information and the gaze direction information of at least one of the gaze feature samples into the gaze area classifier to be trained to obtain the gaze feature sample Corresponding gaze area category prediction information
  • parameter adjustment sub-module 202 for performing the gaze area classifier based on the deviation between the gaze area category prediction information and the annotation information of the gaze area category corresponding to the gaze feature sample The parameters are adjusted to train the gaze area classifier.
  • FIG. 20 is a block diagram of another gaze area detecting device 2000 according to an exemplary embodiment of the present disclosure. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2000 may further include a classifier acquisition module 203.
  • the classifier obtaining module 203 may obtain the gaze area classifier corresponding to the space identifier from the preset gaze area classifier set according to the space identifier of the predetermined three-dimensional space.
  • the preset gaze area classifier set may include: gaze area classifiers respectively corresponding to the spatial identifiers of different three-dimensional spaces.
  • the predetermined three-dimensional space may include a vehicle space.
  • the face image may be determined based on the image collected for the driving area in the vehicle space.
  • the multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space may include at least two of the following types: left front windshield area, right front windshield area, instrument panel area, interior rearview mirror area, and center console Area, left rearview mirror area, right rearview mirror area, visor area, shift lever area, area under the steering wheel, co-pilot area, glove box area in front of the co-pilot.
  • FIG. 21 is a block diagram of another gaze area detecting device 2100 according to an exemplary embodiment of the present disclosure.
  • the gaze area detection device 2100 may further include: an attention monitoring module 24, configured to determine the face according to the gaze area category detection result obtained by the gaze area detection module 23 The attention monitoring result of the person corresponding to the image; the monitoring result output module 25 is configured to output the attention monitoring result and/or output distraction prompt information according to the attention monitoring result.
  • FIG. 22 is a block diagram of another gaze area detecting device 2200 according to an exemplary embodiment of the present disclosure.
  • the gaze area detection device 2200 may further include: a control instruction determination module 26 for determining a control instruction corresponding to the gaze area category detection result obtained by the gaze area detection module 23;
  • the operation control module 27 is configured to control the electronic device to perform operations corresponding to the control instructions.
  • the relevant part can refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one unit. Locally, or it can be distributed to multiple network units.
  • Those of ordinary skill in the art can select some or all of the modules according to actual needs to implement the embodiments of the present disclosure without creative work.
  • FIG. 23 is a block diagram of an electronic device 2300 according to an exemplary embodiment of the present disclosure.
  • the electronic device 2300 may include a processor, an internal bus, a network interface, a memory, and a non-volatile memory.
  • the processor can read the corresponding computer program from the non-volatile memory to run in the memory, thereby logically forming a gaze area detection device that implements the above gaze area detection method.
  • the present disclosure can be provided as a method, device, system, or computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware.
  • the present disclosure may also provide a computer-readable storage medium, the storage medium may store a computer program, and when the computer program is executed by a processor, the processor realizes the gaze area detection method according to any of the foregoing method embodiments .
  • Embodiments of the subject matter described herein can be implemented as one or more computer programs, that is, one or more of computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Modules.
  • the program instructions may be encoded on the generated propagating signal (such as a machine-generated electrical, optical or electromagnetic signal) that is generated to encode the information and transmit it to a suitable receiver device for data transmission
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processing and logic flow described herein can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output.
  • the processing and logic flow can also be executed by a dedicated logic circuit such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for executing computer programs include, for example, general-purpose or special-purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from a read-only memory and/or random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer can include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer can be operatively coupled to this mass storage device to receive data from or send data to it. Transfer data.
  • the computer can be embedded in another device (such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus (USB) ) Flash drives, portable storage devices, etc.).
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer readable media suitable for storing computer program instructions and data may include various forms of non-volatile memory, such as semiconductor memory devices (for example, Erasable Programmable Read Only Memory (EPROM), electronic Erasable Programmable Read Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM) and flash memory), magnetic disks (such as internal hard disks or removable disks), magneto-optical disks, CD-ROMs (Compact Disc Read Only Memory, CD-ROM) , Digital Versatile Disc (DVD), etc.
  • EPROM Erasable Programmable Read Only Memory
  • EEPROM Electrical Erasable Programmable Read Only Memory
  • flash memory such as electrically Erasable Programmable Read Only Memory
  • magnetic disks such as internal hard disks or removable disks
  • magneto-optical disks CD-ROMs (Compact Disc Read Only Memory, CD-ROM) , Digital Versatile Disc (DVD), etc.
  • the processor and the memory can be supplemented by or incorporated into a

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Ophthalmology & Optometry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

A method and apparatus for detecting a gaze area and an electronic device. The method comprises: obtaining a face image collected in a predetermined three-dimensional space (11); performing line-of-sight detection based on the face image to obtain the line-of-sight detection result (12); and using a gaze area classifier trained for the predetermined three-dimensional space in advance to detect the category of a target gaze area corresponding to the face image according to the line-of-sight detection result (13).

Description

注视区域检测方法、装置及电子设备Gaze area detection method, device and electronic equipment
相关申请的交叉引用Cross references to related applications
本公开要求于2019年3月18日提交的、申请号为201910204793.1、发明名称为“注视区域检测方法、装置及电子设备”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。This disclosure claims the priority of a Chinese patent application filed on March 18, 2019 with an application number of 201910204793.1 and an invention title of "Looking Area Detection Method, Apparatus, and Electronic Equipment". The entire content of the Chinese patent application is cited The method is incorporated into this article.
技术领域Technical field
本公开涉及计算机视觉技术领域,特别涉及一种注视区域检测方法、装置及电子设备。The present disclosure relates to the field of computer vision technology, and in particular to a method, device and electronic equipment for detecting a gaze area.
背景技术Background technique
注视区域检测在智能驾驶、人机交互和安防监控等应用中可以起重要作用。在人机交互方面,通过确定人眼在空间中的三维位置,并结合三维视线方向,可以获得人的注视点在三维空间中的位置并输出给机器做进一步交互处理。在注意力检测方面,通过估计出人眼的视线方向,可以判断人的注视方向并获得人的感兴趣区域,进而判断人的注意力是否集中。Gaze area detection can play an important role in applications such as intelligent driving, human-computer interaction, and security monitoring. In terms of human-computer interaction, by determining the three-dimensional position of the human eye in space, combined with the three-dimensional line of sight direction, the position of the human gaze point in the three-dimensional space can be obtained and output to the machine for further interactive processing. In the aspect of attention detection, by estimating the gaze direction of the human eye, the gaze direction of the person can be judged and the area of interest of the person can be obtained, and then it can be judged whether the person's attention is concentrated.
发明内容Summary of the invention
根据本公开的第一方面,提供了一种注视区域检测方法,该方法包括:获取在预定三维空间内采集到的人脸图像;基于所述人脸图像进行视线检测以得到视线检测结果;利用预先针对所述预定三维空间训练完成的注视区域分类器,根据所述视线检测结果检测所述人脸图像对应的目标注视区域的类别,其中,所述目标注视区域属于预先对所述预定三维空间划分得到的多类定义注视区域之一。According to a first aspect of the present disclosure, there is provided a gaze area detection method, the method comprising: acquiring a face image collected in a predetermined three-dimensional space; performing sight line detection based on the face image to obtain a sight line detection result; The gaze area classifier trained in advance for the predetermined three-dimensional space detects the category of the target gaze area corresponding to the face image according to the line of sight detection result, wherein the target gaze area belongs to the predetermined three-dimensional space The divided categories define one of the gaze areas.
根据本公开的第二方面,提供了一种注视区域检测装置,所述装置包括:图像获取模块,用于获取在预定三维空间内采集到的人脸图像;视线检测模块,用于基于所述人脸图像进行视线检测以得到视线检测结果;注视区域检测模块,用于利用预先针对所述预定三维空间训练完成的注视区域分类器,根据所述视线检测结果检测所述人脸图像对应的目标注视区域的类别,其中,所述目标注视区域属于预先对所述预定三维空间划分得到的多类定义注视区域之一。According to a second aspect of the present disclosure, there is provided a gaze area detection device, the device comprising: an image acquisition module for acquiring a face image collected in a predetermined three-dimensional space; a line of sight detection module for The gaze detection of the face image is performed to obtain the gaze detection result; the gaze area detection module is configured to use the gaze area classifier trained in advance for the predetermined three-dimensional space to detect the target corresponding to the face image according to the gaze detection result The category of the gaze area, wherein the target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance.
根据本公开的第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,使该处理器实现根据上述第一方面的方法。According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the processor realizes the method according to the above-mentioned first aspect.
根据本公开的第四方面,提供了一种电子设备,包括存储器和处理器,所述存储器上存储有计算机程序,所述处理器在执行所述计算机程序时实现根据上述第一方面的方法。According to a fourth aspect of the present disclosure, there is provided an electronic device including a memory and a processor, the memory is stored with a computer program, and the processor implements the method according to the above first aspect when the computer program is executed.
根据本公开的实施例,针对预定三维空间的变化,只需针对不同三维空间训练对应的注视区域分类器。由于分类器的训练不需要大量数据并且训练速度较快,因而可以显著降低在不同三维空间(如不同车型的空间)之间迁移使用注视区域检测方法的时间成本和技术难度。According to an embodiment of the present disclosure, for changes in a predetermined three-dimensional space, only corresponding gaze area classifiers need to be trained for different three-dimensional spaces. Since the training of the classifier does not require a large amount of data and the training speed is relatively fast, it can significantly reduce the time cost and technical difficulty of migrating between different three-dimensional spaces (such as the space of different car models) using the gaze area detection method.
附图说明Description of the drawings
图1是根据本公开的示例性实施例的一种注视区域检测方法的流程图;Fig. 1 is a flowchart of a gaze area detection method according to an exemplary embodiment of the present disclosure;
图2是根据本公开的示例性实施例的实时训练针对预定三维空间的注视区域分类器的方法的流程图;2 is a flowchart of a method for training a gaze area classifier for a predetermined three-dimensional space in real time according to an exemplary embodiment of the present disclosure;
图3是根据本公开的示例性实施例的多类定义注视区域的示意图;3 is a schematic diagram of multiple types of defined gaze regions according to an exemplary embodiment of the present disclosure;
图4是根据本公开的示例性实施例的确定人脸图像中人的视线起点信息的方法的流程图;4 is a flowchart of a method for determining starting point information of a person's line of sight in a face image according to an exemplary embodiment of the present disclosure;
图5是根据本公开的示例性实施例的检测人脸图像中人的视线方向信息的方法的流程图;FIG. 5 is a flowchart of a method for detecting line-of-sight direction information of a person in a face image according to an exemplary embodiment of the present disclosure;
图6是根据本公开的示例性实施例的检测人脸图像中人的头部姿态信息的方法的流程图;6 is a flowchart of a method for detecting head posture information of a person in a face image according to an exemplary embodiment of the present disclosure;
图7是根据本公开的示例性实施例的基于头部姿态信息检测人脸图像中人的视线方向信息的方法的 流程图;Fig. 7 is a flowchart of a method for detecting line-of-sight direction information of a person in a face image based on head posture information according to an exemplary embodiment of the present disclosure;
图8A是根据本公开的示例性实施例的对人脸图像进行规范化处理以获得转正人脸图像的方法的流程图;FIG. 8A is a flowchart of a method for normalizing a face image to obtain a normalized face image according to an exemplary embodiment of the present disclosure;
图8B是根据本公开的示例性实施例的对所获取的人脸图像进行规范化处理的示意图;Fig. 8B is a schematic diagram of normalizing an acquired face image according to an exemplary embodiment of the present disclosure;
图9A是根据本公开的示例性实施例的分类器输出目标注视区域的类别的示意图;FIG. 9A is a schematic diagram of a classifier outputting a target gaze area category according to an exemplary embodiment of the present disclosure;
图9B是根据本公开的示例性实施例的分类器输出目标注视区域的名称的示意图;FIG. 9B is a schematic diagram of the classifier outputting the name of the target gaze area according to an exemplary embodiment of the present disclosure;
图10是根据本公开的示例性实施例的训练用于检测三维视线方向的神经网络的方法的流程图;FIG. 10 is a flowchart of a method of training a neural network for detecting a three-dimensional line of sight direction according to an exemplary embodiment of the present disclosure;
图11是根据本公开的示例性实施例的一种注视区域检测装置的框图;Fig. 11 is a block diagram of a gaze area detecting device according to an exemplary embodiment of the present disclosure;
图12是根据本公开的示例性实施例的注视区域检测装置的一种视线检测模块的框图;FIG. 12 is a block diagram of a line of sight detection module of the gaze area detection device according to an exemplary embodiment of the present disclosure;
图13是根据本公开的示例性实施例的注视区域检测装置的另一种视线检测模块的框图;FIG. 13 is a block diagram of another line of sight detection module of the gaze area detection device according to an exemplary embodiment of the present disclosure;
图14是根据本公开的示例性实施例的图12和图13中的眼睛位置检测子模块的框图;14 is a block diagram of the eye position detection sub-module in FIGS. 12 and 13 according to an exemplary embodiment of the present disclosure;
图15是根据本公开的示例性实施例的注视区域检测装置的另一种视线检测模块的框图;15 is a block diagram of another line of sight detection module of the gaze area detection device according to an exemplary embodiment of the present disclosure;
图16是根据本公开的示例性实施例的图15中的视线检测模块的姿态检测子模块的框图;FIG. 16 is a block diagram of a posture detection sub-module of the sight line detection module in FIG. 15 according to an exemplary embodiment of the present disclosure;
图17是根据本公开的示例性实施例的图15中的视线检测模块的方向检测子模块的框图;FIG. 17 is a block diagram of a direction detection sub-module of the sight line detection module in FIG. 15 according to an exemplary embodiment of the present disclosure;
图18是根据本公开的示例性实施例的图17中的方向检测子模块的图像处理单元的框图;FIG. 18 is a block diagram of an image processing unit of the direction detection sub-module in FIG. 17 according to an exemplary embodiment of the present disclosure;
图19是根据本公开的示例性实施例的另一种注视区域检测装置的框图;FIG. 19 is a block diagram of another gaze area detection device according to an exemplary embodiment of the present disclosure;
图20是根据本公开的示例性实施例的另一种注视区域检测装置的框图;20 is a block diagram of another gaze area detecting device according to an exemplary embodiment of the present disclosure;
图21是根据本公开的示例性实施例的另一种注视区域检测装置的框图;21 is a block diagram of another gaze area detection device according to an exemplary embodiment of the present disclosure;
图22是根据本公开的示例性实施例的另一种注视区域检测装置的框图;FIG. 22 is a block diagram of another gaze area detection device according to an exemplary embodiment of the present disclosure;
图23是根据本公开的示例性实施例的一种电子设备的框图。FIG. 23 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
具体实施方式detailed description
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所述的、本公开的一些方面相一致的装置和方法的例子。Here, exemplary embodiments will be described in detail, and examples thereof are shown in the accompanying drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. On the contrary, they are merely examples of devices and methods consistent with some aspects of the present disclosure as described in the appended claims.
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开中所使用的单数形式的“一种”、“所述”和“该”也旨在包括复数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目中的任何一个或其所有可能组合。The terms used in the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. The singular forms of "a", "said" and "the" used in the present disclosure are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used herein refers to and includes any one or all possible combinations of one or more associated listed items.
应当理解,尽管本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应受这些术语限制。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to".
本公开提供了一种注视区域检测方法,可以应用于智能驾驶、人机交互、安防监控等场景中。本公开将以该注视区域检测方法应用于智能驾驶场景为例进行详细说明。The present disclosure provides a gaze area detection method, which can be applied to scenarios such as intelligent driving, human-computer interaction, and security monitoring. This disclosure will take the gaze area detection method applied to an intelligent driving scene as an example for detailed description.
在本公开实施例中,涉及的执行主体可以包括:计算机系统和设置于预定三维空间中的摄像头。设置于预定三维空间中的摄像头可以将采集的用户的人脸图像数据发送给上述计算机系统。该计算机系统可以利用人工神经网络对上述人脸图像数据进行处理,检测出该用户的注意力集中在预定三维空间中的哪部分区域,即检测出该用户的目标注视区域,以便计算机系统根据上述用户的目标注视区域输出对应的操作控制信息如智能驾驶车辆的指令等信息。In the embodiment of the present disclosure, the involved execution subject may include: a computer system and a camera arranged in a predetermined three-dimensional space. The camera set in the predetermined three-dimensional space can send the collected face image data of the user to the aforementioned computer system. The computer system can use the artificial neural network to process the above-mentioned face image data, detect which part of the user’s attention is concentrated in the predetermined three-dimensional space, that is, detect the user’s target gaze area, so that the computer system can according to the above The user's target gaze area outputs corresponding operation control information, such as instructions for smart driving vehicles.
上述计算机系统可以设置于服务器、服务器集群或者云平台中,也可以是个人计算机、车载设备、移动终端等电子设备中的计算机系统。上述摄像头可以是车载设备如行车记录仪中的摄像头、智能终端 的摄像头等。上述智能终端可以包括例如智能手机、PDA(Personal Digital Assistant,个人数字助理)、平板电脑、车载设备等电子设备。在具体实现过程中,摄像头和计算机系统可以各自独立,同时又相互联系,共同实现本公开实施例提供的注视区域检测方法。下面以计算机系统为例,对本公开提供的注视区域检测方法进行详细说明。The above-mentioned computer system may be installed in a server, a server cluster, or a cloud platform, or may be a computer system in electronic equipment such as personal computers, vehicle-mounted equipment, and mobile terminals. The aforementioned camera may be a vehicle-mounted device such as a camera in a driving recorder, a camera of a smart terminal, and the like. The above-mentioned smart terminal may include electronic devices such as smart phones, PDAs (Personal Digital Assistants), tablet computers, and vehicle-mounted devices. In the specific implementation process, the camera and the computer system can be independent of each other, while being connected to each other to jointly implement the gaze area detection method provided by the embodiments of the present disclosure. The following uses a computer system as an example to describe in detail the gaze area detection method provided by the present disclosure.
图1是根据本公开的示例性实施例的一种注视区域检测方法的流程图。所述方法可以由计算机系统执行,可以应用于各种智能设备(例如,智能交通工具、智能机器人、智能家居设备等)。如图1所示,该方法可以包括步骤11~13。Fig. 1 is a flowchart of a gaze area detection method according to an exemplary embodiment of the present disclosure. The method can be executed by a computer system and can be applied to various smart devices (for example, smart vehicles, smart robots, smart home devices, etc.). As shown in Figure 1, the method may include steps 11-13.
在步骤11,获取在预定三维空间内采集的人脸图像。In step 11, a face image collected in a predetermined three-dimensional space is acquired.
以M型号的车辆为例,预定三维空间为该车辆的空间,可在该车辆的内部空间如中控台位置固定安装有一台摄像头,该摄像头可以实时或按照预设时间周期等方式采集目标对象如驾驶员的人脸图像提供给计算机系统,使得该计算机系统获取所采集的人脸图像。Take the M model vehicle as an example. The predetermined three-dimensional space is the space of the vehicle. A camera can be fixedly installed in the internal space of the vehicle such as the center console. The camera can collect target objects in real time or according to a preset time period. For example, the driver's face image is provided to the computer system, so that the computer system obtains the collected face image.
在步骤12,基于所述人脸图像进行视线检测以得到视线检测结果。In step 12, line of sight detection is performed based on the face image to obtain a line of sight detection result.
本公开实施例中,计算机系统可以基于上述人脸图像进行人眼视线检测,获得视线检测结果。人眼视线检测是基于对人脸图像中人眼的位置和/或视线方向进行分析,得到视线检测结果。本公开对进行人眼视线的方法并不限制,即可以采用本公开实施例提及的方法进行人眼视线检测,也可采用传统的其他方法进行人眼视线检测。上述视线检测结果可以包括人脸图像中人的视线起点信息和视线方向信息,还可以包括人脸图像中人的头部姿态等信息。In the embodiment of the present disclosure, the computer system can perform the line of sight detection of the human eye based on the aforementioned face image, and obtain the line of sight detection result. The line of sight detection is based on analyzing the position and/or direction of the line of sight of the human eye in the face image to obtain the line of sight detection result. The present disclosure does not limit the method of detecting the human eye. That is, the method mentioned in the embodiment of the present disclosure may be used to detect the human eye, or other traditional methods may be used to detect the human eye. The above-mentioned line of sight detection result may include the starting point information and the line of sight direction information of the person in the face image, and may also include information such as the head posture of the person in the face image.
在步骤13,利用预先针对所述预定三维空间训练完成的注视区域分类器,根据所述视线检测结果检测所述人脸图像对应的目标注视区域的类别。In step 13, a gaze area classifier that has been trained in advance for the predetermined three-dimensional space is used to detect the category of the target gaze area corresponding to the face image according to the line of sight detection result.
所述目标注视区域属于预先对所述预定三维空间划分得到的多类定义注视区域之一。例如,可以将车辆行进过程中驾驶员可以注视到的各空间作为预定三维空间,如前挡风玻璃、后视镜或者车内的其他空间等等。The target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance. For example, each space that the driver can look at when the vehicle is traveling can be used as a predetermined three-dimensional space, such as a front windshield, a rear-view mirror, or other spaces in the vehicle.
如上示例,计算机系统在获得上述人脸图像中人的视线检测结果之后,可以将上述视线检测结果输入至预先训练好的、针对上述M型号智能驾驶车辆的注视区域分类器中,从而检测出上述人脸图像对应的目标注视区域的类别,即检测出在图像采集时人脸图像中的人如驾驶员正在注视车辆的什么区域。As in the above example, after obtaining the gaze detection result of the person in the face image, the computer system can input the gaze detection result into the pre-trained gaze area classifier for the M-type intelligent driving vehicle, thereby detecting the above The category of the target gaze area corresponding to the face image is to detect which area of the vehicle the person in the face image, such as the driver, is looking at when the image is collected.
本公开中,上述针对预定三维空间的注视区域分类器是由计算机系统预先基于针对上述预定三维空间的训练样本集训练完成的,其中,所述训练样本集包括多个视线特征样本,每个所述视线特征样本包括视线起点信息、视线方向信息、以及该视线特征样本对应的注视区域类别的标注信息,标注的注视区域的类别属于针对所述预定三维空间划分的多类定义注视区域之一。In the present disclosure, the above-mentioned gaze area classifier for the predetermined three-dimensional space is pre-trained by the computer system based on the training sample set for the above-mentioned predetermined three-dimensional space, wherein the training sample set includes a plurality of gaze feature samples, each of which is The gaze feature sample includes gaze starting point information, gaze direction information, and annotation information of a gaze area category corresponding to the gaze feature sample, and the type of the gaze area marked belongs to one of multiple types of defined gaze areas divided into the predetermined three-dimensional space.
根据本公开实施例,在针对预定三维空间训练注视区域分类器之前,对预定三维空间中人眼视线可能关注的三维空间区域进行了精细分类,获得多类定义注视区域,并基于每一类定义注视区域对应的训练样本集进行分类器训练,获得针对预定三维空间的注视区域分类器。后续利用该注视区域分类器根据视线检测结果即可准确检测目标注视区域信息,计算简单并能有效减小目标注视区域的误判率,从而可以为后续操作提供更准确的信息。According to the embodiment of the present disclosure, before training the gaze area classifier for the predetermined three-dimensional space, the three-dimensional space areas that the human eye may pay attention to in the predetermined three-dimensional space are finely classified, and multiple types of defined gaze areas are obtained, and based on each type of definition The training sample set corresponding to the gaze area is trained on a classifier to obtain a gaze area classifier for a predetermined three-dimensional space. Subsequent use of the gaze area classifier can accurately detect the target gaze area information based on the gaze detection result, which is simple to calculate and can effectively reduce the misjudgment rate of the target gaze area, thereby providing more accurate information for subsequent operations.
上述步骤12对应的视线检测阶段与预定三维空间中多类定义注视区域的分布无关,而上述步骤13对应的注视区域检测阶段则与上述多类定义注视区域在预定三维空间中的分布有关。例如,由于不同型号车辆的整体空间大小可能不同,而且同一类别区域如杂物箱在不同车辆空间中的位置可能不同,因此针对不同三维空间对多类定义注视区域的划分也可能不同,如定义注视区域的数量和类别不同。因此,需要针对不同的三维空间训练不同的注视区域分类器,例如,针对空间分布不同的M型车和N型车,分别训练不同的注视区域分类器。The gaze detection stage corresponding to step 12 has nothing to do with the distribution of multiple types of defined gaze areas in the predetermined three-dimensional space, and the gaze area detection stage corresponding to step 13 is related to the distribution of the multiple types of defined gaze areas in the predetermined three-dimensional space. For example, since the overall space size of different models of vehicles may be different, and the location of the same type of area such as a glove box in different vehicle spaces may be different, the division of multiple types of defined gaze areas for different three-dimensional spaces may also be different, such as the definition The number and types of fixation areas are different. Therefore, different gaze area classifiers need to be trained for different three-dimensional spaces, for example, different gaze area classifiers are trained for M-type cars and N-type cars with different spatial distributions.
因此,对于不同型号的车辆可以采用相同的方法进行视线检测,在更换车型时只需重新训练注视区域分类器即可。相比于端到端方式下重新训练整个卷积神经网络,注视区域分类器的训练相对简单,不 需要那么多数据,且训练速度快,因而可以显著地降低在不同车型间迁移使用上述注视区域检测方法的时间成本和技术难度。Therefore, the same method can be used for sight detection for different models of vehicles, and only the gaze area classifier needs to be retrained when changing models. Compared with retraining the entire convolutional neural network in an end-to-end manner, the training of the gaze region classifier is relatively simple, does not require so much data, and the training speed is fast, so it can significantly reduce the migration and use of the above gaze regions between different models The time cost and technical difficulty of the detection method.
在本公开另一实施例中,上述注视区域检测方法还可以包括:在上述步骤11之前,获取针对所述预定三维空间训练完成的注视区域分类器。本公开中,可以采用下列方式一或方式二获取针对所述预定三维空间训练完成的注视区域分类器。In another embodiment of the present disclosure, the above-mentioned gaze area detection method may further include: before step 11, obtaining a gaze area classifier that has been trained for the predetermined three-dimensional space. In the present disclosure, the following method 1 or method 2 may be used to obtain the gaze region classifier completed for the predetermined three-dimensional space training.
方式一,在需要进行注视区域检测时,实时训练针对预定三维空间的注视区域分类器。The first way is to train a gaze area classifier for a predetermined three-dimensional space in real time when gaze area detection is required.
如图2所示,实时训练针对预定三维空间的注视区域分类器可以包括:步骤101,将至少一个视线特征样本的视线起点信息和视线方向信息输入待训练的注视区域分类器,获得该视线特征样本对应的注视区域类别预测信息;步骤102,根据所述注视区域类别预测信息和该视线特征样本对应的注视区域类别的标注信息之间的偏差,对所述注视区域分类器进行参数调整,以训练所述注视区域分类器。As shown in FIG. 2, real-time training of a gaze area classifier for a predetermined three-dimensional space may include: step 101, inputting the gaze starting point information and gaze direction information of at least one gaze feature sample into the gaze area classifier to be trained to obtain the gaze feature The gaze area category prediction information corresponding to the sample; step 102, according to the deviation between the gaze area category prediction information and the annotation information of the gaze area category corresponding to the gaze feature sample, adjust the parameters of the gaze area classifier to Training the gaze area classifier.
例如,上述预定三维空间可以是某一型号车辆的空间。首先可以确定用于采集人脸图像的摄像头的固定位置,比如,将摄像头固定于中控台位置,以采集驾驶区域中驾驶员的人脸图像,后续分类器训练阶段以及检测阶段所需要的人脸图像均由该固定位置的上述摄像头采集。For example, the aforementioned predetermined three-dimensional space may be the space of a certain model of vehicle. First, determine the fixed position of the camera used to collect facial images. For example, fix the camera to the position of the center console to collect the facial image of the driver in the driving area. The subsequent classifier training phase and the detection phase need people The face images are all collected by the above-mentioned camera at the fixed position.
同时,针对上述车辆的不同部位进行注视区域划分,主要是根据驾驶员在车辆驾驶过程中眼睛需要关注的区域,在上述车辆空间中划分出多类定义注视区域,并分别针对每一类定义注视区域设置对应的类别信息。At the same time, the gaze area is divided for different parts of the above-mentioned vehicle, mainly according to the area that the driver needs to pay attention to during the driving of the vehicle, and multiple types of defined gaze areas are divided in the above-mentioned vehicle space, and each type of gaze is defined separately Category information corresponding to the locale.
在本公开一实施例中,对车辆空间划分得到的多类定义注视区域可以包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮光板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。In an embodiment of the present disclosure, the multiple types of defined gaze areas obtained by dividing the vehicle space may include at least two of the following: left front windshield area, right front windshield area, instrument panel area, interior rearview mirror area, Center console area, left rearview mirror area, right rearview mirror area, visor area, shift lever area, under the steering wheel, co-pilot area, glove box area in front of the co-pilot.
图3是根据本公开的示例性实施例的多类定义注视区域的示意图。针对一预设型号的车辆,可以确定如下多类定义注视区域:左前挡风玻璃、右前挡风玻璃、仪表盘、车内后视镜、中控台、左后视镜、右后视镜、遮阳板、换挡杆、手机。可以分别针对每一类定义注视区域预先设置对应的类别信息,如利用数字表示的类别值。上述多类定义注视区域与预设类别值之间的对应关系可以如表1所示:FIG. 3 is a schematic diagram of multiple types of defined gaze areas according to an exemplary embodiment of the present disclosure. For a vehicle of a preset model, the following multiple types of defined gaze areas can be determined: left front windshield, right front windshield, instrument panel, interior rearview mirror, center console, left rearview mirror, right rearview mirror, Sun visor, shift lever, mobile phone. Corresponding category information can be preset for each type of defined gaze area, such as a category value represented by a number. The corresponding relationship between the multiple types of defined gaze areas and the preset category values can be shown in Table 1:
表1Table 1
定义注视区域Define the gaze area 类别值Category value
左前挡风玻璃Left front windshield 11
右前挡风玻璃Right front windshield 22
仪表盘 dash board 33
车内后视镜 Rearview mirror 44
中控台 Center console 55
左后视镜Left rearview mirror 66
右后视镜Right rearview mirror 77
遮阳板 Sun visor 88
换挡杆 Shift lever 99
手机 Cell phone 1010
需要说明的是,上述类别信息也可以采用预设英文字母如A、B、C…J等表示。It should be noted that the above category information can also be represented by preset English letters such as A, B, C...J, etc.
之后,采集人脸图像样本,获得训练样本集。该训练样本集可以包括多个视线特征样本,其中,每个所述视线特征样本包括视线起点信息、视线方向信息、以及该视线特征样本对应的注视区域类别的标注信息,标注的注视区域的类别属于针对所述预定三维空间划分的多类定义注视区域之一。其中,关于如何基于人脸图像确定人的视线起点信息和视线方向信息,将在后面详细描述。After that, collect face image samples to obtain a training sample set. The training sample set may include a plurality of gaze feature samples, wherein each of the gaze feature samples includes gaze starting point information, gaze direction information, and label information of the gaze area category corresponding to the gaze feature sample, and the type of the gaze area that is labeled It belongs to one of multiple types of defined gaze areas divided for the predetermined three-dimensional space. Among them, how to determine the start point information and the line of sight direction information of the person based on the face image will be described in detail later.
然后,根据上述训练样本集,通过迭代执行以下步骤来训练针对上述预定三维空间的分类器:将上述训练样本集中一个视线特征样本的视线起点信息和视线方向信息输入待训练的注视区域分类器,获得该视线特征样本对应的注视区域类别的预测信息;根据针对该视线特征样本的所述注视区域类别的预测信息和注视区域类别的标注信息之间的偏差,对所述注视区域分类器进行参数调整,以训练所述注视区域分类器。Then, according to the above training sample set, the following steps are performed iteratively to train the classifier for the predetermined three-dimensional space: the gaze starting point information and gaze direction information of a gaze feature sample in the training sample set are input into the gaze area classifier to be trained, Obtain the prediction information of the gaze area category corresponding to the gaze feature sample; perform parameters on the gaze area classifier according to the deviation between the prediction information of the gaze area category and the label information of the gaze area category for the gaze feature sample Adjustment to train the gaze area classifier.
在一示例性实施例中,上述步骤102可以包括:根据同一视线特征样本的注视区域类别的预测值和注视区域类别的标注值之间的差值,获得损失函数值;当所述损失函数值满足预设训练终止条件时,终止训练,将当前训练阶段的分类器确定为训练完成的分类器;反之,若所述损失函数值不满足上述预设训练终止条件,则基于所述损失函数值对所述注视区域分类器进行参数调整。In an exemplary embodiment, the foregoing step 102 may include: obtaining a loss function value according to the difference between the predicted value of the gaze area category and the label value of the gaze area category of the same gaze feature sample; when the loss function value When the preset training termination condition is met, the training is terminated, and the classifier in the current training stage is determined as the classifier that has been trained; otherwise, if the loss function value does not meet the preset training termination condition, it is based on the loss function value The parameters of the gaze area classifier are adjusted.
本公开实施例中,损失函数是在训练过程中用于度量分类器模型对训练样本误分类程度的数学化表示。损失函数值可以基于整个训练样本集求得,上述损失函数值越大表示当前训练阶段的分类器的误分类概率越大,反之上述损失函数值越小表示当前训练阶段的分类器的误分类概率越小。In the embodiment of the present disclosure, the loss function is a mathematical expression used to measure the degree of misclassification of training samples by the classifier model during the training process. The value of the loss function can be obtained based on the entire training sample set. The larger the value of the above loss function, the greater the probability of misclassification of the classifier in the current training stage. On the contrary, the smaller the value of the above loss function indicates the probability of misclassification of the classifier in the current training stage. The smaller.
上述预设训练终止条件为终止训练注视区域分类器的条件。在一实施例中,上述预设训练终止条件可以是:预设损失函数的损失函数值小于预设阈值。理想情况下,上述预设训练终止条件应该是损失函数值等于0,表示当前分类器预测的注视区域类别均正确。在实际操作中,考虑到注视区域分类器训练效率和训练成本的问题,上述预设阈值可以是预设的一个经验值。The aforementioned preset training termination condition is a condition for terminating the training of the gaze area classifier. In an embodiment, the foregoing preset training termination condition may be: the loss function value of the preset loss function is less than the preset threshold. Ideally, the aforementioned preset training termination condition should be that the loss function value is equal to 0, which means that the gaze area categories predicted by the current classifier are correct. In actual operation, considering the training efficiency and training cost of the gaze area classifier, the above-mentioned preset threshold may be a preset empirical value.
如上示例,若当前损失函数值大于或等于上述预设阈值,表示当前训练阶段的分类器的预测结果准确率还达不到预期,因此可以利用上述损失函数值调整注视区域分类器的相关参数,然后利用更新参数后的注视区域分类器迭代执行步骤101和步骤102,直至满足预设训练终止条件,获得上述针对预定三维空间训练完成的注视区域分类器。As in the above example, if the current loss function value is greater than or equal to the above preset threshold, it means that the accuracy of the prediction result of the current training stage of the classifier is not as expected. Therefore, the above loss function value can be used to adjust the relevant parameters of the gaze area classifier. Then, the gaze area classifier with updated parameters is used to iteratively execute step 101 and step 102 until the preset training termination condition is met, and the gaze area classifier completed for the predetermined three-dimensional space training is obtained.
本公开实施例中,计算机系统可以采用支持向量机、朴素贝叶斯、决策树、随机森林、K近邻(K-means)等算法训练上述注视区域分类器。In the embodiments of the present disclosure, the computer system may use algorithms such as support vector machines, naive Bayes, decision trees, random forests, and K-means to train the above-mentioned gaze area classifier.
本申请实施例中,针对预定三维空间的变化,只需重新确定训练样本集并训练对应的注视区域分类器。由于分类器的训练不需要大量数据并且训练速度较快,因而可以显著降低了在不同三维空间(如不同车型的空间)之间迁移使用注视区域检测方法的时间成本和技术难度。In the embodiment of the present application, for changes in the predetermined three-dimensional space, it is only necessary to re-determine the training sample set and train the corresponding gaze area classifier. Since the training of the classifier does not require a large amount of data and the training speed is relatively fast, it can significantly reduce the time cost and technical difficulty of using the gaze area detection method to migrate between different three-dimensional spaces (such as the space of different car models).
方式二,在需要进行注视区域检测时,从预设存储资源中直接获取上述针对预定三维空间的注视区域分类器。Manner 2: When the gaze area detection needs to be performed, the gaze area classifier for the predetermined three-dimensional space is directly obtained from the preset storage resource.
在本公开一实施例中,计算机系统可以将针对每一种预定三维空间训练完成的注视区域分类器与该预定三维空间的空间标识相关联地存储至指定存储资源中,比如云端服务器中,形成预设注视区域分类器集合。在上述智能驾驶应用场景中,上述预设注视区域分类器集合可以包括多个车辆型号与注视区域分类器的对应关系,如表2所示:In an embodiment of the present disclosure, the computer system may store the gaze area classifier completed for each predetermined three-dimensional space training in association with the spatial identifier of the predetermined three-dimensional space in a designated storage resource, such as a cloud server, to form A set of preset gaze area classifiers. In the above-mentioned intelligent driving application scenario, the above-mentioned preset gaze area classifier set may include the correspondence between multiple vehicle models and gaze area classifiers, as shown in Table 2:
表2Table 2
车辆型号Vehicle model 分类器Classifier
M01M01 第一分类器First classifier
M02M02 第二分类器Second classifier
M03M03 第三分类器Third classifier
若一辆已知型号(例如型号为M01)的新车的计算机系统中未设置注视区域分类器程序,则该车辆 在进行注视区域检测之前,可以根据自身的型号(例如M01)自动从云端服务器下载对应的目标注视区域分类器程序(例如上述第一分类器对应的计算机程序),从而快速实现注视区域检测。If the computer system of a new car with a known model (for example, model M01) does not have a gaze area classifier program, the vehicle can automatically download it from the cloud server according to its own model (for example, M01) before performing gaze area detection The corresponding target gaze area classifier program (for example, the computer program corresponding to the above-mentioned first classifier), so as to quickly realize the gaze area detection.
本公开实施例中,上述步骤12获得的人眼视线检测结果至少包括上述人脸图像中人的视线起点信息和视线方向信息,还可以包括人脸图像中人的头部姿态信息。In the embodiment of the present disclosure, the human eye sight detection result obtained in the above step 12 includes at least the starting point information and the sight direction information of the person in the face image, and may also include the head posture information of the person in the face image.
根据本公开的实施例,如图4所示,可以通过执行步骤1211~1212来确定人脸图像中人的视线起点信息。According to an embodiment of the present disclosure, as shown in FIG. 4, steps 1211-1212 may be executed to determine the starting point information of the person's line of sight in the face image.
在步骤1211,检测所述人脸图像中的眼睛位置。In step 1211, the position of the eyes in the face image is detected.
本公开实施例中,上述眼睛位置为人脸图像中的人眼在实际相机坐标系中的位置。上述实际相机坐标系为计算机系统基于上述摄像头确定的空间直角坐标系。上述摄像头为上述预定三维空间中拍摄上述人脸图像的摄像头,可以标记为摄像头C0。In the embodiment of the present disclosure, the aforementioned eye position is the position of the human eye in the face image in the actual camera coordinate system. The aforementioned actual camera coordinate system is a spatial rectangular coordinate system determined by the computer system based on the aforementioned camera. The aforementioned camera is a camera that captures the aforementioned human face image in the aforementioned predetermined three-dimensional space, and may be marked as a camera C0.
该实际相机坐标系的Z轴为上述摄像头的光轴,摄像头透镜的光心为该预设实际相机坐标系的原点。实际相机坐标系的水平轴即X轴、垂直轴即Y轴平行于上述摄像头的镜头平面。The Z axis of the actual camera coordinate system is the optical axis of the aforementioned camera, and the optical center of the camera lens is the origin of the preset actual camera coordinate system. The horizontal axis of the actual camera coordinate system is the X axis and the vertical axis is the Y axis parallel to the lens plane of the camera.
本公开实施例中,计算机系统可以采用以下任一方式检测人脸图像中的眼睛位置:第一种方式,基于至少两台摄像头针对同一目标对象如上述驾驶员同时采集的至少两帧人脸图像,利用相机标定方法获取上述人脸图像中的眼睛位置,上述至少两个摄像头中包括采集待测人脸图像的摄像头;第二种方式,检测所述人脸图像中人的头部姿态信息,基于所述头部姿态信息检测所述人脸图像中的眼睛位置。In the embodiment of the present disclosure, the computer system can detect the eye position in the face image in any of the following ways: The first way is based on at least two frames of face images simultaneously collected by at least two cameras for the same target object, such as the above-mentioned driver Using a camera calibration method to obtain the eye positions in the face image, the at least two cameras include cameras that collect the face image to be measured; the second method is to detect the head posture information of the person in the face image, Detecting the position of the eyes in the face image based on the head posture information.
在本公开一实施例中,计算机系统可以根据一台摄像头拍摄的人脸图像,利用相关技术中的头部姿态估计方法如柔性模型方法、几何方法等,确定上述驾驶员的头部姿态信息,并基于头部姿态信息获取目标对象的眼睛在预设实际相机坐标系下的3D位置,上述预设实际相机坐标系是基于上述摄像头C0确定的相机坐标系。In an embodiment of the present disclosure, the computer system can determine the above-mentioned head posture information of the driver by using head posture estimation methods in related technologies, such as flexible model methods, geometric methods, etc., according to a face image taken by a camera. The 3D position of the eyes of the target object in the preset actual camera coordinate system is acquired based on the head posture information, and the preset actual camera coordinate system is based on the camera coordinate system determined by the camera C0.
采用上述第二种眼睛位置确定方式,利用一台摄像头即单目相机所采集的人脸图像,即可实现人眼3D位置的确定,从而可以节约用于注视区域检测的硬件配置成本。Using the above-mentioned second eye position determination method, the 3D position of the human eye can be determined by using the face image collected by a single camera, that is, a monocular camera, so that the hardware configuration cost for gaze area detection can be saved.
在步骤1212,根据所述眼睛位置确定所述人脸图像中人的视线起点信息。In step 1212, the starting point information of the line of sight of the person in the face image is determined according to the eye position.
本公开中,上述步骤1211从人脸图像中检测的眼睛位置可能包括人脸图像中的目标对象如驾驶员单眼的位置,也可能包括双眼的位置(即驾驶员左眼和右眼的位置)。In the present disclosure, the eye position detected from the face image in step 1211 may include the target object in the face image, such as the position of a single eye of the driver, and may also include the positions of both eyes (that is, the positions of the left and right eyes of the driver). .
相应的,本公开实施例中可以采用以下方式一或方式二确定上述人脸图像中人的视线起点信息。Correspondingly, in the embodiments of the present disclosure, the following method 1 or method 2 may be used to determine the starting point information of the person's line of sight in the face image.
方式一,根据单眼的位置确定上述人脸图像中人的视线起点信息。在一实施例中,若步骤1211确定的所述眼睛位置包括双眼的位置,可以根据其中任意一只眼睛的位置确定上述人脸图像中人的视线起点信息。在另一实施例中,若步骤1211确定的眼睛位置包括单眼的位置,则根据该单眼的位置确定上述人脸图像中人的视线起点信息。Manner 1: Determine the starting point of the person's line of sight in the face image according to the position of the single eye. In an embodiment, if the eye positions determined in step 1211 include the positions of both eyes, the starting point information of the line of sight of the person in the face image can be determined according to the position of any one of the eyes. In another embodiment, if the eye position determined in step 1211 includes the position of a single eye, the starting point information of the line of sight of the person in the face image is determined according to the position of the single eye.
方式二,若步骤1211确定的所述眼睛位置包括双眼的位置,确定所述双眼的中间位置为所述视线起点信息,其中,上述双眼的中间位置可以是双眼3D坐标连线的中点位置,也可以是双眼3D坐标连线上的其它位置。Manner 2: If the position of the eyes determined in step 1211 includes the positions of both eyes, the middle position of the eyes is determined to be the line-of-sight starting point information, where the middle position of the eyes may be the middle point position of the 3D coordinate connection of the eyes, It can also be other positions on the 3D coordinate line of the eyes.
本公开实施例中,采用上述方式二确定人脸图像中人的视线起点信息,相较于上述方式一,有利于消除因单眼检测误差而导致的视线起点信息的不准确,进而提高视线检测结果的准确度。In the embodiment of the present disclosure, the second method described above is used to determine the start point information of the person's line of sight in the face image. Compared with the first method described above, it is beneficial to eliminate the inaccuracy of the start point information of the line of sight caused by the monocular detection error, thereby improving the line of sight detection result. Accuracy.
根据本公开的实施例,如图5所示,可以通过执行步骤1221~1222来检测人脸图像中人的视线方向信息。According to an embodiment of the present disclosure, as shown in FIG. 5, steps 1221-1222 can be executed to detect the line of sight direction information of the person in the face image.
在步骤1221,检测人脸图像中人的头部姿态信息。In step 1221, the head posture information of the person in the face image is detected.
如上所述,计算机系统可以根据一台摄像头拍摄的人脸图像,利用相关技术中的头部姿态估计方法如柔性模型方法、几何方法等,确定上述驾驶员的头部姿态信息。As mentioned above, the computer system can determine the head posture information of the driver by using the head posture estimation methods in related technologies such as flexible model method and geometric method according to the face image taken by a camera.
上述柔性模型方法是指在图像平面的头部图像脸部结构上匹配一个柔性的模型如主动形状模型(Active Shape Model,ASM)、主动外观模型(Active Appearance Model,AAM)、弹性图匹配模型(Elastic  Graph Matching,EGM)等,通过特征比较或是模型的参数得到头部姿态估计的最终结果。The above-mentioned flexible model method refers to matching a flexible model such as Active Shape Model (ASM), Active Appearance Model (AAM), and elastic image matching model on the head image and face structure of the image plane. Elastic Graph Matching (EGM), etc., obtain the final result of head pose estimation through feature comparison or model parameters.
几何方法是指利用头部的形状和脸部局部特征点的精确形态信息如眼睛、鼻子、嘴巴的相关位置,估计头部姿态。The geometric method refers to the use of the shape of the head and the accurate morphological information of the local feature points of the face, such as the relative positions of the eyes, nose, and mouth, to estimate the head posture.
根据本公开实施例,可以基于单目相机采集的单帧图像,估计图像中人的头部姿态。According to the embodiments of the present disclosure, the head posture of a person in the image can be estimated based on a single frame image collected by a monocular camera.
根据本公开的实施例,如图6所示,可以通过执行步骤1201~1202来检测人脸图像中人的头部姿态信息(步骤1221)。According to an embodiment of the present disclosure, as shown in FIG. 6, the head posture information of the person in the face image can be detected by performing steps 1201 to 1202 (step 1221).
在步骤1201,检测所述人脸图像中的多个人脸关键点。In step 1201, multiple key points of the face in the face image are detected.
本公开一实施例中,可以通过Robert算法、Sobel算法等边缘检测算法进行人脸关键点检测,也可通过相关模型如主动轮廓线模型(如Snake模型)等进行人脸关键点检测。In an embodiment of the present disclosure, the key points of the face can be detected by edge detection algorithms such as the Robert algorithm and the Sobel algorithm, and the key points of the face can also be detected by related models such as active contour models (such as the Snake model).
在本公开另一实施例中,可以通过用于进行人脸关键点检测的神经网络进行人脸关键点检测。此外,还可以通过第三方应用(例如Dlib工具包)来进行人脸关键点检测。In another embodiment of the present disclosure, face key point detection may be performed by a neural network used for face key point detection. In addition, a third-party application (such as the Dlib toolkit) can also be used for face key point detection.
采用上述方法可以检测出预设数量(如160个)的脸部关键点位置,可以包括左眼角、右眼角、鼻尖、左嘴角、右嘴角、下颌等人脸关键点的位置坐标。可以理解的是,人脸关键点检测方法不同,得到的人脸关键点位置坐标的数量也可能不同。例如,采用Dlib工具包可以检测出68个脸部关键点位置。Using the above method, a preset number (such as 160) of facial key point positions can be detected, which may include the position coordinates of the key points of the face such as the left eye corner, the right eye corner, the nose tip, the left mouth corner, the right mouth corner, and the lower jaw. It is understandable that the number of face key point position coordinates obtained may be different according to different face key point detection methods. For example, using the Dlib toolkit can detect 68 key points on the face.
在步骤1202,基于所检测出的人脸关键点和预设平均人脸模型,确定所述人脸图像中人的头部姿态信息。In step 1202, based on the detected key points of the face and a preset average face model, the head posture information of the person in the face image is determined.
返回参见图5,在步骤1222,基于所述头部姿态信息检测所述人脸图像中人的视线方向信息。Referring back to FIG. 5, in step 1222, the line of sight direction information of the person in the face image is detected based on the head posture information.
本公开实施例中,可以基于头部姿态信息,利用已训练的神经网络检测上述人脸图像中人的视线方向信息。In the embodiments of the present disclosure, based on head posture information, a trained neural network may be used to detect the line of sight direction information of the person in the face image.
参见图7,所述步骤1222可以包括步骤12221~12223。Referring to FIG. 7, the step 1222 may include steps 12221 to 12223.
在步骤12221,根据所述头部姿态信息对所述人脸图像进行规范化处理,获得转正人脸图像。In step 12221, normalize the face image according to the head posture information to obtain a normalized face image.
在实际操作中,对于摄像头C0在不同时刻采集的人脸图像,人脸区域图像在整个图像中的位置是随机变化的,图像中人的头部姿态也是随机变化的。若在训练上述神经网络时将摄像头直接采集的人脸图像作为样本图像,势必因人头部姿态以及人脸区域图像位置的随机性而增加神经网络的训练难度及训练时长。In actual operation, for the face images collected by the camera C0 at different times, the position of the face area image in the entire image changes randomly, and the posture of the person's head in the image also changes randomly. If the face image directly collected by the camera is used as the sample image when training the above neural network, the training difficulty and training time of the neural network will be increased due to the randomness of the head posture and the image position of the face area.
根据本公开的实施例,在训练上述用于检测视线方向的神经网络时,为了降低训练难度,首先对训练样本集中的各个样本图像数据做了规范化处理,使得规范化处理后的样本图像数据相当于虚拟摄像机正对着人头部拍摄的图像数据,然后利用规范化处理后的样本图像数据训练该神经网络。According to the embodiments of the present disclosure, when training the above neural network for detecting the direction of sight, in order to reduce the difficulty of training, firstly, each sample image data in the training sample set is normalized, so that the normalized sample image data is equivalent to The virtual camera is facing the image data taken by the human head, and then the normalized sample image data is used to train the neural network.
相应的,在该神经网络的应用阶段,为确保视线方向信息检测的准确性,也需要首先对人脸图像进行规范化处理,获得对应的虚拟相机坐标系下的转正人脸图像,以输入上述神经网络来检测视线方向信息。Correspondingly, in the application stage of the neural network, in order to ensure the accuracy of the detection of the line of sight direction information, it is also necessary to first normalize the face image to obtain the corrected face image in the corresponding virtual camera coordinate system to input the aforementioned nerve Network to detect line of sight direction information.
参见图8A,上述步骤12221可以包括步骤12-1~12-3。Referring to FIG. 8A, the above step 12221 may include steps 12-1 to 12-3.
在步骤12-1,根据所述头部姿态信息确定所述人脸图像中人的头部坐标系。例如,所述头部坐标系的X轴平行于左右眼坐标的连线;所述头部坐标系的Y轴在人脸平面内垂直于所述X轴;所述头部坐标系的Z轴垂直于所述人脸平面;人眼视线的起点为所述头部坐标系的原点。In step 12-1, the head coordinate system of the person in the face image is determined according to the head posture information. For example, the X axis of the head coordinate system is parallel to the line connecting the left and right eye coordinates; the Y axis of the head coordinate system is perpendicular to the X axis in the face plane; the Z axis of the head coordinate system It is perpendicular to the face plane; the starting point of the line of sight of the human eye is the origin of the head coordinate system.
本公开实施例中,计算机系统根据上述人脸图像检测出目标对象的头部姿态信息,相当于计算机系统预测出了目标对象的三维头部模型。该三维头部模型可以表示在摄像头C0采集上述人脸图像时目标对象的头部相对于摄像头C0的姿态信息。在此基础上,计算机系统可以基于头部姿态信息确定目标对象的头部坐标系。In the embodiment of the present disclosure, the computer system detects the head posture information of the target object based on the aforementioned face image, which is equivalent to the computer system predicting the three-dimensional head model of the target object. The three-dimensional head model may represent the posture information of the head of the target object relative to the camera C0 when the camera C0 collects the aforementioned face image. On this basis, the computer system can determine the head coordinate system of the target object based on the head posture information.
该头部坐标系可以表示为空间直角坐标系。上述头部坐标系的X轴可以与上述三维头部模型中两眼的3D位置坐标的连线平行。可以将两眼的3D位置坐标连线的中点即上述人眼视线的起点确定为上述头部坐标系的原点。所述头部坐标系的Y轴在人脸平面内垂直于所述X轴。所述头部坐标系的Z轴垂 直于人脸平面。The head coordinate system can be expressed as a spatial rectangular coordinate system. The X axis of the head coordinate system may be parallel to the line connecting the 3D position coordinates of the two eyes in the three-dimensional head model. The midpoint of the line of the 3D position coordinates of the two eyes, that is, the starting point of the line of sight of the human eye can be determined as the origin of the head coordinate system. The Y axis of the head coordinate system is perpendicular to the X axis in the face plane. The Z axis of the head coordinate system is perpendicular to the face plane.
在步骤12-2,基于所述头部坐标系对所述人脸图像对应的实际相机坐标系进行旋转及平移,获得虚拟相机坐标系。例如,所述虚拟相机坐标系的Z轴指向所述头部坐标系的原点,所述虚拟相机坐标系的X轴与所述头部坐标系的X轴处于同一平面内,所述虚拟相机坐标系的原点与所述头部坐标系的原点之间在所述虚拟相机坐标系的Z轴方向上间隔预设距离。In step 12-2, the actual camera coordinate system corresponding to the face image is rotated and translated based on the head coordinate system to obtain a virtual camera coordinate system. For example, the Z axis of the virtual camera coordinate system points to the origin of the head coordinate system, the X axis of the virtual camera coordinate system and the X axis of the head coordinate system are in the same plane, and the virtual camera coordinates The origin of the system and the origin of the head coordinate system are separated by a predetermined distance in the Z-axis direction of the virtual camera coordinate system.
本公开实施例中,在计算机系统确定了目标对象的头部坐标系后,可以参照上述头部坐标系对上述摄像头进行旋转、平移操作以确定一个虚拟摄像头,并基于上述虚拟摄像头在头部坐标系下的位置,建立上述虚拟摄像头对应的虚拟相机坐标系。该虚拟相机坐标系的建立方法与上述预设实际相机坐标系的建立方法类似,即虚拟相机坐标系的Z轴为上述虚拟摄像头的光轴,上述虚拟相机坐标系的X轴、Y轴平行于该虚拟摄像头的镜头平面;虚拟摄像头镜头的光心为该虚拟相机坐标系的原点。In the embodiments of the present disclosure, after the computer system determines the head coordinate system of the target object, the camera can be rotated and translated with reference to the head coordinate system to determine a virtual camera, and based on the head coordinate system of the virtual camera Establish the virtual camera coordinate system corresponding to the above-mentioned virtual camera. The method for establishing the virtual camera coordinate system is similar to the method for establishing the preset actual camera coordinate system, that is, the Z axis of the virtual camera coordinate system is the optical axis of the virtual camera, and the X and Y axes of the virtual camera coordinate system are parallel to The lens plane of the virtual camera; the optical center of the virtual camera lens is the origin of the virtual camera coordinate system.
上述虚拟相机坐标系与头部坐标系的位置关系满足以下三个条件:The positional relationship between the virtual camera coordinate system and the head coordinate system meets the following three conditions:
条件一、所述虚拟相机坐标系的Z轴指向所述头部坐标系的原点;Condition 1: The Z axis of the virtual camera coordinate system points to the origin of the head coordinate system;
条件二、所述虚拟相机坐标系的X轴与所述头部坐标系的X轴处于同一平面内,其中,虚拟相机坐标系的X轴与所述头部坐标系的X轴的相对位置关系包括但不限于平行关系;Condition 2: The X axis of the virtual camera coordinate system and the X axis of the head coordinate system are in the same plane, wherein the relative positional relationship between the X axis of the virtual camera coordinate system and the X axis of the head coordinate system Including but not limited to parallel relationships;
条件三、所述虚拟相机坐标系的原点与所述头部坐标系的原点在所述虚拟相机坐标系的Z轴方向上间隔预设距离。Condition 3: The origin of the virtual camera coordinate system and the origin of the head coordinate system are separated by a predetermined distance in the Z-axis direction of the virtual camera coordinate system.
上述过程相当于通过对上述摄像头C0进行以下操作而确定一个虚拟摄像头:旋转所述摄像头C0,使其Z轴指向人眼图像中人的三维视线的起点,同时使摄像头C0的X轴与上述头部坐标系的X轴处于同一平面内;将旋转后的摄像头C0沿其Z轴平移,使其镜头的光心到上述头部坐标系的原点的距离为预设长度。The above process is equivalent to determining a virtual camera by performing the following operations on the camera C0: rotating the camera C0 so that the Z axis points to the starting point of the person's three-dimensional line of sight in the human eye image, and making the X axis of the camera C0 coincide with the head The X axis of the head coordinate system is in the same plane; the rotated camera C0 is translated along its Z axis so that the distance between the optical center of the lens and the origin of the head coordinate system is a preset length.
至此,计算机系统可以根据实际相机坐标系与头部坐标系之间的位置关系、虚拟相机坐标系与上述头部坐标系之间的位置关系,确定实际相机坐标系与上述虚拟相机坐标系之间的位置变换关系。So far, the computer system can determine the relationship between the actual camera coordinate system and the aforementioned virtual camera coordinate system based on the positional relationship between the actual camera coordinate system and the head coordinate system, and the positional relationship between the virtual camera coordinate system and the aforementioned head coordinate system. The position transformation relationship.
应当理解的是,本公开中,虚拟相机坐标系与人脸图像中人的头部姿态相关,因此,不同的人脸图像可能对应不同的虚拟相机坐标系。It should be understood that in the present disclosure, the virtual camera coordinate system is related to the head posture of the person in the face image. Therefore, different face images may correspond to different virtual camera coordinate systems.
在步骤12-3,根据所述实际相机坐标系与所述虚拟相机坐标系之间的位置变换关系,对所述人脸图像进行规范化处理,获得所述转正人脸图像。In step 12-3, according to the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system, normalization processing is performed on the face image to obtain the corrected face image.
本公开实施例中,计算机系统可以利用上述实际相机坐标系与虚拟相机坐标系之间的位置变换关系,对上述人脸图像进行旋转、仿射、缩放变换等处理,获得上述虚拟相机坐标系下的转正人脸图像。In the embodiments of the present disclosure, the computer system can use the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system to perform processing such as rotation, affine, and zoom transformation on the face image to obtain the virtual camera coordinate system. Of the face image.
图8B示出了根据一示例性实施例的对所获取的人脸图像进行规范化处理的示意图,其中,图像P0为实际车载摄像头C0针对驾驶员采集的人脸图像,图像P1表示经过上述规范化处理后获得的虚拟相机坐标系下的转正人脸图像,即相当于正对着驾驶员头部的一台虚拟摄像头C1采集的驾驶员人脸图像。FIG. 8B shows a schematic diagram of normalization processing of acquired facial images according to an exemplary embodiment, where the image P0 is the facial image collected by the actual vehicle camera C0 for the driver, and the image P1 represents the normalization processing described above. The corrected face image in the virtual camera coordinate system obtained later is equivalent to the driver's face image collected by a virtual camera C1 facing the driver's head.
返回参见图7,在步骤12222,基于所述转正人脸图像进行视线方向检测,获得第一检测视线方向。例如,上述第一检测视线方向为所述虚拟相机坐标系下的三维视线方向信息,可以是三维方向向量。Referring back to FIG. 7, in step 12222, the line of sight direction detection is performed based on the corrected face image to obtain the first detected line of sight direction. For example, the first detected line of sight direction is the three-dimensional line of sight direction information in the virtual camera coordinate system, and may be a three-dimensional direction vector.
本公开实施例中,可以将经过上述规范化处理的转正人脸图像输入已训练好的用于检测视线方向的神经网络,以检测出上述转正人脸图像中人的三维视线方向信息。上述用于检测视线方向的神经网络可以包括深度神经网络(deep neural network,DNN)如卷积神经网络(convolutional neural network,CNN)等。In the embodiment of the present disclosure, the normalized face image that has undergone the above-mentioned normalization processing may be input to a trained neural network for detecting the line of sight direction to detect the three-dimensional line of sight information of the person in the above-mentioned corrected face image. The aforementioned neural network for detecting the direction of the line of sight may include a deep neural network (DNN) such as a convolutional neural network (convolutional neural network, CNN), etc.
在步骤12223,对所述第一检测视线方向进行坐标逆变换处理,获得所述人脸图像中人的视线方向信息。In step 12223, perform coordinate inverse transformation processing on the first detected line of sight direction to obtain the person's line of sight direction information in the face image.
在后续注视区域检测阶段,需要向注视区域分类器输入实际相机坐标系下的视线特征向量。因此,本公开中,在计算机系统检测出虚拟相机坐标系下的视线方向信息即上述第一检测视线方向之后,还需要对上第一检测视线方向进行从虚拟相机坐标系到上述实际相机坐标系的坐标逆变换处理,获得上 述实际相机坐标系下的视线方向信息。In the subsequent stage of gaze area detection, it is necessary to input the gaze feature vector in the actual camera coordinate system to the gaze area classifier. Therefore, in the present disclosure, after the computer system detects the line-of-sight direction information in the virtual camera coordinate system, that is, the first detected line-of-sight direction, it is also necessary to perform the upper first detection line-of-sight direction from the virtual camera coordinate system to the actual camera coordinate system. The coordinate inverse transformation process is used to obtain the line-of-sight direction information in the actual camera coordinate system.
返回参见图1,上述步骤12相当于确定人脸图像中人的视线特征向量的过程,该视线特征向量包括人脸图像中人的视线起点信息和视线方向信息。Referring back to FIG. 1, the above step 12 is equivalent to the process of determining the line of sight feature vector of the person in the face image, and the line of sight feature vector includes the start point information and the line of sight direction information of the person in the face image.
在比如智能驾驶的实际应用中,上述对人脸图像进行视线特征向量提取的过程并不会因车型的改变而改变,该阶段所使用的人工神经网络如用于检测人脸关键点的神经网络、用于检测视线方向的神经网络等可以适用于不同车型中,具有很好的迁移性。In practical applications such as intelligent driving, the process of extracting the sight feature vector of the face image will not change due to the change of the vehicle model. The artificial neural network used in this stage is such as the neural network used to detect the key points of the face , The neural network used to detect the direction of the line of sight, etc., can be applied to different car models and has good mobility.
如上所述,根据本公开一实施例,在步骤13,可以将在步骤12确定的人脸图像中人的视线起点信息和视线方向信息输入预先针对预定三维空间训练完成的注视区域分类器中,以检测所述人脸图像对应的目标注视区域的类别。As described above, according to an embodiment of the present disclosure, in step 13, the gaze starting point information and gaze direction information of the person in the face image determined in step 12 can be input into the gaze region classifier that has been trained in advance for a predetermined three-dimensional space. To detect the category of the target gaze area corresponding to the face image.
在本公开实施例中,上述步骤13可以包括:根据所述目标注视区域的类别确定目标注视区域信息,并输出所述目标注视区域信息。In the embodiment of the present disclosure, the above step 13 may include: determining target gaze area information according to the category of the target gaze area, and output the target gaze area information.
比如,分类器可以输出目标注视区域的类别,如图9A所示,或者,直接输出目标注视区域的名称,如图9B所示。For example, the classifier may output the category of the target gaze area, as shown in FIG. 9A, or directly output the name of the target gaze area, as shown in FIG. 9B.
在本公开另一实施例中,上述注视区域检测方法还可以包括:在上述步骤11之前,训练用于检测视线方向的神经网络。该步骤对应三维视线方向估计模型的训练过程。需要说明的是,该步骤与图2所示的实时训练注视区域分类器的过程可以在不同计算机系统中执行。In another embodiment of the present disclosure, the above-mentioned gaze area detection method may further include: before the above-mentioned step 11, training a neural network for detecting the direction of the line of sight. This step corresponds to the training process of the 3D line of sight direction estimation model. It should be noted that this step and the process of real-time training of the gaze area classifier shown in FIG. 2 can be executed in different computer systems.
图10是根据本公开的示例性实施例的训练用于检测三维视线方向的神经网络的方法的流程图。该方法可以包括步骤1001~1005。FIG. 10 is a flowchart of a method of training a neural network for detecting a three-dimensional line of sight direction according to an exemplary embodiment of the present disclosure. The method may include steps 1001-1005.
在步骤1001,确定包含至少一个人脸样本的原始样本集,其中,每个所述人脸样本包括人脸图像样本和视线方向标注信息。In step 1001, an original sample set containing at least one face sample is determined, where each face sample includes a face image sample and line-of-sight direction label information.
本公开实施例中,可以采用监督学习方法训练上述神经网络。相应的,用于训练上述神经网络的样本集中的每一个样本可以包含:用于预测的输入信息即人脸图像样本;和该输入信息相应的真实值即实际相机坐标系下实际测得的视线方向信息。本公开实施例中,也将上述实际测得的视线方向信息称为视线方向标注信息。In the embodiments of the present disclosure, the above-mentioned neural network may be trained by a supervised learning method. Correspondingly, each sample in the sample set used to train the aforementioned neural network may include: input information used for prediction, that is, a face image sample; and the true value corresponding to the input information is the actual line of sight measured in the actual camera coordinate system Direction information. In the embodiments of the present disclosure, the above-mentioned actually measured line-of-sight direction information is also referred to as line-of-sight direction marking information.
在步骤1002,根据人脸关键点和平均人脸模型,确定每一个所述人脸图像样本对应的头部姿态信息。In step 1002, according to the key points of the face and the average face model, head posture information corresponding to each of the face image samples is determined.
在步骤1003,基于所述头部姿态信息和所述实际相机坐标系,确定每一个所述人脸图像样本对应的转正人脸图像样本和所述视线方向标注信息在所述虚拟坐标系下的虚拟视线方向标注信息;In step 1003, based on the head posture information and the actual camera coordinate system, determine the normalized face image sample corresponding to each of the face image samples and the line-of-sight direction label information in the virtual coordinate system. Information on virtual line of sight direction;
上述步骤1002和步骤1003的实施过程分别与上述步骤1202、步骤12-1~12-3类似,此处不再赘述。同时,计算机系统可以根据实际相机坐标系到虚拟相机坐标系的位置变换关系,将上述视线方向标注信息转换为虚拟视线标注信息。The implementation process of the foregoing step 1002 and step 1003 is similar to the foregoing step 1202 and steps 12-1 to 12-3, respectively, and will not be repeated here. At the same time, the computer system can convert the above-mentioned line-of-sight direction labeling information into virtual line-of-sight labeling information according to the position transformation relationship from the actual camera coordinate system to the virtual camera coordinate system.
至此,获得虚拟相机坐标系下的样本集。然后,可以基于该样本集,通过以下步骤进行迭代训练,直到满足用于检测所述三维视线方向的神经网络的训练要求:在步骤1004,将每个所述转正人脸图像样本输入待训练的三维视线方向检测神经网络,获得三维视线方向预测信息;在步骤1005,根据所述三维视线方向预测信息和所述虚拟视线方向标注信息之间的偏差,对所述神经网络进行参数调整,获得用于检测视线方向信息的神经网络。So far, the sample set in the virtual camera coordinate system is obtained. Then, based on the sample set, iterative training can be carried out through the following steps until the training requirements of the neural network for detecting the three-dimensional line of sight direction are met: in step 1004, each of the corrected face image samples is input to the to-be-trained Three-dimensional line-of-sight direction detection neural network to obtain three-dimensional line-of-sight direction prediction information; in step 1005, according to the deviation between the three-dimensional line-of-sight direction prediction information and the virtual line-of-sight direction labeling information, the neural network is parameterized to obtain Neural network for detecting the direction of the line of sight.
本公开实施例中,采用在虚拟相机坐标系下规范化处理后的转正人脸图像作为训练样本数据,可以降低因头部姿态变化而导致的神经网络训练难度,提高用于检测视线方向的神经网络的训练效率。In the embodiment of the present disclosure, the normalized face image processed in the virtual camera coordinate system is used as the training sample data, which can reduce the difficulty of neural network training caused by head posture changes, and improve the neural network used to detect the direction of sight Training efficiency.
作为一个例子,在识别出驾驶员的注视区域后,可以根据该注视区域执行进一步的操作。例如,可以根据注视区域类别检测结果,确定人脸图像对应的人的注意力监控结果。比如,所述的注视区域类别检测结果可以是预设时间段内的注视区域检测类别。示例性的,该注视区域类别检测结果可以是“在预设时间段内,该驾驶员的注视区域一直是区域2”,那么,如果该区域2是右前挡风玻璃,说明该驾驶员 的驾驶较为专心。如果该区域2是副驾驶前方的杂物箱区域,说明该驾驶员很有可能分心了,注意力不集中。As an example, after identifying the gaze area of the driver, further operations can be performed according to the gaze area. For example, the attention monitoring result of the person corresponding to the face image can be determined according to the detection result of the gaze area category. For example, the gaze area category detection result may be the gaze area detection category within a preset time period. Exemplarily, the detection result of the gaze area category may be "During the preset time period, the gaze area of the driver has always been area 2", then, if the area 2 is the right front windshield, it indicates that the driver is driving More attentive. If this area 2 is the glove box area in front of the co-pilot, it means that the driver is likely to be distracted and unable to concentrate.
在检测出注意力监控结果后,可以输出所述注意力监控结果,例如,可以在车辆内的某个显示区域显示“驾驶很专心”。或者,还可以根据所述注意力监控结果输出分心提示信息,通过显示屏醒目显示或语音提示等方式提示驾驶员“请集中注意力驾驶,确保行车安全”。当然,在具体信息输出时,可以输出注意力监控结果和分心提示信息中的至少一种信息。After the attention monitoring result is detected, the attention monitoring result may be output, for example, "driving is very attentive" may be displayed in a certain display area in the vehicle. Alternatively, it is also possible to output a distraction prompt message according to the attention monitoring result, and prompt the driver to "please concentrate on driving and ensure driving safety" through a prominent display on the display screen or voice prompts. Of course, when specific information is output, at least one of the attention monitoring result and the distraction prompt information may be output.
通过根据注视区域类别检测确定人的注意力监控结果或者输出分心提示信息,对于驾驶员注意力监控有着重要的帮助,能够有效检测出驾驶员注意力不集中的情况,及时进行提醒,降低事故发生风险,确保行车安全。By detecting the human attention monitoring results or outputting distraction prompt information according to the category of the gaze area, it is an important help for the driver's attention monitoring, which can effectively detect the driver's inattention, promptly remind and reduce accidents If risks occur, ensure driving safety.
上述示例的描述中,都是以在智能驾驶应用场景下监控驾驶员注意力为例进行说明。除此之外,注视区域的检测还可以有其它许多用途。In the description of the above examples, the monitoring of the driver's attention in the intelligent driving application scenario is taken as an example for description. In addition, the detection of the gaze area can also have many other uses.
例如,可以进行基于注视区域检测的车机交互控制。车辆内可以设置有一些电子设备,如多媒体播放器,可以通过检测车辆内人员的注视区域,根据注视区域的检测结果自动控制该多媒体播放器开启播放功能。For example, vehicle-machine interactive control based on gaze area detection can be performed. Some electronic equipment, such as a multimedia player, can be installed in the vehicle, which can automatically control the multimedia player to start the playback function according to the detection result of the gaze area by detecting the gaze area of the person in the vehicle.
示例性的,通过部署在车辆内的摄像头拍摄得到车内人员(如司机或乘客)的人脸图像,通过预先训练的神经网络检测出注视区域类别检测结果。例如,该检测结果可以是:在一段时间T内,该车内人员的注视区域一直是车辆内的某个多媒体播放器上的“注视开启”选项所在的区域。根据上述检测结果可以确定该车内人员要开启该多媒体播放器,从而可以输出相应的控制指令,控制该多媒体播放器开始进行播放。Exemplarily, the face image of the person (such as the driver or passenger) in the vehicle is captured by a camera deployed in the vehicle, and the detection result of the gaze area category is detected through a pre-trained neural network. For example, the detection result may be: within a period of time T, the gaze area of the person in the vehicle has been the area where the "gaze on" option on a certain multimedia player in the vehicle is located. According to the above detection result, it can be determined that the person in the vehicle wants to turn on the multimedia player, so that corresponding control instructions can be output to control the multimedia player to start playing.
除了与车辆相关的应用之外,还可以包括游戏控制、智能家居设备控制、广告推送等多种应用场景。以智能家居控制为例,可以采集控制人的人脸图像,通过预先训练的神经网络检测出注视区域类别检测结果。例如,该检测结果可以是:在一段时间T内,该控制人的注视区域一直是智能空调上的“注视开启”选项所在的区域。根据上述检测结果可以确定该控制人要启动智能空调,从而可以输出相应的控制指令,控制该空调开启。In addition to vehicle-related applications, it can also include multiple application scenarios such as game control, smart home device control, and advertising push. Taking smart home control as an example, the face image of the control person can be collected, and the gaze area category detection result can be detected through a pre-trained neural network. For example, the detection result may be: within a period of time T, the gaze area of the controller has been the area where the "gaze on" option on the smart air conditioner is located. According to the above detection results, it can be determined that the controller wants to start the smart air conditioner, so that a corresponding control command can be output to control the air conditioner to turn on.
为了便于描述,前述的各方法实施例都被描述为一系列的动作组合。本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制。依据本公开,某些步骤可以采用其他顺序或者同时进行。For ease of description, the foregoing method embodiments are all described as a series of action combinations. Those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. According to the present disclosure, certain steps can be performed in other order or simultaneously.
本公开还可以提供与前述方法实施例相对应的装置及电子设备的实施例。The present disclosure may also provide embodiments of devices and electronic equipment corresponding to the foregoing method embodiments.
图11是根据本公开的示例性实施例的一种注视区域检测装置1100的框图。注视区域检测装置1100可以包括图像获取模块21、视线检测模块22和注视区域检测模块23。FIG. 11 is a block diagram of a gaze area detecting device 1100 according to an exemplary embodiment of the present disclosure. The gaze area detection device 1100 may include an image acquisition module 21, a gaze detection module 22 and a gaze area detection module 23.
图像获取模块21用于获取在预定三维空间内采集到的人脸图像。视线检测模块22用于基于所述人脸图像进行视线检测以得到视线检测结果。在本公开一实施例中,所述视线检测结果可以包括所述人脸图像中人的视线起点信息和视线方向信息。注视区域检测模块23用于利用预先针对所述预定三维空间训练完成的注视区域分类器,根据所述视线检测结果检测所述人脸图像对应的目标注视区域的类别。所述目标注视区域属于预先对所述预定三维空间划分得到的多类定义注视区域之一。The image acquisition module 21 is used to acquire a face image collected in a predetermined three-dimensional space. The sight line detection module 22 is configured to perform sight line detection based on the face image to obtain a sight line detection result. In an embodiment of the present disclosure, the sight line detection result may include the start point information and the sight direction information of the person in the face image. The gaze area detection module 23 is configured to use a gaze area classifier trained in advance for the predetermined three-dimensional space to detect the category of the target gaze area corresponding to the face image according to the line of sight detection result. The target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance.
参见图12,根据本公开的示例性实施例的注视区域检测装置的一种视线检测模块22可以包括:眼睛位置检测子模块221,用于检测所述人脸图像中的眼睛位置;第一起点信息确定子模块222,用于在所述眼睛位置包括双眼的位置的情况下,确定所述双眼的中间位置为所述视线起点信息。Referring to FIG. 12, a line-of-sight detection module 22 of a gaze area detection device according to an exemplary embodiment of the present disclosure may include: an eye position detection sub-module 221 for detecting the eye position in the face image; a first starting point The information determining submodule 222 is configured to determine that the middle position of the eyes is the line of sight starting point information when the eye position includes the positions of the eyes.
参见图13,根据本公开的示例性实施例的注视区域检测装置的另一种视线检测模块22可以包括:眼睛位置检测子模块221,用于检测所述人脸图像中的眼睛位置;第二起点信息确定子模块223,用于在所述眼睛位置包括双眼的位置的情况下,确定所述双眼中的任一只眼睛的位置为所述视线起点信息,或者,在所述眼睛位置包括单眼的位置的情况下,确定所述单眼的位置为所述视线起点信息。Referring to FIG. 13, another line of sight detection module 22 of the gaze area detection device according to an exemplary embodiment of the present disclosure may include: an eye position detection sub-module 221 for detecting the eye position in the face image; second The starting point information determining submodule 223 is configured to determine that the position of any one of the eyes is the line of sight starting point information when the eye position includes the positions of both eyes, or, when the eye position includes a single eye In the case of the position of, the position of the single eye is determined as the start point information of the line of sight.
参见图14,根据本公开的示例性实施例的图12和图13中的眼睛位置检测子模块221可以包括: 姿态检测单元2211,用于检测所述人脸图像中人的头部姿态信息;位置确定单元2212,用于依据所述头部姿态信息确定所述人脸图像中的眼睛位置。Referring to FIG. 14, the eye position detection sub-module 221 in FIGS. 12 and 13 according to an exemplary embodiment of the present disclosure may include: a posture detection unit 2211 for detecting head posture information of the person in the face image; The position determining unit 2212 is configured to determine the position of the eyes in the face image according to the head posture information.
参见图15,根据本公开的示例性实施例的注视区域检测装置的另一种视线检测模块22可以包括:姿态检测子模块22-1,用于检测所述人脸图像中人的头部姿态信息;方向检测子模块22-2,用于基于所述头部姿态信息检测所述人脸图像中人的视线方向信息。Referring to FIG. 15, another line of sight detection module 22 of the gaze area detection device according to an exemplary embodiment of the present disclosure may include: a posture detection sub-module 22-1 for detecting the head posture of the person in the face image Information; direction detection sub-module 22-2, used to detect the line of sight direction information of the person in the face image based on the head posture information.
参见图16,根据本公开的示例性实施例的图15中的姿态检测子模块22-1可以包括:关键点检测单元22-11,用于检测所述人脸图像中的多个人脸关键点;姿态确定单元22-12,用于基于所述人脸关键点和预设平均人脸模型,确定所述人脸图像中人的头部姿态信息。Referring to FIG. 16, the posture detection sub-module 22-1 in FIG. 15 according to an exemplary embodiment of the present disclosure may include: a key point detection unit 22-11 for detecting multiple face key points in the face image The posture determination unit 22-12 is configured to determine the head posture information of the person in the face image based on the key points of the face and a preset average face model.
参见图17,根据本公开的示例性实施例的图15中的方向检测子模块22-2可以包括:图像处理单元22-21,用于根据所述头部姿态信息对所述人脸图像进行规范化处理,获得转正人脸图像;第一方向检测单元22-22,用于基于所述转正人脸图像进行视线方向检测,获得第一检测视线方向;方向确定单元22-23,用于对所述第一检测视线方向进行坐标逆变换处理,获得所述人脸图像中人的视线方向信息。Referring to FIG. 17, the direction detection sub-module 22-2 in FIG. 15 according to an exemplary embodiment of the present disclosure may include: an image processing unit 22-21, configured to perform processing on the face image according to the head posture information Normalized processing to obtain the corrected face image; the first direction detection unit 22-22 is configured to detect the line of sight direction based on the corrected face image to obtain the first detected line of sight direction; the direction determining unit 22-23 is used to check the The first detected line of sight direction is subjected to coordinate inverse transformation processing to obtain the line of sight direction information of the person in the face image.
参见图18,根据本公开的示例性实施例的图17中的图像处理单元22-21可以包括:头部坐标确定子单元22-211,用于根据所述头部姿态信息确定所述人脸图像中人的头部坐标系;坐标变换子单元22-212,用于基于所述头部坐标系对所述人脸图像对应的实际相机坐标系进行旋转及平移,获得虚拟相机坐标系;图像处理子单元22-213,用于根据所述实际相机坐标系与所述虚拟相机坐标系之间的位置变换关系,对所述人脸图像进行规范化处理,获得所述转正人脸图像。Referring to FIG. 18, the image processing unit 22-21 in FIG. 17 according to an exemplary embodiment of the present disclosure may include: a head coordinate determination subunit 22-211 for determining the face according to the head posture information The head coordinate system of the person in the image; coordinate transformation subunits 22-212 are used to rotate and translate the actual camera coordinate system corresponding to the face image based on the head coordinate system to obtain a virtual camera coordinate system; The processing subunits 22-213 are configured to perform normalization processing on the face image according to the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system to obtain the corrected face image.
在本公开上述任一装置实施例中,所述注视区域分类器可以预先基于针对所述预定三维空间的训练样本集训练完成。所述训练样本集可以包括多个视线特征样本,每个所述视线特征样本包括视线起点信息、视线方向信息、以及该视线特征样本对应的注视区域类别的标注信息,标注的注视区域的类别属于针对所述预定三维空间划分的多类定义注视区域之一。In any of the foregoing device embodiments of the present disclosure, the gaze area classifier may be trained in advance based on a training sample set for the predetermined three-dimensional space. The training sample set may include a plurality of gaze feature samples, each of the gaze feature samples includes gaze starting point information, gaze direction information, and label information of the gaze area category corresponding to the gaze feature sample, and the type of the gaze area marked belongs to One of the multiple categories defined for the predetermined three-dimensional space is defined.
图19是根据本公开的示例性实施例的另一种注视区域检测装置1900的框图。与图11所示的注视区域检测装置1100相比,注视区域检测装置1900还可以包括分类器训练模块20。FIG. 19 is a block diagram of another gaze area detecting device 1900 according to an exemplary embodiment of the present disclosure. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 1900 may further include a classifier training module 20.
分类器训练模块20可以包括:类别预测子模块201,用于将至少一个所述视线特征样本的所述视线起点信息和所述视线方向信息输入待训练的注视区域分类器,获得该视线特征样本对应的注视区域类别预测信息;参数调整子模块202,用于根据所述注视区域类别预测信息和该视线特征样本对应的注视区域类别的标注信息之间的偏差,对所述注视区域分类器进行参数调整,以训练所述注视区域分类器。The classifier training module 20 may include: a category prediction sub-module 201, configured to input the gaze starting point information and the gaze direction information of at least one of the gaze feature samples into the gaze area classifier to be trained to obtain the gaze feature sample Corresponding gaze area category prediction information; parameter adjustment sub-module 202 for performing the gaze area classifier based on the deviation between the gaze area category prediction information and the annotation information of the gaze area category corresponding to the gaze feature sample The parameters are adjusted to train the gaze area classifier.
图20是根据本公开的示例性实施例的另一种注视区域检测装置2000的框图。与图11所示的注视区域检测装置1100相比,注视区域检测装置2000还可以包括分类器获取模块203。FIG. 20 is a block diagram of another gaze area detecting device 2000 according to an exemplary embodiment of the present disclosure. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2000 may further include a classifier acquisition module 203.
分类器获取模块203可以根据所述预定三维空间的空间标识从预设注视区域分类器集合中获取所述空间标识对应的注视区域分类器。所述预设注视区域分类器集合可以包括:不同三维空间的空间标识分别对应的注视区域分类器。The classifier obtaining module 203 may obtain the gaze area classifier corresponding to the space identifier from the preset gaze area classifier set according to the space identifier of the predetermined three-dimensional space. The preset gaze area classifier set may include: gaze area classifiers respectively corresponding to the spatial identifiers of different three-dimensional spaces.
在本公开上述任一装置实施例中,所述预定三维空间可以包括车辆空间。相应的,所述人脸图像可以基于针对所述车辆空间中的驾驶区域采集到的图像确定。所述对所述预定三维空间划分得到的多类定义注视区域可以包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮光板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。In any of the foregoing device embodiments of the present disclosure, the predetermined three-dimensional space may include a vehicle space. Correspondingly, the face image may be determined based on the image collected for the driving area in the vehicle space. The multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space may include at least two of the following types: left front windshield area, right front windshield area, instrument panel area, interior rearview mirror area, and center console Area, left rearview mirror area, right rearview mirror area, visor area, shift lever area, area under the steering wheel, co-pilot area, glove box area in front of the co-pilot.
图21是根据本公开的示例性实施例的另一种注视区域检测装置2100的框图。与图11所示的注视区域检测装置1100相比,注视区域检测装置2100还可以包括:注意力监控模块24,用于根据注视区域检测模块23得到的注视区域类别检测结果,确定所述人脸图像对应的人的注意力监控结果;监控结果输出模块25,用于输出所述注意力监控结果和/或根据所述注意力监控结果输出分心提示信息。FIG. 21 is a block diagram of another gaze area detecting device 2100 according to an exemplary embodiment of the present disclosure. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2100 may further include: an attention monitoring module 24, configured to determine the face according to the gaze area category detection result obtained by the gaze area detection module 23 The attention monitoring result of the person corresponding to the image; the monitoring result output module 25 is configured to output the attention monitoring result and/or output distraction prompt information according to the attention monitoring result.
图22是根据本公开的示例性实施例的另一种注视区域检测装置2200的框图。与图11所示的注 视区域检测装置1100相比,注视区域检测装置2200还可以包括:控制指令确定模块26,用于确定与注视区域检测模块23得到的注视区域类别检测结果对应的控制指令;操作控制模块27,用于控制电子设备执行与所述控制指令相应的操作。FIG. 22 is a block diagram of another gaze area detecting device 2200 according to an exemplary embodiment of the present disclosure. Compared with the gaze area detection device 1100 shown in FIG. 11, the gaze area detection device 2200 may further include: a control instruction determination module 26 for determining a control instruction corresponding to the gaze area category detection result obtained by the gaze area detection module 23; The operation control module 27 is configured to control the electronic device to perform operations corresponding to the control instructions.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。本领域普通技术人员在不付出创造性劳动的情况下,可以根据实际的需要选择其中的部分或者全部模块来实现本公开的实施例。For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative, where the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one unit. Locally, or it can be distributed to multiple network units. Those of ordinary skill in the art can select some or all of the modules according to actual needs to implement the embodiments of the present disclosure without creative work.
本公开还可以提供对应于上述的注视区域检测方法的电子设备。图23是根据本公开的一示例性实施例的电子设备2300的框图。例如,电子设备2300可以包括处理器、内部总线、网络接口、内存以及非易失性存储器。处理器可以从非易失性存储器中读取对应的计算机程序到内存中运行,从而在逻辑上形成实现上述注视区域检测方法的注视区域检测装置。The present disclosure may also provide an electronic device corresponding to the above-mentioned gaze area detection method. FIG. 23 is a block diagram of an electronic device 2300 according to an exemplary embodiment of the present disclosure. For example, the electronic device 2300 may include a processor, an internal bus, a network interface, a memory, and a non-volatile memory. The processor can read the corresponding computer program from the non-volatile memory to run in the memory, thereby logically forming a gaze area detection device that implements the above gaze area detection method.
本领域技术人员应明白,本公开可提供为方法、装置、系统或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。Those skilled in the art should understand that the present disclosure can be provided as a method, device, system, or computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware.
本公开还可以提供一种计算机可读存储介质,该存储介质上可以存储有计算机程序,所述计算机程序被处理器执行时,使该处理器实现根据上述任一方法实施例的注视区域检测方法。The present disclosure may also provide a computer-readable storage medium, the storage medium may store a computer program, and when the computer program is executed by a processor, the processor realizes the gaze area detection method according to any of the foregoing method embodiments .
本文中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本文中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本文中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在生成的传播信号(例如机器生成的电、光或电磁信号)上,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。The embodiments of the subject and functional operations described herein can be implemented in the following: digital electronic circuits, tangible computer software or firmware, computer hardware including the structures disclosed herein and their structural equivalents, or one of them Or a combination of multiple. Embodiments of the subject matter described herein may be implemented as one or more computer programs, that is, one or more of computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Modules. Alternatively or in addition, the program instructions may be encoded on the generated propagating signal (such as a machine-generated electrical, optical or electromagnetic signal) that is generated to encode the information and transmit it to a suitable receiver device for data transmission The processing device executes. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
本文中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路例如FPGA(现场可编程门阵列)或ASIC(专用集成电路来执行,并且装置也可以实现为专用逻辑电路。The processing and logic flow described herein can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output. The processing and logic flow can also be executed by a dedicated logic circuit such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
适合用于执行计算机程序的计算机包括例如通用或专用微处理器,或任何其他类型的中央处理单元。通常,中央处理单元将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理单元以及用于存储指令和数据的一个或多个存储器设备。通常,计算机可以包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机可以可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据。此外,计算机可以嵌入在另一设备(例如移动电话机、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备等)中。Computers suitable for executing computer programs include, for example, general-purpose or special-purpose microprocessors, or any other type of central processing unit. Generally, the central processing unit will receive instructions and data from a read-only memory and/or random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer can include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer can be operatively coupled to this mass storage device to receive data from or send data to it. Transfer data. In addition, the computer can be embedded in another device (such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus (USB) ) Flash drives, portable storage devices, etc.).
适合于存储计算机程序指令和数据的计算机可读介质可以包括各种形式的非易失性存储器,例如半导体存储器设备(例如,可擦可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、电可擦可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)和闪存)、磁盘(例如内部硬盘或可移动盘)、磁光盘、光盘只读存储器(Compact Disc Read Only Memory,CD-ROM)、数字多功能光盘(Digital Versatile Disc,DVD)等。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer readable media suitable for storing computer program instructions and data may include various forms of non-volatile memory, such as semiconductor memory devices (for example, Erasable Programmable Read Only Memory (EPROM), electronic Erasable Programmable Read Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM) and flash memory), magnetic disks (such as internal hard disks or removable disks), magneto-optical disks, CD-ROMs (Compact Disc Read Only Memory, CD-ROM) , Digital Versatile Disc (DVD), etc. The processor and the memory can be supplemented by or incorporated into a dedicated logic circuit.
虽然本文包含许多具体实施细节,但是这些不应被解释为限制本公开的范围或所要求保护的范围,而是主要用于描述本公开的具体实施例的特征。在多个实施例中分别描述的某些特征也可以在单个 实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。Although this document contains many specific implementation details, these should not be construed as limiting the scope of the present disclosure or the scope of the claimed protection, but are mainly used to describe the features of specific embodiments of the present disclosure. Certain features described separately in multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. In addition, although features may function in certain combinations as described above and even initially claimed as such, one or more features from the claimed combination may in some cases be removed from the combination, and the claimed The combination of protection can be directed to a sub-combination or a variant of the sub-combination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,上述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, although operations are depicted in a specific order in the drawings, this should not be construed as requiring these operations to be performed in the specific order shown or performed sequentially or requiring all the illustrated operations to be performed to achieve the desired result . In some cases, multitasking and parallel processing may be advantageous. In addition, the separation of various system modules and components in the foregoing embodiments should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can usually be integrated together in a single software product. In, or packaged into multiple software products.
以上所述仅为本公开的一些实施例,并不用以限制本公开。凡在本公开的精神和原则之内所做的任何修改、等同替换、改进等,均应包含在本公开的范围之内。The above descriptions are only some embodiments of the present disclosure, and are not used to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included in the scope of the present disclosure.

Claims (34)

  1. 一种注视区域检测方法,所述方法包括:A method for detecting a gaze area, the method comprising:
    获取在预定三维空间内采集到的人脸图像;Acquiring a face image collected in a predetermined three-dimensional space;
    基于所述人脸图像进行视线检测以得到视线检测结果;Performing line of sight detection based on the face image to obtain a line of sight detection result;
    利用预先针对所述预定三维空间训练完成的注视区域分类器,根据所述视线检测结果检测所述人脸图像对应的目标注视区域的类别,Using a gaze area classifier trained in advance for the predetermined three-dimensional space to detect the category of the target gaze area corresponding to the face image according to the line of sight detection result,
    其中,所述目标注视区域属于预先对所述预定三维空间划分得到的多类定义注视区域之一。Wherein, the target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance.
  2. 根据权利要求1所述的方法,其中,所述视线检测结果包括:所述人脸图像中人的视线起点信息和视线方向信息。The method according to claim 1, wherein the line-of-sight detection result includes: starting point information and line-of-sight direction information of the person in the face image.
  3. 根据权利要求2所述的方法,其中,所述基于所述人脸图像进行视线检测以得到视线检测结果包括:The method according to claim 2, wherein the performing line of sight detection based on the face image to obtain a line of sight detection result comprises:
    检测所述人脸图像中的眼睛位置;Detecting eye positions in the face image;
    在所述眼睛位置包括双眼的位置的情况下,确定所述双眼的中间位置为所述视线起点信息。In a case where the eye position includes the positions of both eyes, it is determined that the middle position of the eyes is the line of sight starting point information.
  4. 根据权利要求2所述的方法,其中,所述基于所述人脸图像进行视线检测以得到视线检测结果包括:The method according to claim 2, wherein the performing line of sight detection based on the face image to obtain a line of sight detection result comprises:
    检测所述人脸图像中的眼睛位置;Detecting eye positions in the face image;
    在所述眼睛位置包括双眼的位置的情况下,确定所述双眼中的任一只眼睛的位置为所述视线起点信息,或者,在所述眼睛位置包括单眼的位置的情况下,确定所述单眼的位置为所述视线起点信息。In the case where the eye position includes the position of both eyes, the position of any one of the eyes is determined as the line-of-sight starting point information, or in the case where the eye position includes the position of a single eye, the position of the eye is determined The position of a single eye is the starting point information of the line of sight.
  5. 根据权利要求3或4所述的方法,其中,所述检测所述人脸图像中的眼睛位置包括:The method according to claim 3 or 4, wherein the detecting the position of the eyes in the face image comprises:
    检测所述人脸图像中人的头部姿态信息;Detecting head posture information of the person in the face image;
    依据所述头部姿态信息确定所述人脸图像中的眼睛位置。The eye position in the face image is determined according to the head posture information.
  6. 根据权利要求2所述的方法,其中,所述基于所述人脸图像进行视线检测以得到视线检测结果包括:The method according to claim 2, wherein the performing line of sight detection based on the face image to obtain a line of sight detection result comprises:
    检测所述人脸图像中人的头部姿态信息;Detecting head posture information of the person in the face image;
    基于所述头部姿态信息检测所述人脸图像中人的视线方向信息。Detecting the line of sight direction information of the person in the face image based on the head posture information.
  7. 根据权利要求5或6所述的方法,其中,所述检测所述人脸图像中人的头部姿态信息包括:The method according to claim 5 or 6, wherein the detecting the head posture information of the person in the face image comprises:
    检测所述人脸图像中的多个人脸关键点;Detecting multiple face key points in the face image;
    基于所述人脸关键点和预设平均人脸模型,确定所述人脸图像中人的头部姿态信息。Based on the key points of the face and a preset average face model, the head posture information of the person in the face image is determined.
  8. 根据权利要求6或7所述的方法,其中,所述基于所述头部姿态信息检测所述人脸图像中人的视线方向信息包括:The method according to claim 6 or 7, wherein the detecting the line of sight direction information of the person in the face image based on the head posture information comprises:
    根据所述头部姿态信息对所述人脸图像进行规范化处理,获得转正人脸图像;Performing normalization processing on the face image according to the head posture information to obtain a normalized face image;
    基于所述转正人脸图像进行视线方向检测,获得第一检测视线方向;Performing line-of-sight direction detection based on the corrected face image to obtain the first detected line-of-sight direction;
    对所述第一检测视线方向进行坐标逆变换处理,获得所述人脸图像中人的视线方向信息。Perform coordinate inverse transformation processing on the first detected line of sight direction to obtain the line of sight direction information of the person in the face image.
  9. 根据权利要求8所述的方法,其中,所述根据所述头部姿态信息对所述人脸图像进行规范化处理,获得转正人脸图像,包括:8. The method according to claim 8, wherein said normalizing said face image according to said head posture information to obtain a normalized face image comprises:
    根据所述头部姿态信息确定所述人脸图像中人的头部坐标系;Determining the head coordinate system of the person in the face image according to the head posture information;
    基于所述头部坐标系对所述人脸图像对应的实际相机坐标系进行旋转及平移,获得虚拟相机坐标系;Rotate and translate the actual camera coordinate system corresponding to the face image based on the head coordinate system to obtain a virtual camera coordinate system;
    根据所述实际相机坐标系与所述虚拟相机坐标系之间的位置变换关系,对所述人脸图像进行规范化处理,获得所述转正人脸图像。According to the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system, normalizing the face image is performed to obtain the corrected face image.
  10. 根据权利要求1-9中任一所述的方法,其中,所述注视区域分类器预先基于针对所述预定三维空间的训练样本集训练完成,其中,所述训练样本集包括多个视线特征样本,每个所述视线特征样本包括 视线起点信息、视线方向信息、以及该视线特征样本对应的注视区域类别的标注信息,标注的注视区域的类别属于针对所述预定三维空间划分的所述多类定义注视区域之一。The method according to any one of claims 1-9, wherein the gaze area classifier is trained in advance based on a training sample set for the predetermined three-dimensional space, wherein the training sample set includes a plurality of gaze feature samples Each of the gaze feature samples includes gaze starting point information, gaze direction information, and annotation information of the gaze area category corresponding to the gaze feature sample, and the type of the gaze area marked belongs to the multiple types divided for the predetermined three-dimensional space Define one of the gaze areas.
  11. 根据权利要求10所述的方法,所述方法还包括:在所述获取在预定三维空间内采集到的人脸图像之前,The method according to claim 10, further comprising: before said acquiring a face image collected in a predetermined three-dimensional space,
    将至少一个所述视线特征样本的所述视线起点信息和所述视线方向信息输入待训练的注视区域分类器,获得该视线特征样本对应的注视区域类别预测信息;Input the gaze starting point information and the gaze direction information of at least one gaze feature sample into a gaze area classifier to be trained to obtain the gaze area category prediction information corresponding to the gaze feature sample;
    根据所述注视区域类别预测信息和该视线特征样本对应的注视区域类别的标注信息之间的偏差,对所述注视区域分类器进行参数调整,以训练所述注视区域分类器。According to the deviation between the gaze area category prediction information and the annotation information of the gaze area category corresponding to the gaze feature sample, the gaze area classifier is adjusted to train the gaze area classifier.
  12. 根据权利要求10所述的方法,所述方法还包括:在所述获取在预定三维空间内采集到的人脸图像之前,根据所述预定三维空间的空间标识从预设注视区域分类器集合中获取所述空间标识对应的注视区域分类器,The method according to claim 10, further comprising: before said acquiring the face image collected in a predetermined three-dimensional space, selecting from a preset gaze area classifier set according to the spatial identifier of the predetermined three-dimensional space Acquiring a gaze area classifier corresponding to the spatial identifier,
    其中,所述预设注视区域分类器集合包括:不同三维空间的空间标识分别对应的注视区域分类器。Wherein, the preset gaze area classifier set includes: gaze area classifiers respectively corresponding to spatial identifiers of different three-dimensional spaces.
  13. 根据权利要求1~12中任一所述的方法,其中,所述预定三维空间包括:车辆空间。The method according to any one of claims 1-12, wherein the predetermined three-dimensional space includes: a vehicle space.
  14. 根据权利要求13所述的方法,其中,The method according to claim 13, wherein:
    所述人脸图像基于针对所述车辆空间中的驾驶区域采集到的图像确定;The face image is determined based on the image collected for the driving area in the vehicle space;
    所述多类定义注视区域包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮光板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。The multiple types of defined gaze areas include at least two of the following: left front windshield area, right front windshield area, instrument panel area, interior mirror area, center console area, left rearview mirror area, and right rear Sight mirror area, visor area, shift lever area, area under the steering wheel, co-pilot area, and glove box area in front of the co-pilot.
  15. 根据权利要求1~14中任一所述的方法,所述方法还包括:The method according to any one of claims 1-14, the method further comprising:
    根据注视区域类别检测结果,确定所述人脸图像对应的人的注意力监控结果;Determine the attention monitoring result of the person corresponding to the face image according to the detection result of the gaze area category;
    输出所述注意力监控结果,和/或,根据所述注意力监控结果输出分心提示信息。Output the attention monitoring result, and/or output distraction prompt information according to the attention monitoring result.
  16. 根据权利要求1~15中任一所述的方法,所述方法还包括:The method according to any one of claims 1-15, the method further comprising:
    确定与注视区域类别检测结果对应的控制指令;Determine the control instruction corresponding to the detection result of the gaze area category;
    控制电子设备执行与所述控制指令相应的操作。The control electronic device executes the operation corresponding to the control instruction.
  17. 一种注视区域检测装置,所述装置包括:A gaze area detection device, the device comprising:
    图像获取模块,用于获取在预定三维空间内采集到的人脸图像;An image acquisition module for acquiring a face image collected in a predetermined three-dimensional space;
    视线检测模块,用于基于所述人脸图像进行视线检测以得到视线检测结果;A line of sight detection module, configured to perform line of sight detection based on the face image to obtain a line of sight detection result;
    注视区域检测模块,用于利用预先针对所述预定三维空间训练完成的注视区域分类器,根据所述视线检测结果检测所述人脸图像对应的目标注视区域的类别,The gaze area detection module is configured to use a gaze area classifier trained in advance for the predetermined three-dimensional space to detect the category of the target gaze area corresponding to the face image according to the line of sight detection result,
    其中,所述目标注视区域属于预先对所述预定三维空间划分得到的多类定义注视区域之一。Wherein, the target gaze area belongs to one of multiple types of defined gaze areas obtained by dividing the predetermined three-dimensional space in advance.
  18. 根据权利要求17所述的装置,其中,所述视线检测结果包括:所述人脸图像中人的视线起点信息和视线方向信息。The device according to claim 17, wherein the line of sight detection result comprises: starting point information and line of sight direction information of the person in the face image.
  19. 根据权利要求18所述的装置,其中,所述视线检测模块包括:The device according to claim 18, wherein the line of sight detection module comprises:
    眼睛位置检测子模块,用于检测所述人脸图像中的眼睛位置;An eye position detection sub-module for detecting the eye position in the face image;
    第一起点信息确定子模块,用于在所述眼睛位置包括双眼的位置的情况下,确定所述双眼的中间位置为所述视线起点信息。The first starting point information determining sub-module is configured to determine the middle position of the two eyes as the line of sight starting point information when the eye position includes the positions of the two eyes.
  20. 根据权利要求18所述的装置,其中,所述视线检测模块包括:The device according to claim 18, wherein the line of sight detection module comprises:
    眼睛位置检测子模块,用于检测所述人脸图像中的眼睛位置;An eye position detection sub-module for detecting the eye position in the face image;
    第二起点信息确定子模块,用于在所述眼睛位置包括双眼的位置的情况下,确定所述双眼中的任一只眼睛的位置为所述视线起点信息,或者,在所述眼睛位置包括单眼的位置的情况下,确定所述单眼的位置为所述视线起点信息。The second starting point information determining sub-module is used to determine that the position of any one of the eyes is the line of sight starting point information when the eye position includes the positions of the eyes, or the eye position includes In the case of the position of a single eye, the position of the single eye is determined as the line of sight starting point information.
  21. 根据权利要求19或20所述的装置,其中,所述眼睛位置检测子模块包括:The device according to claim 19 or 20, wherein the eye position detection sub-module comprises:
    姿态检测单元,用于检测所述人脸图像中人的头部姿态信息;A posture detection unit for detecting head posture information of the person in the face image;
    位置确定单元,用于依据所述头部姿态信息确定所述人脸图像中的眼睛位置。The position determining unit is configured to determine the position of the eyes in the face image according to the head posture information.
  22. 根据权利要求18所述的装置,其中,所述视线检测模块包括:The device according to claim 18, wherein the line of sight detection module comprises:
    姿态检测子模块,用于检测所述人脸图像中人的头部姿态信息;A posture detection sub-module for detecting head posture information of the person in the face image;
    方向检测子模块,用于基于所述头部姿态信息检测所述人脸图像中人的视线方向信息。The direction detection sub-module is configured to detect the line of sight direction information of the person in the face image based on the head posture information.
  23. 根据权利要求22所述的装置,其中,所述姿态检测子模块包括:The device according to claim 22, wherein the posture detection sub-module comprises:
    关键点检测单元,用于检测所述人脸图像中的多个人脸关键点;A key point detection unit for detecting multiple face key points in the face image;
    姿态确定单元,用于基于所述人脸关键点和预设平均人脸模型,确定所述人脸图像中人的头部姿态信息。The posture determination unit is configured to determine the head posture information of the person in the face image based on the key points of the face and a preset average face model.
  24. 根据权利要求22或23所述的装置,其中,所述方向检测子模块包括:The device according to claim 22 or 23, wherein the direction detection sub-module comprises:
    图像处理单元,用于根据所述头部姿态信息对所述人脸图像进行规范化处理,获得转正人脸图像;An image processing unit, configured to perform normalization processing on the face image according to the head posture information to obtain a normalized face image;
    第一方向检测单元,用于基于所述转正人脸图像进行视线方向检测,获得第一检测视线方向;A first direction detection unit, configured to perform line-of-sight direction detection based on the corrected face image to obtain a first detected line-of-sight direction;
    方向确定单元,用于对所述第一检测视线方向进行坐标逆变换处理,获得所述人脸图像中人的视线方向信息。The direction determining unit is configured to perform coordinate inverse transformation processing on the first detected line of sight direction to obtain the line of sight direction information of the person in the face image.
  25. 根据权利要求24所述的装置,其中,所述图像处理单元包括:The device according to claim 24, wherein the image processing unit comprises:
    头部坐标确定子单元,用于根据所述头部姿态信息确定所述人脸图像中人的头部坐标系;A head coordinate determination subunit, configured to determine the head coordinate system of the person in the face image according to the head posture information;
    坐标变换子单元,用于基于所述头部坐标系对所述人脸图像对应的实际相机坐标系进行旋转及平移,获得虚拟相机坐标系;A coordinate transformation subunit, configured to rotate and translate the actual camera coordinate system corresponding to the face image based on the head coordinate system to obtain a virtual camera coordinate system;
    图像处理子单元,用于根据所述实际相机坐标系与所述虚拟相机坐标系之间的位置变换关系,对所述人脸图像进行规范化处理,获得所述转正人脸图像。The image processing subunit is configured to perform normalization processing on the face image according to the position transformation relationship between the actual camera coordinate system and the virtual camera coordinate system to obtain the corrected face image.
  26. 根据权利要求17-25中任一所述的装置,其中,所述注视区域分类器预先基于针对所述预定三维空间的训练样本集训练完成,其中,所述训练样本集包括多个视线特征样本,每个所述视线特征样本包括视线起点信息、视线方向信息、以及该视线特征样本对应的注视区域类别的标注信息,标注的注视区域的类别属于针对所述预定三维空间划分的所述多类定义注视区域之一。The apparatus according to any one of claims 17-25, wherein the gaze area classifier is trained in advance based on a training sample set for the predetermined three-dimensional space, wherein the training sample set includes a plurality of gaze feature samples Each of the gaze feature samples includes gaze starting point information, gaze direction information, and annotation information of the gaze area category corresponding to the gaze feature sample, and the type of the gaze area marked belongs to the multiple types divided for the predetermined three-dimensional space Define one of the gaze areas.
  27. 根据权利要求26所述的装置,所述装置还包括分类器训练模块,所述分类器训练模块包括:The device according to claim 26, the device further comprising a classifier training module, the classifier training module comprising:
    类别预测子模块,用于将至少一个所述视线特征样本的所述视线起点信息和所述视线方向信息输入待训练的注视区域分类器,获得该视线特征样本对应的注视区域类别预测信息;The category prediction sub-module is configured to input the gaze starting point information and the gaze direction information of at least one of the gaze feature samples into the gaze area classifier to be trained to obtain the gaze area category prediction information corresponding to the gaze feature sample;
    参数调整子模块,用于根据所述注视区域类别预测信息和该视线特征样本对应的注视区域类别的标注信息之间的偏差,对所述注视区域分类器进行参数调整,以训练所述注视区域分类器。The parameter adjustment sub-module is used to adjust the parameters of the gaze area classifier according to the deviation between the gaze area category prediction information and the gaze area category label information corresponding to the gaze feature sample to train the gaze area Classifier.
  28. 根据权利要求26所述的装置,所述装置还包括:The device according to claim 26, further comprising:
    分类器获取模块,用于根据所述预定三维空间的空间标识从预设注视区域分类器集合中获取所述空间标识对应的注视区域分类器,The classifier obtaining module is configured to obtain the gaze area classifier corresponding to the space identifier from the preset gaze area classifier set according to the space identifier of the predetermined three-dimensional space,
    其中,所述预设注视区域分类器集合包括:不同三维空间的空间标识分别对应的注视区域分类器。Wherein, the preset gaze area classifier set includes: gaze area classifiers respectively corresponding to spatial identifiers of different three-dimensional spaces.
  29. 根据权利要求17~28中任一所述的装置,其中,所述预定三维空间包括:车辆空间。The device according to any one of claims 17-28, wherein the predetermined three-dimensional space comprises: a vehicle space.
  30. 根据权利要求29所述的装置,其中,The device of claim 29, wherein:
    所述人脸图像基于针对所述车辆空间中的驾驶区域采集到的图像确定;The face image is determined based on the image collected for the driving area in the vehicle space;
    所述多类定义注视区域包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮光板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。The multiple types of defined gaze areas include at least two of the following: left front windshield area, right front windshield area, instrument panel area, interior mirror area, center console area, left rearview mirror area, and right rear Sight mirror area, visor area, shift lever area, area under the steering wheel, co-pilot area, and glove box area in front of the co-pilot.
  31. 根据权利要求17~30中任一所述的装置,所述装置还包括:The device according to any one of claims 17-30, the device further comprising:
    注意力监控模块,用于根据所述注视区域检测模块得到的注视区域类别检测结果,确定所述人脸图像对应的人的注意力监控结果;An attention monitoring module, configured to determine the attention monitoring result of the person corresponding to the face image according to the gaze area category detection result obtained by the gaze area detection module;
    监控结果输出模块,用于输出所述注意力监控结果,和/或,根据所述注意力监控结果输出分心提示信息。The monitoring result output module is used to output the attention monitoring result, and/or output distraction prompt information according to the attention monitoring result.
  32. 根据权利要求17~31中任一所述的装置,所述装置还包括:The device according to any one of claims 17 to 31, the device further comprising:
    控制指令确定模块,用于确定与所述注视区域检测模块得到的注视区域类别检测结果对应的控制指令;A control instruction determination module, configured to determine a control instruction corresponding to the gaze area category detection result obtained by the gaze area detection module;
    操作控制模块,用于控制电子设备执行与所述控制指令相应的操作。The operation control module is used to control the electronic device to execute the operation corresponding to the control instruction.
  33. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,使该处理器实现根据权利要求1~16中任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the processor realizes the method according to any one of claims 1-16.
  34. 一种电子设备,包括存储器和处理器,其中,所述存储器上存储有计算机程序,所述处理器在执行所述计算机程序时,实现根据权利要求1~16中任一项所述的方法。An electronic device comprising a memory and a processor, wherein a computer program is stored on the memory, and the processor implements the method according to any one of claims 1-16 when the computer program is executed.
PCT/CN2019/127833 2019-03-18 2019-12-24 Method and apparatus for detecting gaze area and electronic device WO2020186867A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021540793A JP7244655B2 (en) 2019-03-18 2019-12-24 Gaze Area Detection Method, Apparatus, and Electronic Device
KR1020217022187A KR20210104107A (en) 2019-03-18 2019-12-24 Gaze area detection method, apparatus and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910204793.1 2019-03-18
CN201910204793.1A CN111723828B (en) 2019-03-18 2019-03-18 Gaze area detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2020186867A1 true WO2020186867A1 (en) 2020-09-24

Family

ID=72519550

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127833 WO2020186867A1 (en) 2019-03-18 2019-12-24 Method and apparatus for detecting gaze area and electronic device

Country Status (4)

Country Link
JP (1) JP7244655B2 (en)
KR (1) KR20210104107A (en)
CN (1) CN111723828B (en)
WO (1) WO2020186867A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329718A (en) * 2020-11-26 2021-02-05 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN112434741A (en) * 2020-11-25 2021-03-02 杭州盛世传奇标识系统有限公司 Method, system, device and storage medium for using interactive introduction identifier
CN112580522A (en) * 2020-12-22 2021-03-30 北京每日优鲜电子商务有限公司 Method, device and equipment for detecting sleeper and storage medium
CN112733740A (en) * 2021-01-14 2021-04-30 深圳数联天下智能科技有限公司 Attention information generation method and device, terminal equipment and storage medium
CN113115086A (en) * 2021-04-16 2021-07-13 安乐 Method for collecting elevator media viewing information based on video sight line identification
CN113692371A (en) * 2021-06-30 2021-11-23 华为技术有限公司 Target position determining method, determining device and determining system
CN114677476A (en) * 2022-03-30 2022-06-28 北京字跳网络技术有限公司 Face processing method and device, computer equipment and storage medium
CN114967935A (en) * 2022-06-29 2022-08-30 深圳职业技术学院 Interaction method and device based on sight estimation, terminal equipment and storage medium
CN115761871A (en) * 2022-12-01 2023-03-07 北京中科睿医信息科技有限公司 Detection image generation method, device, equipment and medium based on eye movement detection
CN116030512A (en) * 2022-08-04 2023-04-28 荣耀终端有限公司 Gaze point detection method and device

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112308006A (en) * 2020-11-10 2021-02-02 深圳地平线机器人科技有限公司 Sight line area prediction model generation method and device, storage medium and electronic equipment
WO2022141114A1 (en) * 2020-12-29 2022-07-07 深圳市大疆创新科技有限公司 Line-of-sight estimation method and apparatus, vehicle, and computer-readable storage medium
CN112766097B (en) * 2021-01-06 2024-02-13 中国科学院上海微系统与信息技术研究所 Sight line recognition model training method, sight line recognition device and sight line recognition equipment
CN113627267A (en) * 2021-07-15 2021-11-09 中汽创智科技有限公司 Sight line detection method, device, equipment and medium
CN113569785A (en) * 2021-08-04 2021-10-29 上海汽车集团股份有限公司 Driving state sensing method and device
CN113807330B (en) * 2021-11-19 2022-03-08 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Three-dimensional sight estimation method and device for resource-constrained scene
KR20230101580A (en) * 2021-12-29 2023-07-06 삼성전자주식회사 Eye tracking method, apparatus and sensor for determining sensing coverage based on eye model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106891811A (en) * 2017-03-15 2017-06-27 黄建平 A kind of automobile display system
CN107878326A (en) * 2016-09-30 2018-04-06 法乐第(北京)网络科技有限公司 Vehicle parking assistance device and vehicle drive auxiliary control method
US20180354509A1 (en) * 2017-06-08 2018-12-13 Daqri, Llc Augmented reality (ar) visualization of advanced driver-assistance system
CN109080641A (en) * 2017-06-08 2018-12-25 丰田自动车株式会社 Drive consciousness estimating device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106293031B (en) * 2015-06-04 2019-05-21 北京智谷睿拓技术服务有限公司 Information processing method, information processing unit and user equipment
CN107590482A (en) * 2017-09-29 2018-01-16 百度在线网络技术(北京)有限公司 information generating method and device
CN107679490B (en) * 2017-09-29 2019-06-28 百度在线网络技术(北京)有限公司 Method and apparatus for detection image quality
CN108875524B (en) * 2018-01-02 2021-03-02 北京旷视科技有限公司 Sight estimation method, device, system and storage medium
CN108171218A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of gaze estimation method for watching network attentively based on appearance of depth

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107878326A (en) * 2016-09-30 2018-04-06 法乐第(北京)网络科技有限公司 Vehicle parking assistance device and vehicle drive auxiliary control method
CN106891811A (en) * 2017-03-15 2017-06-27 黄建平 A kind of automobile display system
US20180354509A1 (en) * 2017-06-08 2018-12-13 Daqri, Llc Augmented reality (ar) visualization of advanced driver-assistance system
CN109080641A (en) * 2017-06-08 2018-12-25 丰田自动车株式会社 Drive consciousness estimating device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434741A (en) * 2020-11-25 2021-03-02 杭州盛世传奇标识系统有限公司 Method, system, device and storage medium for using interactive introduction identifier
CN112329718A (en) * 2020-11-26 2021-02-05 北京沃东天骏信息技术有限公司 Method and apparatus for generating information
CN112580522A (en) * 2020-12-22 2021-03-30 北京每日优鲜电子商务有限公司 Method, device and equipment for detecting sleeper and storage medium
CN112733740A (en) * 2021-01-14 2021-04-30 深圳数联天下智能科技有限公司 Attention information generation method and device, terminal equipment and storage medium
CN112733740B (en) * 2021-01-14 2024-05-28 深圳数联天下智能科技有限公司 Attention information generation method and device, terminal equipment and storage medium
CN113115086B (en) * 2021-04-16 2023-09-19 浙江闪链科技有限公司 Method for collecting elevator media viewing information based on video line-of-sight identification
CN113115086A (en) * 2021-04-16 2021-07-13 安乐 Method for collecting elevator media viewing information based on video sight line identification
CN113692371A (en) * 2021-06-30 2021-11-23 华为技术有限公司 Target position determining method, determining device and determining system
CN114677476A (en) * 2022-03-30 2022-06-28 北京字跳网络技术有限公司 Face processing method and device, computer equipment and storage medium
CN114967935B (en) * 2022-06-29 2023-04-07 深圳职业技术学院 Interaction method and device based on sight estimation, terminal equipment and storage medium
CN114967935A (en) * 2022-06-29 2022-08-30 深圳职业技术学院 Interaction method and device based on sight estimation, terminal equipment and storage medium
CN116030512A (en) * 2022-08-04 2023-04-28 荣耀终端有限公司 Gaze point detection method and device
CN116030512B (en) * 2022-08-04 2023-10-31 荣耀终端有限公司 Gaze point detection method and device
CN115761871B (en) * 2022-12-01 2023-08-11 北京中科睿医信息科技有限公司 Detection image generation method, device, equipment and medium based on eye movement detection
CN115761871A (en) * 2022-12-01 2023-03-07 北京中科睿医信息科技有限公司 Detection image generation method, device, equipment and medium based on eye movement detection

Also Published As

Publication number Publication date
JP2022517254A (en) 2022-03-07
CN111723828B (en) 2024-06-11
KR20210104107A (en) 2021-08-24
CN111723828A (en) 2020-09-29
JP7244655B2 (en) 2023-03-22

Similar Documents

Publication Publication Date Title
WO2020186867A1 (en) Method and apparatus for detecting gaze area and electronic device
CN112590794B (en) Method and device for determining an estimated value of the ability of a vehicle driver to take over vehicle control
EP3033999B1 (en) Apparatus and method for determining the state of a driver
Seshadri et al. Driver cell phone usage detection on strategic highway research program (SHRP2) face view videos
CN110765807B (en) Driving behavior analysis and processing method, device, equipment and storage medium
US9881221B2 (en) Method and system for estimating gaze direction of vehicle drivers
CN111566612A (en) Visual data acquisition system based on posture and sight line
García et al. Driver monitoring based on low-cost 3-D sensors
WO2020177480A1 (en) Vehicle accident identification method and apparatus, and electronic device
WO2019184573A1 (en) Passenger-related item loss mitigation
JP2019040465A (en) Behavior recognition device, learning device, and method and program
US9606623B2 (en) Gaze detecting apparatus and method
CN110826370B (en) Method and device for identifying identity of person in vehicle, vehicle and storage medium
WO2020231401A1 (en) A neural network for head pose and gaze estimation using photorealistic synthetic data
US20220180109A1 (en) Devices and methods for monitoring drivers of vehicles
CN111027506B (en) Method and device for determining sight direction, electronic equipment and storage medium
Martin et al. Real time driver body pose estimation for novel assistance systems
US11062141B2 (en) Methods and apparatuses for future trajectory forecast
Shirpour et al. A probabilistic model for visual driver gaze approximation from head pose estimation
CN112926364A (en) Head posture recognition method and system, automobile data recorder and intelligent cabin
US20140368644A1 (en) Apparatus and method for tracking driver attentiveness using vector
US20230109171A1 (en) Operator take-over prediction
WO2023220916A1 (en) Part positioning method and apparatus
Armingol Peláez, GA, García, F., Escalera, A., & Armingol, JM (2014). Driver Monitoring Based on Low-Cost 3-D Sensors. Transactions on Intelligent Transportation Systems, 15 (4), pp. 11687-11708.
Ordonez-Hurtado et al. Enabling the Evaluation of Driver Physiology Via Vehicle Dynamics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19920034

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021540793

Country of ref document: JP

Kind code of ref document: A

Ref document number: 20217022187

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19920034

Country of ref document: EP

Kind code of ref document: A1