WO2020186883A1 - 注视区域检测及神经网络训练的方法、装置和设备 - Google Patents
注视区域检测及神经网络训练的方法、装置和设备 Download PDFInfo
- Publication number
- WO2020186883A1 WO2020186883A1 PCT/CN2019/129893 CN2019129893W WO2020186883A1 WO 2020186883 A1 WO2020186883 A1 WO 2020186883A1 CN 2019129893 W CN2019129893 W CN 2019129893W WO 2020186883 A1 WO2020186883 A1 WO 2020186883A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- area
- image
- category
- neural network
- face image
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
- G06V20/597—Recognising the driver's state or behaviour, e.g. attention or drowsiness
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
Definitions
- the present disclosure relates to computer vision technology, and in particular to a method, device and equipment for gaze area detection and neural network training.
- An artificial intelligence product that has attracted attention is used to monitor the driving state of the driver, for example, whether the driver is distracted while driving, so as to promptly remind the driver when the driver is monitored to reduce the risk of accident.
- a first aspect of the present disclosure provides a training method of a neural network for gaze area detection, the method comprising: inputting at least a face image as a training sample and its corresponding gaze area category annotation information into the neural network, wherein, The marked gaze area category belongs to one of multiple types of defined gaze areas obtained by dividing the designated space area in advance; the neural network performs feature extraction on the input face image, and determines the face image according to the extracted features Gaze area category prediction information; determine the difference between the gaze area category prediction information and the gaze area category label information; adjust the parameters of the neural network based on the difference.
- a second aspect of the present disclosure provides a gaze area detection method, the method includes: intercepting a face area in an image collected in a designated space area to obtain a face image; and inputting the face image into a neural network, wherein, the neural network is trained in advance using a training sample set that includes a plurality of face image samples and their respective corresponding gaze area category label information, and the gaze area category labeled belongs to a multi-class definition obtained by dividing the designated space area in advance.
- One of the gaze areas; the neural network performs feature extraction on the input face image, and the gaze area detection category corresponding to the face image is determined according to the extracted features.
- a third aspect of the present disclosure provides a training device for a neural network for gaze area detection.
- the device includes: a sample input module for inputting at least a face image as a training sample and its corresponding gaze area category label information into a place The neural network, wherein the marked gaze area category belongs to one of the multiple types of defined gaze areas obtained by dividing the designated space area in advance; the category prediction module is used for feature extraction of the input face image via the neural network , And determine the gaze area category prediction information of the face image according to the extracted features; a difference determination module for determining the difference between the gaze area category prediction information and the gaze area category label information; a parameter adjustment module, For adjusting the parameters of the neural network based on the difference.
- a fourth aspect of the present disclosure provides a gaze area detection device, the device includes: an image acquisition module for intercepting a face area in an image collected in a designated space area to obtain a face image; an image input module, It is used to input the face image into a neural network, where the neural network is trained in advance using a training sample set including a plurality of face image samples and their respective gaze area category label information, and the gaze area category labeled belongs to the pre- One of the multiple types of defined gaze regions obtained by dividing the designated space area; a category detection module for extracting features of the input face image via the neural network, and determining the face based on the extracted features The gaze area detection category corresponding to the image.
- a fifth aspect of the present disclosure provides a training device for a neural network for gaze area detection.
- the device includes a memory and a processor, wherein the memory stores computer instructions executable by the processor, and the processor is When the computer instructions are executed, the training method of the neural network for gaze area detection according to the first aspect of the present disclosure is realized.
- a sixth aspect of the present disclosure provides a gaze area detection device, the device includes a memory and a processor, wherein the memory stores computer instructions executable by the processor, and the processor is executing the computer instructions At the time, the gaze area detection method according to the second aspect of the present disclosure is realized.
- a seventh aspect of the present disclosure provides a computer-readable storage medium on which a computer program is stored.
- the processor realizes the gaze area detection neuron according to the first aspect of the present disclosure.
- the training method of the network, and/or enables the processor to implement the gaze area detection method according to the second aspect of the present disclosure.
- a neural network is trained by using a face image as a training sample and its corresponding gaze area category annotation information, so that the gaze area corresponding to the face image can be directly predicted according to the neural network.
- Fig. 1 is a flowchart of a training method of a neural network for gaze area detection according to an embodiment of the present disclosure
- FIG. 2 is a schematic diagram of multiple gaze areas predefined in a vehicle driver's attention monitoring scenario according to an embodiment of the present disclosure
- FIG. 3 illustrates an example of a neural network structure to which the embodiment of the present disclosure can be applied
- FIG. 4 illustrates a configuration for training a neural network according to an embodiment of the present disclosure
- FIG. 5 illustrates a configuration for training a neural network according to another embodiment of the present disclosure
- Fig. 6 is a flowchart of a neural network training method corresponding to the configuration in Fig. 5;
- FIG. 7 is a schematic diagram of obtaining an eye image according to an embodiment of the present disclosure.
- Fig. 8 is a flowchart of a neural network training method according to another embodiment of the present disclosure.
- FIG. 9 illustrates a configuration corresponding to the neural network training method shown in FIG. 8.
- FIG. 10 is a flowchart of a method for detecting a gaze area according to an embodiment of the present disclosure
- Fig. 11 is a schematic diagram of a neural network application scenario according to an embodiment of the present disclosure.
- FIG. 12 illustrates an example of the gaze area detection category output by the neural network in the application scenario shown in FIG. 11;
- FIG. 13 is a block diagram of a training device for a neural network for gaze area detection according to an embodiment of the present disclosure
- Fig. 14 is a block diagram of a gaze area detecting device according to an embodiment of the present disclosure.
- FIG. 15 is a block diagram of a gaze area detecting device according to another embodiment of the present disclosure.
- 16 is a block diagram of a training device for a neural network for gaze area detection according to an embodiment of the present disclosure
- Fig. 17 is a block diagram of a gaze area detecting device according to an embodiment of the present disclosure.
- first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
- word “if” as used herein can be interpreted as "when” or “when” or “in response to”.
- the embodiment of the present disclosure provides a training method of a neural network for gaze area detection. As shown in Fig. 1, the training method may include steps 100-106.
- the neural network may include, for example, a convolutional neural network, a deep neural network, and the like.
- the face image may be an image collected in a specific gaze area detection scene.
- gaze area detection For example, by detecting the person’s gaze area to automatically learn the person’s intention to control smart devices, by detecting the person’s gaze area to get people’s preferences or wishes, and judging by detecting the driver’s gaze area The driver’s driving concentration, etc.
- the face image of the target person in the scene can be collected.
- the marked gaze area category belongs to one of multiple types of defined gaze areas obtained by dividing the designated space area in advance.
- a space area can be pre-designated.
- the gaze area detection corresponding to the face image is to detect which position in the designated space area the area where the person gazes in the face image is.
- Different gaze positions may have different meanings. For example, different gaze positions may indicate different driving concentration of the driver; for another example, different gaze positions may indicate different intentions of the target person.
- the designated space area can be divided into a plurality of different sub-areas, and each sub-areas can be called a gaze area.
- these gaze areas can also be distinguished by different identifiers, for example, gaze area A, gaze area B; or gaze area 5, gaze area 6, and so on.
- the above-listed A, B, 5, 6, etc. can all be called the gaze area category.
- the definition of the gaze area category can facilitate the training of the neural network, and the pre-labeled category can be used as a label for training and testing.
- step 102 feature extraction is performed on the input face image via the neural network, and the gaze area category prediction information of the face image is determined according to the extracted features.
- the features extracted by the neural network from the input face image include various image features of the face image.
- the gaze area category prediction information of the face image may be output according to the extracted features, which may be a pre-defined gaze area category.
- the category can be represented by letters or numbers.
- the output gaze area category prediction information is "5", that is, gaze area 5.
- step 104 the difference between the gaze area category prediction information and the gaze area category label information corresponding to the face image is determined.
- a loss function can be used to determine the difference between the gaze area category prediction information and the gaze area category label information.
- the parameters of the neural network are adjusted based on the difference.
- the parameters of the neural network can be adjusted through the gradient back propagation method.
- the neural network is trained by using the face image as a training sample and its corresponding gaze area category annotation information, so that the gaze area corresponding to the face image can be directly predicted based on the neural network. Even if the driver's line of sight is slightly shifted or changed, it will not affect the detection result, which can improve the fault tolerance of the detection.
- the training method of the neural network for detecting the gaze area will be described in more detail.
- the following describes the training method by taking a vehicle driver attention monitoring scene as an example, where the face image input to the neural network is determined based on the image collected for the driving area in the space area of the vehicle. For example, an image of the driving area can be collected, and the face area in the image can be cropped to obtain the face image of the vehicle driver.
- the pre-defined gaze areas are multiple areas that the driver may gaze at while driving.
- the same training method can also be applied to other scenes.
- the difference is that the face image input to the neural network can vary with the application scene, and the designated space where the gaze area is located in different scenes
- the region may also be different.
- the designated space area can be the space area of the vehicle, or other space areas, such as the space where a certain smart device is located; even the space area of the vehicle, in the non-driver attention monitoring scene, it can be Vehicle space areas other than the area illustrated in FIG. 2.
- the gaze area of the driver may refer to the area currently gazing at by the driver among the multiple types of defined gaze areas obtained by dividing the designated space area in advance.
- the designated space area can be determined according to the vehicle structure, and can be divided into multiple gaze areas.
- the multiple gaze areas can be defined as different gaze area categories, and each category is represented by a corresponding identifier. For example, the category that defines a certain gaze area is B.
- FIG. 2 illustrates a plurality of pre-defined gaze areas in a vehicle driver's attention monitoring scene according to an embodiment of the present disclosure.
- the multiple gaze areas may include the left front windshield 21, the right front windshield 22, the instrument panel 23, the left rearview mirror 24, the right rearview mirror 25, the interior rearview mirror 26, and the center console 27.
- Sun visor 28, shift lever 29, under the steering wheel 30, the passenger area, the glove box area in front of the passenger, etc. It should be noted that the above are only exemplary. According to actual needs, the number of gaze areas can be increased or decreased, and the range of the gaze area can be zoomed.
- the driver’s gaze area is usually mainly on the front windshield 21, and if it is detected within a period of time The driver's gaze area has been concentrated on the dashboard 23, and it can be determined that the driver is distracted.
- an end-to-end neural network for detecting the gaze area can be provided, and the neural network can be used to detect the gaze area of the driver in the vehicle.
- the input of the neural network can be the driver's face image collected by the camera, and the neural network can directly output the identification of the driver's gaze area. For example, if the neural network detects that the driver's gaze area is the right front windshield 22, the neural network can directly output the identification of the right front windshield 22, such as "B". This end-to-end neural network can more quickly detect the driver's gaze area.
- a sample set Before training the neural network, a sample set may be prepared first, and the sample set may include: training samples for training the neural network and test samples for testing the neural network.
- each gaze area to be detected can be predetermined.
- the ten gaze areas shown in FIG. 2 may be predetermined.
- the purpose of training the neural network is to enable the neural network to automatically detect which of the ten gaze areas the input driver's face image corresponds to.
- corresponding identifications can be assigned to the above ten gaze areas, for example, the shift lever identification "A", the right front windshield identification "B", etc., which are used to facilitate subsequent neural network training and testing.
- the above-mentioned identification may also be referred to as the "category" of the gaze area in the subsequent description.
- the collected person can be instructed to sit in the driver's position in the vehicle and look at the above ten gaze areas in turn. Whenever the collected person gazes at one of the gaze areas, the driver's face image corresponding to the gaze area can be collected through the camera installed in the vehicle. For each gaze area, multiple facial images of the collected person can be collected.
- each gaze area category label information of the corresponding face image that is, each face image is An image collected when the driver looks at the gaze area corresponding to the category labeling information.
- a large number of collected samples can be divided into a training set and a test set.
- the training samples in the training set are used to train the neural network, and the test samples in the test set are used to test the neural network.
- Each training sample may include: a face image of the driver and the gaze area category label information corresponding to the face image.
- a neural network for detecting the driver's gaze area can be trained.
- the neural network may be a convolutional neural network (Convolutional Neural Networks, CNN) or a deep neural network.
- the neural network may include a convolutional layer (Convolutional Layer), a pooling layer (Pooling Layer), a modified linear unit (Rectified Linear Unit, ReLU) layer, a fully connected layer (Fully Connected Layer) and other network units, where The aforementioned network units are stacked in a certain way.
- Fig. 3 illustrates an example of a network structure of CNN 300 to which the embodiments of the present disclosure can be applied.
- the CNN 300 can extract features from the input image 302 through the feature extraction layer 301.
- the feature extraction layer 301 may, for example, include multiple convolutional layers and pooling layers that are alternately connected together. Each convolution layer can extract different features in the image through multiple convolution kernels to obtain a feature map (Feature Map) 303. Each pooling layer is located after the corresponding convolutional layer, and the feature map can be locally averaged and down-sampled to reduce the resolution of the feature map. As the number of convolutional layers and pooling layers increases, the number of feature maps gradually increases, and the resolution of the feature maps gradually decreases.
- a feature vector 304 can be obtained as the input vector of the fully connected layer 305.
- the fully connected layer 305 can convert the feature vector 304 into the input vector 306 of the classifier through multiple hidden layers. Since the CNN is trained to detect which gaze area corresponds to the input image 302, the fully connected layer 305 finally outputs a classification vector 307 through the classifier.
- the classification vector 307 includes the probability that the input image corresponds to each gaze area.
- the number of elements included in the input vector 306 is the same as the number of elements in the classification vector 307, and both are the number of gaze regions to be detected.
- some parameters can be set. For example, the number of convolutional layers and pooling layers included in the feature extraction layer 301 can be set, the number of convolution kernels used by each convolution layer can be set, and the size of the convolution kernel can also be set.
- self-learning can be carried out through the iterative training of the CNN network.
- the specific CNN network training method can adopt the conventional training method, which will not be described in detail.
- neural network training can be started. Several example ways of training a neural network for detecting the driver's gaze area will be described below.
- FIG. 4 illustrates a configuration for training a neural network according to an embodiment of the present disclosure, where the structure of the CNN network may be as shown in FIG. 3, and the face image in the training sample may be input to the CNN network.
- the face image may be obtained based on the driver's upper body image collected by a camera installed in the vehicle.
- the upper body image may be an image with a relatively large shooting range, for example, it may involve the face, shoulders, neck and other parts.
- the upper body image can be cropped into a face image mainly including the driver's face through face detection.
- the neural network can extract image features from the input face image, and output the category prediction information of the gaze area corresponding to the face image based on the image feature, that is, it is collected when the face image is predicted to which category the driver is gazing at. of.
- the gaze area corresponding to the face image is one of multiple gaze areas pre-divided according to the structure of the vehicle on which the driver rides, and the category is used as an identifier of the gaze area.
- the CNN network can output a classification vector, which can include the probability that the input image corresponds to each gaze area.
- a classification vector can include the probability that the input image corresponds to each gaze area.
- "A”, “B”, “C”... “J” represent the categories of ten fixation areas, and "0.2” means “the probability of the input image corresponding to the fixation area A is 20%” , “0.4” means “the probability that the input image corresponds to the gaze area J is 40%”. Assuming that J corresponds to the highest probability, then "J" will be the category prediction information of the gaze area obtained by the CNN network of the face image input this time.
- the pre-labeled gaze area category labeling information corresponding to the face image is C
- the loss value of the loss function can be obtained according to the difference between the category prediction information and the category annotation information.
- the training samples can be divided into multiple image batches for iterative training of the neural network. Input a subset of images to the neural network during each iteration of training. For each training sample in the input image subset, the neural network outputs the category prediction results, and feeds back the loss value to the neural network to adjust the parameters of the neural network, such as adjusting the weight of the fully connected layer, the value of the convolution kernel and other parameters . After this iteration training is completed, the next image subset can be input to the neural network for the next iteration training.
- the training samples included in different image subsets are at least partially different.
- the trained CNN network can be obtained as a neural network for detecting the driver's gaze area.
- the predetermined training termination condition for example, may be that the loss value is lower than a certain threshold, or the predetermined number of iteration training is reached.
- the neural network trained according to this embodiment can take the driver’s face image as input, and output the gaze area detection category corresponding to the face image, so that the driver’s gaze area can be quickly detected, which facilitates subsequent judgments based on the gaze area Whether the driver is distracted.
- the input configuration of the neural network is adjusted in this embodiment.
- the input of the neural network may include: a face image and an eye image.
- the eye image can be cropped from the face image.
- the key points of the face can be detected from the face image, for example, key points of the eyes, key points of the nose, key points of the eyebrows, etc.
- the face image can be cropped according to the detected key points to obtain an eye image, which mainly includes the eyes of the driver.
- the eye image may include at least one of a left eye image and a right eye image.
- the input of the neural network may include a human face image and a left eye image, or a human face image and a right eye image, or a human face image, a left eye image, and a right eye image.
- the simultaneous input of the face image and the left and right eye images is taken as an example.
- the neural network can learn the features of the face and eyes at the same time, increasing the diversity of features and the representation ability, so that the trained neural network can detect gaze more accurately Regional category.
- Fig. 6 is a flowchart of a neural network training method corresponding to the configuration in Fig. 5. As shown in FIG. 6, the training method may include steps 600-612.
- step 600 key points of the face in the face image, such as key points of the eyes, are detected.
- step 602 the face image is cropped according to the key points of the face to obtain an eye image including the eyes of the person in the face image.
- the eye image includes the eyes of the driver.
- the eye image may include the left eye image and the right eye image of the driver.
- FIG. 7 illustrates the left eye image 72 and the right eye image 73 obtained by cropping the face image 71.
- step 604 the face image and the eye image are adjusted to the same predetermined size.
- step 606 the resized face image and eye image are simultaneously input to the same feature extraction layer of the same neural network.
- step 608 the feature extraction layer of the neural network simultaneously extracts the features in the face image and the features in the eye image to obtain the extracted feature vector, the feature vector includes the feature in the face image and the eye image Features in.
- the feature extraction layer of CNN can learn the features of the face and the features of the left and right eyes at the same time, and extract the feature vector including the features of the face image and the eye image.
- a CNN can extract multiple feature maps through multiple convolutional layers, pooling layers, etc., the multiple feature maps include face image features and eye image features, and the results are obtained according to the multiple feature maps.
- the feature vector is a feature vector that maps to the feature vector.
- step 610 the driver's gaze area category prediction information is determined according to the feature vector.
- the feature vector can be converted into an intermediate vector through the fully connected layer in the CNN, and the number of dimensions of the intermediate vector is the same as the number of categories of the gaze area.
- the probability of the driver's face image corresponding to each category of the gaze area can be calculated by a classification algorithm based on the intermediate vector, and the category corresponding to the maximum probability can be used as the category prediction information.
- the intermediate vector may be the input vector 306 of the classifier, for example.
- step 612 the parameters of the neural network are adjusted based on the difference between the category prediction information and the category annotation information corresponding to the face image.
- the loss value of the loss function of the training sample can be calculated based on the difference between the category prediction information and the category labeling information, and the parameters of the CNN can be adjusted based on the loss value of each loss function of a set of training samples.
- the face image and the eye image can be used as the input of the neural network at the same time, so that the neural network can learn the features of the face and the eyes at the same time. Since the feature of the eye is a very relevant part of attention detection, combining the face image and the eye image can strengthen the characterization ability of the extracted features in terms of attention, thereby improving the detection accuracy of the gaze area category of the neural network.
- Fig. 8 is a flowchart of a neural network training method according to another embodiment of the present disclosure
- Fig. 9 illustrates a configuration corresponding to the neural network training method.
- the training method may include steps 800-812.
- step 800 key points of the face in the face image, such as key points of eyes, are detected.
- the face image is cropped according to the face key points (such as eye key points) to obtain an eye image including the eyes of the person in the face image.
- the obtained eye image may include a left eye image and/or a right eye image.
- step 804 the face image, the left eye image, and/or the right eye image are simultaneously input to the corresponding feature extraction branch of the neural network.
- the face image and the eye image without size adjustment can be input into the corresponding feature extraction branch of the neural network, namely ,
- the size of the face image and eye image input to the neural network may be different.
- the face image, the left eye image, and the right eye image can be input into the first feature extraction branch, the second feature extraction branch, and the third feature extraction branch, respectively, where the left eye image and the right eye image
- the size of the image may be the same, and the size of the face image is larger than the size of the left eye image and the right eye image.
- each of the three feature extraction branches may include multiple convolutional layers, pooling layers, etc. for extracting image features.
- the structures of the three feature extraction branches may be the same or different, for example, may include different volumes. The number of layers, or the number of different convolution kernels.
- a feature extraction branch of the neural network extracts the features in the face image to obtain the extracted face feature vector; in addition, other feature extraction branches of the neural network extract the features in the eye image to obtain the extracted eyes Feature vector.
- the above three feature extraction branches can learn the features in each image separately.
- the first feature extraction branch can be extracted from the face image to the face feature vector 91
- the second feature extraction branch can be extracted from the left
- the eye image is extracted to the left eye feature vector 92
- the third feature extraction branch can be extracted from the right eye image to the right eye feature vector 93.
- Both the left eye feature vector 92 and the right eye feature vector 93 can be called eye feature vectors.
- the face feature vector and the eye feature vector are fused to obtain a fusion feature vector, that is, a fusion feature.
- a fusion feature vector that is, a fusion feature.
- the face feature vector 91, the left eye feature vector 92, and the right eye feature vector 93 can be fused to obtain the fused feature vector 94.
- the feature vector fusion can be a combination of multiple vectors in any order.
- step 810 the driver's gaze area category prediction information is obtained according to the fusion feature vector.
- the fusion feature vector can be transformed into an intermediate vector through the fully connected layer in the CNN, and the number of dimensions of the intermediate vector is the same as the number of categories of the gaze area.
- the probability of the driver's face image corresponding to each category of the gaze area can be calculated by a classification algorithm based on the intermediate vector, and the category corresponding to the maximum probability can be used as the category prediction information.
- step 812 the parameters of the neural network are adjusted based on the difference between the category prediction information and the category annotation information corresponding to the face image.
- the loss value of the loss function of the training sample can be calculated based on the difference between the category prediction information and the category labeling information, and the parameters of the neural network can be adjusted based on the loss value of each loss function of a set of training samples.
- the face image and eye image without resizing can be input into the neural network, and the features in the face image and the eye image can be extracted by different feature extraction branches in the neural network, thereby reducing even Avoid image quality loss caused by image size adjustment, so that facial and eye features can be extracted more accurately.
- facial features and eye features can be fused to strengthen the feature's ability to characterize attention, making the category detection of the gaze region based on the fusion feature more accurate.
- the neural network can distinguish feature vectors corresponding to different types of gaze areas in a feature space through a classification algorithm.
- the feature vectors extracted from the training data corresponding to different gaze regions may be very close in the feature space.
- the feature vector extracted from the training data may be more distant from the center of the real gaze area in the feature space than the center of the adjacent gaze area, which may cause judgment errors.
- the image features extracted by the neural network can be dot producted with multiple category weights.
- the multiple category weights respectively correspond to multiple categories of the gaze area.
- the number of dimensions of the intermediate vector is the same as the number of categories of the gaze area.
- a large margin softmax algorithm can be used to improve the quality of the feature vectors extracted by the neural network and enhance the compactness of the features extracted by the neural network to improve the accuracy of the final gaze region classification.
- the algorithm can be expressed as the following formula (1), where Li represents the loss value of the loss function of sample i, Yes And the angle between x i , It can be the category weight corresponding to each gaze area category, xi can be the image feature extracted by CNN according to the feature map, y i can be the category of each gaze area, i can be the i-th training sample, It can be called the intermediate vector.
- Fig. 10 illustrates a flowchart of a method for detecting a gaze area according to an embodiment of the present disclosure. As shown in FIG. 10, the method may include steps 1000-1004.
- step 1000 the face area in the image collected in the designated space area is intercepted to obtain the face image.
- an image collected in a designated space area may be an image with a larger range including a human face, and the human face area may be cut out from the image to obtain a human face image.
- the face image is input into a neural network, where the neural network is trained in advance using a training sample set including a plurality of face image samples and their respective gaze area category annotation information, and the marked gaze area
- the category belongs to one of multiple types of defined gaze areas obtained by dividing the designated space area in advance.
- the neural network according to this embodiment may be a neural network obtained by using the training method shown in FIG. 1, and the face image obtained in step 1000 may be input to the neural network.
- step 1004 feature extraction is performed on the input face image via the neural network, and the gaze area detection category corresponding to the face image is determined according to the extracted features.
- the gaze area corresponding to the face image can be predicted by the neural network, and the predicted gaze area can be called the gaze area detection category.
- the gaze area detection category can be expressed in different ways such as letters, numbers, and names.
- the gaze area detection category corresponding to the face image can be directly predicted through the pre-trained neural network. Even if the driver's line of sight is slightly shifted or changed, it will not affect the detection result, which can improve the fault tolerance of the detection.
- driver attention monitoring scenario As an example to illustrate how the neural network trained in this scenario is applied. It is understandable that neural networks trained in other scenarios can also be similarly applied.
- any of the above-trained neural networks can be applied to detect the driver's gaze area.
- a camera 1102 may be installed in the driver's vehicle 1101, and the camera 1102 may collect an image 1103 including the driver's face.
- the image 1103 can be transmitted to the image processing device 1104 in the vehicle, and the pre-trained neural network 1108 can be stored in the image processing device 1104.
- the image processing device 1104 may preprocess the image 1103, and then input the obtained image into the neural network 1108.
- the face area can be cut out from the image 1103 through, for example, face detection, to obtain the face image 1105.
- the left-eye image 1106 and the right-eye image 1107 can also be cropped from the face image 1105.
- the face image 1105, the left eye image 1106, and the right eye image 1107 can be simultaneously input to the pre-trained neural network 1108, so that the neural network 1108 outputs the gaze area detection category of the driver in the vehicle.
- the face image 1105, the left eye image 1106, and the right eye image 1107 can be adjusted to the same predetermined size and then input to the neural network 1108, or they can be input to the neural network 1108 for corresponding feature extraction without size adjustment. Branch.
- FIG. 12 illustrates an example of the gaze area detection category output by the neural network 1108 in the application scenario shown in FIG. 11.
- the driver image shown in FIG. 12 may be collected by a camera 1102 deployed in the vehicle in which the driver rides.
- the image processing device 1104 in the vehicle can intercept the driver's face image 1201 from the driver image.
- the face image 1201 may be input to the neural network 1108 in the image processing device 1104.
- the neural network 1108 can output the driver's gaze area detection category "[5]: center console" in the vehicle corresponding to the face image 1201, as shown in FIG. 12.
- the driver's gaze area detection method has better real-time performance, and can quickly and accurately detect the driver's gaze area.
- the same driver may have different head postures. If only a single camera is used to collect the driver's image, no matter where the camera is installed in the car, it may happen that the driver's head turns and the single eye or even both eyes are invisible, which affects the judgment of the final gaze area . In addition, for drivers wearing glasses, it is often the case that the camera just captures the reflection of the lens at a certain angle, causing the eye area to be partially or completely blocked. To solve the above problems, multiple cameras can be installed in different positions in the car to collect the driver's image.
- multiple cameras 1102 may be installed in the vehicle 1101 of the driver, and the multiple cameras 1102 may respectively collect images of the same driver in the driving area in the vehicle from different angles.
- the acquisition time of multiple cameras can be synchronized, or the acquisition time of each frame of image can be recorded, so that multiple images of the same driver collected by different cameras at the same time can be acquired in subsequent processing.
- multiple cameras can be deployed in a designated space area of the scene to collect images for a specific sub-region of the designated space area.
- the specific sub-area may be the area where the target person controlling the smart device is located.
- the multiple images can be used to determine, for example, in any of the following ways The gaze area of the driver at the time T k .
- the image with the highest image quality score among the multiple images can be determined according to the image quality evaluation index, and the face region in the image with the highest image quality score can be intercepted to obtain the driver's face image.
- the image quality evaluation index may include at least one of the following: whether the image includes an eye image, the sharpness of the eye area in the image, the occlusion of the eye area in the image, and the open/close condition of the eyes in the image.
- a captured image includes a clear image of the eyes, the eye area is not blocked, and the eyes are completely open, it can be determined that the image is the image with the highest image quality score, and the driver’s person can be intercepted from the image
- the face image input the face image into a pre-trained neural network to determine the gaze area detection category of the driver at the time T k .
- Manner 2 Refer to Manner 1, and determine the image with the highest image quality score among the multiple images according to the image quality evaluation index.
- the facial images of the driver can be intercepted from the multiple images, and the intercepted facial images can be input into a pre-trained neural network to obtain multiple gaze area detection categories corresponding to the multiple facial images.
- the gaze area detection category corresponding to the face image associated with the image with the highest image quality score may be selected from the plurality of gaze area detection categories as the gaze area detection category of the driver at the time T k .
- Method 3 It is possible to intercept the driver's face images from the multiple images respectively, and input the intercepted multiple face images into the pre-trained neural network to obtain multiple gaze area detections corresponding to the multiple face images. category. Most of the results of the multiple gaze area detection categories may be selected as the gaze area detection category of the driver at the time T k . For example, if 5 of the 6 gaze area detection categories obtained from 6 face images are all "C", then "C" can be selected as the gaze area detection category of the driver at the time T k .
- the attention monitoring result of the person corresponding to the face image can be determined according to the detection result of the gaze area category.
- the gaze area category detection result may be the gaze area detection category within a preset time period.
- the gaze area category detection result may be "the driver's gaze area has always been area B within a preset time period.” If the area B is the front windshield, it means that the driver is more attentive. If the area B is the glove box area in front of the co-pilot, it means that the driver is likely to be distracted and unable to concentrate.
- the attention monitoring result may be output, for example, "driving is very attentive” may be displayed in a certain display area in the vehicle.
- a distraction prompt message according to the attention monitoring result, for example, output "Please pay attention to the risk, pay attention" on the display screen to prompt the driver.
- at least one of the attention monitoring result and the distraction prompt information can be displayed.
- the driver's attention monitoring scene is taken as an example.
- the detection of the gaze area can also have many other uses.
- vehicle-machine interactive control based on gaze area detection can be performed.
- Some electronic equipment such as a multimedia player, can be installed in the vehicle, which can automatically control the multimedia player to start the playback function according to the detection result of the gaze area by detecting the gaze area of the person in the vehicle.
- the face image of the person (such as the driver or passenger) in the vehicle is captured by a camera deployed in the vehicle, and the detection result of the gaze area category is detected through a pre-trained neural network.
- the detection result may be: within a period of time T, the gaze area of the person in the vehicle has been the area where the "gaze on" option on a certain multimedia player in the vehicle is located. According to the above detection result, it can be determined that the person in the vehicle wants to turn on the multimedia player, so that corresponding control instructions can be output to control the multimedia player to start playing.
- the face image of the control person can be collected, and the gaze area category detection result can be detected through a pre-trained neural network.
- the detection result may be: within a period of time T, the gaze area of the controller has been the area where the "gaze on" option on the smart air conditioner is located. According to the above detection results, it can be determined that the controller wants to start the smart air conditioner, so that a corresponding control command can be output to control the air conditioner to turn on.
- the present disclosure may also provide embodiments of devices and equipment corresponding to the foregoing method embodiments.
- FIG. 13 is a block diagram of a training device 1300 of a neural network for gaze area detection according to an embodiment of the present disclosure.
- the apparatus 1300 may include: a sample input module 1301, a category prediction module 1302, a difference determination module 1303, and a parameter adjustment module 1304.
- the sample input module 1301 is used to input at least the face image as a training sample and its corresponding gaze area category label information into the neural network, where the gaze area category to be labeled belongs to multiple types of defined gaze areas obtained by dividing the designated space area in advance.
- the category prediction module 1302 is configured to perform feature extraction on the input face image via the neural network, and determine the gaze area category prediction information of the face image according to the extracted features.
- the difference determining module 1303 is used to determine the difference between the gaze area category prediction information and the gaze area category label information corresponding to the face image.
- the parameter adjustment module 1304 is configured to adjust the parameters of the neural network based on the difference.
- the sample input module 1301 may crop at least one eye area in the face image before at least inputting the face image as a training sample and the corresponding gaze area category label information into the neural network , Get at least one eye image.
- the sample input module 1301 may adjust the face image and the at least one eye image to the same predetermined size and input them into the neural network at the same time.
- the category prediction module 1302 may simultaneously extract features in the face image and features in the at least one eye image via the neural network, and determine the gaze area category prediction information of the face image according to the extracted features .
- the sample input module 1301 may input the face image and the at least one eye image (without resizing) into different feature extraction branches of the neural network, wherein The size of the face image and the eye image in the neural network may be different.
- the category prediction module 1302 can extract the features in the face image and the features in the eye image through the corresponding feature extraction branches of the neural network, and fuse the features extracted by the feature extraction branches to obtain the fused features, and according to all features
- the fusion feature determines the gaze area category prediction information of the face image.
- the category prediction module 1302 may perform dot product operations on the extracted features and multiple category weights respectively to obtain an intermediate vector, and The gaze area category prediction information of the face image is determined according to the intermediate vector.
- the plurality of category weights respectively correspond to the multiple categories of defined gaze regions, and the number of dimensions of the intermediate vector is the same as the number of the multiple categories of defined gaze regions.
- the designated space area includes: a space area of a car.
- the face image is determined based on an image collected for a driving area in the space area of the vehicle.
- the multiple types of defined gaze areas obtained by dividing the designated space area include at least two of the following: left front windshield area, right front windshield area, instrument panel area, interior rearview mirror area, center console area, left Rearview mirror area, right rearview mirror area, sun visor area, shift lever area, area under the steering wheel, co-pilot area, and glove box area in front of the co-pilot.
- FIG. 14 is a block diagram of a gaze area detecting device 1400 according to an embodiment of the present disclosure.
- the device 1400 may include: an image acquisition module 1401, an image input module 1402, and a category detection module 1403.
- the image acquisition module 1401 is used to intercept a face area in an image collected in a designated space area to obtain a face image.
- the image input module 1402 is used to input the face image into a neural network, where the neural network is trained in advance using a training sample set including a plurality of face image samples and their respective corresponding gaze area category annotation information, and the marked gaze
- the area category belongs to one of multiple types of defined gaze areas obtained by dividing the designated space area in advance.
- the category detection module 1403 is configured to perform feature extraction on the input face image via the neural network, and determine the gaze area detection category corresponding to the face image according to the extracted features.
- the training sample set for pre-training the neural network further includes multiple eye image samples intercepted from multiple face image samples.
- the image obtaining module 1401 can crop at least one eye area in the face image to obtain at least one eye image.
- the image input module 1402 may adjust the face image and the at least one eye image to the same predetermined size and input them into the neural network at the same time.
- the category detection module 1403 may simultaneously extract features in the face image and features in the at least one eye image via the neural network, and determine the gaze area detection category corresponding to the face image according to the extracted features .
- the image input module 1402 may separately input the face image and the at least one eye image (without resizing) into different feature extraction branches of the neural network, wherein The size of the face image and the eye image in the neural network may be different.
- the category detection module 1403 can extract the features in the face image and the features in the eye image through the corresponding feature extraction branches of the neural network, and fuse the features extracted by the feature extraction branches to obtain the fused features, and according to The fusion feature determines the gaze area detection category corresponding to the face image.
- the image acquisition module 1401 may acquire the image of the face region in the image collected in the designated space region through multiple cameras deployed in the designated space region at the same time T i for the designated space region. Multiple images collected from different angles in a specific sub-region.
- the image acquisition module 1401 may determine the image with the highest image quality score among the multiple images according to the image quality evaluation index.
- the image quality evaluation index may include at least one of the following: whether the image includes an eye image, the sharpness of the eye area in the image, the occlusion of the eye area in the image, and the open/close condition of the eyes in the image.
- the image acquisition module 1401 can intercept the face region in the image with the highest image quality score to obtain the face image.
- the image input module 1402 can input the face image into the neural network.
- the category detection module 1403 may perform feature extraction on the face image via the neural network, and determine the corresponding gaze area detection category according to the extracted features, as the gaze area detection category at the time T i .
- the image acquisition module 1401 can respectively intercept the face regions in the above multiple images to obtain corresponding multiple face images.
- the image input module 1402 can input the multiple face images into the neural network respectively.
- the category detection module 1403 can determine its corresponding gaze area detection category as described above.
- the category detection module 1403 may select the gaze area detection category corresponding to the face image associated with the image with the highest image quality score from the determined multiple gaze area detection categories respectively corresponding to the multiple face images, as the The type of gaze area detection at time T i .
- the image acquisition module 1401 can respectively intercept the face regions in the above multiple images to obtain corresponding multiple face images.
- the image input module 1402 can input the multiple face images into the neural network respectively.
- the category detection module 1403 can determine its corresponding gaze area detection category as described above. The category detection module 1403 may select most of the determined results of the multiple gaze area detection categories corresponding to the multiple face images as the gaze area detection category at the time T i .
- the designated space area includes: a space area of a car.
- the above-mentioned images collected in the designated space area include images collected for the driving area in the space area of the vehicle.
- the multiple types of defined gaze areas obtained by dividing the designated space area include at least two of the following: left front windshield area, right front windshield area, instrument panel area, interior rearview mirror area, center console area, left Rearview mirror area, right rearview mirror area, sun visor area, shift lever area, area under the steering wheel, co-pilot area, and glove box area in front of the co-pilot.
- Fig. 15 is a block diagram of a gaze area detecting device 1400' according to another embodiment of the present disclosure.
- the difference between the device 1400' and the gaze area detecting device 1400 shown in FIG. 14 is that the device 1400' may also include at least one of the first category application module 1404 and the second category application module 1405.
- the first category application module 1404 may obtain the gaze area category detection result based on the gaze area detection category obtained by the category detection module 1403, and determine the attention monitoring result of the person corresponding to the face image according to the gaze area category detection result.
- the first category application module 1404 may output the attention monitoring result, and/or output distraction prompt information according to the attention monitoring result.
- the second category application module 1405 can obtain the gaze area category detection result based on the gaze area detection category obtained by the category detection module 1403, determine the control instruction corresponding to the gaze area category detection result, and control the electronic device to execute the control instruction The corresponding operation.
- FIG. 16 is a block diagram of a training device of a neural network for gaze area detection according to an embodiment of the present disclosure.
- the device may include a memory 1601 and a processor 1602.
- the memory 1601 stores computer instructions executable by the processor 1602.
- the processor 1602 executes the computer instructions, it can implement any of the aforementioned neural network training methods for gaze area detection.
- FIG. 17 is a block diagram of a gaze area detection device according to an embodiment of the present disclosure.
- the device may include a memory 1701 and a processor 1702.
- the memory 1701 stores computer instructions executable by the processor 1702.
- the processor 1702 executes the computer instructions, it can implement any one of the above-mentioned gaze area detection methods.
- the embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored.
- the processor can realize any of the above-mentioned neural network training methods for gaze area detection.
- the embodiment of the present disclosure also provides a computer-readable storage medium on which a computer program is stored.
- the processor can realize any of the above-mentioned gaze area detection methods.
- the present disclosure can be provided as a method, device, system, or computer program product. Therefore, the present disclosure may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware.
- Embodiments of the subject matter described herein can be implemented as one or more computer programs, that is, one or more of computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Modules.
- the program instructions may be encoded on the generated propagating signal (such as a machine-generated electrical, optical or electromagnetic signal) that is generated to encode the information and transmit it to a suitable receiver device for data transmission
- the processing device executes.
- the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- the processing and logic flow described herein can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output.
- the processing and logic flow can also be executed by a dedicated logic circuit such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
- FPGA Field Programmable Gate Array
- ASIC Application Specific Integrated Circuit
- Computers suitable for executing computer programs include, for example, general-purpose or special-purpose microprocessors, or any other type of central processing unit.
- the central processing unit will receive instructions and data from a read-only memory and/or random access memory.
- the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
- a computer can include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer can be operatively coupled to this mass storage device to receive data from or send data to it. Transfer data.
- the computer can be embedded in another device (such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or a universal serial bus (USB) ) Flash drives, portable storage devices, etc.).
- PDA personal digital assistant
- GPS global positioning system
- USB universal serial bus
- Computer readable media suitable for storing computer program instructions and data may include various forms of non-volatile memory, such as semiconductor memory devices (for example, Erasable Programmable Read Only Memory (EPROM), electronic Erasable Programmable Read Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM) and flash memory), magnetic disks (such as internal hard disks or removable disks), magneto-optical disks, CD-ROMs (Compact Disc Read Only Memory, CD-ROM) , Digital Versatile Disc (DVD), etc.
- EPROM Erasable Programmable Read Only Memory
- EEPROM Electrical Erasable Programmable Read Only Memory
- flash memory such as electrically Erasable Programmable Read Only Memory
- magnetic disks such as internal hard disks or removable disks
- magneto-optical disks CD-ROMs (Compact Disc Read Only Memory, CD-ROM) , Digital Versatile Disc (DVD), etc.
- the processor and the memory can be supplemented by or incorporated into a
Abstract
Description
Claims (42)
- 一种注视区域检测用神经网络的训练方法,所述方法包括:至少将作为训练样本的人脸图像及其对应的注视区域类别标注信息输入所述神经网络,其中,标注的注视区域类别属于预先对指定空间区域划分得到的多类定义注视区域之一;经所述神经网络对输入的所述人脸图像进行特征提取,并根据提取的特征确定所述人脸图像的注视区域类别预测信息;确定所述注视区域类别预测信息与所述注视区域类别标注信息之间的差异;基于所述差异调整所述神经网络的参数。
- 根据权利要求1所述的方法,所述方法还包括:在至少将所述人脸图像及其对应的注视区域类别标注信息输入所述神经网络之前,裁剪所述人脸图像中的至少一眼睛区域,得到至少一眼睛图像;其中,至少将所述人脸图像及其对应的注视区域类别标注信息输入所述神经网络包括:将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络。
- 根据权利要求2所述的方法,其中,将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络包括:将所述人脸图像和所述至少一眼睛图像调整到相同的预定尺寸后将它们同时输入所述神经网络;对输入的所述人脸图像进行特征提取包括:经所述神经网络同时提取所述人脸图像中的特征和所述至少一眼睛图像中的特征。
- 根据权利要求2所述的方法,其中,将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络包括:将所述人脸图像和所述至少一眼睛图像分别输入所述神经网络的不同的特征提取分支,其中,所述人脸图像和所述至少一眼睛图像的尺寸不同;对输入的所述人脸图像进行特征提取并确定所述注视区域类别预测信息包括:经所述神经网络的相应的特征提取分支分别提取所述人脸图像中的特征和所述至少一眼睛图像中的特征;融合所述神经网络的相应的特征提取分支分别提取的各特征,得到融合特征;根据所述融合特征确定所述人脸图像的注视区域类别预测信息。
- 根据权利要求1~4中任一所述的方法,其中,根据提取的特征确定所述注视区域类别预测信息包括:将所提取的特征与多个类别权重分别进行点积运算,得到中间向量,其中,所述多个类别权重分别与所述多类定义注视区域对应,所述中间向量的维度数量与所述多类定义注视区域的数量相同,当所提取的特征与所述注视区域类别标注信息对应的类别权重进行点积运算时,调整该特征与该类别权重之间的向量夹角余弦值,以增大类间距离且缩小类内距离;根据所述中间向量,确定所述人脸图像的注视区域类别预测信息。
- 根据权利要求1~5中任一所述的方法,其中,所述指定空间区域包括:车的空间区域。
- 根据权利要求6所述的方法,其中,所述人脸图像基于针对所述车的空间区域中的驾驶区域采集到的图像确定;所述多类定义注视区域包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮阳板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。
- 一种注视区域检测方法,所述方法包括:截取在指定空间区域内采集到的图像中的人脸区域,得到人脸图像;将所述人脸图像输入神经网络,其中,所述神经网络预先采用包括多个人脸图像样本及其分别对应的注视区域类别标注信息的训练样本集训练完成,标注的注视区域类别属于预先对所述指定空间区域划分得到的多类定义注视区域之一;经所述神经网络对输入的所述人脸图像进行特征提取,并根据提取的特征确定所述人脸图像对应的注视区域检测类别。
- 根据权利要求8所述的方法,其中,用于预先训练所述神经网络的所述训练样本集中还包括分别从所述多个人脸图像样本中截取的多个眼睛图像样本;所述方法还包括:在得到所述人脸图像之后,裁剪所述人脸图像中的至少一眼睛区域,得到至少一眼睛图像;将所述人脸图像输入所述神经网络包括:将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络。
- 根据权利要求9所述的方法,其中,将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络包括:将所述人脸图像和所述至少一眼睛图像调整到相同的预定尺寸后将它们同时输入所述神经网络;对输入的所述人脸图像进行特征提取包括:经所述神经网络同时提取所述人脸图像中的特征和所述至少一眼睛图像中的特征。
- 根据权利要求9所述的方法,其中,将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络包括:将所述人脸图像和所述至少一眼睛图像分别输入所述神经网络的不同的特征提取分支,其中,所述人脸图像和所述至少一眼睛图像的尺寸不同;对输入的所述人脸图像进行特征提取并确定所述注视区域检测类别包括:经所述神经网络的相应的特征提取分支分别提取所述人脸图像中的特征和所述至少一眼睛图像中的特征;融合所述神经网络的相应的特征提取分支分别提取的各特征,得到融合特征;根据所述融合特征确定所述人脸图像对应的注视区域检测类别。
- 根据权利要求8~11中任一所述的方法,所述方法还包括:在截取在所述指定空间区域内采集到的图像中的人脸区域之前,获取通过在该指定空间区域部署的多个摄像头,在同一时刻针对该指定空间区域的一特定子区域从不同角度分别采集的多个图像;以及根据图像质量评价指标,确定所述多个图像中图像质量评分最高的图像,其中,截取在所述指定空间区域内采集到的图像中的人脸区域包括:截取所述图像质量评分最高的图像中的人脸区域。
- 根据权利要求8~11中任一所述的方法,所述方法还包括:在截取在所述指定空间区域内采集到的图像中的人脸区域之前,获取通过在该指定空间区域部署的多个摄像头,在同一时刻针对该指定空间区域的一特定子区域从不同角度分别采集的多个图像;以及根据图像质量评价指标,确定所述多个图像中图像质量评分最高的图像,其中,截取在所述指定空间区域内采集到的图像中的人脸区域以得到人脸图像包括:分别截取所述多个图像中的人脸区域,得到相应的多个人脸图像;将所述人脸图像输入所述神经网络包括:分别将该多个人脸图像输入所述神经网络;对输入的所述人脸图像进行特征提取并确定所述人脸图像对应的注视区域检测类别包括:针对该多个人脸图像中的每一个人脸图像,经所述神经网络对该人脸图像进行特征提取,并根据提取的特征确定 该人脸图像对应的注视区域检测类别;所述方法还包括:从所确定的分别与该多个人脸图像对应的多个注视区域检测类别中,选择与所述图像质量评分最高的图像关联的人脸图像对应的注视区域检测类别,作为在所述时刻的注视区域检测类别。
- 根据权利要求12或13所述的方法,其中,所述图像质量评价指标包括下列中至少一种:图像中是否包括有眼睛图像、图像中眼睛区域的清晰度、图像中眼睛区域的遮挡情况、图像中眼睛的睁/闭情况。
- 根据权利要求8~11中任一所述的方法,所述方法还包括:在截取在所述指定空间区域内采集到的图像中的人脸区域之前,获取通过在该指定空间区域部署的多个摄像头,在同一时刻针对该指定空间区域的一特定子区域从不同角度分别采集的多个图像,其中,截取在所述指定空间区域内采集到的图像中的人脸区域以得到人脸图像包括:分别截取所述多个图像中的人脸区域,得到相应的多个人脸图像;将所述人脸图像输入所述神经网络包括:分别将该多个人脸图像输入所述神经网络;对输入的所述人脸图像进行特征提取并确定所述人脸图像对应的注视区域检测类别包括:针对该多个人脸图像中的每一个人脸图像,经所述神经网络对该人脸图像进行特征提取,并根据提取的特征确定该人脸图像对应的注视区域检测类别;所述方法还包括:选择所确定的分别与该多个人脸图像对应的多个注视区域检测类别中的多数结果,作为在所述时刻的注视区域检测类别。
- 根据权利要求8~15中任一所述的方法,其中,所述指定空间区域包括:车的空间区域。
- 根据权利要求16所述的方法,其中,所述在指定空间区域内采集到的图像包括:针对所述车的空间区域中的驾驶区域采集到的图像;所述多类定义注视区域包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮阳板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。
- 根据权利要求8~17中任一所述的方法,所述方法还包括:基于所述注视区域检测类别得到注视区域类别检测结果,并根据该注视区域类别检测结果,确定所述人脸图像对应的人的注意力监控结果;输出所述注意力监控结果,和/或,根据所述注意力监控结果输出分心提示信息。
- 根据权利要求8~17中任一所述的方法,所述方法还包括:基于所述注视区域检测类别得到注视区域类别检测结果,并确定与所述注视区域类别检测结果对应的控制指令;控制电子设备执行与所述控制指令相应的操作。
- 一种注视区域检测用神经网络的训练装置,所述装置包括:样本输入模块,用于至少将作为训练样本的人脸图像及其对应的注视区域类别标注信息输入所述神经网络,其中,标注的注视区域类别属于预先对指定空间区域划分得到的多类定义注视区域之一;类别预测模块,用于经所述神经网络对输入的所述人脸图像进行特征提取,并根据提取的特征确定所述人脸图像的注视区域类别预测信息;差异确定模块,用于确定所述注视区域类别预测信息与所述注视区域类别标注信息之间的差异;参数调整模块,用于基于所述差异调整所述神经网络的参数。
- 根据权利要求20所述的装置,其中,所述样本输入模块用于:在至少将所述人脸图像及其对应的注视区域类别标注信息输入所述神经网络之前,裁剪所述人脸图像中的至少一眼睛区域,得到至少一眼睛图像;将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络。
- 根据权利要求21所述的装置,其中,所述样本输入模块用于:将所述人脸图像和所述至少一眼睛图像调整到相同的预定尺寸后将它们同时输入所述神经网络;所述类别预测模块用于:经所述神经网络同时提取所述人脸图像中的特征和所述至少一眼睛图像中的特征,并根据提取的特征确定所述人脸图像的注视区域类别预测信息。
- 根据权利要求21所述的装置,其中,所述样本输入模块用于:将所述人脸图像和所述至少一眼睛图像分别输入所述神经网络的不同的特征提取分支,其中,所述人脸图像和所述至少一眼睛图像的尺寸不同;所述类别预测模块用于:经所述神经网络的相应的特征提取分支分别提取所述人脸图像中的特征和所述至少一眼睛图像中的特征;融合所述神经网络的相应的特征提取分支分别提取的各特征,得到融合特征;根据所述融合特征确定所述人脸图像的注视区域类别预测信息。
- 根据权利要求20~23中任一所述的装置,其中,所述类别预测模块用于:将所提取的特征与多个类别权重分别进行点积运算,得到中间向量,其中,所述多个类别权重分别与所述多类定义注视区域对应,所述中间向量的维度数量与所述多类定义注视区域的数量相同,当所提取的特征与所述注视区域类别标注信息对应的类别权重进行点积运算时,调整该特征与该类别权重之间的向量夹角余弦值,以增大类间距离且缩小类内距离;根据所述中间向量,确定所述人脸图像的注视区域类别预测信息。
- 根据权利要求20~24中任一所述的装置,其中,所述指定空间区域包括:车的空间区域。
- 根据权利要求25所述的装置,其中,所述人脸图像基于针对所述车的空间区域中的驾驶区域采集到的图像确定;所述多类定义注视区域包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮阳板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。
- 一种注视区域检测装置,所述装置包括:图像获取模块,用于截取在指定空间区域内采集到的图像中的人脸区域,得到人脸图像;图像输入模块,用于将所述人脸图像输入神经网络,其中,所述神经网络预先采用包括多个人脸图像样本及其分别对应的注视区域类别标注信息的训练样本集训练完成,标注的注视区域类别属于预先对所述指定空间区域划分得到的多类定义注视区域之一;类别检测模块,用于经所述神经网络对输入的所述人脸图像进行特征提取,并根据提取的特征确定所述人脸图像对应的注视区域检测类别。
- 根据权利要求27所述的装置,其中,用于预先训练所述神经网络的所述训练样本集中还包括分别从所述多个人脸图像样本中截取的多个眼睛图像样本;所述图像获取模块还用于:在得到所述人脸图像之后,裁剪所述人脸图像中的至少一眼睛区域,得到至少一眼睛图像;所述图像输入模块用于:将所述人脸图像和所述至少一眼睛图像同时输入所述神经网络。
- 根据权利要求28所述的装置,其中,所述图像输入模块用于:将所述人脸图像和所述至少一眼睛图像调整到相同的预定尺寸后将它们同时输入所述神经网络;所述类别检测模块用于:经所述神经网络同时提取所述人脸图像中的特征和所述至少一眼睛图像中的特征,并根据提取的特征确定所述人脸图像对应的注视区域检测类别。
- 根据权利要求28所述的装置,其中,所述图像输入模块用于:将所述人脸图像和所述至少一眼睛图像分别输入所述神经网络的不同的特征提取分支,其中,所述人脸图像和所述至少一眼睛图像的尺寸不同;所述类别检测模块用于:经所述神经网络的相应的特征提取分支分别提取所述人脸图像中的特征和所述至少一眼睛图像中的特征;融合所述神经网络的相应的特征提取分支分别提取的各特征,得到融合特征;根据所述融合特征确定所述人脸图像对应的注视区域检测类别。
- 根据权利要求27~30中任一所述的装置,其中,所述图像获取模块用于:获取通过在所述指定空间区域部署的多个摄像头,在同一时刻针对该指定空间区域的一特定子区域从不同角度分别采集的多个图像;根据图像质量评价指标,确定所述多个图像中图像质量评分最高的图像;截取所述图像质量评分最高的图像中的人脸区域,得到所述人脸图像。
- 根据权利要求27~30中任一所述的装置,其中,所述图像获取模块用于:获取通过在所述指定空间区域部署的多个摄像头,在同一时刻针对该指定空间区域的一特定子区域从不同角度分别采集的多个图像;根据图像质量评价指标,确定所述多个图像中图像质量评分最高的图像;分别截取所述多个图像中的人脸区域,得到相应的多个人脸图像;所述图像输入模块用于:分别将该多个人脸图像输入所述神经网络;所述类别检测模块用于:针对该多个人脸图像中的每一个人脸图像,经所述神经网络对该人脸图像进行特征提取,并根据提取的特征确定该人脸图像对应的注视区域检测类别;从所确定的分别与该多个人脸图像对应的多个注视区域检测类别中,选择与所述图像质量评分最高的图像关联的人脸图像对应的注视区域检测类别,作为在所述时刻的注视区域检测类别。
- 根据权利要求31或32所述的装置,其中,所述图像质量评价指标包括下列中至少一种:图像中是否包括有眼睛图像、图像中眼睛区域的清晰度、图像中眼睛区域的遮挡情况、图像中眼睛的睁/闭情况。
- 根据权利要求27~30中任一所述的装置,其中,所述图像获取模块用于:获取通过在所述指定空间区域部署的多个摄像头,在同一时刻针对该指定空间区域的一特定子区域从不同角度分别采集的多个图像;分别截取所述多个图像中的人脸区域,得到相应的多个人脸图像;所述图像输入模块用于:分别将该多个人脸图像输入所述神经网络;所述类别检测模块用于:针对该多个人脸图像中的每一个人脸图像,经所述神经网络对该人脸图像进行特征提取,并根据提取的特征确定该人脸图像对应的注视区域检测类别;选择所确定的分别与该多个人脸图像对应的多个注视区域检测类别中的多数结果,作为在所述时刻的注视区域检测类别。
- 根据权利要求27~34中任一所述的装置,其中,所述指定空间区域包括:车的空间区域。
- 根据权利要求35所述的装置,其中,所述在指定空间区域内采集到的图像包括:针对所述车的空间区域中的驾驶区域采集到的图像;所述多类定义注视区域包括下列中至少两类:左前挡风玻璃区域、右前挡风玻璃区域、仪表盘区域、车内后视镜区域、中控台区域、左后视镜区域、右后视镜区域、遮阳板区域、换挡杆区域、方向盘下方区域、副驾驶区域、副驾驶前方的杂物箱区域。
- 根据权利要求27~36中任一所述的装置,所述装置还包括:第一类别应用模块,用于:基于所述类别检测模块得到的所述注视区域检测类别而得到注视区域类别检测结果,并根据该注视区域类别检测结果,确定所述人脸图像对应的人的注意力监控结果;输出所述注意力监控结果,和/或,根据所述注意力监控结果输出分心提示信息。
- 根据权利要求27~36中任一所述的装置,所述装置还包括:第二类别应用模块,用于:基于所述类别检测模块得到的所述注视区域检测类别而得到注视区域类别检测结果,并确定与该注视区域类别检测结果对应的控制指令;控制电子设备执行与所述控制指令相应的操作。
- 一种注视区域检测用神经网络的训练设备,包括存储器和处理器,其中,所述存储器中存储有可由所述处理器执行的计算机指令,所述处理器在执行所述计算机指令时,实现根据权利要求1至7中任一所述的方法。
- 一种注视区域检测设备,包括存储器和处理器,其中,所述存储器中存储有可由所述处理器执行的计算机指令,所述处理器在执行所述计算机指令时,实现根据权利要求8至19中任一所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,使该处理器实现根据权利要求1至7中任一所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,使该处理器实现根据权利要求8至19中任一所述的方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021540840A JP7252348B2 (ja) | 2019-03-18 | 2019-12-30 | 注視エリア検出方法とニューラルネットワークトレーニング方法、装置、及びデバイス |
KR1020217022190A KR20210102413A (ko) | 2019-03-18 | 2019-12-30 | 주시 영역 검출 방법과 신경망 트레이닝 방법, 장치 및 디바이스 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910204566.9 | 2019-03-18 | ||
CN201910204566.9A CN111723596B (zh) | 2019-03-18 | 2019-03-18 | 注视区域检测及神经网络的训练方法、装置和设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020186883A1 true WO2020186883A1 (zh) | 2020-09-24 |
Family
ID=72518968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/129893 WO2020186883A1 (zh) | 2019-03-18 | 2019-12-30 | 注视区域检测及神经网络训练的方法、装置和设备 |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP7252348B2 (zh) |
KR (1) | KR20210102413A (zh) |
CN (1) | CN111723596B (zh) |
WO (1) | WO2020186883A1 (zh) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112541436A (zh) * | 2020-12-15 | 2021-03-23 | 平安科技(深圳)有限公司 | 专注度分析方法、装置、电子设备及计算机存储介质 |
CN112560783A (zh) * | 2020-12-25 | 2021-03-26 | 京东数字科技控股股份有限公司 | 用于评估关注状态的方法、装置、系统、介质及产品 |
CN112656431A (zh) * | 2020-12-15 | 2021-04-16 | 中国科学院深圳先进技术研究院 | 基于脑电的注意力识别方法、装置、终端设备和存储介质 |
CN113052064A (zh) * | 2021-03-23 | 2021-06-29 | 北京思图场景数据科技服务有限公司 | 基于面部朝向、面部表情及瞳孔追踪的注意力检测方法 |
CN113065997A (zh) * | 2021-02-27 | 2021-07-02 | 华为技术有限公司 | 一种图像处理方法、神经网络的训练方法以及相关设备 |
CN113283340A (zh) * | 2021-05-25 | 2021-08-20 | 复旦大学 | 一种基于眼表特征的疫苗接种情况检测方法、装置及系统 |
CN113391699A (zh) * | 2021-06-10 | 2021-09-14 | 昆明理工大学 | 一种基于动态眼动指标的眼势交互模型方法 |
US20210366152A1 (en) * | 2018-12-24 | 2021-11-25 | Samsung Electronics Co., Ltd. | Method and apparatus with gaze estimation |
CN114863093A (zh) * | 2022-05-30 | 2022-08-05 | 厦门大学 | 基于眼动技术的神经网络训练方法及建筑设计方法和系统 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113900519A (zh) * | 2021-09-30 | 2022-01-07 | Oppo广东移动通信有限公司 | 注视点获取方法、装置以及电子设备 |
KR20230054982A (ko) * | 2021-10-18 | 2023-04-25 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
CN116048244B (zh) * | 2022-07-29 | 2023-10-20 | 荣耀终端有限公司 | 一种注视点估计方法及相关设备 |
CN116030512B (zh) * | 2022-08-04 | 2023-10-31 | 荣耀终端有限公司 | 注视点检测方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107697069A (zh) * | 2017-10-31 | 2018-02-16 | 上海汽车集团股份有限公司 | 汽车驾驶员疲劳驾驶智能控制方法 |
CN108229284A (zh) * | 2017-05-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | 视线追踪及训练方法和装置、系统、电子设备和存储介质 |
WO2018167991A1 (ja) * | 2017-03-14 | 2018-09-20 | オムロン株式会社 | 運転者監視装置、運転者監視方法、学習装置及び学習方法 |
CN109446892A (zh) * | 2018-09-14 | 2019-03-08 | 杭州宇泛智能科技有限公司 | 基于深度神经网络的人眼注意力定位方法及系统 |
CN109460780A (zh) * | 2018-10-17 | 2019-03-12 | 深兰科技(上海)有限公司 | 人工神经网络的车辆安全驾驶检测方法、装置及存储介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106407935A (zh) * | 2016-09-21 | 2017-02-15 | 俞大海 | 基于人脸图像和眼动注视信息的心理测试方法 |
CN107590482A (zh) * | 2017-09-29 | 2018-01-16 | 百度在线网络技术(北京)有限公司 | 信息生成方法和装置 |
CN109002753B (zh) * | 2018-06-01 | 2022-07-08 | 上海大学 | 一种基于卷积神经网络级联的大场景监控图像人脸检测方法 |
CN108985181B (zh) * | 2018-06-22 | 2020-07-24 | 华中科技大学 | 一种基于检测分割的端对端人脸标注方法 |
-
2019
- 2019-03-18 CN CN201910204566.9A patent/CN111723596B/zh active Active
- 2019-12-30 WO PCT/CN2019/129893 patent/WO2020186883A1/zh active Application Filing
- 2019-12-30 JP JP2021540840A patent/JP7252348B2/ja active Active
- 2019-12-30 KR KR1020217022190A patent/KR20210102413A/ko not_active Application Discontinuation
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018167991A1 (ja) * | 2017-03-14 | 2018-09-20 | オムロン株式会社 | 運転者監視装置、運転者監視方法、学習装置及び学習方法 |
CN108229284A (zh) * | 2017-05-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | 视线追踪及训练方法和装置、系统、电子设备和存储介质 |
CN107697069A (zh) * | 2017-10-31 | 2018-02-16 | 上海汽车集团股份有限公司 | 汽车驾驶员疲劳驾驶智能控制方法 |
CN109446892A (zh) * | 2018-09-14 | 2019-03-08 | 杭州宇泛智能科技有限公司 | 基于深度神经网络的人眼注意力定位方法及系统 |
CN109460780A (zh) * | 2018-10-17 | 2019-03-12 | 深兰科技(上海)有限公司 | 人工神经网络的车辆安全驾驶检测方法、装置及存储介质 |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210366152A1 (en) * | 2018-12-24 | 2021-11-25 | Samsung Electronics Co., Ltd. | Method and apparatus with gaze estimation |
US11747898B2 (en) * | 2018-12-24 | 2023-09-05 | Samsung Electronics Co., Ltd. | Method and apparatus with gaze estimation |
CN112656431A (zh) * | 2020-12-15 | 2021-04-16 | 中国科学院深圳先进技术研究院 | 基于脑电的注意力识别方法、装置、终端设备和存储介质 |
CN112541436A (zh) * | 2020-12-15 | 2021-03-23 | 平安科技(深圳)有限公司 | 专注度分析方法、装置、电子设备及计算机存储介质 |
CN112541436B (zh) * | 2020-12-15 | 2024-05-07 | 平安科技(深圳)有限公司 | 专注度分析方法、装置、电子设备及计算机存储介质 |
CN112560783A (zh) * | 2020-12-25 | 2021-03-26 | 京东数字科技控股股份有限公司 | 用于评估关注状态的方法、装置、系统、介质及产品 |
CN113065997B (zh) * | 2021-02-27 | 2023-11-17 | 华为技术有限公司 | 一种图像处理方法、神经网络的训练方法以及相关设备 |
CN113065997A (zh) * | 2021-02-27 | 2021-07-02 | 华为技术有限公司 | 一种图像处理方法、神经网络的训练方法以及相关设备 |
CN113052064A (zh) * | 2021-03-23 | 2021-06-29 | 北京思图场景数据科技服务有限公司 | 基于面部朝向、面部表情及瞳孔追踪的注意力检测方法 |
CN113052064B (zh) * | 2021-03-23 | 2024-04-02 | 北京思图场景数据科技服务有限公司 | 基于面部朝向、面部表情及瞳孔追踪的注意力检测方法 |
CN113283340A (zh) * | 2021-05-25 | 2021-08-20 | 复旦大学 | 一种基于眼表特征的疫苗接种情况检测方法、装置及系统 |
CN113283340B (zh) * | 2021-05-25 | 2022-06-14 | 复旦大学 | 一种基于眼表特征的疫苗接种情况检测方法、装置及系统 |
CN113391699A (zh) * | 2021-06-10 | 2021-09-14 | 昆明理工大学 | 一种基于动态眼动指标的眼势交互模型方法 |
CN113391699B (zh) * | 2021-06-10 | 2022-06-21 | 昆明理工大学 | 一种基于动态眼动指标的眼势交互模型方法 |
CN114863093A (zh) * | 2022-05-30 | 2022-08-05 | 厦门大学 | 基于眼动技术的神经网络训练方法及建筑设计方法和系统 |
Also Published As
Publication number | Publication date |
---|---|
CN111723596B (zh) | 2024-03-22 |
JP2022517121A (ja) | 2022-03-04 |
CN111723596A (zh) | 2020-09-29 |
JP7252348B2 (ja) | 2023-04-04 |
KR20210102413A (ko) | 2021-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020186883A1 (zh) | 注视区域检测及神经网络训练的方法、装置和设备 | |
TWI741512B (zh) | 駕駛員注意力監測方法和裝置及電子設備 | |
US10877485B1 (en) | Handling intersection navigation without traffic lights using computer vision | |
US20210357670A1 (en) | Driver Attention Detection Method | |
Seshadri et al. | Driver cell phone usage detection on strategic highway research program (SHRP2) face view videos | |
US20190377409A1 (en) | Neural network image processing apparatus | |
CN108447303B (zh) | 基于人眼视觉与机器视觉耦合的外周视野危险识别方法 | |
JP2019533209A (ja) | 運転者監視のためのシステム及び方法 | |
JP2020509466A (ja) | 完全な畳み込みアーキテクチャを使用する運転者の視覚的注意のための計算フレームワークのシステム及び方法 | |
WO2021016873A1 (zh) | 基于级联神经网络的注意力检测方法、计算机装置及计算机可读存储介质 | |
JP2022517254A (ja) | 注視エリア検出方法、装置、及び電子デバイス | |
García et al. | Driver monitoring based on low-cost 3-D sensors | |
WO2020231401A1 (en) | A neural network for head pose and gaze estimation using photorealistic synthetic data | |
WO2023272453A1 (zh) | 视线校准方法及装置、设备、计算机可读存储介质、系统、车辆 | |
Jha et al. | Probabilistic estimation of the driver's gaze from head orientation and position | |
CN115690750A (zh) | 一种驾驶员分心检测方法和装置 | |
Sun et al. | Driver fatigue detection system based on colored and infrared eye features fusion | |
CN113635833A (zh) | 基于汽车a柱的车载显示装置、方法、系统及存储介质 | |
Doman et al. | Estimation of traffic sign visibility toward smart driver assistance | |
Zabihi et al. | Frame-rate vehicle detection within the attentional visual area of drivers | |
CN116012822A (zh) | 一种疲劳驾驶的识别方法、装置及电子设备 | |
CN113525402B (zh) | 高级辅助驾驶及无人驾驶视场智能响应方法及系统 | |
TWI758717B (zh) | 基於汽車a柱的車載顯示裝置、方法、系統及存儲介質 | |
CN112258813A (zh) | 一种车辆主动安全控制方法和设备 | |
CN112506353A (zh) | 车辆交互系统、方法、存储介质和车辆 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19920094 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021540840 Country of ref document: JP Kind code of ref document: A Ref document number: 20217022190 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19920094 Country of ref document: EP Kind code of ref document: A1 |