NL2030131B1

NL2030131B1 - Human-machine cooperative sensing method and system for automatic driving

Info

Publication number: NL2030131B1
Application number: NL2030131A
Authority: NL
Inventors: Xu Gang; Shen Jianhao; Deng Yuanzhi; Chi Cheng; Zhou Yang; Lin Guoyong; Li Wenjie
Original assignee: Univ Shenzhen Technology
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-11-04
Also published as: NL2030131A

Abstract

The present invention discloses a human-machine cooperative sensing method and system for automatic driving, the method comprising: photographing an image of a 5 driver's head through binocular infrared CCDs arranged at different positions in a vehicle, extracting facial features, and acquiring feature comer points, establishing a mapping relationship between a three-dimensional line of sight of the driver and pixel points on an imaging of an environment sensing camera, obtaining a visual landing point of the driver, and saving the visual landing point to a gaze point cache database, performing, 10 based on the gaze point cache database, eye movement analysis on gaze target frequency and gaze duration to obtain an eye movement state, and constructing topological environment sensing image database to adjust a distribution weight of an image processing neural network in an automatic driving process. The present invention can quickly locate a region of interest in an image in an intelligent camera through a 15 mechanism of visual tracking and human eye attention, and accelerate an information processing speed of an environment sensing system by using a sensing fusion technology, which can si gnif1cantly reduce hardware computing requirements of the sensing system, improve real-time performance of the system and have better economy.

Description

HUMAN-MACHINE COOPERATIVE SENSING METHOD AND SYSTEM FORAUTOMATIC DRIVING TECHNICAL FIELD

[01] The present invention relates to the technical field of intelligent driving, and in particular to a human-machine cooperative sensing method and system for automatic driving.

BACKGROUND ART

[02] Intelligent auxiliary driving and even automatic driving technology is the inevitable trend of automobile development in the future, and the current sensing system becomes the bottleneck of the development of automatic driving technology.

[03] At present, multi-sensor information fusion has become the mainstream solution in current sensing. However, the disadvantages, which laser radar, millimeter wave radar, camera and ultrasonic radar cannot avoid or which multiple multi-sensor combination sensing solutions of the same type cannot avoid, are large amount of processing data per unit time and high demand for hardware resources. It is often difficult to satisfy the real - time requirements of the system, and the system is not economical.

SUMMARY

[04] The human-machine cooperative sensing method for automatic driving provided by the present invention comprises the following steps of: S100, photographing an image of a driver's head through binocular infrared CCDs arranged at different positions in a vehicle, acquiring a composite image of the driver's face, extracting facial features, acquiring feature corner points, and establishing an eyeball coordinate system through positioning of the feature corner points; S200, acquiring the driver's three-dimensional line of sight based on the eyeball coordinate system, placing the driver's three- dimensional line of sight and imaging pixel information of an environment sensing camera under a same world coordinate by means of coordinate matrix transformation,

establishing a mapping relationship between the driver's three-dimensional line of sight and pixel points on an imaging of the environment sensing camera, obtaining the driver's visual landing point, and saving the driver's visual landing point to a driver's gaze point cache database; S300, performing eye movement analysis on a gaze target frequency and a gaze duration based on the driver's gaze point cache database, obtaining an eye movement state and marking the eye movement state, marking a pixel region of an intersection point between the driver's visual landing point and the imaging, and constructing a topological environment sensing image database; and S400, adjusting, according to the topological environment sensing image database, a distribution weight of an image processing neural network in an automatic driving process, and self- adaptively adjusting image pixel traversal fineness and region.

[05] Advantageous effects are as follows. The present invention can quickly locate a region of interest in an image in an intelligent camera through a mechanism of visual tracking and human eye attention, and accelerate an information processing speed of an environment sensing system by using a sensing fusion technology, which can significantly reduce hardware computing requirements of the sensing system, improve real-time performance of the system and have better economy.

BRIEFT DESCRIPTION OF THE DRAWINGS

[06] FIG.1 is a schematic flow diagram of a method according to an embodiment of the invention;

[07] FIG.2 is a schematic diagram of a data processing procedure in the method according to an embodiment of the invention;

[08] FIG.3 is a block schematic diagram of a system according to an embodiment of the present invention.

[09] Reference numbers:

[10] face image acquisition module 100, corner point positioning module 200, gaze point acquisition module 300, topology sensing and marking module 400, sensing fusion module 500, head movement compensation module 600 and sensed image classification processing module 700.

DETAILED DESCRIPTION OF THE EMBODIMENTS

[11] With reference to FIG.1, a method of an embodiment of the invention comprises: S100, photographing an image of a driver's head through binocular infrared CCDs arranged at different positions in a vehicle, acquiring a composite image of the driver's face, extracting facial features, acquiring feature corner points, and establishing an eyeball coordinate system through positioning of the feature corner points, S200, acquiring the driver's three-dimensional line of sight based on the eyeball coordinate system, placing the driver's three-dimensional line of sight and imaging pixel information of an environment sensing camera under a same world coordinate by means of coordinate matrix transformation, establishing a mapping relationship between the driver's three-dimensional line of sight and pixel points on an imaging of the environment sensing camera, obtaining the driver's visual landing point, and saving the dnver's visual landing point to a driver's gaze point cache database; S300, performing eye movement analysis on a gaze target frequency and a gaze duration based on the driver's gaze point cache database, obtaining an eye movement state and marking the eye movement state, marking a pixel region of an intersection point between the driver's visual landing point and the imaging, and constructing a topological environment sensing image database; and S400, adjusting, according to the topological environment sensing image database, a distribution weight of an image processing neural network in an automatic driving process, and self-adaptively adjusting image pixel traversal fineness and region.

[12] The processing of data is roughly divided into the following four steps: the binocular infrared imaging acquiring the face image, GPU processing the face image, and CPU calculating, caching, storing, fusing, and outputting the data. Specifically, the steps include: image acquisition, facial feature detection, feature corner point detection, feature three-dimensional coordinate extraction, driving line-of-sight direction calculation, driving line-of-sight landing point calculation, and fusion output, as shown in FIG.2. The binocular infrared CCDs are used to photograph the driver's face under the full driving conditions, collect and caching the video images. The face image processing by the GPU comprises detecting facial features of the driver and extracting feature corner point coordinates. The CPU calculates and locates RIO region of the driver's line of sight in the imaging of the environment camera. According to the history information of driver's visual RIO region in cache, the storage and fusion output reconstructs topology of the sensing data, and fuses and outputs the topological data based on visual attention mechanism.

[13] Image acquisition as an input end is mainly to acquire and cache video stream images of the driver's face. Video image acquisition needs to adopt such manner as non- wearable acquisition mode with little interference on driver behavior. In addition, an infrared camera that is not sensitive to the light environment must be used. Binocular infrared CCDs at different positions is set in the vehicle to photograph the image of the driver's head and obtain the image of the driver's face with different angles at the same time. Each camera may only photograph a part of the face image, and the image captured by the binocular camera is processed with panorama composition and stitching, gray processing and binarization into a complete composition image of the driver's face, which is then transmitted to the next processing flow.

[14] The facial feature detection is used for extracting the facial position of the driver in the composite image, performing pre-processing for the subsequent facial feature point extraction, and providing basic data for the line-of-sight direction calculation. Face feature detection needs to keep continuous tracking of the driver's face features, so as to improve the speed of the system and reduce the false detection rate. The face region in the composite image is separated from a background region by a face skin color model to obtain the region to be detected which may have a face region, and then a face model is matched with the region to be detected to obtain a matching degree with the face model through analysis and comparison, so as to extract the region which may have a face according to the matching degree. On the basis of a successful face detection, the subsequent feature corner point detection is started. If the face detection fails, a loop face detection is resumed. During the face feature detection process, the photographed images are also cyclically stored to build a historical face database to provide time sequence dimensional information for subsequent driver mental state monitoring.

[15] The feature corner point extraction is based on the face feature detection. A face 5 detection image is tailored and obtained from the driver's face image through the face feature detection. Inner and outer eye angles, two mouth angles and two human eye center points of two eyes on the human face are extracted from the picture, these feature corner points are positioned, and a face coordinate system is established. Specific Procedures are as follows. Firstly, after the face detection image, a rough positioning of the human eye range is carried out according to the principle of "three courts and five eyes" in the face region, so as to narrow the human eye detection range and improve the detection accuracy and detection speed. Then, by means of dynamic threshold segmentation and gradient transformation etc., feature corner points such as eyes are extracted, and face plane is established according to two inner canthi and two mouth corners. After the face region and eyes are detected, the driver's face can be continuously observed by infrared CCD cameras, and the driver's mental state of driving (including fatigue state and driving concentration degree) can be detected by analyzing the driver's facial feature point change and visual attention change through the video stream, a mental state score can be obtained, and a next operation can be performed when the driver's mental state is good, otherwise, face and eye images are continuously photographed and cyclically observed.

[16] Three-dimensional coordinate extraction for feature corner means extracting three-dimensional coordinates of the above-mentioned feature corner points. Through the calibrated binocular infrared CCD camera system, the position and orientation of the above-mentioned feature corner point in the binocular camera imaging system can be obtained, and then the relative coordinates of the facial corner point position are obtained according to a relationship between a camera coordinate system and an ideal coordinate system. A face plane coordinate system is established according to coordinates of canthus points and mouth corners, wherein a face orientation is perpendicular to the face plane. Then, the position and direction of the camera in the world coordinate system are used to solve three-dimensional space coordinates of facial corner points, and the world coordinate system of each corner point can be obtained through a series of coordinate transformations. The human eye can basically assume that the coordinates of the eyeball centers with respect to the human face coordinate system are constant during rotation, and therefore, the coordinates of the eyeball centers can be determined according to the coordinates of canthi and mouth corners. According to this, the eyeball coordinate system is established through the obtained corner point coordinates.

[17] Driving line-of-sight direction calculation is the process of solving the driver's line-of-sight direction and keeping continuous tracking. A line-of-sight path of the human eye is a direction of the connecting line between a fovea in a middle of the retina and a middle of lens. Specifically, eyeball region recognition is performed on the face detection image extracted from the face composite image, and an eyeball region image is intercepted. Then a threshold analysis is performed on the eyeball region image to obtain a pupil threshold image and a pelchin spot threshold image, respectively. The pupil and the pelchin spot are identified and the coordinates of a pupil center and a pelchin spot center are calculated, and a mapping function of pupil-pelchin spot position relationship is established. The calculation of the driving line-of-sight direction further includes: using head tracking to compensate image data, including: obtaining the spatial position of the head feature points with respect to the camera coordinate system through image recognition, establishing a driver's head coordinate system, caching and recording three-axis translational data of a head pitch angle, a yaw angle, a roll angle and the three- dimensional coordinate system, performing data fusion based on a environment model, using the head movement data to compensate line-of-sight tracking data, and finally calculating and outputting three-dimensional special directional line of sight.

[18] Driving line-of-sight landing point calculation is mainly used to complete driver attention extraction and tracking, and establish the mapping relationship between the driver's three-dimensional line of sight and the pixel points on the imaging of the environment sensing camera. The three-dimensional line of sight of the driver is based on the eyeball coordinate system, and the pixel positions on the imaging of the environment sensing camera are based on the imaging coordinate system, so establishing the mapping relationship between the two involves coordinate matrix transformation. The position of the eyeball coordinate system relative to the imaging coordinate system of the binocular infrared CCD camera is determined, and the coordinates of the imaging coordinate system of the binocular infrared CCD camera relative to the vehicle body are determined. The position of the imaging coordinate system of the environment sensing camera relative to the imaging coordinate system of the environment sensing camera is relatively determined, and the position of the environment sensing camera relative to the vehicle body is determined. Therefore, by placing the driver's three-dimensional line of sight and the imaging pixel information of the environment sensing camera in the same world coordinate through coordinate matrix transformation, the mapping relationship between the three-dimensional line of sight and the imaging pixel information of the environment sensing camera can be established. Then the intersection point between the three-dimensional line of sight and the environmental sensing camera can be used to solve the visual landing point of the driver. The visual landing point of the driver is continuously tracked and saved to the driver's gaze point cache database. It can be understood that if it is detected that there is an intersection between the three- dimensional line of sight of the driver and the imaging of the environment sensing camera, i.e. the solution of the visual landing point is successful, the next step of data fusion is continued, otherwise, a loop solution of the gaze point is continued.

[19] Information fusion is effective information extraction and combination based on registration of the driver's gaze point and imaging pixels. Based on the driver's gaze point cache database, eye movement analysis is performed on the gaze target frequency and gaze duration, and the driver's eye movement status (gaze, jump, follow-up) at this moment is marked. At the same time, the pixel region of the intersection point between the visual landing point of the driver and the imaging is marked. Based on the above- mentioned marked information, the topological environment sensing image database containing the driver's visual information is constructed.

[20] An attention neural network module is trained according to the topological environment sensing image data, and the neural network module is used to automatically adjust the distribution weight of some part of the traditional image processing neural network, so as to self-adapt the image pixel traversal fineness and region, quickly locate the region of interest of the image under the pixel marking, and reduce the algorithm pixel traversal time.

With regard to a sensed image acquired by the environment sensing camera, pixel features are also performed with auxiliary classification via a preset driver's eye movement feature database.

By acquiring a current driver eye movement feature, a matched data eye movement feature is searched in the preset driver eye movement feature database, and an image pixel feature classification of the current sensed image is determined, so that a specific image processing method is selected for processing, which can improve the accuracy of environment sensing in a severe environment (light, rain, night, etc.). A system of an embodiment of the present invention, with reference to FIG.3, includes: a face image acquisition module 100 for photographing an image of a driver's head through binocular infrared CCDs arranged at different positions in a vehicle and acquiring a composite image of the driver's face; a corner point positioning module 200 for extracting facial features from the composite image of the driver's face, acquiring feature corner points, and establishing an eyeball coordinate system according to positioning of the feature corner points; a gaze point acquisition module 300 for acquiring the driver's three-dimensional line of sight based on the eyeball coordinate system, placing the driver's three-dimensional line of sight and imaging pixel information of an environment sensing camera under a same world coordinate by means of coordinate matrix transformation, establishing a mapping relationship between the dnver's three-dimensional line of sight and pixel points on an imaging of the environment sensing camera, obtaining the driver's visual landing point, and saving the driver's visual landing point to a driver's gaze point cache database; a topology sensing and marking module 400 for performing eye movement analysis on a gaze target frequency and a gaze duration based on the driver's gaze point cache database, obtaining an eye movement state and marking the eye movement state, marking a pixel region of an intersection point between the driver's visual landing point and the imaging, and constructing a topological environment sensing image database; a sensing fusion module 500 for, adjusting, according to the topological environment sensing image database, a distribution weight of an image processing neural network in an automatic driving process, and self-adaptively adjusting image pixel traversal fineness and region; a head movement compensation module 600 for acquiring a spatial position of a driver's head feature points relative to the camera coordinate system via image recognition, establishing a driver's head coordinate system, recording head pitch angle, yaw angle, roll angle and three-axis translation data to obtain head movement data, and performing data fusion based on an environment model, compensating line-of-sight tracking data via the head movement data, and calculating and outputting a three-dimensional line of sight of the driver, and a sensed image classification processing module 700 for acquiring a current driver's eye movement feature, comparing the current driver's eye movement feature with a data eye movement feature of a preset driver's eye movement feature database, acquiring a pixel feature classification of the sensed image acquired by the environment sensing camera, and processing the sensed image according to pixel feature classification.

Claims

-10- Conclusions l. Man-machine cooperative sensing method for automatic steering, characterized by comprising the steps of: S100, photographing an image of a driver's head through binocular infrared CCDs mounted at various positions in a vehicle, acquiring a composite image of the driver's face, extracting facial features, acquiring feature vertices, and establishing an eyeball coordinate system through the positioning of the feature vertices; S200, acquiring the driver's three-dimensional line of sight based on the eyeball coordinate system, placing the driver's three-dimensional line of sight and imaging pixel information of an environmental sensing camera under a same world coordinate by using coordinate matrix transformation, establishing a mapping relationship between the three-dimensional driver's line of sight and pixel points on an imaging from the environmental sensing camera, obtaining the driver's visual landing point, and storing the driver's visual landing point to a driver's gaze point cache database; S300, performing eye movement analysis on a gaze target frequency and gaze duration based on the driver's gaze cache database, obtaining an eye movement state and marking the eye movement state, marking a pixel area of an intersection between the driver's visual landing point and the imaging, and constructing a topological environment observation image database; and S400, adjusting, according to the topological environment perception image database, a distribution weight of an image processing neural network in an automatic control process, and self-adaptive adjusting image pixel traversal fineness and area.

2. A human-machine cooperative sensing method for automatic control, characterized by comprising: a face image acquisition module for photographing an image of

“11 - a driver's head via binocular infrared CCDs mounted at various positions in a vehicle and acquiring a composite image of the driver's face; a vertex positioning module for extracting facial features from the composite image of the driver's face, acquiring feature vertices, and establishing an eyeball coordinate system according to positioning of the feature vertices; a gaze point acquisition module for acquiring a three-dimensional line of sight of the driver based on the eyeball coordinate system, placing the three-dimensional line of sight and imaging pixel information of an environmental sensing camera under a same world coordinate by means of coordinate matrix transformation, establishing a mapping relationship between the three-dimensional line of sight of the driver and pixel points on an imaging from the environmental sensing camera, obtaining the driver's visual landing point, and storing the driver's visual landing point to a driver's gaze point cache database; a topology sensing and marking module for performing eye movement analysis at a target gaze rate and gaze duration based on the driver's gaze cache database, obtaining an eye movement state and marking the eye movement state, marking a pixel region of an intersection between the visual landing point of the driver and imaging, and constructing a topological environment observation image database; and a sensing fusion module for adjusting, according to the topological environment sensing image database, a distribution weight of an image processing neural network in an automatic control process, and self-adaptive adjusting image pixel traversal fineness and area.