CN115158325A

CN115158325A - Method and vehicle for determining a gaze area of a person

Info

Publication number: CN115158325A
Application number: CN202210269005.9A
Authority: CN
Inventors: H-J·比格
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2021-03-19
Filing date: 2022-03-18
Publication date: 2022-10-11
Also published as: DE102021202704A1

Abstract

The invention relates to a method for determining a region of gaze (150) of a person (140) in an environment in which a plurality of different regions (110-118) are each assigned a camera (120-128), wherein a video signal (132) is obtained from each of the plurality of cameras (120-128), wherein a probability (W) is determined by means of the video signal (132) of each of the cameras (120-128) that the region of gaze (150) of the person (140) comprises the region assigned to the respective camera (120-128) ₁ ，W ₂ ) And wherein the gaze region (150) of the person (140) is determined as the highest probability (W) of the cameras (120-128) in the plurality of different regions (110-118) ₂ ) That area (1) allocated10‑118)。

Description

Method and vehicle for determining a region of gaze of a person

Technical Field

The present invention relates to a method for determining a gaze area of a person, in particular of a driver in a vehicle, a computing unit and a computer program for the execution thereof, and a vehicle having such a computing unit.

Background

In modern vehicles, for example, the direction of the driver's gaze can be determined in order to determine his attention to traffic situations, for example.

Disclosure of Invention

According to the invention, a method for determining a region of gaze of a person, a computing unit and a computer program for carrying out the method, and a vehicle are proposed with the features of the independent patent claims. Advantageous embodiments are the subject matter of the dependent claims and the following description.

The present invention relates to determining a gaze area of a person in an environment, for example a gaze area of a driver in a vehicle. In a camera-based driver attention estimation system, the primary source of estimation may be one or more camera-based gaze direction estimates (i.e., an estimate of the driver's gaze direction). The camera data can be interpreted with the aid of model assumptions, so that the gaze direction vector and the origin of the eyes can be estimated relative to the vehicle interior. On the basis of this, it can be estimated whether the driver's line of sight is directed to a specific location or a specific region, in particular a so-called "region of interest" (ROI), i.e. a region of interest, at a specific point in time and a corresponding attention ranking can be carried out.

The attention of the driver can be understood here as a variable which describes the level of the driver's full concentration of the current traffic situation. The value of this variable can be determined, in particular, using a measuring device present in the vehicle, which detects, in particular, the driver's interaction with the vehicle. In this sense, the driver's attention is a representation of variables that describe the driver's interaction with the vehicle.

By typical positioning of the line-of-sight detection camera in the vehicle, for example in the vicinity of the dashboard, viewing other positions or ROIs (for example viewing the right side exterior rear view mirror) is difficult to estimate or not recognizable at all.

In this context, it is proposed to assign cameras to a plurality of different (local) areas or points in the environment, respectively, wherein a video signal is obtained from each of the plurality of cameras. Based on the video signal (which represents the image detected by the cameras, ideally also with a person in the image), the probability of the respective camera in the gazing area of the person, in particular in the center or the best field of view, is then determined for each camera. In particular, a (statistical) gaze model may be used here, which will be explained in more detail later. And then determining the gazing area of the person as the area allocated by the camera with the highest probability.

Preferably, however, the specific gaze area is output as information, for example to a further processing unit, such as a driver assistance system in the vehicle, only if the highest probability deviates from the second highest probability by more than a predefined threshold value (which may be given absolutely or relatively, for example). Otherwise the estimate may be considered to be not good enough. I.e. whereby also the estimated quality can be quantified.

Instead of abstracting the gaze region absolutely in the form of a gaze direction vector and thus estimating the possible gaze contact with a region or ROI, the attention to one or more local regions is estimated directly on the basis of the camera images assigned to this region. Thus, the estimation can be more accurate and reliable. It may also be estimated using a higher quality camera (e.g., lower resolution) or a simpler, less powerful algorithm. Such a system may therefore be more cost effective than a system with a central driver viewing camera, even if more cameras are required for this purpose. The cameras are preferably assigned to the areas such that the cameras record images from the direction of the area in the direction of the person.

There are two preferred possibilities to determine the probability that the camera is in the gazing zone. One possibility is to use a statistical model. This involves identifying a face in the camera image and then identifying the eye region. The eye region is to be understood here to mean, in particular, a rectangular region surrounding one or both eyes. The recognition in the camera image is performed, for example, as a two-dimensional rectangular area in the camera image coordinates. A more recent example of a particular possibility can be found, for example, in "Viola, P. & Jones, M. Rapid object detection using a booted cassette of simple defects of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001, 1, 511-518". For example, the eye regions may be identified by estimating the pose of the face using a zone boundary marker model, such as described in "Kazemi, V. & Sullivan, J. One millilocally plane orientation with an ensemble of regression trees 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, 1867-1874", and then determined, for example, by means of the eye zone boundary markers.

If the eye area cannot be determined or identified by these steps, i.e. if the identification of the eye area fails, it is assumed that the relevant camera is not watched, i.e. the probability of the relevant camera is assumed to be zero.

If the eye area can be determined by these steps, i.e. if the eye area is successfully recognized, further steps can be performed. The probability of the relevant camera is then determined, for example, by means of statistical classification methods. This includes, inter alia, creating feature vectors based on the pixel information in the eye region and inputting the feature vectors into a statistical classification method for assessing the gaze. This includes, inter alia, the determination of a confidence coefficient, for example based on the following features: successful identification of eye regions; the position and size of the eye region in the camera image; the position and size of the eye region in the camera image compared to the temporally previous estimate; the estimated probability of the corresponding statistical (classification) method.

One possibility of statistical classification methods is to calculate feature vectors based on the image content, which are interpreted by means of Decision Trees, random forests or other statistical learning methods. For example, such a feature vector may consist of various statistics formed by the image content of an eye region or a transformation of the image content of the region (e.g., a transformation for edge recognition). An example of such a statistic is, for example, the average luminance of the portion. Alternatively, the values of the histogram of the luminance values in the portion or the luminance values of individual or all pixels within the eye region may be directly included in the feature vector. The last possibility assumes that the size of the eye region is adjusted to a uniform size. Such learning methods can be trained accordingly before use, for example using corresponding known images from which it is known that the eyes of the person's face displayed on them are looking at the camera.

In particular, another possibility is to determine the probability using an artificial neural network. Thereby, the gazing can be directly classified through the camera image. For this purpose, for example, so-called "convolutional neural networks" are used, into which the respective camera images can be directly input, i.e. which obtain the camera images or the video signals of the camera and output the probability of a person looking in the direction of the camera. Such neural networks can be trained accordingly before use, for example using corresponding known images from which it is known that the eyes of the person's face displayed on them are looking at the camera.

In particular, dashboards, rear-view mirrors, exterior rear-view mirrors (left or right), infotainment displays, center consoles and gantries are considered as possible regions or ROIs. In particular, these regions each relate to a region of the relevant component which is visible to the driver. An area may also be just one location, as desired. It is easy to understand that more than two areas can be selected and equipped with corresponding cameras. It will also be readily appreciated that these regions are merely exemplary, although they are regions that are particularly important or relevant for determining the attention of the driver. In this case, the cameras themselves should be arranged in a predefined environment (that is to say in the vicinity of the area) in or around the area to which the respective camera is assigned. In the case of an arrangement in a mirror, it should be noted that the mirror is partially transparent in the relevant area (camera lens).

The described camera or miniature camera using multiple region or ROI correlation can also be combined with the classical system with a central, high quality driver viewing camera. This can lead to advantages, in particular when viewing ROIs that can only be recognized incorrectly using such a central camera alone or that can only be distinguished with difficulty due to small differences in viewing angle (e.g. rear view mirror, side view mirror, combination/street view).

Modern vehicles often have various driver assistance systems which automate the vehicle or its operation to some extent. This includes, for example, so-called lane departure warning systems and distance and/or speed control systems. However, if the vehicle is operated in a semi-automatic manner, it is often desirable for safety reasons that the vehicle driver is responsible for and monitors the operation of the vehicle, i.e. has close attention to the surrounding traffic and remains able to react to unforeseen events at any time. However, there is a risk here that the driver relies too much on these driver assistance systems and then reacts too late or incorrectly to unforeseen events. In any event, the problem may be that the driver is not gasped by a complex driving situation.

The proposed approach for determining the gaze area and, in relation thereto, the attention of the driver can be used here. This allows, for example, checking whether the driver is looking at the side or rear view mirrors when turning, or whether he is looking at the road at high speed, even if an active cruise control system is present, if necessary.

The method steps described may be performed, for example, by means of a suitable computing system, i.e. the computing system processes the video signal and performs calculations for the estimation. The computing unit according to the invention, for example a control device of a motor vehicle, is therefore provided, in particular in terms of programming technology, to carry out the method according to the invention.

The invention also relates to a system, for example a vehicle, having a plurality of cameras, each assigned to one of a plurality of different positions, and a calculation unit according to the invention.

It is easy to understand that the proposed method can also be applied outside the vehicle, for example in a monitoring or control station where it is necessary to keep various monitors or devices in view.

The implementation of the method according to the invention in the form of a computer program or a computer program product with program code for executing all method steps is also advantageous, since this results in particularly low costs, in particular when the execution control device is also used for other tasks and is therefore always present. Suitable data carriers for supplying the computer program are in particular magnetic, optical and electronic memories, such as hard disks, flash memories, EEPROMs, DVDs etc. The program may also be downloaded via a computer network (internet, intranet, etc.).

Other advantages and design aspects of the invention will be apparent from the description and drawings.

The invention is schematically illustrated by means of embodiments in the drawings and described below with reference to the drawings.

Drawings

Fig. 1 schematically shows a vehicle in which the method according to the invention can be implemented.

Fig. 2 schematically shows the flow of the method according to the invention in a preferred embodiment.

Detailed Description

In fig. 1 a vehicle 100 is schematically shown as an environment in which the method according to the invention may be performed. For example, the vehicle is shown from the direction of a driver 140, which is a person in the environment, and includes, as components, a dashboard 110, an infotainment display 112, a rear view mirror 114, a left exterior rear view mirror 116, and a right exterior rear view mirror 118. The driver 140 views the rear-view mirror 114 in the illustrated case, the corresponding viewing area being designated by 150. At the same time, these components 110 to 118 form an area in which the line of sight of the driver 140 can be directed in general. In particular, the central or best view of the driver may be determined as the gaze area.

Furthermore, one camera is assigned to each of these regions, and in particular, cameras are also arranged in or around the relevant region. In the present case, a camera is assigned to an area, so that the camera records an image from the direction of the area in the direction of the person. A camera 120 is assigned to the dashboard 110, a camera 122 (disposed slightly above) is assigned to the infotainment display 112, a camera 124 is assigned to the rear view mirror 114, a camera 126 is assigned to the left exterior rear view mirror 116, and a camera 128 is assigned to the right exterior rear view mirror 118. As already mentioned, when arranging the camera in the mirror, it should be noted that the mirror is partially transparent. However, it is also conceivable to arrange the camera on the edge of the mirror, for example in or on the housing or housing edge of the rear view mirror.

A computing unit 130 is also provided, which may be, for example, a control device or other computing system in the vehicle 100. This calculation unit 130 is connected to each of the cameras 120 to 128 and is thus able to obtain and process the video signals of the cameras, in particular in real time. Such a video signal 132 is illustratively described with respect to the camera 128. It will be readily appreciated that a corresponding power supply should also be provided for the camera head.

Fig. 2 schematically shows the flow of the method according to the invention in a preferred embodiment, which can be carried out, for example, in vehicle 100 from fig. 1. The video signals of the camera 126 for the area 116 (left outer rear view mirror) and of the camera 124 for the area 114 (rear view mirror) should be taken into account here by way of example. The video signal of camera 126 provides a camera image 216 and the video signal of camera 124 provides a camera image 214.

The two

camera images

216 and 214 are now analyzed, for example, in the calculation unit 130 according to fig. 1 as follows: i.e. the probability with which the driver's line of sight is directed towards the relevant camera. This is to be explained in more detail by way of example using the camera image 214 and one of the previously mentioned possibilities (statistical model).

For this purpose, a human face is first recognized in the camera image 214, or whether a human face exists is checked. Illustratively, the face is labeled 224. In addition, an eye region, here indicated at 234, is then identified or attempted to be identified.

If the eye area (i.e. the rectangle containing no one or two eyes) cannot be determined or identified with these steps, i.e. if the identification of the eye area fails, then the probability of not looking at the relevant camera, i.e. the relevant camera, here the probability W, is assumed ₂ -is assumed to be zero. However, in the example shown, the eye region should be identified, in particular because both eyes can be seen there.

Then the probability W of the camera is determined by using a statistical classification method M ₂ . This includes, for example, creating a feature vector based on the pixel information in the eye region and inputting the feature vector into a statistical classification method for assessing gaze, as already explained in more detail above. Ultimately, this results in a probability that the driver's gaze area includes the camera 124, e.g., W ₂ =80%。

This analysis is also performed on and confirmed for the camera image 216 of the camera 116Probability of phased association W ₁ . Although the face 226 is recognized there, and the region 236 is also an eye region, even though only one eye is visible there. However, the line of sight is not directed at the camera, and eventually only the eyes are captured from the side. This is because the driver does not observe the left outer rear view mirror and hence the associated camera. For example, this may be determined as W ₁ =20%. However, if there are no eyes in the image at all, the region 236 may not be recognized as an eye region, which then results in a probability of zero.

Subsequently, in step 240, the probabilities of all camera images-e.g., exemplarily only W-are compared ₁ And W ₂ -comparing each other. Basically, the region to which the camera is assigned with the highest probability is determined as the region of gaze (the region to which the line of sight is directed). In this example, this would be area 114, i.e. the rear view mirror.

However, the quality of the gaze region determination may additionally be checked. Only when the highest probability deviates from the next highest probability by more than a threshold aw of, for example, 10%, i.e. when W ₂ >W ₁ The gaze area determined in this way is assumed to also correspond to the actual gaze area, when + aw applies. Information 250 is then output that describes what the current gaze area is. This information 250 can then be processed, for example, in a further control device or in the context of a driver assistance function.

Claims

1. A method for determining a region of gaze (150) of a person (140) in an environment in which a plurality of different regions (110-118) are respectively assigned cameras (120-128),

wherein a video signal (132) is obtained from each of a plurality of cameras (120-128),

wherein a probability (W) is determined by means of the video signal (132) of each of the cameras (120-128) that the region of gaze (150) of the person (140) comprises a region assigned to the respective camera (120-128) ₁ ，W ₂ ) And is and

wherein the gaze area (150) of the person (140) is determined as cameras in a plurality of different areas (110-118)(120-128) with highest probability (W) ₂ ) That region (110-118) is assigned.

2. The method of claim 1, wherein only when there is a highest probability (W) ₂ ) With the second highest probability (W) ₁ ) Only if the deviation exceeds a predetermined threshold value (Δ W) is the specific gaze area (150) output as information (250).

3. The method according to claim 1 or 2, wherein for determining the probability (W) of each camera (120-128), a probability (W) is determined for each camera (120-128) ₁ ，W ₂ ) A face (224) in the camera images (210, 214) is identified and then an eye region is identified (234).

4. The method according to claim 3, wherein if the recognition of the eye region (234) fails, the probability of the associated camera is assumed to be zero.

5. Method according to claim 3 or 4, wherein, if the eye region (234) is successfully recognized, the probability (W) of the relevant camera (114) is determined by means of a statistical classification method (M) ₂ )。

6. The method of any preceding claim, wherein the probability is determined using an artificial neural network.

7. The method according to any of the preceding claims, wherein the plurality of cameras (120-128) are arranged in a predetermined environment in the area (110-118) or around the area to which the respective camera (120-128) is assigned, respectively.

8. The method according to any of the preceding claims, wherein the environment is a vehicle (100) and the person (140) is a driver of the vehicle.

9. The method of claim 8, wherein the plurality of zones are selected from a dashboard (110), a rear view mirror (114), an exterior rear view mirror (116, 118), an infotainment display (112), a center stack, and a door frame.

10. A computing unit (130) arranged to perform all method steps of the method according to one of the preceding claims.

11. System (100) with a plurality of cameras (120-128), each assigned to one of a plurality of different areas (110-118), and a calculation unit (130) according to claim 10.

12. A computer program which, when executed on a computing unit (130), causes the computing unit (130) to carry out all the method steps of the method according to one of claims 1 to 9.

13. A machine readable storage medium having stored thereon a computer program according to claim 12.