WO2002025576A1

WO2002025576A1 - System for detecting a line of vision using image data

Info

Publication number: WO2002025576A1
Application number: PCT/EP2001/010820
Authority: WO
Inventors: Matthias Franz; Martin Fritzsche; Matthias Oberländer; Tilo Schwarz; Bernd Woltermann
Original assignee: Daimlerchrysler Ag
Priority date: 2000-09-20
Filing date: 2001-09-19
Publication date: 2002-03-28
Also published as: DE10046859A1; AU2001295572A1; DE10046859B4

Abstract

The invention relates to a novel system for identifying a line of vision of a person under observation using image data. Said system comprises a device for locating the eyes and a unit connected downstream for determining the line of vision of the person under observation. The system is characterised by an inventive device for locating the eyes, which contains a unit that adapts to radii, to which an inventive circle-detection device is connected. A classifier, which evaluates the results of the circle-detection device and determines the position of the eyes within the image data is positioned downstream of the circle-detection device. The invention also relates to a novel device for determining a line of vision, which contains units for segmenting the image data corresponding to the eyes and the nose, a common classifier being connected downstream of said units.

Description

description

System for detecting the direction of view from image data

The invention relates to a system for detecting the direction of view of an observed person from image data according to the preamble of claim 1, and devices and methods suitable for this system for operating these devices according to the preambles of claims 2, 21 and 24, and 3, 22 and 25.

Eye detection plays a major role in a system for detecting the direction of view. Most of the information about the direction of view is contained in the eyes. From the position of the eyes alone, it is possible to give an approximate indication of the direction of the gaze. However, the head must not be moved. In order to detect the line of sight even when the head is allowed to move, additional facial features must be consulted. This includes the nose, the mouth and possibly even the eyebrows. This also makes it possible to determine the person's head position. A method for detecting the eyes is based on the so-called difference method with two light sources (Morimoto [20], Morimoto et al. [17], Ebiswana [8]). With this method, the person is illuminated with two light sources, usually LEDs (Light Emitting Diodes). A light source is positioned in the axis of the pupil and camera so that the light is reflected directly from the retina of the eye. The second light source is arranged on the side so that no reflections from the retina reach the camera. Now two pictures are taken, each with an active light source. These two images are identical except for the reflection of the first light source on the retina. By forming a difference image, only the reflection on the result image remains. It is now easy to extract this reflection from the result image using threshold value methods and thus to determine the position of the eyes. However, this procedure cannot be used for head movements. Since the difference image is formed from two images taken at intervals, the pupil reflex and the dark pupil no longer coincide. Furthermore, the movements of further structures in the image result which, with the help of the threshold value formation, can no longer be distinguished from the pupil reflex. These methods are usually used when the person to be observed does not have to move his head. These include face recognition tasks for identifying people, such as those used in ATMs, as well as for operating computers with the eyes.

Other methods known from the literature are those of template matching (Xie et al. [31], Chow et al. [7]). For this purpose, a geometric model of the eyes is created, which is adaptively adjusted on the image. The disadvantage of this method is that the template has to be adapted adaptively. Furthermore, templates tend to detect the eyebrows instead of the eyes. This is the case if the starting position of the templates has not been carefully selected (Xie et al. [31], Chow et al. [7]). The most well-known methods for determining the eye positions are threshold and edge-oriented methods. These methods are used in many scientific publications to initialize the starting position of templates (Xie et al. [31], Chow et al. [7]). Other methods of initialization are methods for recognizing the face. Here, with the help of statistical or geometric methods attempt to extract a person's face from images (Edwards et al. [9], Stiefelhagen et al. [26], Chow et al. [7], Zobel et al. [33]). The area in which to look for eyes is already very limited. All of these methods are common because they are relatively slow (Tian et al. [28], Lam, Yan [16]). Template matching alone is a complex process. In addition, there is the determination of suitable starting positions for the templates, which are determined using the above methods. However, these methods are also slow in combination with template matching, since two methods are used sequentially. In order to minimize the effort involved in eye detection, it is necessary to subdivide the overall image into one or two smaller search areas. Two boxes are used for this, each positioned near an eye. Using these so-called search boxes results in speed advantages, on the one hand, since it is no longer necessary to examine the entire image and, on the other hand, the error rate is reduced because objects similar to the iris, such as buttons, patterns on shirts, etc., are no longer found outside the boxes become. This simplifies the actual decision as to whether the circles found are the iris of an eye.

After the position of the eyes is known, the direction of view must be derived. The eyes contain most of the information about the viewing direction.

However, this information alone is not sufficient if the head can also be moved during the line of sight detection. Many applications assume that the head is not moved, which is often sufficient. These include automated teller machines, in which the identity is checked via the face or retina, and a computer mouse controlled by the eyes. If the head movement is not taken into account, the information contained in the eyes is sufficient to recognize the direction of the gaze. In Baluja, Pomerleau [3] and Xu et al. [30] is e.g. the direction of view detection is determined with the eyes alone with the help of a neural network. Another method for direction of view detection uses geometric methods

(Arrington [1]). For this purpose, the eye is additionally actively illuminated so that a light reflex can be seen on the eye. The light source is installed directly in front of the person. If the position of the pupil and the position of the light reflex are known, the viewing direction can be recognized from this. The position of the light reflex corresponds to that Straight view of the person. With the help of the vector between reflex and pupil, it is now possible to calculate the direction of view. However, this procedure is necessary because high-resolution images of the eye region are available.

In order to be able to correctly determine the line of sight even when the head is moving, it is necessary to provide further information. This includes other facial features, such as nose and mouth. In Gee, Cipolla [6] a method is described which detects the viewing direction from the positions of the eyes and the mouth. To do this, the mouth is modeled by a line. With the help of this line and the connecting line of both eyes, it is now possible to detect the direction of view. However, the mouth must be found for this, which is not easy due to the many possible conditions of the mouth. When detecting the direction of view in a motor vehicle, however, both eyes may not be visible under certain circumstances. In extreme twisting of the head, eg with a view ^'over the shoulder, only one eye is visible and therefore only the position of an eye there.

The object of the invention is to find a novel system, particularly suitable for use in a motor vehicle, for detecting the direction of view of an observed person from image data. Furthermore, suitable devices and methods for operating these devices are to be found for incorporation into this system.

The object is achieved by the features of the system described in claim 1. The device suitable for this system and the methods suitable for its operation are set out in the features of claims 2, 21 and 24, and 3, 22 and 25. Advantageous refinements and developments of the invention are described by the features of the subordinate claims.

In an inventive manner, the system for detecting the line of sight can be divided into two areas:

- The first area contains a device for detecting the eyes. - The second area contains, downstream of the first area, a device for determining the viewing direction. The process chain within the device for detecting the eyes is essentially divided into three stages:

- The first stage is formed by a unit for radius adjustment. As part of this radius adjustment, the area used to search for the circles is limited to r _m j _n and r _max , ie no circles with smaller or larger radii are detected; the radius of the iris is limited to this area in the pictures. The lower limit r _mm - is preferably set to such a small value (eg: ^r min ^{= 3} P'xeö »that detection of the iris is still possible with eyes half closed. If these limits r _m , - _n and r _mgx for If the radii are not used, many faulty circles are detected in addition to the iris of the eyes. If, for example, the upper limit r _{max is} chosen too high, many circle-like structures that do not belong to an iris are detected. which extend between the nose and eyebrows. These shadows have a circle-like structure with a large radius. Furthermore, bends of glasses frames are detected much more easily if large radii are allowed. In addition, if the upper radius limit is too large, the circular algorithm is often used the iris is described by a circle that is too large. Then the eyebrows and the lower lid are regarded as the boundaries of the circle because there is a very high contrast If, on the other hand, the radius below the radius is selected too small, many small circles are detected. In addition count small dark objects, mostly only from four to 10

Pixels exist. With these small radii it is no longer possible to distinguish between circles and other objects in terms of shape, so many small circles are detected. This can be largely suppressed by restricting the circular search to a small area. Furthermore, this limitation also reduces the execution speed of the algorithm, since far fewer circular radii now have to be checked.

It would be ideal, however, to search for the desired circle in an even smaller interval of approx. 5 to 6 radii. This means, however, that the radius of the iris must be known in advance. For these reasons, the method according to the invention preferably uses an adaptive radius adaptation which covers the range of

Radii further restricted and newly adapted to every image. For this purpose, the upper and lower bounds of the radii r _m j _n and r _{mgx are reset} for each image. The radius r of the circle that circumscribes the iris is used as a reference value. On this basis, the new values of r _m j _n and ^r max ^aιJ f agile image pixels (image resolutions) less or more than r specified ( _e.g.: r _mjn = r-2 and r _max = r + 3).

The new areas for the radii r _m ; _n and r _max may not fall below or exceed certain absolute limits (eg: r _m , - _n > = 3 and r _max <= 12 pixels). If a limit is exceeded during operation of the system due to r _m / _n and r _max , r _m / _n and r _{max must} be corrected / limited so that they come within these maximum limits.

The new choice of the radius range r _m ; _n and r _max are preferably always applied to the following image. However, since the size of the iris does not change too much from picture to picture, the error resulting from this procedure is negligible. If no eye is detected in an image, it is advantageous to leave the values r _m , - _n and r _max unchanged and thus to use them when processing the subsequent image.

- The second stage of the process chain within the eye detection device includes a circle detection device. For example, it is conceivable for this

Hough transform. After creating the intersection or

Accumulations of intersections in the accumulator field, the center points and radii of the circles must be extracted from them. A threshold value method which isolates the accumulations in the accumulator field is preferably used to determine the center of the circle. A suitable one for the line of sight detection system

The method determines the threshold value from a histogram. Thereby the

Brightness distribution in the accumulator field calculates a histogram, which the

Distribution of brightness contains. For example, the histogram is marked with histo [i]

0 <= i <= 255. It can be viewed as a vector of length 256. The histogram contains 256 entries from 0 to 255, since the image format used as an example has 256 gray levels.

The N brightest points are then searched for, starting with the points which are to be assigned to the largest brightness level (here: 255). If N points are then selected, the brightness value of the N point then serves as a threshold value for the further process. As a result, all points in the accumulator whose brightness values are below this threshold are suppressed. The other points are left with their associated brightness. Islands are formed in the accumulator field according to the threshold value method, which indicate the area of a possible center point of a circle. Objects must then be extracted from this threshold image. This is done using an algorithm that works on grayscale images; a so-called color connected components algorithm Connected components) called CCC for short. In the objects coded with the help of CCC, the centers of circles are determined on the basis of a calculation of the centers of gravity in each of these objects. The center points of the circles are thus known and the associated radii must subsequently be determined. The previously calculated center of gravity is preferably used to determine the circle radii. Starting from him, circles with the radii ^r = ^r min _> - _> ^r max are calculated and compared with the points from the edge image. All points that lie on the corresponding circle radius r and their normal points at this point, with the exception of a tolerance, are counted towards the center of the circle. The number of hits is then standardized, ie divided by the number of maximum points of this radius. Thus, a size of the circle is generated which reflects the number of hits for this circle with this radius, ie quality = number of the hit / (2πή. Preferably the circle with the best quality is retained, whereas the others are discarded Correcting the Hough transformation that occurs during the discretization of the accumulator field and the discretization of the gradients is preferably examined not only for the center of gravity of the CCC-coded objects according to circles, but also for an environment around them; a 5 × 5 image pixel area surrounding area proves to be advantageous The best of the circles determined from this environment is again kept, and this circle then determines the actual center.

In a particularly advantageous manner, it is also conceivable within the scope of the invention to use an edge-oriented method instead of a threshold-based method for eye detection, which method works on the basis of polar edge detection. The advantage of edge-oriented methods over the threshold value method is that they are insensitive to light fluctuations, since differences are considered. The edge detectors are generally based on the Cartesian coordinate system. However, since eye detection is used to search for the iris, which can be easily described by circles, a method that uses this polar property directly is advantageous. In Wilson [29] one such

Procedure described. It is a polar edge detector that is described in Equation 1:

The two-dimensional function to be examined is designated by l (x, y), where x and y represent the Cartesian coordinates of this function. Based on this function l (x, y), a circular path with radius r is traversed at position (XQ, y _υ ) and the intensities l (x, y) are integrated on this circular path and then normalized with the factor 2πr. This process can be compared to averaging along the circular path. The integration along a circular path is derived according to the radius r, which forms the gradient of the different integrations at different radii at the position (XQ, yg). The maximum gradient along the radius r is determined with the aid of the maximum formation. This process is repeated for all points of the function l (x, y), so that a maximum gradient is formed for each position (x, y). This method therefore specifies an evaluation of a circle at the position (x, y) and its best radius r for each point of the function l (x, y). It is irrelevant whether the examined positions are actually circle-like structures. The better the structure at the point (x, y) resembles a circle, the higher the rating.

In order to be able to use the polar edge detection in image processing, it is necessary to discretize equation 1, since the image data f (x, y) are also available discretely according to the method according to the invention and thus correspond to the discretized function l (x, y). For this reason, the invention advantageously uses a novel concept, described below, for discretizing equation 1. The discretized procedure is shown schematically in FIG.

Figure 1 shows the polar edge detector after discretization of the variables.

A circular path is described for each radius, on which the intensities are added.

The core of the polar edge operator is the orbital integral, which describes a circular path with radius r at the position (x0, yO). This integration must be converted into a summation when using image data. For a certain radius r, the value μ _r then results for the orbital integral:

Rounding off the circular function is necessary, since discrete image data is used. For this reason, all positions (x, y) within an image are integers shown. The same applies to the radius r. This summation corresponds to the formation of an average value μ _{r of} the gray value distribution of the image along a circular path with radius r. This averaging can be described more generally with the help of a function srfi) = (s _{r> x} (i), s _r> y (i) fi, which corresponds to a parameterized curve: _{μr =} ΣX XX!} L

The function s i) can be used to describe any path along which an average is calculated.

In an inventive manner, the polar edge detection is now generally generalized to the effect that it can also detect paths of any other shape in addition to the usually circular paths. The basic idea here is the desire to find eyes with the help of the detection of the iris. In most cases, however, the iris is partially covered by the eyelids at the top and bottom. The covering by the eyelids is more pronounced at the top than at the bottom. Due to this concealment, the iris is no longer a perfect circle, but a circle that is cut off by two arcs below and above. For this reason, it is necessary to adapt the paths to the circle search so that this concealment is taken into account.

In Figure 2 different ways are shown, with which the covering of the iris by the eyelids is to be compensated ((a) circle, (b) ellipse,

(c) open circle, (d) supplemented circle, (e) rectangle). The thick sections are the positions that were included in the averaging. The thinly drawn sections represent guidelines for orientation.

Figure 2 describes different ways used in averaging.

Figure a) of Figure 2 shows the circle already mentioned, which does not take into account the masking, but has the fewest additional parameters. The covering of the eyelids is best approximated by the path from Figure d) in Figure 2. The circle is cut open at the top and bottom. The resulting gaps are bridged using two routes. The two gaps are described with the angles a and β, which define the opening angles or the sections of the circular path circumscribing the iris, which are not included in the circle detection. Another advantageous design of the circular path is shown in Figure c). The ellipse from Figure b) in Figure 2 has only one additional parameter compared to the other paths, namely that Relationship of the two main axes of the ellipse. Since the extent of the ellipses is different in both directions, the same points are included several times in the evaluation for different radii. This is due to the discretization of the image, since only integer positions are permitted. However, this does not represent a disadvantage in the evaluation. Figure e) in FIG. 2 is a rectangle. The rectangle is a rough approximation of Figure d) because many circles are relatively small and the arc can be approximated by a straight line. The illustrations shown in FIG. 2 are of course only examples of possible paths on which the polar edge detector is optimized. It is now conceivable to carry out the method according to the invention in such a way that the optimum contour for the current image data is used in accordance with the examples from FIG. 2 ,

Starting from equation 1 of the polar edge operator, it can be seen that it is not the maximum value of a path integral that is sought, but the maximum change in two successive path integrals. This also makes sense since the largest transition from dark to light is searched for. The iris of the eye is generally seen as a black disc in images, whereas the area around the iris is very bright. Exactly this transition from dark to light should be detected, since this provides the best circular path that describes the iris. The derivation according to the radius r from equation 1 can be realized with differences. This is the first

Approximation of the following approach:

V _r = μr - μr-1 GI.4

V _r represents the evaluation for an average with radius r. In order to find the best circle, it is necessary to find the largest V _r . The V _r must therefore be generated for a specific range r _m j _n to r _max and compared with one another.

The first approximation of the derivative according to the radius, as indicated in equation 4, is susceptible to fluctuations between the individual mean values. In real conditions, the iris is not an exact black disk, but has fluctuations in brightness that result, for example, from reflections. These fluctuations have a negative effect on the evaluation of the circle at radius r if they occur on the edge of the circular disk. In order to be able to better compensate for these brightness fluctuations, it is particularly advantageous to form the evaluation over several mean values. This can be represented as follows if n is the number of mean values that should be included in the evaluation: V _r = μ _r - ∑ _μi GI.5

A value of n = 2 has proven to be very reliable for the evaluation calculation. This simplifies Equation 5

V _r = μ _r ~ (μ _r _, + μ _r _ ₂ ) GI.6

Values greater than two are usually not sensible, since then the evaluation generally becomes worse. This is particularly the case when areas of high brightness appear in the iris. In this case, the mean is very large even before the edge of the iris has been considered. The jump in the mean value at the edge of the iris is then no longer so large and this circle receives a poor or weak rating or the wrong radius r is determined. The case n = 2 therefore represents a compromise to compensate for such fluctuations within the iris.

With this approach, the iris can be easily detected by eyes. However, circle-like structures are also detected that do not represent an iris from one eye. Above all, this includes glasses and eyebrows. There are very large contrasts on glasses, which are also detected as circles, although they do not correspond to a circle-like structure. Since the evaluation of the path integrals is carried out by means of mean values, the difference between the mean values is greater in these "feel hitting" than in the iris, which may not have such a large contrast. In order to eliminate these mistakes, it is particularly advantageous to expand the evaluation of the path integrals in an inventive manner. Here one makes use of the knowledge that the decisive difference between these missed hits and correct circles lies in the non-uniformity of the brightness distribution along the circular orbits. For eyebrows e.g. no contrast at all in the vertical direction. But the contrast at the top and bottom of the eyebrows is very pronounced. For this reason, the method according to the invention includes the variance of the brightnesses along the circular path in the evaluation of the path integrals. In addition to the mean from equation 1, the root mean square μ - is also calculated, i.e.

The variance can thus be determined using the direct relationship between the mean and the root mean square. σ ² = μ _r! -μ ² GI.8

The variance of the orbital integral is now included in the evaluation according to equation 9:

V _r = μ _r - ^ (μ _r _, + μ _r _ ₂ ) -c - σ ² GI.9

The variance is not included as a difference in the evaluation like the mean value, but is always offset directly with the respective radius. The problem with variance is to include it in the evaluation with a suitable weighting factor. Here a value of c = 0.001 has proven to be sufficient. This value must not be chosen too small, otherwise the effect of the variance will disappear. If the weighting factor c is chosen too large, even small irregularities on the circular path of the iris are rated too much and these are no longer detected.

The polar edge operator according to the invention described above, according to equations 1-9, can be used particularly advantageously in the system for eye direction detection and / or eye detection. Of course, it is also conceivable to use the polar edge operator profitably in similar systems; for example in systems for identifying people by comparing the structure of the iris (such as in ATMs and access controls).

The polar edge operator returns a rating for a circle at every point (x, y) of the image. From these evaluations, it must be decided which of these groups should be used for further processing. As with the procedures with the Hough transformation, the best circles are selected and transferred to the next processing step. For this purpose, in a conceivable embodiment of the method, the entire search area is evaluated with the aid of the polar edge operator and then sorted, so that the N best circles are in the first position in a list with all evaluated circles. The search area is run through line by line and each point is evaluated. At the end of a line, it is sorted according to the N best circles and the next line is processed. After processing this line, the N best circles from the list again removed. After the last line, the N best circles of the entire search area are then sorted in a list.

Another particularly advantageous alternative for suppressing unwanted neighboring circles is described by the following algorithm. First, the best circle is removed from the list of all circles. The next best circle must now have a certain distance d from the previous circle before it is removed from the list. This is repeated until the N best circles have been extracted. This procedure increases the likelihood that the searched circle will be in the N best circles if it has been rated poorly.

A particularly inventive alternative to suppressing neighboring circles is a method which does not have to evaluate the entire search area. A corresponding method is described in FIG. 3.

Figure 3 shows a spiral path that is used in evaluating the individual positions within the search box.

In the context of this method, instead of evaluating the search area line by line, a spiral path is described. This route is shown in Figure 3. The search begins in the middle of the search area (search box). If the center of the search box is placed close to the eye, this position is reached after just a few steps. If the eye is placed at the edge of the search box, the entire search box must still be examined. But since it can be assumed that the driver looks in the same direction most of the time, placing the box is relatively easy and the eye is usually in the middle of the search box. In order to nevertheless place the search box above the eye in the event of violent head movements, suitable algorithms for tracking the eyes described below must always be used, which always place the search box correctly.

The spiral path that is used to evaluate the circles within the search box results in certain arrangements of the ratings along this path. If an area with high ratings is not in the middle of the search box, but somewhat shifted to it, this rating is adopted with each revolution of the spiral. Since the spiral is approximately circular, this area is repeated with the period of the spiral. From these evaluations, the maxima are to be extracted at which circles are highly likely. With However, maxima are not the peaks of the individual valuation maxima, but the maxima of the envelopes of all valuation maxima over the entire previous circulation.

For this purpose, an algorithm was designed in an inventive manner, which only extracts the interesting maxima from these arrangements of the evaluations. Since the spiral has a circle-like structure, it has a period accordingly. This period can be used to extract the interesting maxima. For the sake of clarity, a search area (search box) with the size of 75 x 45 pixels is to be assumed below. This corresponds to 3375 points. The individual peaks in the arrangements have a distance of approximately 80 to 120 points. In principle, this distance should change with the radius of the spiral. It turns out, however, that the positions with the maxima of the ratings are usually extended over a small area. This means that the period at the beginning of the spiral does not have much of an impact on small radii, so that a constant period can be assumed over the entire range.

FIG. 4 shows the flow diagram of the algorithm for maximum search along a spiral path.

The advantageous algorithm according to the invention runs according to FIG. 4 in the following steps: First, an index is placed on the beginning of the list with all evaluations. This

Index is called index and is initially initialized with index = 0. Furthermore, two variables are defined, which denote the last found maximum value and the maximum value to be found at the moment. Maxpos is the position of the maximum to be searched for and is initially initialized with maxpos = 0. The individual ratings are stored in val [i], with / the position is within this list. Since a constant period, ie a constant distance between the individual peaks can be assumed, it is only necessary to look for a new maximum up to the next peak. This requires a counter, which is called count. The distance of the peaks is recorded in the constant variable dist, which must be passed to the algorithm as a parameter. Now the next maximum is searched in a vicinity of dist points from the current position index. For this purpose, all values are compared with each other and the highest value is saved in maxvalue. If a value higher than maxvalue is found, then again from the position of maxvalue Dist points are examined until no new value larger than maxvalue appears. If this point is reached, maxvalue is compared with lastvalue. If maxvalue is greater than lastvalue, a maximum has been found and it is stored in a list with maxima (disturbance). Regardless of this output, lastvalue is now set to maxvalue and the search is started again from the position of maxvalue plus an offset offset. The offset should prevent values that are close to the maximum from being included again in the evaluations, since the drop after a maximum only returns to a very small value after a few points. However, since main maxima are to be found, it is necessary to skip these values. The offset value must also be passed to the algorithm as a parameter. If the index pointer has reached the end of the list, the algorithm is terminated. By using a counter, which counts the number of maxima found so far, it can also be terminated after a certain number of found maxima. This again fulfills the requirement for the N best circles, only that it is now a different sort order of the circles, from which the first N circles were determined. The parameters dist and offset are to be made available to the algorithm for configuration. To simplify matters, the counting of the maxima found so far in FIG. 4 has been dispensed with. The goal of this search of the maximum is not to have to examine all points of the search box, but only a part. To do this, however, it is necessary to know which circle is the one you are looking for, ie which describes an iris.

A classifier described below is used here, which can make the decision - “eye” or “not eye.” If a maximum is found during the execution of the algorithm, it is passed to the classifier instead of to the list of maxima If the circle found is the iris of one eye, the algorithm is terminated, if it is not an eye, the next circle must be found, which in turn is then verified using the classifier. This process is repeated until all points in the search box have been examined, but it is also conceivable that the classifier does not work in such an interaction with the circle detection, but after detection of all circles is given this as a list and then all circles at the same time

(at the same time or in direct succession) classified.

- The third stage within eye detection is a classifier, which is the

Results of the device for circle detection is evaluated and thus the position of the

Eyes determined within the image data. When configuring the classifier, it is very conceivable not only in terms of a search for "eye present"-> to configure yes / no; but also to classify an open or closed eye. This in turn allows, in an inventive manner, to generate a functionality of a system which, in the event that the length of time during which the eyes of an observed person remain closed last significantly longer than the duration of blinking, triggers and (or necessary) a mechanism suitable for warning of falling asleep Take measures to prevent accidents, which is particularly advantageous when observing vehicle drivers and machine operators.

With regard to the type of classifier, the method for eye detection according to the invention makes no particular claims. In general, a learning classifier is used, which typical patterns to be classified are represented in a training sequence (polynomial classifier, neural networks). According to the application, it is conceivable to design the classification process in such a way that each of the two eyes of the observed person is classified individually, or that a classifier is designed so that it classifies the image data of both eyes together. Accordingly, the method for eye detection should be designed with a common or two individual, eye-specific search areas (search boxes).

The direction of view detection depends crucially on the detection of the eyes within the image data, since these contain a large part of the information about the driver's line of sight. That is why the search for the eyes is an important step. For the detection of the eyes, search boxes are preferably used which are smaller than the overall image and thus simplify and considerably speed up the finding of the eyes. The use of search boxes is discussed in more detail below. Methods according to the invention are described below which compensate for the disadvantages resulting from the use of search boxes.

As mentioned at the beginning, most of the previously published methods for detecting the direction of view cannot be used for driver observation because of their running time. One of the reasons for this is the number of pixels to be processed. The images from the camera have been recorded in the PAL standard. Two fields are always combined into one image, so that there is an effective image resolution of 768 horizontally and 576 vertically. A total of 442,368 pixels must therefore be examined. Will the whole If there is an image used for eye detection, there is another disadvantage: The images contain many objects similar to the iris, which are also described by the eye detection algorithms using circles. As a result, the probability of a wrong decision by the classifier is significantly higher. For this reason, it is advantageous to divide the entire image into one or two smaller search areas (two search areas if each eye is to be individually detected and classified; one search area if both eyes are to be detected and classified together). For this, boxes (search boxes) are used, which are each positioned near an eye. The disadvantage of using search boxes is that they must always be placed near the eyes so that the circle detection algorithms can also detect the iris. If there is no eye within a search box, circles are still found, but these are classified by the classifier as "no eye available".

In the method according to the invention, the placement of the search boxes is advantageously implemented with eye tracking. Preferably, the eye tracking repositions the search boxes after each image, so that the iris can be found within the search box and the eye detection algorithms can find it. The starting point for eye tracking is the last eye position found. With the help of this position, an attempt is now made to correctly position the search box in the next picture. To do this, the eyes must be correctly recognized in the search boxes.

In one possible embodiment of eye tracking, eye tracking is implemented in the method according to the invention by means of a linear prediction of the position of the search boxes. It is assumed that the movement of the head has a constant speed. Accelerations are therefore not taken into account. If the sampling rate of the image sequences is large enough, this assumption has only a small error. The position of the search boxes for the next image n + 1 is calculated from the last two positions of the eyes found from the current n and last n-1 image. The eye positions are denoted by rnj, ' ¹ = {m ^ ^[ _x , m ^ _y ), with i denoting the left or right eye, ie ie {left, right}. The search box position is described with x (i). Since eyes are viewed from several images in this eye tracking method, there are different combinations of states of the eyes. The different states arise when the eyes are viewed as interdependent. If both eyes are viewed independently of one another, the effort is reduced to four states per eye. In addition, the eyes have a certain distance between them. If this is also taken into account, then again all sixteen states must be considered, since the eyes are oriented towards each other and the individual states would also have to be taken into account. The disadvantage of a rigid coupling of both eyes is that sometimes no eyes are found when the head is very much to one side. In this case, one eye is covered by the nose and cannot be detected. If both search boxes are coupled to each other, ie the search boxes are oriented towards each other, then when the head is turned further, one of the search boxes is pushed out of the head area and is still outside the head area after turning back. For this reason, it is conceivable and sensible not to couple the eyes rigidly, but also to look at them separately. The problem with eye tracking is that the head movement is mapped from a three-dimensional space into a two-dimensional display space. The distance between the eyes changes depending on the rotation of the head in the picture.

The linear estimate of the search box position is only valid if the speed of the eyes is constant. In the event of a sudden change of direction, the assumption of constant speed can no longer be met. Further problems arise if no eyes are found. This is particularly the case if closed eyes are included in the image, since they cannot be detected directly.

In order to minimize these problems, it is advantageous to implement corresponding extensions of the linear eye tracking. One possibility is the coupling of the eyes already mentioned. If no eye is found in one of the search boxes, the position of the search box in which an eye was found can be estimated well in the next image using the above procedure. The other search box in which no eye was detected is now placed relative to the search box with the detected eye, corresponding to the old relative position in the previous image. Another conceivable possibility of designing the method according to the invention is to include certain boundary conditions for the positions of the search boxes. This allows the search boxes to be assigned certain areas within which they can be placed relative to each other. For example, it is very unlikely that the driver's eyes will be over one another. A particularly advantageous embodiment of the eye tracking is provided by the utilization of the optical flow, whereby the head movement can be estimated from the image sequences and the eye tracking can therefore be carried out correctly even if no eyes were found. Optical flow is a process for finding the displacement of two similar image structures from two successive images. With this shift it is possible to determine the movement of an image structure from one image to the next. With the help of the optical flow, it is now possible to set up simple eye tracking. The optical flow alone is not accurate enough to place the search boxes. However, it can be used to estimate the next position of the search boxes. For the description of the algorithm it makes sense to define the optical displacement vector h (n, π + 1). It denotes the optical flow calculated from images n and / 7 + 1. The same designations apply as have already been used in the linear estimation of the search box position, ie the search box positions are again designated with X _n '. The designation m ¹ is again used for the position of a detected eye. With this

Both search boxes are also considered independently of one another. This means that there are only two cases for each search box that must be taken into account: - Case 1: An eye was found in search box i.

In this case, the search boxes can be placed directly over the eyes with the positions m '. To take into account movement of the head, the displacement vector of the optical flow is also added to the new coordinates. For the new search box position, the result is X _π '^: X ^ = m' + h (n, n + l) GI.10

It is crucial when placing the search boxes that the optical displacement vector is to be calculated from the current image and the next image. This enables the search box to be set as optimally as possible for the next image - Case 2: No eye was found in search box i.

In this case the next position could not be estimated for the linear one. However, since the movement of the head was determined with the aid of the optical flow, one can still use this method Under certain circumstances, the search box position in picture n + 1 can be estimated with sufficient accuracy. It will only be the optical one

Displacement vector h (n, n + 1) used, ie:

In a profitable manner, when performing the method according to the invention, the search box position is automatically initialized at the beginning by means of an algorithm based on the hyperpermutation network HPN (Mandler, Oberlönder [19]). An HPN tries to separate the redundancy and the information from the diversity of data. Depending on the desired result, there is a lot of redundancy in each picture. For example, in the case of eye detection, only the information “at the location (x, y) is an eye” is of interest. This is only a fraction of the information that the entire image contains. For this purpose, the HPN is constructed similarly to neural networks, only that of them Nodes or neurons have a different structure. With the HPN, these nodes have the same number of inputs and outputs. Furthermore, a line corresponds to exactly one information unit (bit). The inputs are mapped to the output by permutations, this mapping being reversible By interconnecting several nodes, it is now possible to solve the problem of eye detection. After entering an image, the HPN creates a probability distribution of the eyes in the image. There are several levels.

At the highest level, the probability of finding an eye at this point is the highest. In practice, this corresponds to a kind of "probability clouds" around the eye area. If border boxes are determined by the highest level, these can be used as positions for the search boxes. In an alternative embodiment of the method according to the invention, it is conceivable for the Cayman filter to be used for the task Adjust the initialization of the position of the search boxes so that the eye tracking can be made more dynamic, ie the position of the eyes relative to each other and the search box size can be handled adaptively.This procedure requires the development of a complete model of the head movement, which shows the movement of the head in three dimensions

Space and the imaging properties of the camera are taken into account. Such a method works with statistical methods, with which a dynamic adjustment of the search box size is also possible. It is also conceivable to initialize the search boxes using the generally known method of template matching.

The device according to the invention for the detection of eyes in image data described above and the method suitable for its operation can be used particularly advantageously as a core element in the system according to the invention for detecting the direction of view. In addition, however, it is also possible to use this device and the method as general elements in a wide variety of applications in which eyes can be recognized in image data. For example, also in the case of methods for identifying people by recognizing the structure of the eye iris, in which the viewing direction of the person to be identified is already predetermined by the system.

Downstream of the device for detecting the eyes, the actual device for determining the viewing direction can be found within the system for detecting the viewing direction. In contrast to the methods known from the prior art, the method according to the invention which is suitable for operating the device for determining the viewing direction allows the viewing direction of an observed person to be determined even when both eyes are no longer visible. The method even goes so far that a rough estimate of the viewing direction is already possible without eye detection. For this purpose, the method according to the invention makes use of the image information which is supplied by the nose of the person being observed. The position of the nose is advantageously detected by searching for the nostrils using a polar edge detector, corresponding to that described for use in eye detection. The advantage over finding the iris of the eyes is the good contrast between the nostrils and the surrounding area, so that the fact that nostrils are not circular is relatively unimportant. The nostrils are then tracked according to the eye tracking described above (e.g. optical flow or caiman filter). On the basis of the position of the recognized nostrils, a data area is selected from the image of the observed person, which is selected based on empirical values large enough so that the image of the entire nose is contained in the data area. A recognized nostril or the arithmetic mean of the two positions of two nostrils can be used as a guideline value as the center for the data area. It is but also conceivable to choose the data area so that the positions of the nostrils or the nostril come to lie in the lower area.

In order to recognize the viewing direction, both the image of the nose and that of the eyes are classified in a first approach. If there are no images of the eyes, the direction of the gaze can be roughly estimated by aligning the nose. If the viewing direction of the eyes can also be classified correctly, the viewing direction of the observed person can be estimated from the combination of the viewing direction of the eyes and the alignment of the nose in relation to the camera position.

For this purpose, it is conceivable to use a separate classifier for classifying the nose alignment and a separate classifier for classifying the direction of the eyes. In an advantageous manner, however, it is also conceivable to combine the image data of the eyes and nose and to make them available together to a suitably trained classifier for determining the viewing direction. The device according to the invention for determining the viewing direction described above and the method suitable for its operation can be used particularly advantageously as a core element in the system for detecting the viewing direction according to the invention. In addition, however, it is also possible to use this device and the method as general elements in a wide variety of applications in which the viewing direction of observed people is to be recognized from image data.

bibliography

[1] K.F. Arrington, Arrington Research, November 1997, www.arringtonresearch.com/viewPoint.html [3] S.Baluja, D.Pomerieau, Non-Intrusive Gaze Tracking Using Artificial Neural Netwoorks, Technical Report CMU-CS-94-102, Carnegie Mellon University, 1994

[6] A.Gee, R. Cipolla, Non-intrusive Gaze Tracking for Human-Computer Interaction, Proceedings on Mechatronics and Machine Vision in Practice, pp.1 12-1 17, Toowoomba, Australia 1994 [7] G.Chow, X. Li, Towards a System for Automatic Facial Feature Detection, Pattern Recognition Vol. 26, No. 12, pp.1739-1755, 1993 [8] Y. Ebiswana, Unconstrained Pupil Detection Technique using Two Light Sources and the Image Difference Mathod, Visualization and intelligent Design in Engineering and Architecture II / ed. By S.Hemandez. - Southhampton: Computational Mechanical Publications, 1995 [9] GJEdwards, A.Lanitis, CJTaylor, TFCootes, Statistical Models of Face Images- Improving Specificity, Image and Vision Computing 16, 1998, pp.203-21 1

[16] K.-M. Lam, H. Yan, Locating and Extracting the Eye in Human Face Images, Pattern Recognition, Vol. 29, No. 5, pp. 771-779, 1996

[17] C.Motrimoto, D. Koons, A. Amir, M. Flicker, Pupil Detection and Tracking Using Multiple Light Sources, Image and Vision Computing, Vol.18, No.4, March 2000, Eisevier, Netherlands

[19] M. Oberländer, Hyperpermutation Networks - A Discrete Approach to Machine Perception, 3rd Workshop on Weightless Neural Networks, York, 30th March 1999

[20] OH. Morimoto, Real-Time Multiple Face Detection Using Active Illumination, Proceedings of the fourth international Conference on automatic Face and Gesture Recognition, 28-30 March 2000, Grenoble, France

[26] R.Stiefelhagen, J.Yang, A.Waibel, Tracking Eyes and Monitoring Eye Gaze, Proceedings of Perceptual User Interfaces (PUI'97) Banff, Alberta, Canada 1997 (werner.ira.uka.de/ISL.publications .html) [28] Y.-L.Tian, T.Kanade, JFCohn, Dual State Parametric Eye Tracking, Proceedings of the 4th International Conference on Automatix Face and Gesture Recognition, 28-30 March 2000, Grenoble, France

[30] L.-O.Xu, D.Machin, P.Sheppard, A Novel Approach to Real-tine Non-intrusive Gaze Finding, Proceedings of the British Vision Conference, University Southhampton, September 14-17, 1998

[31] X.Xie, R.Sudhakar, H.Zhuang, On Improving Eye Feature Extraction Using Deformable Templates, Pattern Recognition, Vol.27, No.6, pp.791-799, 1994

[33] M.Zobel, A. Gebhard, D. Paulus, J. Denzler, H. Niemann, Robust Facial Feature Localization Coupled Features, Proceedings of the fourth International Conference on Automatic Face and Gesture Recognition, 28-30 March 2000 .

Grenoble, France

Claims

claims

I. System for detecting the direction of view of an observed person from image data, consisting of a device for detecting the eyes and a downstream unit for determining the direction of view of the observed person, characterized in that the device for detecting the eyes contains a unit for adapting the radii to which a device for circle detection is connected, which is followed by a classifier which evaluates the results of the device for circle detection and thus determines the position of the eyes within the image data, and that the device for determining the viewing direction devices for segmenting the image data associated with the eyes and nose contains which is followed by a common classifier.

2. Device for the detection of eyes, characterized in that the device contains a unit for radius adjustment, which is followed by a device for circle detection, which is followed by a classifier, which evaluates the results of the device for circle detection and thus the position of the eyes within the image data determined.

3. Method for operating a device according to claim 2, characterized in that within the scope of the radius adjustment, the area in which the device for circle detection searches for circles is limited to a minimum area r _m / _n and a maximum area r _max .

4. The method according to claim 3, characterized in that the limitation of the area from image to image is adapted adaptively, using as the reference value the radius r of the circle which circumscribes an iris previously detected in the image, and on this basis the new values of r _m ; _n and r _{max are set} to a few image pixels (image resolutions) less or more than r, the values of r _m / _n and r _max not being allowed to fall below or exceed certain absolute limits.

5. The method according to any one of claims 3 to 4, characterized in that no eye was detected in an image, the values of r _m \ _n and r _max are used unchanged for the evaluation of the subsequent image.

6. The method for operating a device according to claim 2, characterized in that the device for circle detection works on the basis of an edge-oriented detection method, in particular the Hough transformation.

7. The method for operating a device according to claim 2, characterized in that the device for circle detection works on the basis of a method of polar edge detection.

8. The method according to claim 7, characterized in that the device for circle detection can detect not only circles but also other, arbitrarily predefined orbits.

9. The method for operating a device according to claim 2, characterized in that the device downstream of the circuit detection Classifier classifies selected image areas on the basis of the data supplied to it in order to determine whether these areas are the images of an eye.

10. The method according to claim 9, characterized in that the classifier classifies a complete list of all areas selected within a search box by the device for circle detection in one step.

1 1. The method according to any one of claims 6 to 9, characterized in that the classifier works synchronously, alternating with the device for circle detection, that after each individual successful circle detection, a classification of the area selected in this way to determine whether this is the image an eye is performed, and that one of the termination criteria of this process of circle detection is an eye recognized by the classifier.

12. The method according to any one of claims 3 to 1 1, characterized in that the classifier is able to recognize whether an eye is closed or open.

13. The method according to any one of claims 3 to 12, characterized in that in the event that the classifier triggers a mechanism suitable for warning of falling asleep and / or necessary measures for prevention for a period of time which lasts significantly longer than the duration of an eyeshadow of accidents.

14. The method according to any one of claims 3 to 13, characterized in that the area in which the device for detecting eyes in the image searches is limited by a so-called search boxes, which is equal to or smaller than the entire image area.

15. The method according to claim 14, characterized in that the search boxes are initialized at the beginning of the method with the aid of a pixel-oriented classifier.

16. The method according to claim 15, characterized in that the pixel-oriented classifier is a hyperpermutation network.

17. The method according to claim 14, characterized in that the search boxes are initialized at the beginning of the method with the aid of template matching.

18. The method according to any one of claims 14 to 17, characterized in that a method from image to image for estimating the changes in position of the search boxes

Base of the optical flow is applied.

19. The method according to any one of claims 14 to 17, characterized in that a Cayman filter adapted to this problem is used from picture to picture to estimate the position changes of the search boxes.

20. The method according to any one of claims 14 to 19, characterized in that the search for circles within a search box begins in the middle of the search box and the further expansion of the search area describes a spiral path from there.

21. Device for circle detection, characterized in that the device is implemented on the basis of a polar edge-oriented algorithm which can detect not only circles but also other, arbitrarily predefined orbits.

22. A method of operating a device according to claim 21, characterized in that in order to take account of the occlusion of the iris through the eyelids, two angles a and β are defined which define the opening angles or the sections of the circular path circumscribing the iris which are not included in the circle detection be included.

23. The method according to any one of claims 21 or 22, characterized in that the variance of the brightnesses along the path is included in the evaluation of the path integrals of the circular orbits.

24. Device for determining the viewing direction, characterized in that that the device contains devices for segmenting the image data associated with the eyes and nose, which is followed by a common classifier which, as the classification result, provides the direction of view of an observed person.

25. The method for operating a device according to claim 24, characterized in that the device for segmenting the image data associated with the nose of the person being observed makes the selection based on a detected nostril.

26. The method according to claim 25, characterized in that the detection of a nostril is carried out by means of a polar edge detector with a downstream classifier.

27. The method according to any one of claims 25 or 26, characterized in that a separate classifier for classifying the nose alignment and a separate classifier for classifying the direction of gaze of the eyes is used.

28. The method according to claim 27, characterized in that in the event that no eye was recognized, the classifier for classifying the nose alignment performs an estimation of the viewing direction.

29. The method according to any one of claims 25 or 26, characterized in that a common classifier classifies the combined image data of the eyes and nose in order to estimate the direction of view of an observed person.