EP3097511A1

EP3097511A1 - Method for detecting a movement path of at least one moving object within a detection region, method for detecting gestures while using such a detection method, and device for carrying out such a detection method

Info

Publication number: EP3097511A1
Application number: EP15700309.6A
Authority: EP
Inventors: Friedrich SCHICK
Original assignee: Myestro Interactive GmbH
Current assignee: Tripleye GmbH
Priority date: 2014-01-24
Filing date: 2015-01-14
Publication date: 2016-11-30
Also published as: DE102014201313A1; WO2015110331A1

Abstract

In a method for detecting a movement path (2₂₃; 2₂₁, 2₂₂) of at least one moving object (23, 24) within a detection region, the moving object is first detected so as to be imaged. In the process, a first detection image (11) is generated which reproduces the detection region at a first detection time. After a delay period, a second detection image (12) is generated which displays the detection region at a correspondingly later detection time. Correspondences of the image regions (13 to 22) of the detection images (12) are then determined and analyzed. For this purpose, the detection images (12) are compared, a distribution density of image regions (13 to 23) which correspond to one another with respect to their changes in position in the detection image (12) is ascertained, and the ascertained distribution density is analyzed. Pixels (23, 24) with a corresponding movement increment (2₂₃; 2₂₁, 2₂₂) between the detection images (12) are then assigned on the basis of the analyzed distribution density. The pixel movements are then analyzed. The method and a gesture detection method which uses said method can be carried out using simple optical means.

Description

Method for detecting a movement path of at least one moving object within a detection area, method for gesture recognition using such a detection method, and device for carrying out such a detection method

The content of German Patent Application 10 2014 201 313.5 is incorporated herein by reference. The invention relates to a method for detecting a movement path of at least one moving object within a detection area. Furthermore, the invention relates to a method for gesture recognition using such a recognition method and an apparatus for carrying out such a recognition method or Gestikerken- recognition method.

Methods and devices for object recognition are known from WO 2012/095 258 AI and WO 2013/020 872 AI. The article "A Probabilistic Framework for Matching Temporal Trajectories:

CONDENSATION-Based ecognition of Gestures and Expressions "by Black et al., In H. Burckhardt, B. Neumann (Eds.): Computer Vision - ECCV '98, Vol I, LNCS 1406, pp. 909-924, 1998, © Springer - Verlag Berlin Heidelberg 1998, describes a use of locally rasterized vectors to detect the trajectories of a mouth movement.

It is an object of the present invention to provide a detection method with simple optical means. This object is achieved according to the invention by a recognition method with the features specified in claim 1.

The invention described measures a distribution density of motion correspondences between parts of successive images.

The essential information that is processed is a movement pattern, whereby the moving structure is only deduced in a next step. To perform the method, a pattern recognition, for example a face recognition, which is performed on one and the same image is not required. The essential motion information is obtained by comparing successive images. For this purpose, motion correspondences are determined between image sections of two successive images. Correspondence is given when two parts of the picture are similar. In this preprocessing, correspondences are also permitted which do not correspond to the optical flow. With the method, distributions of correspondence vectors of different direction and length are produced for small image regions in each case, thus correspondence distribution profiles over the entire image. These correspondence distribution profiles are converted into a correspondence distribution density. The image flow then corresponds to the largest values of an ideal correspondence distribution density. Due to measurement errors, image repetitions and measurement gaps, the ideal correspondence distribution density of an optical flow, ie a "clean" optical flow, is generally not achieved. The preprocessing process can therefore be characterized as flow-oriented examination (flox), with which correspondence A subset of such correspondence distribution densities is the optical flux By allocation errors, the distribution density will find a variety of other correspondences The distribution densities are checked for potential movements of compact regions Correspondences between similar pixels or image parts that are not images of the same object, eg correspondences between two adjacent file folders, lead to a pseudo-movement that usually does not continue locally but remains local, comparable to the speedometer indicator on spinning tires, by comparing more than two capturing images taken in succession exclude such apparent movements. The concatenation of plausibilized motion increments then leads to a movement, which in turn is checked for a gesture.

If objects in the vicinity are to be measured, such as fingers directly in front of the camera which is installed in a pair of glasses, several image areas will lead to a similar, ie common, movement. In this case, a suitable average of the coordinates of the common movement is used to represent the actual object. As a rule, it is not the position of the object that determines the motion gesture, but the shape of the path, which in this case is identical for all common movements. In addition to an average value, the supreme of all pixels traversing a common path can also be selected and assigned. This is, for example, the fingertip of an upward pointing finger in the picture. The accuracy of the web has to be so good that it is possible to differentiate the web shapes assigned to the gestures.

In the case of imaging, camera images can be cyclically loaded into an evaluation computer.

The temporal distance of the pictures may vary, but must be known. From two successive images, a correspondence distribution density is determined from which movement increments are calculated per image pair. From the sequence of motion increments, motion sequences are filtered which can correspond to selected gesture movements. The number of incorrect correspondence distribution densities can be reduced by coarse distance knowledge, by suitable depth sensors or by sharpness adjustments of the camera or flash lighting, in order to increase the security with the recognition.

When pixel mapping is done no object shape detection. It is checked where, in corresponding pixel groups or image areas, movements, in particular fast movements, with high density, ie movements of pixel groups with comparable movement increments, are detected. From a detected pixel group, the selection and assignment of a representative pixel takes place on the basis of previously defined criteria for the determined distribution density and the associated movement increments. For example, a minimum density of moving pixels can be specified and it can under the then preselected pixels which lie within the pixel groups with the minimum density, a selection is made after the largest movement increment. Alternatively, it is possible to preselect according to certain movement increments and, within a pixel group which has this movement increment, a pixel which is excellent in terms of its position can be selected within this pixel group. In the selection or assignment of the pixel by evaluation of the determined distribution density, a prediction algorithm can simplify an assignment of a specific pixel. For this purpose, it is checked on the basis of, for example, three successive acquisition images, if the last captured acquisition image is a candidate pixel in an image area in which it can actually be expected according to its movement in the first two consecutively acquired acquisition images. Only pixels in which a predicted image area is reached then correspond to the prediction and thus fulfill this selection criterion. As far as several pixels remain after passing through these different selection criteria, a simple geometric selection can be made. For example, it is possible to select and assign an uppermost pixel which is present in the detection area among the selection candidates. The image areas may be individual pixels or pixel groups.

In the motion path recognition method explained above, the method steps are executed automatically and computer-aided. The procedure can be performed without operator intervention. The trajectory detection method can be run on a standard computer in real time. The trajectory recognition method also extracts movement increments from "dirty" flow distributions, in particular via a 2D frequency matrix, which will be described below.

The definition of a depth range according to claim 2 can be carried out with the aid of a depth sensor. For this purpose, the depth of field of a front optics of the camera sensor can be used. Autofocus techniques can also be used for this purpose, which can be used in particular for contrast enhancement and thus for improving the result of a comparison of the acquisition images. As soon as a depth of an object whose trajectory has been detected is known, the object speed can also be measured and specified for the object movement.

Accordingly, the correspondence distribution density can be determined not only from objects in the distance of the expected object but also from objects less or further away from the sensor. With coarse-resolution depth sensors, based on structured light, time-of-flight or even stereoscopy, image parts can be identified that are not in the distance range and whose distribution densities are ignored. A depth sensor based on structured light (structured light) is known, for example, from US Pat. No. 4,954,962. A depth sensor based on time-of-flight is known from EP 2 378 310 A1.

Coarser resolutions offer, for example, ultrasonic sensors. Through a combination of some ultrasonic sensors, the directions of objects that are within the expected distance can be determined and other image areas can be discriminated.

Distance information is also provided by so-called light field sensors. If no sharpness can be calculated for the desired distance in certain image areas, these too will be discriminated. A depth-range defmition according to claim 3, provided the presence of an appropriately controllable light source, with high precision possible. Alternatively or additionally, for the definition of the depth range, a temporal variation of an illumination period at an exposure time in the imaging acquisition can also take place.

In addition to the direct depth sensors so also further distance-dependent effects can be used. If an IR filter is placed in front of the camera and the surroundings are irradiated with limited IR light power, the range is limited and correspondence of underlying objects is no longer detected. If objects are very close, they are so strongly illuminated by the IR radiation that no contrasts are recognizable on them. This creates a depth range for measurable correspondences. If the IR radiation line and the exposure time are varied in a short time sequence, measurable depth ranges can be offset in such a way that only chains of movement increments can be made plausible by the objects that have remained throughout the measurable areas.

Another distance-dependent effect is the depth of field. For deep lenses, the depth of field is less than with low-foc lenses. Only in this area can correspondences be measured. By varying the focal length in a short time sequence, the measurable depth range can be shifted so that only chains of Movement increments of the objects can be made plausible, which have remained throughout the measurable range.

The combination of the two distance-dependent effects together with their temporal variation leads to the desired effect of limiting the measurable range. It is therefore also possible to use combinations of depth of field and / or illuminance and / or illumination duration for the depth range definition. The advantages of a gesture recognition method according to claim 4 correspond to those which have already been explained above with reference to the movement path recognition method.

Gestures are created by the movement of body parts. Immediate measurement of motion does not require modeling, such as images of hands or joint models. If the movement of compact, for example, fist-sized, objects measured directly, can be dispensed with the modeling, for example, a hand pose or joint models. In a monocular camera system, the fist-sized object should tend to be moved transversely to the viewing direction of the sensor. Together with a suitable depth sensor, it is also possible to directly measure removal speeds to the sensor. In both cases, however, neither hand poses have to be trained or joint models with an essentially undisturbed environment must be taken into account. In addition, the smaller the object-to-detection-area ratio, the smaller the demands on texture information to the object, so that many independent objects in a large volume can be measured. The derived gestures can be further plausibilized via the use of known methods such as inverse kinematics or template matching. The movement must have been triggered by a specific object-like grayscale distribution. A hand, fingers as well as artificial objects (gloves, markers) can serve as a basis. With "Inverse Kinematics", movement predictions can be made and thus the correspondence density distribution can be evaluated in a more targeted manner.The correspondence density distribution can also be better evaluated through simplified, for example planar, motion models such as the model of constant speed.

An inverse kinematics method is known from CA 2 21 1 858 C. A template matching method is known from EP 1 203 344 B 1 corresponding image acquisition, a circle symbol can be selected, which is generated by an open or closed hand of the user within a detection area by corresponding circular motion. About the imaging detection of such a circle symbol, a circle center and a circle radius of this circle symbol can be detected and stored, for example, in a memory of a control module. Subsequent symbols can then be detected as being relevant for the control, insofar as they occur within the circle area thus defined within the detection area, plus, if necessary, an additional surrounding area which can be preset via an enlarged tolerance radius around the center of the circle. Within the circular area can then be defined via the control various sub-areas, such as circular sectors, which are analogous to how keys of a keypad controlled by the user and can trigger various signals. A persistence in such a subrange or a defined change between predetermined subrange sequences can then be recognized as a signal for triggering a specific control sequence. Other gestures which can be recognized after the activation gesture "circle symbol" are, for example, a clockwise and counterclockwise rotating gesture, which can be processed, for example, to amplify or reduce a signal intensity comparable to, for example, a volume control.

The gesture recognition method described here can also be used separately from the motion path recognition method explained above by using a corresponding control module and is an independent component of the application. For trajectory recognition, a method known from the prior art may alternatively be used which deals with an optical flow, for example the so-called KLT tracker described in "Bruce D. Lucas and Takeo Kanade." "Iterative Image Registration Technique with an Application to Stereo Vision." IJCAI, pages 674-679, 1981. Also, methods known in the context of codec implementations may be used. The Gestikerkennungsverfah- ren can be designed so that it runs on a standard computer in real time.

Model pixel movements according to claim 5 result in a gesture set that can be used for a variety of control tasks. The specification of an input area with an area specification gesture according to claim 6 makes it possible to define a sub-area which can be detected, for example, with high resolution, within the detection area, which can be used for detailed input purposes. The Range Preset gesture may be a circular motion. You can then make further entries in the defined input area.

By a face recognition according to claim 7, a release of the gesture recognition can take place. Face recognition can identify a person in the environment of the movement. It can then be ensured that only certain people have access.

By a selection of provided model pixel movements according to claim 8, the specification of a user profile is possible.

The advantages of a device according to claim 9 correspond to those which have already been explained above in connection with the movement path recognition method and the gesture recognition method.

The device may include a light source which is in signal communication with the camera sensor and / or the evaluation computer so that the light source, for example an exposure intensity or an exposure period, can be preset by the camera sensor and / or the evaluation computer by appropriate control.

With the aid of a projector device according to claim 10, an input field or a multiple input field can be used, for example, in a given input area. number of input fields, for example in the form of a keyboard, can be generated by projection. By actuating the at least one projected input field, the user can then trigger a defined control action or also make an input, for example a yes / no selection or a text input.

An embodiment of the invention will be explained in more detail with reference to the drawing. 1 shows very schematically a device for carrying out a

Method for detecting a movement path of at least one moving object within a detection area as part of a gesture recognition method; Figs. 2 and 3 are snapshots of the detection area reproducing detection images at two consecutive detection times.

1 shows schematically a device 1 for carrying out a detection method. With the device, a movement path 2 of at least one moving object 3 within a detection area 4, which is shown in dashed lines in FIG. 1, can be detected. As an example of the movement path 2, the path of a moving hand of the object 3 is shown in FIG. 1 using the example of a gesticulating user.

The device 1 has a monocular camera sensor 5, which is a high-resolution CCD camera or CMOS camera with an optical attachment 6, which is capable of a predetermined depth or a Depth range T of the detection area 4 with predetermined image sharpness to capture.

Via a signal line 7, the camera sensor 5 is in signal connection with an evaluation computer 8. The latter is connected via a further signal line 9 with a device 10 to be controlled in signal connection. Alternatively, the evaluation computer 8 and the device 10 to be controlled can be one and the same unit. The device 10 to be controlled may be a type of tablet PC equipped with components 5 and 8 for gesture recognition. Alternatively, the device 10 to be controlled may also be an external device with respect to the evaluation computer 8, for example a TV set or another consumer electronics device. A home automation device, such as a lighting system or a shutter control or a heating system is an example of the device to be controlled 10th

For detecting the movement path 2, the detection area 4 is imaged by the camera sensor 5. In this case, an acquisition image reproducing the detection area 4 is generated in the camera sensor 5.

By way of example, FIGS. 2 and 3 show two such acquisition images 1 1 and 12 at times t = 0 and t = 1, using arbitrary time units by way of example. The acquisition image 12 is generated by the camera sensor 5 by a delay period later than the acquisition image 1 1. The two acquisition images 1 1 and 12 are digitized in real time or quasi in real time and stored in the evaluation computer 8. In the evaluation computer 8, a determination and evaluation of correspondences of image areas of the acquisition images 1 1, 12 then takes place. For this purpose, the acquisition images 1 1 and 12 in the evaluation computer 8 are compared with each other. It Then, a distribution density of image areas corresponding to their change in position in the acquisition image is determined.

The delay period, ie a time interval between the detection times of the acquisition images 11 and 12, can be variable. The delay period can be in the range between 10 ms and 1 s.

In Figs. 2 and 3, such image areas are exemplified by small squares 13 to 22. These image areas may be individual pixels or groups of pixels.

The detection images 1 1 and 12 show as pixel groups representing detected objects, for example, a raindrop 23, which may be present on a camera lens of the camera sensor 5, for example, and a hand 24 of the user 3. Both the raindrop 23 and the hand 24th have moved between the two detection times t = 0 and t = 1 of the acquisition images 1 1 and 12. When determining and evaluating correspondences, in particular of the image regions 13 to 22 of the acquisition images 1 1 and 12, the procedure is as follows, in particular using the evaluation computer 8: First, the first captured acquisition image 1 1 is split into overlapping image parts. The capture image 1 1 is a digital image that is formed overall as an A x B pixel array. The integer values A and B, which represent the numbers of pixels in the respective rows and columns of the array, are in the range between 500 and 10,000, for example. The overlapping image parts are then C x D subpixel arrays. The integer value C is included is significantly smaller than the value A and the integer value D is significantly smaller than the value B. C and D may for example be in the range between 8 and 30. Adjacent image parts, ie adjacent subpixelarrays, have at least one pixel row or at least one pixel column in common.

After splitting into overlapping image parts, each of these image parts is assigned an image signature. By way of example, this signature is a bit sequence which represents a brightness distribution and / or a color distribution within the image part. For image signature detection and image signature mapping, each image part is split into overlapping sub-image parts. The subpictures may be E x F sub-subpixel arrays. The integer values E and F are smaller than the values C and D of the subpixel arrays. For example, E and F may be in the range of 3 to 7. For the respective image part and for each sub-image part of this image part, a mean gray value is determined by appropriate evaluation of the brightness and / or color values of the associated pixels with the aid of the evaluation computer 8. In addition, a tolerance deviation ε is specified. A difference is determined in each case between the determined average sub-image gray value and the average image part gray value. If the resulting difference is smaller than -ε, the value 0 is assigned as the first sub-image signature value. If the difference lies between the values -ε and ε, the value 1 is assigned as the second sub-image signature value. If the difference is greater than + ε, the value 2 is assigned as the third sub-image signature value. The partial image signature to be assigned to the respective image part is then the result of the assigned sub-image signature values. With the allocation method explained above, the respective image part signatures are determined for the two acquisition images 1 1 and 12. Subsequently, the image parts of the second capture image 12 are assigned to the image parts of the first capture image 1 1 with the same signature. This assignment results in 2D vectors, which can be understood as raw motion increments. These 2D vectors connect image parts, that is to say, for example, the image regions 13 to 22 of the two capture images 11, 12 with the same image signature. Image parts without associated 2D vectors are then discarded, so that the further evaluation is limited exclusively to the assigned image parts. Now the 2D vectors in the environment, in particular in a predefined pixel environment, are compared in each case of a remaining image part and the frequency of similar vectors in this environment is determined. The result of this frequency determination is the distribution density of the image areas corresponding to their positional change in the acquisition image.

Those 2D vectors whose distribution density is below a predefined limit are then discarded.

Now, a 2D frequency distribution of all remaining 2D vectors, ie a 2D frequency matrix, is calculated. Motionless image parts have a vector length 0 in both dimensions and form a central element of the distribution density. Moving parts of the picture increase the frequency of discrete 2D vectors with a certain length and direction.

The central element of the frequency distribution including 2D vectors with a length below a given limit length subsequently rejected. When the camera is stationary, this results in background suppression. When the camera is moving, it is alternatively possible to suppress 2D vectors which correspond to this movement within a predetermined tolerance range.

A maximum frequency of a 2D vector swarm with calculation of center point and extent in the second acquisition image 12 is now selected. This may be the hand 24. The selection can then be continued for the next most frequent 2D vector swarm, ie for at least one subpopulation. One result of this subswath selection can be, for example, the raindrop 23.

For further acquisition images, a linear prediction of the respective center of swarm in the next image for tracking this 2D vector swarm can then take place. This can improve the detection accuracy to suppress interference by swarms overlapping each other in individual detection images.

When determining the distribution density by comparing the acquisition images 1 1 and 12, it results, for example, that in the region of the raindrop 23 virtually all the pixels run along a comparable trajectory 2 ₂₃ , which is illustrated in FIG. 3 on the basis of the movement of the uppermost image area 13. Here, the position of the image area 13 at the detection time t = 0 is shown in dashed lines in FIG.

FIG. 3 shows a typical (intermediate) result when evaluating the determined distribution density by a corresponding evaluation algorithm. For example, in the correspondence determination to the image areas 19 to 22 assigned to the hand 24, there have actually been True correspondences (movement of the image areas 21 and 22) and actually false correspondences (movement of the image areas 19 and 20) result. Together with other image areas that can be assigned to the hand 24, which are not shown in FIGS. 2 and 3, there is an increased distribution density of image areas that correspond with the image areas 21 and 22 with respect to their positional change in the acquisition image 12.

Also shown are further detected pixel groups reproduced as objects in the acquisition images 11 and 12 by associated image areas 14 to 18 and the corresponding image areas 14 to 18 resulting from the evaluation of the distribution density after the delay period, ie at time t = 1 (cf. Fig. 3). From the resulting movement paths or movement increments 2j (i = 13 to 22) of the image areas 13 to 22, it can be concluded in the evaluation of the respective associated distribution density of the image areas whether the resulting movement paths 2j can be real or not. The result of the evaluation is an assignment of individual pixels from pixel groups evaluated with respect to their distribution density with associated motion increment between the acquisition images 1 1, 12 on the basis of the evaluated distribution density. The result of the evaluation of the acquisition images 1 1 and 12 results respectively assigned pixels for the objects "raindrops" and "hand" with the actual trajectories 2 ₂₃ for the raindrop 23 and 2 ₂₁ and 2 ₂₂ for the hand 24th The pixel movements assigned to the assigned pixels 13, 21, 22 and the associated movement increments 2 ₂₃ and ₂₁ 1, 2 ₂₂ can then be evaluated. When determining the distribution density takes place - as explained above - detecting selected portions of the detection images 1 1, 12, which differ in the detection images 1 1, 12. In the region of the raindrop 23 and in the region of the hand 24, therefore, a higher-resolution determination and evaluation of correspondences of the image regions takes place. When evaluating the distribution density, methods of averaging and statistical methods are used.

The determination and evaluation of correspondences can, of course, be carried out on the basis of a sequence of individual images of a larger number, for example using a sequence of three, four, five, six, eight, ten, twenty-five, fifty, one hundred or even more individual images.

For pure recognition of the trajectory 2 no gesture model is required.

As already demonstrated by means of the example "Raindrop 23" and "Hand 24", the recognition method makes it possible to detect the trajectories of several independent objects. These can also be more than two independent objects (for example, three, four, five, ten, or even more independent objects).

In the case of imaging detection of the detection area 4, a predefined depth area T, that is to say a range of predetermined distances, within which objects, that is to say, for example, the user 3, can be detected. len, be defined. As a depth range, for example, a distance range from the camera sensor 5 between 0.5 m and 3 m or between 1 m and 2.5 m can be specified. Also, a more tolerant or more specific specification of a depth range is possible. The definition of the predetermined depth range can be done by means of a depth sensor. This technique can be used, which are known under the keywords "Structured Light", "TOF". A stereobildgebendes method with two camera sensors can be used to define the depth range. For this purpose, a light field can also be used or ultrasound or radar radiation can be used. The depth of field of the optical attachment 6 can also be used to define the depth range T. In this case, for example, autofocus techniques can be used. As soon as the depth of the detected object 3, ie its distance from the camera sensor 5, is known with the aid of such a method, it is also possible to measure and indicate a speed of the object detected in its movement after detection of the movement path 2.

The definition of the depth range can also be achieved by setting a lighting intensity of an illumination of the detection area by means of a light source 25 at an exposure time during the imaging acquisition. The light source 25 is connected via a signal connection, not shown, with the camera sensor 5 and / or the evaluation computer 8 in signal connection. As an alternative or in addition to an illuminance setting, a temporal variation of an illumination period during illumination with the light source 25 in relation to the exposure time of the camera sensor 5 during the imaging acquisition can also be used to define the depth range. The above-described trajectory recognition method can be used within a method of gesture recognition.

In this case, a plurality of model pixel movements or model object movements are provided as control symbols, and these model pixel movements are compared with the pixel movements which were evaluated by the movement path recognition method. Subsequently, the model pixel movement is identified as a selected control symbol, which has the greatest agreement with the evaluated pixel movement. Finally, a control action associated with the selected control icon is performed. In this gesture recognition technique, techniques known in the art as "template matching" and "inverse kinematics" may be used.

The model pixel movements may include at least one of the following motion patterns:

Movement across the detection area from left to right;

Movement across the detection area from right to left;

Movement across the detection area from top to bottom;

Movement across the detection area from bottom to top;

Movement over the detection area in heart shape;

Movement over the detection area in Z-shape;

- movement over the detection area in circular form;

Closing an open hand into a fist;

no movement. The control action may include predetermining an input area 26 within the entire detection area 4 by an area specification gesture. This range setting gesture may be performed, for example, by a circular motion of an open or closed hand. The person 3 can thereby define within the entire detection area 4 the input area 26, which is subsequently detected by the camera sensor 5 in high-resolution. For this purpose, the attachment optics 6 can be designed, for example, as a zoom lens. In the then defined input area 26 then more, more detailed inputs can be made. Within the detection area 26, an input raster, for example a keyboard layout, can be projected, for example, by means of a corresponding projection technology with the aid of a projection module or a projector device 27. The user can then operate a keyboard projected into the detection area 4 with the projector device 27, which in turn is detected, recognized and evaluated by the camera sensor 5.

The gesture recognition and subsequent gesture control can in particular work without distinction from different trajectory models for symbol gestures. This will be explained below with reference to another example:

From the sequence of - as explained above in connection with FIGS. 2 and 3 - certain movement increments a circular trajectory is detected. This circular trajectory serves as a circle symbol for activating the gesture control. In order to activate the system, only a distinction of the result states "circle" or "non-circle" in the movement increment tracking of a 2D vector main swarm is required. This involves an evaluation of the movement tion increments with subsequent assignment to one of the result states "circle" or "non-circle". The associated circle-symbol gesture then represents a "point to unlock" gesture All 2D vectors in a neighborhood of the second highest frequency of the vector distribution density describe a vector swarm, which can be calculated using the mean 2D vector lengths as well as a Mean value and a standard deviation of positions of the respective swarm vectors in the subsequent image The mean 2D vector lengths describe the movement increment The mean of the vector positions describes a center of the swarm The position standard deviations are a measure of the size of the swarm.

The center of the detected circle trajectory is then detected by the gesture controller as a polar coordinate system in the acquisition image, having a center and a reference radius. This polar coordinate system is assigned by the gesture control eight sectors, which - as in the cartography - the cardinal directions N, NO, O, SO, S, SW, W and NW can be assigned.

An outer boundary ring with a 1.5-fold reference radius is defined around the detected reference radius.

If a detected swarm center leaves this ring or if no swarm movement is detected for a longer time, the gesture control interprets this as deactivation of the gesture. When a rotation of the swarm within the ring is detected, this can be, for example, clockwise in rotation as an enlargement of an intensity signal desired by the operator and vice versa upon detection of a rotation of the swarm counterclockwise interpreted as a reduction of the desired intensity signal. Thus, for example, a volume of a terminal to be operated via the gesture control can be controlled by corresponding rotational gestures.

Depending on whether the swarm is detected in a specific one of the eight sectors, a specific signal can be triggered. A shift of the swarm into certain sectors can trigger associated signals. For example, by shifting the swarm to a particular signal and maintaining that position, a switching signal may be triggered. In this way, a control operation similar to that of a touchpad operation can be performed.

The original, initializing circle-symbol gesture can therefore be used to define a type of keyboard in the room over which the user can trigger desired control signals. Each of the sectors discussed above may then represent a key of that keyboard.

The triggering of desired control signals after the circle symbol initialization is also called "Point to Control".

In the gesture recognition, facial recognition may be performed prior to the comparison step, which is a prerequisite for performing the further steps of gesture recognition. Depending on the result of the face recognition, a selection of the provided model pixel movements can take place. As a result, a profile of model pixel movements can be assigned to the user respectively recognized via the face recognition. So you can specify user profiles.

Claims

claims

1. A method for detecting a movement path (2, 2j) of at least one moving object (3, 23, 24) within a detection area (4), comprising the following steps:

Imaging the detection area (4) and generating a detection area (1) reproducing the first detection image (1 1) at a first detection time,

Imaging the detection area (4) and generating a second detection image (12) representing the detection area (4) at a second detection time later by a delay period,

Determining and evaluating correspondences of image areas (13 to 22) of the acquisition images (1 1, 12), comprising the following steps:

Comparing the acquisition images (1 1, 12),

Determining a distribution density of image areas (13 to 22) corresponding to their change in position in the acquisition image (1 1, 12),

- evaluation of the determined distribution density,

Assigning at least one pixel of a pixel group (23, 24) and / or an image region (13 to 22) with associated motion increment (2, ₂₃ , ₂₁ , ₂₂ ) between the detection images (11, 12) on the basis of the evaluated distribution density, Evaluating pixel movements assigned to the associated pixel and its movement increment (2 ₂₃ , 2 ₂₁ , 2 ₂₂ ). A method according to claim 1, characterized in that in the imaging detection of the detection area (4) a predetermined depth range (T), ie a range of predetermined distances to be detected within the pixels is defined.

A method according to claim 2, characterized in that a definition of the depth range (T) by adjusting an illuminance of an illumination of the detection area to an exposure time in the imaging acquisition takes place.

Method for detecting a gesture using a recognition method according to one of Claims 1 to 3, with the following further steps:

Providing multiple model pixel motions as control symbols

Comparing the pixel movements evaluated with the recognition method with the model pixel movements, identifying the model pixel movement that has the greatest agreement with the evaluated pixel movement as the selected control symbol,

Perform a control action associated with the selected control icon.

A method according to claim 4, characterized in that the model pixel movements comprise at least one of the following movement patterns:

Movement across the detection area from left to right;

Movement across the detection area from right to left;

Movement across the detection area from top to bottom; Movement across the detection area from bottom to top;

Movement over the detection area in heart shape;

Movement over the detection area in Z-shape;

Movement over the detection area in circular form;

- closing an open hand into a fist;

no movement.

6. The method according to claim 4 or 5, characterized in that the

Control action includes the specification of an input area (26) within the detection area (4) by a range setting gesture.

7. The method according to any one of claims 4 to 6, characterized in that prior to comparing a face recognition.

8. The method according to claim 7, characterized in that, depending on the result of the face recognition, a selection of the provided model pixel movements takes place. 9. Device (1) for carrying out a recognition method according to one of claims 1 to 8

with a monocular camera sensor (5),

with an evaluation computer (8) connected to the camera sensor (5) in

Signal connection (7) is.

10. Apparatus according to claim 9, characterized by a projector device (27).