EP4278329A1

EP4278329A1 - Method and system for recognizing objects, which are represented in an image by means of a point cloud

Info

Publication number: EP4278329A1
Application number: EP21843940.4A
Authority: EP
Inventors: Niklas HERMES; Cornelius REINFELDT
Original assignee: Gestigon GmbH
Current assignee: Gestigon GmbH
Priority date: 2021-01-13
Filing date: 2021-12-21
Publication date: 2023-11-22
Also published as: WO2022152522A1; DE102021100512A1; CN116888637A

Abstract

A method for recognizing one or more objects, which are represented in an image by means of an M-dimensional point cloud, with M > 1, composed of a plurality n of points, comprises: determining, for each of a number m, with m > 0, of specific one-dimensional variables, an associated value of the variable for each of the points on the basis of the position or properties of the point; determining, for each of the variables, a frequency distribution with respect to the respective values of said variable which were determined for the various points; approximating each of the frequency distributions by means of a linear combination of a finite number of one-dimensional probability density functions associated with the variable in question; segmenting the image such that, in the case of m = 1, each of the probability density functions and, in the case of m > 1, each product of m probability density functions, one of the associated probability density functions per variable being represented in the product, is uniquely assigned a segment of the image; assigning each point of the point cloud to the segment, the probability density function associated with which, in the case of m = 1, or the product associated with which, in the case of m > 1, has, at the location which is determined by the values of the m variables which are assigned to the point, relatively the greatest function value among the probability density functions or relatively the greatest product value among the products; and identifying, as a representative of an associated recognized object, at least one of the segments to which at least a predefined minimum number of points was assigned. A corresponding device and a computer program are designed to carry out the method.

Description

METHOD AND SYSTEM FOR RECOGNIZING OBJECTS REPRESENTED IN AN IMAGE BY A CLOUD OF POINTS

The present invention relates to a method and a system for recognizing one or more objects that are represented in an image or in corresponding image data using a point cloud.

In many different technical applications, the task arises of analyzing image data, ie data representing an image or a sequence of images, such as a video, to determine whether and, if so, which objects are depicted in the image(s). The detection of movements or changes in such objects on the basis of such images or image data is also regularly of interest.

In addition to the known methods of photography or the recording of "moving images", such as video recordings, the methods for generating images or image data also include methods of, in particular discrete, scanning of a real scene with one or more associated real objects ( e.g. people or things), where the resulting image data represents a two- or three-dimensional point cloud. Such scanning can be carried out in particular with image sensors that also scan a scene in the depth dimension. Examples of such image sensors are, in particular, stereo cameras, time-of-flight sensors (time of flight or time-of-flight (TOF) sensors), and electro-optical distance sensors (laser range finders (LRF) sensors). Alternatively, such point clouds can also be generated by radar, lidar or ultrasonic sensors. Alternatively, such point clouds can also be generated artificially, without a real scene having to be recorded by sensors. In particular, such point clouds can be generated artificially, in particular computer-aided, as part of or as the result of simulations, in particular simulations of real scenes.

In some applications, it may be necessary to segment such a point cloud (in the sense of image processing) in order to be able to distinguish or separate different image areas or areas of the point cloud as segments (i.e. image segments), for example to be able to separate an image foreground from an image background to separate.

A simple known method for such a foreground/background segmentation for an image given by a point cloud is the Evaluate depth information regarding the points of a point cloud by means of a threshold value method, in that all points which, according to their depth information, are closer than a specific depth threshold are assigned to the image foreground, while all other points are assigned to the image background.

If a scene represented by the point cloud contains, for example, two different objects, a separation of the two objects in the image or in the point cloud can also be achieved in this way.

However, such a method reaches its limits when the objects are close together, in particular in such a way that they overlap in each spatial dimension considered and thus the individual point clouds representing the objects merge into one another without clearly recognizable separation and merge into a common point cloud.

The object of the present invention is to further improve the recognition of one or more objects that are represented in an image or in corresponding image data using a cloud of points. In particular, it is desirable to achieve improved separability of different objects.

The solution to this problem is achieved according to the teaching of the independent claims. Various embodiments and developments of the invention are the subject matter of the dependent claims.

A first aspect of the invention relates to a method, in particular a computer-implemented method, for recognizing one or more objects represented in an image using an M-dimensional point cloud, with M>1, from a plurality n of points, the method having: (i) determining, for each of a number m, with m > 0, of certain one-dimensional quantities, a respective associated value of the quantity to each of the points based on its position or properties; (ii) determining, for each of the quantities, a respective frequency distribution in relation to the values of this quantity determined for the various points; (iii) approximating each of the frequency distributions by means of a respective linear combination of a finite number of one-dimensional probability density functions associated with the underlying quantity; (iv) Segment the image such that if m=1 each of the probability density functions and if m>1 each product of m Probability density functions, one, in particular precisely one, of the associated probability density functions being represented for each variable in the product, a respective segment of the image being unambiguously associated; (v) Assigning each point of the point cloud to that segment whose assigned probability density function in the case of m=1 or its assigned product in the case of m>1 at the point that is determined by the values of the m variables assigned to the point, the relative has the largest function value or product value among the probability density functions or products; and (vi) identifying at least one of those segments to which at least a predetermined minimum number of points has been assigned as a representative of a respective recognized object.

A “cloud of points” in the sense of the invention is a set of points of a vector space (unless restricted to specific dimensions below for embodiments) of any given dimension M>1, which in particular can have an organized or an unorganized spatial structure. A point cloud is described by the points it contains, which can each be recorded in particular by their positions specified using spatial coordinates. In addition to the points, attributes such as B. geometric standards, color values, temperature values, recording times or measurement accuracies or other information.

A “one-dimensional quantity” within the meaning of the invention is to be understood as any selected quantity that can be completely determined one-dimensionally, ie as a number (with or without a unit), and that characterizes a property of a point in a point cloud. In particular, the property can be position information, such as a spatial coordinate, or an attribute of the point or be derived therefrom. In the case of position information, the size can correspond in particular, but is not limited to, an assignment of the position to a specific point on a directional line (e.g. coordinate axis). In another example, however, it could also correspond to a distance of the respective point of the point cloud from a specific reference point, so that, for example, points lying concentrically at the same distance from this reference point have the same value for the size.

Let X be a continuous random variable (here a continuous variable representing one of the one-dimensional characteristic quantities). A “one-dimensional probability density function” within the meaning of the invention is then to understand a mathematical function f(x) of the one-dimensional random variable X, for which the following applies: sp (ci < X < b) stands for the probability or actual frequency of the occurrence of a value for x from the value interval ]a;b] specified by a and b. Especially for the value c = 1 of the scaling factor c, this definition of f(x) agrees with the usual mathematical definition of a probability density function of a one-dimensional continuous random variable. In contrast, the concept of a “one-dimensional probability density function” within the meaning of the invention is therefore generalized, since c can also assume values other than 1 here.

A “segment” of an image (or a point cloud) in the sense of the invention is a content-related region of an image (or a point cloud) that is defined by combining adjacent pixels (or points in a point cloud) according to a specific homogeneity criterion is. In this case, the homogeneity criterion can relate in particular to a position or coordinate or an attribute of the points, without being limited thereto. The context of the region can thus be understood spatially in some cases in particular, while in other cases it can relate in particular to points in the sense of the homogeneity criterion of the same or similar attributes.

As used herein, the terms "comprises," "includes," "includes," "has," "has," "having," or any other variant thereof, as appropriate, are intended to cover non-exclusive inclusion. For example, a method or apparatus that includes or has a list of elements is not necessarily limited to those elements, but may include other elements that are not expressly listed or that are inherent in such method or apparatus.

Further, unless expressly stated to the contrary, "or" refers to an inclusive or and not to an exclusive "or". For example, a condition A or B is satisfied by one of the following conditions: A is true (or present) and B is false (or absent), A is false (or absent) and B is true (or present), and both A and B are true (or present). As used herein, the terms "a" or "an" are defined to mean "one or more". The terms "another" and "another" and any other variant thereof shall be construed to mean "at least one other".

The term "plurality" as used herein means "two or more".

The term "configured" or "set up" to perform a specific function (and respective modifications thereof) is to be understood within the meaning of the invention that the corresponding device is already in a configuration or setting in which it can or can perform the function it is at least adjustable - i.e. configurable - so that it can carry out the function after appropriate setting. The configuration can take place, for example, via a corresponding setting of parameters of a process flow or of switches or the like for activating or deactivating functionalities or settings. In particular, the device can have a plurality of predetermined configurations or operating modes, so that the configuration can take place by selecting one of these configurations or operating modes.

The aforementioned method according to the first aspect is therefore based in particular on describing the cloud of points using one or more selected, one-dimensional variables that characterize each point in the cloud of points on the basis of its position or properties, and a frequency distribution of the values of the to approximate the respective variable by means of one-dimensional probability density functions (in the sense of the approximation or adjustment calculation). On the basis of this approximation, in particular the respective function values of the various probability density functions for the values of the respective quantity associated with a point under consideration, this point can then be unambiguously assigned to a segment of the image or the point cloud. In many cases, this is even possible if the point cloud portions of different objects or of one object and the image background are close to each other. This can be used in particular to separate the images of multiple objects represented by a point cloud from one another. In particular, the accuracy of the separation can be increased and the error rate reduced. Particularly high accuracies or low error rates can be achieved in the case of m>1, since different variables that are independent of one another interact here to create even stricter separation criteria for assigning the points to an image segment and thus if necessary, to deliver to an associated object. In many cases, it is also possible to separate images of objects well from one another which, if only one size were used, could not be separated or could only be separated with a higher error rate with regard to the allocation of points.

Preferred embodiments of the method are described below, each of which, unless expressly excluded or technically impossible, can be combined with one another and with the other aspects of the invention described further as desired.

In some embodiments for the case m=1, the points of the point cloud are assigned to one segment each (segmentation criterion) in such a way that each point to be assigned is assigned to a segment of the image is assigned. At least one of the threshold values is defined as a function of a variable value at which one of the intersection points of at least two of these probability density functions occurs such that the threshold value corresponds to the variable value for this intersection point.

This procedure can also be illustrated in particular by using the threshold value in the M-dimensional space in which the point cloud is defined to define a separation line in the case of M=2, a separation plane in the case of M=3 and a separation hyperplane in the case of M>3 is used to separate the points to be assigned to the different segments. If there are more than two segments and therefore two or more different threshold values, then several such separation lines or (hyper)planes occur.

The above-mentioned segmentation criterion can thus be defined in a simple manner and used efficiently without a great deal of computational effort in order to allocate the individual points to a segment in each case. The definition of the threshold value(s) as a function of the point(s) of intersection of the probability density function is particularly advantageous with regard to the goal of an assignment that is as reliable as possible (with few or no errors). Namely, if the probability density functions for the linear combination are determined by the approximation in such a way that they each approximate the respective frequency distribution of the size for a specific object well, then their integral over a specific value interval, in which to the associated value for the size lies at a certain point, with a respective one Associate the probability that the point belongs to the object approximated by the respective probability density function. Thus, if a point is assigned to a particular segment based on its size value as a result of comparison with the threshold, this means that it has a higher probability of belonging to the object associated with this segment than to the other object whose associated segment is determined by means of of the threshold is separated from the associated segment.

In some embodiments, at least one of the m quantities for each of the points in the point cloud indicates a position of this point along this spatial direction, projected onto a selected fixed spatial direction. In this way, in particular, a separation of different objects or of object and background on the basis of the spatial position of the points (along the spatial direction) is made possible. This can be used, for example, to achieve segmentation of the image or point cloud in a two- or three-dimensional point cloud (M e {2;3}) with depth dimension z on the basis of the depth information given by the point positions, in particular also in the sense a foreground/background segmentation. The spatial direction can in particular correspond to the direction of a coordinate axis of a coordinate system used to define the positions of the points in the M-dimensional space.

In some embodiments, the fixed spatial direction is selected to be orthogonal to a first principal component resulting from a principal component analysis applied to the point cloud. This is particularly advantageous for the detection of objects that are to be separated from the background or other objects with regard to a spatial direction that does not coincide with the direction of the first principal component, preferably even, at least essentially, is perpendicular thereto. Since the first principal component from a principal component analysis represents the dominant component for objects that are not spherically symmetric, it is consequently particularly easy to separate those objects whose dominant component runs at least largely transversely to the fixed spatial direction under consideration. If, for example, the selected fixed spatial direction corresponds to the depth direction (e.g. "z" direction) of a depth image, then an arm that is shown transverse to the depth direction in the image and whose main component corresponding to the longitudinal direction of the arm can also be transverse (e.g. in x- or y-direction orthogonal to the z-direction) to the selected fixed spatial direction can be recognized or separated particularly well. Specifically, in some embodiments for which M e {2;3} applies, the fixed spatial direction can be selected such that, in the case of M=2, it is the second principal component resulting from the principal component analysis and, in the case of M=3, the third principal component resulting from the principal component analysis Main component corresponds. The least dominant of the main components is thus selected as the fixed spatial direction, so that objects can be recognized or separated particularly well whose more dominant first or second main components are transverse, in particular orthogonal, to the fixed spatial direction.

In some embodiments, the method further includes: filtering the image such that, after filtering, it only contains those points of the point cloud that have been assigned to one of the segments that have each been identified as representing a respective recognized object. In this way, a filter function can be implemented in particular, which has the effect that only the object or objects of interest is recognized or identified, while other objects or the image background are at least largely ignored (except for those points that may have been mistakenly assigned to the object or objects). assigned to the remaining objects of interest).

Specifically, in some of these embodiments, the image can be filtered in such a way that, after filtering, it only contains those points of the point cloud that have been assigned exactly to a specific selected one of those segments that has been identified as representing an assigned recognized object. A result can thus be achieved in which at most or in particular only exactly one single object is identified.

In some embodiments, in which for m=1 the size for each of the points of the point cloud indicates a position of this point along this spatial direction projected onto a selected fixed spatial direction, that segment is selected from the set of segments identified as representing a respective recognized object , whose assigned points according to their positions projected onto the selected fixed spatial direction viewed in the viewing direction along this spatial direction, viewed on average, are closer than the points assigned to any other of the identified segments. This can be advantageously used in particular for the purpose of foreground/background segmentation if only one (or the) foremost object is to be recognized as the foreground. In some embodiments m>1 applies and at least one of the m quantities indicates a temperature value or a color value for each of the points of the point cloud. Another of the m quantities can relate in particular to the position of the respective point. In particular, a particularly reliable, ie selective, segmentation can be achieved if the object(s) to be identified typically have a surface temperature that deviates from their ambient temperature, as is usually the case with living objects, in particular people or animals.

In some embodiments, output data is generated (and preferably output, in particular via an interface) that represents the result of the assignment of the points to segments or the identification of at least one recognized object in one or more of the following ways: (i) the output data represent, for at least one of the objects, an image of this object based on one or more, in particular all, of those points in the point cloud which have been assigned to the segment belonging to this object; (ii) the output data represents information indicating how many different objects were recognized by the segment assignment of the points in the image; (iii) the output data represent information which indicates to which respective segment or object the points were assigned in each case; (iv) the output data represent information which, for at least a subset of the points, specifies the respective function value of one or more of the probability density functions at the point which is determined by the values of the m quantities assigned to the point. In the case of option (i), the image can be determined in particular by a specific point from the set of points assigned to the segment or as a specific, in particular calculated point depending on these points, for example as the center point of the distribution of the points in the set. Instead, the image can in particular also be defined as a spatial area or body spanned by the points of the set.

In some embodiments, for at least one (in particular for all) of the m variables, the associated (respective) probability density functions each have a course in which the function value increases as a function of the value of the variable up to a maximum and then falls again, with the maximum is the only occurring maximum in the course of the probability density function. Such a function profile, which can be bell-shaped (symmetrical or also asymmetrical), is then particularly good for the method and in particular for approximating frequency distributions for the sampling point clouds generated by objects if the object or objects each have a convex shape.

In particular, in some of these embodiments, at least one (in particular each) of the respective probability density functions for at least one of the m quantities can be a Gaussian function. The Gaussian function or Gaussian functions can, in particular, be normalized or can be normalized by means of a parameter (e.g. such that c=1 in formula (3) above). In addition to the above-mentioned good suitability for approximating frequency distributions for the point clouds generated by scanning convex objects, the choice of Gaussian functions also has the advantage that a large number of known, efficient and robust approximation methods are available for this purpose.

In some embodiments, at least one of the frequency distributions is subjected to a respective smoothing process and the approximation with regard to this at least one frequency distribution takes place with respect to the corresponding frequency distribution smoothed by means of the smoothing process. In this way, the quality of the approximation and thus the quality and reliability of the recognition or separation of objects represented by the point cloud based thereon can be further increased.

In some embodiments, based on the respective points of one or more of the segments identified as representing a respective object, a gesture recognition process is performed to recognize a gesture of a person represented in the image by means of the point cloud. This can be done in particular in the context of an automotive application, in particular in connection with a gesture recognition with regard to gestures performed by an occupant of a vehicle to control a functionality of the vehicle.

A second aspect of the invention relates to a system for data processing, having at least one processor which is configured in such a way that it executes the method according to the first aspect of the invention.

In particular, the system can be a computer or a control unit for another or higher-level system, such as for a vehicle or for a production machine or line. A third aspect of the invention relates to a computer program with instructions which, when executed on a system according to the second aspect, cause the latter to carry out the method according to the first aspect.

The computer program can in particular be stored on a non-volatile data medium. This is preferably a data carrier in the form of an optical data carrier or a flash memory module. This can be advantageous if the computer program as such is to be traded independently of a processor platform on which the one or more programs are to be executed. In another implementation, the computer program can be present as a file on a data processing unit, in particular on a server, and can be downloaded via a data connection, for example the Internet or a dedicated data connection, such as a proprietary or local network. In addition, the computer program can have a plurality of interacting individual program modules.

The system according to the second aspect can accordingly have a program memory in which the computer program is stored. Alternatively, the system can also be set up to access a computer program available externally, for example on one or more servers or other data processing units, via a communication connection, in particular in order to exchange data with it that are used during the course of the method or computer program or outputs of the computer program represent.

The features and advantages explained in relation to the first aspect of the invention also apply correspondingly to the further aspects of the invention.

Further advantages, features and application possibilities of the present invention result from the following detailed description in connection with the figures.

It shows:

1 shows schematically various exemplary scenes, each with an object arrangement of two objects to be separated from one another, and in each case a sectional image of a corresponding point cloud detected by sensors by scanning the scene; 2 shows a diagram to illustrate an exemplary embodiment of the method according to the invention for the case m=1;

3 shows an illustration to illustrate the dependence of the approximation on the choice of a one-dimensional variable; and

4 shows a diagram to illustrate the assignment of points to a specific segment in an exemplary embodiment of the method according to the invention for the case m=2, with a local temperature value of the object recorded for the respective point as an attribute for each point in addition to the depth coordinate at the location of the point is used as the basis for the assignment.

Throughout the figures, the same reference numbers are used for the same or corresponding elements of the invention.

In Fig. 1, to illustrate an exemplary problem addressed by the invention, an overview 100 of various exemplary scenes 105a, 110a, 115a and 120a and a corresponding sectional view 105b, 110b, 1 15b or 120b through a point cloud P is shown, which was generated by scanning the respective scene using a depth image sensor, in particular a TOF camera (time of flight sensor). The depth direction, to which the detected depth image relates and which measures a distance from the sensor to the respective object along the depth image sensor, is selected here as the "z" direction by way of example. One can thus think of the TOF camera as mounted above the scene such that the z-direction viewing direction is vertically downward. A point p, in the point cloud is given by its (x,y,z) coordinates, where (x,y) is a (horizontal) plane perpendicular to the sensor's line of sight, and z is the depth value, i.e. the distance from the point to the sensor.

Each of the scenes shows a first object Oi, which is formed by a human hand of a person, and any other object O ₂ , which can be, for example, another part of the person's body or a body belonging to an interior of a vehicle.

In the case of scene 105a, the _two objects Oi and O2 are laterally adjacent in a direction perpendicular to the z-direction (eg, x-direction), with a gap between them along this direction. Due to this gap, the point cloud portions corresponding to the two objects Oi and O ₂ can be divided, as in shown in sectional view 105b, easily separate from one another and assign each to a separate image segment or. This assignment is essentially error-free, at least when the gap is larger than the average point spacing within the point cloud P.

In the case of scene 110a, the two objects Oi and O2 are offset from one another in the z-direction, with a gap between them in the z-direction. Due to this gap, the point cloud portions corresponding to the _two objects O1 and O2, as shown in section view 110b, can also be easily separated from each other due to their clearly different depth values (z-coordinates) and each have their own image segment and thus object O1 or assign O2. This assignment is also essentially error-free, at least when the gap is larger than the average point spacing within the point cloud P.

In the case of scene 115a, on the other hand, the _two objects O1 and O2 are offset from one another in the z-direction, separated only by a very small gap, and they overlap in the direction perpendicular to the z-direction. The corresponding point cloud P in view 115b no longer allows a division of the point cloud P into point cloud portions or segments corresponding to the two objects O1 and O2 in a similarly simple and error-free manner as in scenes 105a and 110a due to a recognized gap, because the average point spacing within the point cloud P is similar in size to the gap.

The starting position for an object separation is even more difficult in the case of scene 120a, in which the two objects O1 and O2 overlap or touch both in the z-direction and in a direction perpendicular thereto, so that there is no gap that can be imaged by the point cloud P here more occurs and thus an object separation or segmentation with simple means, as explained for the scenes 105a and 105b, becomes unreliable or fails completely.

In the exemplary embodiment 200 of a method according to the invention illustrated in FIG. 2, a scene 205 containing a plurality of objects, in this case two objects O1 and O2, is scanned by image sensors, in particular by means of a depth image sensor, such as a TOF camera, in order to obtain an image of the scene in the form of a point cloud P, as shown in view 210. The image data output by the depth image sensor can, in particular, have its respective coordinate in the depth direction, here as the z-direction, for each of the points p in the point cloud P selected, and optionally represent further coordinates or additionally measured properties of the objects. The following explanations of the method 200 focus on the z-coordinate, which is initially to be taken into account as the only one-dimensional variable used within the scope of the method 200, so that the case m=1 is initially considered here. The case m>1 will be addressed below with reference to FIG.

Starting from the point cloud P, a frequency distribution h(k) is determined with respect to the z coordinates of the points that occur in the point cloud, where k=k(z) stands for discrete values of z, as will be explained in detail below. In view 220, the resulting frequency distribution h(k) is illustrated using a histogram that represents it.

Mathematically, this can be expressed as follows, for example for the frequent case M=3, for any depth values (one-dimensional quantities): Let P = {pi , ..., p _n } be a three-dimensional point cloud and de IR ³ a given unit vector in a specific direction, referred to herein as the "depth direction". In the present example, this is the z-direction. Furthermore, let dt ■■= (pt, d) e ]R be the directed depth (depth value) of the point p _t , where (p _b d) denotes the scalar product of the two vectors p _L and d. The set of depth values {d , ..., d _n } (in this example equivalent to the set of z coordinates of the points [p _L , ...,p _n }) serves as the basis for the further steps for object separation or Segmentation.

A frequency distribution with regard to the depth values {d^ . . . d _n ) can now be determined as follows, in particular as a histogram: Such a (depth) histogram has a specific granularity y>0. For example, y=1 cm could be chosen. In order to achieve a good compromise between the quality of the result of the segmentation or object identification on the one hand and the efficiency, in particular in the form of computational effort, of the method, the choice of y should be based on the requirements of the respective application. For each depth value _dL let _kL yJe TL , where LJ symbolizes rounding off. For each TL let n ₇ be the number of those for which j = k _t holds. Then the mapping h _P Z >— > ]R:j >— > n ₇ defines such a histogram for the frequency distribution.

This can be described as follows: the range of possible depth values is divided into a sequence of sections of length y and each point Pt of the point cloud P, at least each point to be assigned to a segment, is assigned to one of the sections according to its depth value d _L . The histogram then indicates, for each value JE 2 , the number of those points whose depth value corresponds approximately (ie rounded down in the present example) to j /. The finitely large granularity requires the aforementioned discretization, since all values of d _L within the same section are assigned the same value k _t for k.

Referring now again to the specific example from FIG normalized Gaussian functions were selected. This results in a corresponding approximation function F(h(z)) = fi(z) + f ₂ (z) formed by means of the Gaussian functions, in the present case by means of two different Gaussian functions fi(k) and f ₂ (k), as in View 230 illustrated. F(h(z)) is thus an approximation of the (smoothed) frequency distribution from view 225.

A normalized Gaussian function is, as usual, to be understood as a function f :IR >— > H, which can be represented using the following formula, where the mean p of the distribution, the standard deviation a and the normalization factor c are each parameters of the function f (the notation "f" and "f" are used here synonymously, the same applies to different spellings of other symbols) and with regard to the method 200 z is selected as the independent variable:

The approximation task is to find the number N of different Gaussian functions L and for each of them the respective set of parameters {pi, <7 c with i = 1 ,...,N such that the (smoothed) frequency distribution h(k) for each value of k (i.e. the corresponding discrete z-value) is approximated by the sum of these Gaussian functions: h(k)^f _q (k) (5)

The choice of Gaussian functions for the approximation is advantageous in several respects. In particular, it has been shown that such functions can provide a very good approximation for frequency distributions, such as those found when sampling convex bodies, in particular many body parts of the human body, such as arms and legs or the head, using a depth image sensor. If one considers each punctiform distance measurement during sampling as an independent random variable, then the good suitability of Gaussian functions for the mentioned approximation can also be justified mathematically on the basis of the central limit value theorem.

Furthermore, various efficient methods for a function approximation using Gaussian curves are available. This includes, for example, an approximation method described in A. Goshtasby, W.D. O'Neill, "Curve Fitting by a Sum of Gaussians", CVGIP: Graphical Models and Image Processing, Vol. 56. Mp 4, July, 1994, pp. 281-288. Further examples of applicable approximation methods can be found in particular on the Internet at: httDs://www.researchgate.net/Dublication/252062037 A Simple Algorithm for Fitting a Gaussian Function DSP Tips and Tricks/link/544732410cf22b3c14e0c0c8/downlo ad or at https:// stats.stackexchange.com/auestions/92748/multi-peak-aaussian-fit-in-r.

If the Gaussian functions f _q (z) are determined by means of the approximation, a segment of the image or the point cloud P represented by them can be defined by each of these Gaussian functions. Then, for each point p _L e P , the probability that that point p _L belongs to a respective particular segment can be interpreted such that this probability is proportional to fq(di). In the present example, for each point Pt EP the associated function value fi (di) indicates the probability that this point p _L belongs to a first segment of the image, and accordingly for each point p _t e P the associated function value f2 ( di) indicates the probability that this point Pt belongs to a second segment of the image different from the first segment.

The two segments can thus be separated, as shown, in such a way that each point p _t is unambiguously assigned to that segment q whose function value f _q (dj) for this point is the highest among the various function values for this point. This assignment rule is illustrated in this view 235, where the dashed dividing line runs exactly through the intersection of the two functions T and f2 and all points above this dividing line to the first segment represented by T (q=1 ) and all points lying below this dividing line to the segment represented by f ₂ represented second segment (q = 2) are assigned. Should a point p _t actually (within the accuracy of the representation of di) lie on the dividing line, a predetermined assignment to a selected one of the segments can be provided for this case in order to avoid ambiguities. However, if di is represented with a sufficiently high level of accuracy, this case will generally not occur or will occur very rarely.

Based on this segment assignment, as illustrated in view 240, one or, in this case, _two objects Oi and O2 can now be identified by assigning all points of a respective segment to exactly one of these objects O1 or O2. The respective segment is thus determined as a representative of the respective associated object.

Alternatively, however, it is also possible to filter the point cloud based on the segmentation before object assignment, so that (except in the limiting case that all points have been assigned to the same object) only a real subset of the segments remains after filtering and as a basis for object assignment serves. In the present example, the segment for q=2 can be filtered out in this way, for example, which corresponds to the larger depth values z. Thus, the first segment for q = 1 can be determined as a representative of a (in this example only) identified object O1 in the image foreground (nearest segment in the z-direction), while the second segment for q = 2 not as an identified object, but instead is not interpreted or interpreted as the background of the image B.

As illustrated in FIG. 3 using an exemplary comparison 300 of two different scenarios, the choice of the one-dimensional variable can influence the resulting one, particularly if it corresponds to a position along a specific direction (here the z-direction, for example). Frequency distribution, thus on the functions determined from it by approximation and finally also on the quality of the segment assignment and object identification.

In a first scenario, which is illustrated in view 305, the z-direction is selected such that it runs orthogonally to a main extension direction, represented by direction vector A, of a person's hand to be identified as object O1 within the scope of the method. Within the framework of the approximation, here for example again using Gaussian functions, the situation shown in view 310 results that the frequency distribution is good even using a single Gaussian function can be approximated, which in turn leads to a simple and very reliable and accurate identification of the object Oi.

In the second scenario, which is illustrated in view 315, the z-direction is selected such that it is no longer orthogonal, but rather at a smaller angle to the main extension direction represented by the direction vector A of the object shown and as part of the method Oi to be identified hand of a person runs. Within the framework of the approximation using Gaussian functions, the situation shown in view 320 results here that the frequency distribution can only be well approximated using a linear combination of several Gaussian functions, which in turn leads to a more difficult and possibly less reliable or less precise identification of the object Oi.

The choice of the one-dimensional variable is therefore clearly preferable in the case of the first scenario. Accordingly, the method 200 can in particular provide that the one-dimensional direction is selected on the basis of the result of a principal component analysis in such a way that a fixed spatial direction is selected for the one-dimensional quantity such that it runs orthogonally to a first principal component, which consists of a point cloud applied principal component analysis results. In particular, in the present example, the second principal component resulting from the principal component analysis can be selected in the case of M=2 and the third principal component resulting from the principal component analysis in the case of M=3 (cf. direction vector Ä in view 305). In this way, the least dominant main component (here along the z-direction) is selected, which usually optimizes the probability that the most dominant main component is at least predominantly perpendicular to it and thus to the scanning direction (here z-direction) and therefore a more dem scenario approximated to the first scenario with optimized segment allocation and object allocation.

4 relates to an extension of the method, in particular also of method 200, to the case m>1. Diagram 400 serves as an example to illustrate the assignment of points to a specific segment in an exemplary embodiment of the method according to the invention for the case m=2.

Consider the exemplary problem of discriminating a hand Oi from a background B once again. This problem can be addressed as follows. So far, only the depth information of the pixel has been used in the method 200, but even this advanced approach can have limitations: if, for example, in the context of an image recording in a motor vehicle, the hand (of the driver) is held next to the gear stick at a certain point in time, at about the same depth level from the point of view of the image sensor, so that same or very similar depth values z result for the points of a point cloud resulting from scanning the scene, then the image or the point cloud can be segmented into a segment for the hand and a segment for the background B (or the shift lever as the second object O2) may fail based on depth readings alone.

In general, for certain scenes, a situation may arise where the points that can be distinguished by the method for m=1 (i.e., that they belong to different Gaussian curves) belong to different objects, but it is not guaranteed that those points that are not discriminated in this way belong to the same object. In other words, in such a case each function, especially Gaussian, may represent only one object category (i.e. a set of multiple objects that is not further discriminated by the chosen feature) and not necessarily exactly a single object.

One approach to improving the method with regard to its selectivity includes adding at least one additional one-dimensional variable so that m>1 applies. In particular, as illustrated in FIG. 4, for each point p _L , in addition to the depth coordinate z, a local temperature value T recorded for the respective point can also be used as a second variable and thus as an additional basis for the assignment.

It is now assumed, for example, that the hand has a higher (surface) temperature than the background and a classification of the points pi according to their respective local temperature value Ti according to a second frequency distribution h'(k'(T) related to the temperature as an independent variable ) or h'(T) for short, which in turn can be approximated by a linear function of distribution density functions gi in accordance with method 200, only this time related to the temperature instead of the z-coordinate.

A purely temperature-based segmentation and object identification based thereon (corresponding to view 240) can now be carried out in a corresponding application of the segmentation according to view 235 from FIG. This still corresponds to the m=1 case, only with a temperature-based segmentation instead of a depth-value (z-coordinate)-based segmentation. However, as illustrated in FIG. 4, it is even more effective to use both variables z and T in combination as the basis for the segmentation. Here, the size z enables the point cloud to be subdivided into the categories of near object and distant object or image background. In parallel, the thermal quantity (temperature) T can divide the points into the categories "warm objects" and "cold objects". In the present example, a distinction can be made between at least four categories (or corresponding segments): (i) a warm and at the same time close object, (ii) a warm and at the same time distant object, (iii) a cold and at the same time close object and (iv) a cold and at the same time distant object. The image background B can optionally also be viewed as a distant object.

Mathematically, such a generalization can be represented in particular as follows:

Let P = {p _x , ... , p _n } be a point cloud generated by the sensory scanning of the scene, with each point p _L being assigned a depth value z and a measured local temperature value T at the location of the measured position of the respective point Pt becomes.

As described above, an approximation according to equation (5) is carried out for the depth z of the points, initially considered as a single variable, in order to determine a linear combination of functions f _q (z) which approximates the depth value distribution of the points. Each of the functions f _q (z) again represents a depth segment.

In the same way, for the temperature (local temperature values T) of the points, which is also initially considered as a single variable, an approximation is made according to equation (5) in order to determine a linear combination of functions, in particular Gaussian functions, g _r (T), which the temperature value distribution of points approximated. Each of the functions g _r (T) represents a temperature segment.

Then one can interpret the value of the product f _q (z(p _i )') ■ _gr ( (pd) or in abbreviated notation f _q (pt) ■ gr ( _.pd as proportional to the probability that the point p _t belongs to the combined segment (q, r) formed as the intersection of the depth segment in q and the temperature segment in r, where q and r are subscripts for enumerating the functions f _q and g _r , respectively The value of this product is now used to the respective point p _L so a certain of to assign to combined segments that the product for that combined segment is relatively largest, which corresponds to a selection of the most likely assignment.

Specifically, in the example of FIG. 4, the product for the selected point p _t the combination /i(Pi) '^(Pr) is largest among ^all combinations, so that the concrete point pt is assigned to the combined segment (1;2). becomes, which here corresponds to the closest and at the same time warmest object. The points of this combined segment can then be identified as points of an object to be recognized, here the hand Oi.

The method according to the invention can be used in its various variants for a wide variety of applications. Such applications include, in particular, the separation of images of different body parts of a person, of different people or of one or more people on the one hand and one or more other objects on the other hand, each from one another or from a background. In particular, the method can be used to separate one or more body parts of a person in an image captured by sensors, in order then, depending on the result of such a separation or segmentation and a subsequent identification of the body parts as objects, to carry out gesture recognition with regard to any of the perform gestures performed by the person.

While at least one exemplary embodiment has been described above, it should be appreciated that a large number of variations thereon exist. It should also be noted that the example embodiments described are intended to be non-limiting examples only, and are not intended to limit the scope, applicability, or configuration of the devices and methods described herein. Rather, the foregoing description will provide those skilled in the art with guidance for implementing at least one example embodiment, while understanding that various changes in the operation and arrangement of elements described in an example embodiment may be made without departing from the scope of the appended claims the specified object and its legal equivalents are deviated from. REFERENCE LIST

100 Overview of various exemplary scenes

105a-120a different scenes

105b-120b point clouds for the various scenes 105a-120a

200 exemplary method for detecting objects

205-240 views of intermediate stages of the process 200

300 Comparison of two different scenarios

305 first scenario

310 Approximation function for the first scenario

315 second scenario

320 approximation function for the second scenario

400 Diagram to illustrate an example assignment of points in the case of m=2

A Direction vector of the first principal component of an object

B background

Set of probability density functions, in particular Gaussian functions, for approximating a frequency distribution of depth values gr Set of probability density functions, in particular Gaussian functions, for approximating a frequency distribution of depth values h(z) frequency distribution p point cloud

P/ single point of the point cloud

Oi ; O2 objects

T temperature

depth

Claims

CLAIMS Method (200) for recognizing one or more objects (Oi;O ₂ ) represented in an image by means of an M-dimensional point cloud (P), with M>1, from a plurality n of points (p,), wherein the Method (200) comprises:

determining, for each of a number m, with m > 0, of certain one-dimensional quantities (z; T), a respective associated value of the quantity (z; T) to each of the points (p _z ) based on its position or properties;

determining, for each of the quantities (z; T), a respective frequency distribution (h) with respect to the values of this quantity (z; T) determined in each case for the different points (p _z );

approximating each of the frequency distributions (h) by means of a respective linear combination of a finite number of one-dimensional probability density functions (f _q ;gr) associated with the underlying quantity (z;T);

Segment the image in such a way that in the case m = 1 each of the probability density functions (f _q ; _gr ) and in the case m>1 each product of m probability density functions (f _q ; _{gr r} ), with each one of the associated probability density functions (f _q ; g _r ) is represented in the product for each size (z; T), a respective segment of the image is clearly assigned;

Assignment of each point of the point cloud ( _P ) to that segment whose assigned probability density function in the case of m=1 or its assigned product in the case of m>1 at the point that is determined by the values of the m quantities ( z;T) has the relatively largest function value or product value among the probability density functions (f _q ;g _r ) or products; and

Identifying at least one of those segments to which at least a predetermined minimum number of points (p _z ) has been assigned in each case as a representative of a respective recognized object (Oi ;O ₂ ). Method (200) according to one of the preceding claims, wherein the at least one of the m variables (z) for each of the points (p _z ) of the point cloud (P) is a position of this point (p _z ) projected onto a selected fixed spatial direction along this spatial direction indicates.

23

3. The method (200) of claim 2, wherein the fixed spatial direction is selected to be orthogonal to a first principal component (1) resulting from a principal component analysis applied to the point cloud (P).

4. The method (200) according to claim 3, wherein M e{2:3} and the fixed spatial direction is selected such that, in the case of M=2, the second principal component resulting from the principal component analysis and in the case of M=3, the second principal component resulting from the principal component analysis resulting third principal component.

The method (200) of any preceding claim, further comprising:

Filtering the image in such a way that, after filtering, it only contains those points (p/) of the point cloud (P) which have been assigned to one of the segments which have each been identified as representing a respective recognized object (01; O2).

6. The method (200) of claim 5, wherein the image is filtered in such a way that, after filtering, it only contains those points (p 1 ) of the point cloud (P) that have been assigned exactly to a specific selected one of those segments that, as a representative an associated recognized object (01; O2) was identified.

7. The method (200) according to any one of claims 2 to 4 in conjunction with claim 6, wherein m=1 and that segment is selected from the set of segments identified as representing a respective recognized object (01; 02), its associated points (p,) according to their positions projected onto the selected fixed spatial direction viewed in the viewing direction along this spatial direction, viewed on average, are closer than the points assigned to any other of the identified segments.

8. The method (200) according to any one of the preceding claims, wherein m>1 and at least one of the m variables (z; T) indicates a temperature value (T) or a color value for each of the points (p 1 ) of the point cloud (P).

9. The method (200) according to any one of the preceding claims, wherein output data are generated, which is the result of the assignment of the points (p/) to segments or the identification of at least one recognized object in one or more of the following ways:

- for at least one of the objects (01; O2), the output data represent an image of this object (01; O2) based on one or more of those points (p,) of the point cloud (P) that correspond to the object (01 ;O2) belonging segment.

- the output data represent information indicating how many different objects were recognized by means of the segment assignment of the points (p _z ) in the image;

- The output data represent information which indicates to which respective segment or object (01; 02) the points (p) were assigned in each case;

- the output data represent information which, for at least a subset of the points (p,) indicates the respective function value of one or more of the probability density functions (f _q ;g _r ) at the point indicated by the values assigned to the point (p). of m sizes (z; T) is determined. Method (200) according to one of the preceding claims, wherein for at least one of the m variables (z;T) the associated probability density functions (f _q ; _gr ) each have a profile in which the function value depends on the value of the variable (z ; T) rises to a maximum and then falls again, the maximum being the only maximum that occurs in the course of the probability density function. Method (200) according to claim 10, wherein at least one of the respective probability density functions (f _q ; _gr ) for at least one of the m quantities (z; T) is a Gaussian function. Method (200) according to one of the preceding claims, wherein at least one of the frequency distributions (h) is subjected to a respective smoothing process and the approximation with regard to this at least one frequency distribution (h) is carried out with regard to the corresponding frequency distribution (h) smoothed by means of the smoothing process. Method (200) according to one of the preceding claims, wherein on the basis of the respective points (p) one or more of the segments identified as representing a respective object (01; O2) a Gesture recognition process is performed to recognize a person's gesture depicted in the image by means of the point cloud (P). A data processing system comprising at least one processor configured to perform the method (200) of any preceding claim. A computer program having instructions which, when executed on a system according to claim 14, cause it to carry out the method (200) according to any one of claims 1 to 13.

26