WO2024088935A1

WO2024088935A1 - System and method for object depth estimation

Info

Publication number: WO2024088935A1
Application number: PCT/EP2023/079419
Authority: WO
Inventors: Sébastien PICCAND
Original assignee: Ams International Ag
Priority date: 2022-10-28
Filing date: 2023-10-23
Publication date: 2024-05-02

Abstract

System (120) for depth estimation of an object (30), said system comprising. a light projector (124) for projecting a pattern (50) of light onto an object (30), said pattern (50) comprises shapes (60, 62, 64) arranged in a plurality of horizontal and/or vertical lines;. a sensor (128) for detecting said projected pattern (50) on said object (30);. a processing unit (134) for computing the distance of said object (30) employing the sensor data, whereby said pattern (50) is built as a multi-scale pattern comprising an overlap of at least two different single patterns (54, 56, 58) with different densities of said shapes (60, 62, 64), whereby said different single patterns (54, 56 58) do not overlap in said lines.

Description

System and method for object depth estimation

DESCRIPTION

Technical Background of the invention

The invention relates to a system for depth estimation of an obj ect . It more particularly relates to a system with a light proj ector, a sensor and a processing unit .

Background

Estimating the depth of an obj ect using a structured light approach is usually computing expensive . A way to accelerate this is to use only the dot position . However, this has di f ferent requirements on the pattern design at short or long range and a trade-of f must be done to focus on long range or short range .

Structured light is usually achieved using an irregular pattern which encodes a speci fic position . The pattern is optimi zed to be unique on a line and requires an algorithm to decode the pattern . The pattern is composed of multiple dots and only one depth measurement is obtained for the whole pattern .

The document US 11 , 297 , 300 B2 describes a structured-light pattern for a structured-light system which includes a base light pattern having a row of a plurality of sub-patterns extending in a first direction . The pattern is created from sub-patterns and lies on a grid . The pattern created is rather dense and requires a decoding algorithm, whereas our proposed pattern focuses on detecting the center of the dots only .

The document US 2020/ 0309916 Al discloses a method which includes causing a light proj ecting system of a distance sensor to proj ect a pattern of light onto an obj ect . The pattern of light includes a plurality of proj ection arti facts arranged in a grid. The approach describes the "inverted light technology" based on one depth per dot measurement. It is using a single regular pattern. This pattern must be sparse enough to handle closed objects. If the objects is moved further away, the density of depth measurements becomes scarce .

Summary

The object of the invention is therefore to improve a system for depth estimation of an object to allow depth estimations for close and far objects with approximately the same accuracy. Furthermore, a method for depth estimation and a pattern are provided.

With respect to the system, this object is solved by system according to claim 1. The system for depth estimation of an object comprises a light projector for projecting a pattern of light onto an object, the pattern comprising shapes arranged in a plurality of horizontal and/or vertical lines, a sensor for detecting said projected pattern on the object, and a processing unit for computing the distance of the object employing the sensor data. The pattern is built as a multi-scale pattern comprising an overlap of at least two different single patterns with different densities of said shapes or markers, whereby said different single patterns do not overlap in said lines. The purpose of the shapes, which can be dots, is to easily identify a specific position of the marker/shape (e.g. center of a gaussian shape) .

Preferred embodiments are specified in the dependent claims.

The invention is based on the consideration that problem of known methods of depth estimation is that the resolution for the depth estimation depends strongly on the distance of the object to the projector as the number of shapes or makers that lie on the objects is dependent on this distance. If an object is far away, only a few points may lie on the object, allowing only a coarse depth estimation. Applicant has found that the accuracy dependency on the obj ect distance can be remedied by employing a multi-scale pattern consisting of several single pattern or sub-pattern which each have a larger density . In this way, only enough shapes lie on the obj ect for an accurate depth estimation .

The term "multi-scale" especially means that a plurality of shapes is arranged in patterns of di f ferent scales , thereby di f fering in the ( average ) distances of their shapes .

The pattern is illuminated as structured light with a plurality of shapes or markers .

The proposed approach enables a consistent depth coverage of the obj ect at di f ferent range using the same depth from dot position approach on a single frame . The proposed approach does not need a pattern decoder ( only detecting the center of the shapes or dots is enough) and allows on depth measurement per dot .

The invention is therefore providing a new way to design patterns for depth estimation of single obj ects and is essentially composed of estimating the expected depth coverage and closest position of the obj ect to consider to generate a first regular sparse pattern, estimating the furthest position of the obj ect to consider to generate a regular dense pattern to fit within the tolerance to the calibration error, creating intermediate patterns as possible to control the minimum density of depth coverage on the obj ect and still fit within the tolerance to calibration error, and combining these patterns to compose a single pattern, with no overlap per line to be able to handle the patterns di f ferently .

Combining the shape/dot patterns with di f ferent density with no overlap per line enables to choose which dots can be used to estimate the depth of the obj ect , optimi zing the depth coverage of the obj ect . Preferably, all shapes of all single patterns have the same wavelength, i.e., they have the same color if the light with which they are projected is visible light. In this way, the sensor does not need to be sensitive to various wavelengths and/or a monochrome projector can be used. The design of the projector itself is simplified and consumption of energy is reduced as it can focus on only one wavelength.

The wavelength advantageously is 940 nm. This specific wavelength is usually a preferred trade-off: it is less sensitive to outdoor condition (like visible light) , still sensitive enough for cameras, not affecting physically the object (e.g., burning) .

The shapes of the pattern are preferably dots.

The shape should be small enough to distinguish them apart when they are close to one another (especially for objects at long range) , and also where it is easy to identify precisely a specific position within the shape (e.g., the center of a dot) .

In a preferred embodiment, the single patterns are built as a cascade of single patterns, whereby from one single pattern to the next single pattern the density of shapes is respectively increased. Advantageously, the single patterns are built in such a way that from one single pattern to the next single pattern the density of shapes is respectively increased .

Advantageously, the density is increased by a factor between 1,5 and 2,5, especially by 2.

The lines are preferably horizontal lines. The direction of the lines should follow the positioning of the camera and projector, i.e. if they are horizontally aligned, the shapes are aligned on lines, if they are vertically aligned, the shapes are aligned in columns. Preferably, during the calibration step, the patterns can be mapped to horizontal lines by a transformation. In this way the lines preferably are horizontal with a transformation matrix factor.

Preferably, each single pattern is arranged shifted along the lines with respect to all other single patterns. The advantage of this arrangement is to reduce the surface of the areas without dots, and to maximize the uniformity of the density on the whole image.

In a preferred embodiment, the pattern comprises exactly three single patterns. Advantageously, it covers a range of 4 times the minimum distance, e.g, from 25 cm to 1 m, which is a typical range for face recognition for example.

Preferably, see sensor comprises or is built as a camera. Another preferred embodiment is an event based-camera. The camera must be sensitive to the wavelength of the projector. The higher the resolution the further the depth can be estimated. Additionally, the f ield-of-view must fit the pattern width/height

Preferably, the processing unit is configured to combine these patterns to compose a single pattern, with no overlap per line to be able to handle the patterns differently.

The processing unit can be realized by hardware and/or software .

The projector preferably is a VCSEL or VCSEL array.

With respect to the method, the object of the invention is solved by a method according to claim 10. The method comprises the steps of projecting a pattern of shapes on an object by a light projector, sensing the pattern on the object by a sensor, determining the object depth by employing the sensed pattern, whereby a multi-scale pattern is projected onto the object, the pattern comprising a plurality of single pattern with different shape densities. Preferably, the expected depth coverage and closest position of the object are estimated and used to generate a first regular sparse pattern. The estimation is preferably conducted as follows. Let z^ (in meter) be the closest distance between the object and the sensor, let b (in meter) be the distance between the projector and camera of the sensor, let f (in pixel) be the focal distance of the camera, and let d_max (in pixel) be the distance required between the dots on the same line in the camera image at handle distance Zmin •

The equation d_max = b.f / z^n holds.

Now let R be the horizontal resolution of the camera (in pixel) . Then density (nb of pixels per line) = d_max/R

If z_max and the density are known, R, b and f can be adjusted whenever possible to fit the present use case.

One can use d_max to create a pattern where dots are spaced in a diamond shape. Dots are placed d_max pixels apart on a line (horizontal space between dots) . Only lines which are d_max pixels a part are filled with dots (vertical space between lines of dots) . The diamond shape is not absolutely necessary, but it allows to reduce the size of the areas without dots.

Advantageously, the furthest position of the object is estimated to generate a regular dense single pattern to fit within the tolerance to the calibration error.

In a preferred variant of the method, intermediate patterns are created to control the minimum density of depth coverage on the object and still fit within the tolerance to calibration error. Preferably, these patterns are combined to compose a single pattern, with no overlap per line to be able to handle the patterns di f ferently .

Advantageously, the pattern on said obj ect is sensed by a camera .

With respect to the pattern, the obj ect of the invention is solved by a pattern according to claim 15 . The pattern is built as a multi-scale pattern comprising a plurality of single patterns of shapes with di f ferent shape densities .

The invention also relates to a light proj ector proj ecting such a pattern .

The advantages of the invention are especially as follows . The pattern design approach described allows to estimate the depth of an obj ect at di f ferent distance with almost the same density of measure points . Only one pattern is required, therefore only one illuminator can be used, saving power consumption . The depth can be estimated with a single shot and the algorithm to compute the depth is very fast , this allows depth computation at very high framerate . The combination of patterns allows to estimate the depth of the obj ect at close and far range with a consistence depth coverage .

Brief Description of the Preferred Embodiments

A preferred embodiment of the invention is described in with a drawing . In the drawing,

FIG . 1 shows a dot pattern from an illuminator and a dot frame captured by a camera ;

FIG . 2 shows an obj ect with a proj ected pattern;

FIG . 3 shows a pattern; FIG. 4 shows the contour of a close object with a pattern projected onto it;

FIG. 5 shows the contour of a far object with a pattern projected onto it;

FIG. 6 shows a pattern projected on an object at three different distances;

FIG. 7 shows a diagram indicating the number of dots of a pattern on an object, and

FIG. 8 shows a system for depth estimation of an object.

Identical parts are labelled with the same reference signs.

Detailed Description of the Preferred Embodiments

In FIG. 1, very schematically an overlay 2 of two optical structures or patterns is shown. A first pattern is an illuminated pattern 6 which serves as a reference pattern and is projected into an object by a projector or illuminator. A second pattern 10 is the dot frame or captured pattern by a camera. For both pattern 6, 10, only one dot is respectively denoted by a reference sign.

A maximum disparity d_max denoted by reference sign 14 is the maximum disparity before overlap, whereby

whereby f is the focal length in pixels of the camera, b is the baseline, i.e., the distance between the camera and the illuminator, and Zmi_n is the (minimum) distance between camera and object. With reference sign 20, a distance d between a pixel of the illuminated pattern 6 and the captured pattern 10 is denoted. With a reference sign 24, a tolerance for calibration is denoted. In FIG. 2, an exemplary object 30 in the shape of a human head is shown on which a pattern 34 with dots 38 is projected. The pattern 34 leads to a sparse depth map: one depth measurement per dot is possible. The computing time for depth measurement is faster than with block-matching SL (Structured Light) . Intermediate values between the dots can 38 be interpolated if required.

Known methods typically conduct the estimation of depth one object of typically known size, e.g., a face or head as shown in FIG. 2. These methods encounter problems when dealing with far and close object. When the object is close, the pattern 34 appears large on the image and can contain many dots. The average disparity is large and dots on the same line must be far from one another. When the object is far, the pattern 34 appears small on the image and contains fewer dots. The average disparity is small, therefore dots on the same line can be closer from one another. The accuracy, with which depth estimations can be conducted is therefore quite different of the object is close or far away.

The invention comprises a pattern which approximately provides the same high accuracy for dept estimation for both close and far objects and which will be described in connection with the following figures.

In FIG. 3, a single pattern 42 with dots 38 which is part of pattern 23 according to the invention is shown for an object 30 which is close at a distance z_ciose. Of object 30, only the outer contour is drawn.

Again, with reference sign 14 the maximum disparity d_max is shown for which r Umax > / b f J- / / 7 cl iose holds and with reference sign 23 the tolerance for calibration is shown. As the object 30 is close, many dots 38 cover the object 30, and a precise and detailed depth estimation of object 30 is possible.

In FIG. 4, the object 30 is shown at distance Zf_ar which is four times larger than z_ciose, i.e., z_far = 4 * z_ciose. The object appears 4 times smaller than at distance z_ciose. A distance 48 corresponds to one fourth of d_max . Compared with the pattern of FIG. 3, the dot density in vertical direction is the same but is limited by tolerance for calibration. In horizontal direction, the density is 4 times more when far. The pattern 34 shown in Fig. 4 comprises the single pattern 42 shown in FIG. 3 usable at longe range.

In FIG. 5, a multi-scale pattern 50 according to the invention is shown. The pattern 50 comprises a first single pattern 54 with dots 60 which can be called a sparse pattern, a second single pattern 56 with dots 62 which can be called a regular or intermediate/medium density pattern and a third single pattern 58 with dots 64 which can be called a dense pattern. The density of dots 62 the second single pattern 56 is as twice as large as the density of dots 60 of the first single pattern 54. The density of dots 64 of the third single pattern 58 is as twice as large as the density of dots 62 of the second single pattern 56.

The maximum disparity 14 is shown, as well as a distance 70 which corresponds to half of the maximum disparity and a distance 74 which corresponds to one fourth of the maximum disparity. Also, in FIG. 5 tolerances for calibration 24 are shown .

The pattern 50 is one pattern for all object distances and for all distances from close to far provides an almost constant dot density on the object. The dots The dots 60, 62, 64 of all single patterns 52, 54, 56 originate from one single illuminator/pro j ector and have the same wavelength, especially 940 nm. The identification of sparse/regular/dense dots can be done in a calibration phase of the pattern. The sparse single pattern 52 can be used to determine the average distance of the object 30, then the dots density can be used to compute the depth of the object 30. FIG. 5 shows an example with 3 levels ( sparse/regular/dense ) , but the invention covers two levels or more depending on the application .

The pattern 50 combines optimized single patterns 52, 54, 56 at different range, reduces vertical density for closer objects 30 which will be discussed in connection with FIG. 6.

In the most left part 80 of FIG. 6, the object 30 is at far distance. For the depth estimation of object 30 all dots 60, 62, 64 of single patterns 55, 56, 58 are used. In this way, a large dot density on the object 30 is realized and a precise estimation is possible. In the example shown, of the sparse single pattern 54 approx. 2 dots, of the regular single pattern 56 approx. 8 dots, and of the dense single pattern 58 approx. 32 dots are projected onto object 30. Therefore, a total number of approx. 42 dots are projected onto the object 30.

In the middle part 82 of FIG. 6, the object 30 is two times closer than in the left part 80 of FIG. 6. The dots 64 of the dense single pattern 58 cannot be used for the depth estimation, as they are too close to each other. Of the sparse single pattern 54, approx.8 dots are projected onto object 30, while of the regular single pattern 56, approx. 32 dots are projected onto the object 30.

In the right part 84 of FIG. 6, the object 30 is four times closer than in the left part 80 of FIG. 6. The dots 62 of the regular single pattern 56 and the dots 64 of the dense single pattern 58 are not usable in this case as they are too close to each other. The dots 60 of the sparse single pattern 54 can be used. Approx. 32 dots 60 of this single pattern 54 are projected onto object 30. As can be inferred from the comparison, for all three different distances of object 30, approx. 40 dots are projected onto object 30.

The number of dots 60, 62, 64 available for depth estimation therefore is approximately independent of the distance of the object from the projector. There is therefore almost a constant dot density on the object independent of the object distance .

The sparse dots 60 are always present on the object 30 to identify which dots 60, 62, 64 should be used for depth estimation. Shown in parts 80, 82, 84 of FIG. 6 are also maximum disparity 14, a distance 90 corresponding to twice the maximum disparity, and a distance 94 corresponding to 4 times the maximum disparity.

In FIG. 7, in a diagram on the x-axis 100 the average distance of the object 30 is shown, and on the y-axis 104, the number of dots on the object 30 is shown. Column 110 corresponds to the case in which only dots 60 of the sparse single pattern 54 are used. Column 112 shows the case in which dots 60, 62 of the sparse single pattern 54 and the regular single pattern 56 are used. Column 114 shows the case in which dots 60, 62, 64 of all single patterns 54, 56, 58 are used.

The rate of reduction between dot density can be adapted to control the minimum number of dots to be present on the object. This is however limited by the expected range of the object depth. In FIG. 6, with a factor of increase density of dots of 2 (fdensity = 2) , the number of dots decreases by fdensity each time we increase the depth of the object by fdensity and the recover by exploiting the subdots with high density. Depending on the average depth of the object, the number of dots is always above N/ fdensity, where N is the number of sparse dots on the object at its closest position, up to a maximum distance. A system 120 for depth estimation is shown in FIG. 8. The system 120 comprises a light projector 124 or illuminator and a camera 128. Shown is also the object 30 of which a depth estimation is to be conducted. The light projector 124 projects a pattern 50 onto object 30 which is represented by a light cone 130. The system 120 also comprises a processing unit 134 which is on its input side connected to the camera 128. The processing unit from the measured position of the dots 60, 62, 64 computes the distance of portions of the object 30 and/or computes an average depth/distance of object 30.

LIST OF REFERENCE SIGNS

2 overlay

6 illuminated pattern

10 captured pattern

14 maximum disparity

20 distance

24 tolerance for calibration

30 ob j ect

34 pattern

38 dot

42 single pattern

44 single pattern

48 distance

50 pattern

54 single pattern

56 single pattern

58 single pattern

60 dot

62 dot

64 dot

70 distance

74 distance

80 left part

82 middle part

84 right part

90 distance

94 distance

100 x-axis

104 y-axis

110 column

112 column

114 column

120 system

124 light proj ector

128 camera

130 light cone

134 processing unit

Claims

1. System (120) for depth estimation of an object (30) , said system comprising

• a light projector (124) for projecting a pattern (50) of light onto an object (30) , said pattern (50) comprises shapes (60, 62, 64) arranged in a plurality of horizontal and/or vertical lines;

• a sensor (128) for detecting said projected pattern (50) on said object (30) ;

• a processing unit (134) for computing the distance of said object (30) employing the sensor data; characterized in that said pattern (50) is built as a multi-scale pattern comprising an overlap of at least two different single patterns (54, 56, 58) with different densities of said shapes (60, 62, 64) , whereby said different single patterns (54, 56 58) do not overlap in said lines.

2. System (120) according to claim 1, whereby all shapes (60, 62, 64) of all single patterns (54, 56, 58) have the same wavelength.

3. System (120) according to claim 2, whereby said wavelength is 940 nm.

4. System (120) according to one of the claims 1 to 3, whereby said shapes (60, 62, 64) are dots.

5. System (120) according to one of the claims 1 to 4, whereby said single patterns (54, 56, 58) are built as a cascade of single patterns (54, 56, 58) , whereby from one single pattern (54, 56, 58) to the next single pattern (54, 56, 58) the density of shapes (60, 62, 64) is respectively increased .

6. System (120) according to claim 5, whereby the density is increased by a factor between 1,5 and 2,5, especially by 2.

7. System (120) according to one of the claims 1 to 6, whereby each single pattern (54, 56, 58) is arranged shifted along said lines with respect to all other single patterns (54, 56, 58) .

8. System (120) according to one of the claims 1 to 7, whereby said pattern (50) comprises exactly three single patterns (54, 56, 58) .

9. System (120) according to one of the claims 1 to 8, whereby said sensor (128) comprises or is built as a camera.

10. Method for depth estimation of an object (30) , comprising the steps of

• projecting a pattern (50) of shapes (60, 62, 64) on an object (30) by a light projector (124) ;

• sensing said pattern on said object (30) by a sensor (128) ;

• determining the object depth by employing said sensed pattern ( 50 ) , characterized in that a multi-scale pattern is projected onto said object (30) which comprises a plurality of single pattern (54, 56, 58) with different shape densities.

11. Method according to claim 10, whereby the expected depth coverage and closest position of the object are estimated and used to generate a first regular sparse pattern.

12. Method according to claim 11, whereby the furthest position of the object (30) is estimated to generate a regular dense single pattern (58) to fit within the tolerance to the calibration error.

13. Method according to claim 12, whereby intermediate patterns are created to control the minimum density of depth coverage on the object and still fit within the tolerance to calibration error.

14. Method according to claim 13, whereby these patterns are combined to compose a single pattern (50) , with no overlap per line to be able to handle the patterns differently.

15. Method according to one of the claims 10 to 14, whereby the pattern on said object (30) is sensed by a camera.

16. Pattern (50) for depth estimation of an object (30) , characterized in that the pattern (50) is built as a multi-scale pattern comprising a plurality of single patterns (55, 56, 58) of shapes (60, 62, 64) with different shape densities.