WO2021239941A1

WO2021239941A1 - Method for estimating a distance between an imaging device and a subject represented in at least one 2d image provided by said device

Info

Publication number: WO2021239941A1
Application number: PCT/EP2021/064335
Authority: WO
Inventors: Vincent DEBORDES
Original assignee: Lacroix Electronics Cesson
Priority date: 2020-05-28
Filing date: 2021-05-28
Publication date: 2021-12-02
Also published as: FR3110997B1; FR3110997A1

Abstract

A method for estimating a distance D between an imaging device, comprising a sensor and providing at least one 2D image representing a subject. The method comprises a calculation (80) according to the following equation: D = l * (w_m * f_w) / (w_p * s_w) with: l a focal length of the imaging device; w_m a real linear dimension of the subject, having a linear dimension type; s_w an actual linear dimension of the sensor, according to the linear dimension type; w_p an apparent linear dimension, in pixels, of the subject in the at least one image, according to the linear dimension type; and f_w an apparent linear dimension, in pixels, of the at least one image, according to the given linear dimension type. Estimating (20) the parameter w_p comprises: a) applying at least one filter for extracting contours; b) dividing to form a plurality of strips; c) calculating an intensity for each of the strips; d) selecting two strips; and e) calculating a pixel distance d between the two selected strips, constituting a first estimate of the parameter w_p.

Description

TITLE: Method for estimating a distance between a camera and a subject represented in at least one 2D image provided by this camera.

1. TECHNICAL FIELD

The field of the invention is that of distance calculation.

More precisely, the invention relates to a method for estimating, by a computing machine, a distance between a camera, comprising a sensor and providing one or more 2D image (s), and a subject represented in this or these 2D image (s).

The 2D images considered are digital images comprising pixels.

By shooting device is meant in particular, but not exclusively, a photographic device (providing 2D images) or a video camera (providing a video comprising a sequence of 2D images).

By subject is meant in particular, but not exclusively, a face or an object.

Measuring such a distance (between the camera and the subject) is not a new problem and is involved in a large number of technical fields. We can, for example, mention the advantage of measuring the distance between a driver and his steering wheel, in order to adapt the moment of deployment of the airbag in the event of an accident.

2. TECHNOLOGICAL BACKGROUND

The distance D between a camera and a subject can be estimated easily and with good precision using the following equation (geometric formula): D = (w _m ^* f _w ) / (w _p ^* s _w ) , with :

• I a focal length of the camera;

• w _m a real linear dimension of the subject, having a type of linear dimension (typically, width or length);

• S _w a real linear dimension of the sensor, depending on the type of linear dimension;

• w _p an apparent linear dimension, in pixels, of the subject in the image, according to the type of linear dimension; and

• f _w an apparent linear dimension, in pixels, of the image, according to the given type of linear dimension. In a first particular implementation: w _m is the real width in meters of the subject, s _w is the real width in meters of the sensor, w _p is the apparent width in pixels of the subject in the image and f _w is the apparent width in image pixel (also called the camera's horizontal pixel resolution).

In a second particular implementation: w _m is the real height in meters of the subject, s _w is the real height in meters of the sensor, w _p is the apparent height in pixels of the subject in the image and f _w is the apparent height in image pixel (also called the camera's vertical pixel resolution).

The parameters I, s _w and f _w are intrinsic parameters of the camera. It is considered in the remainder of the description that they are known.

The parameter w _m is a parameter of the subject which is also considered to be known in the remainder of the description.

The use of the aforementioned equation also requires knowledge of the parameter w _p , that is to say the size of the subject in at least one dimension (width or height) in pixels on the image supplied by the camera. shooting.

A known solution (current state of the art) for estimating the parameter w _p , that is to say for measuring the dimension of a subject (face or object) on an image, involves the use of a neural network. At the end of the detection of the subject, a “box regression” will adjust the edges of an imaginary rectangle so that they are as close as possible to the subject. The values of the dimensions of this imaginary rectangle then give a measure of the dimensions of the subject, expressed in pixels.

A drawback of the known solution is that the use of a neural network requires prior learning of it with a sufficiently full database comprising different images of the same type of subject and precisely adjusted rectangles.

Another drawback of the known solution is that a neural network requires significant resources to produce results in a contained time. It is therefore difficult to integrate into an on-board system, which reduces its field of possible applications. 3. OBJECTIVES

The invention, in at least one embodiment, aims in particular to overcome these various drawbacks of the state of the art.

More specifically, in at least one embodiment of the invention, an objective is to provide a solution for measuring a distance between a camera and a subject, which does not require prior learning.

Another objective of at least one embodiment of the invention is to provide such a solution which uses few resources and is easily integrated into an on-board system.

Another objective of at least one embodiment of the invention is to provide such a solution which is simple to implement and inexpensive.

4. SUMMARY

In a particular embodiment of the invention, there is proposed a method for estimating, by a computing machine, a distance D between a camera, comprising a sensor and providing at least one 2D image, and a subject represented in said at least one 2D image, said method comprising a calculation according to the following equation: D = (w _m ^* f) / (w _p ^* s _w ), with:

- I a focal length of the camera;

- w _m a real linear dimension of the subject, having a linear dimension type;

- S _w a real linear dimension of the sensor, depending on the type of linear dimension;

- w _p an apparent linear dimension, in pixels, of the subject in said at least one image, according to the type of linear dimension; and

- f _w an apparent linear dimension, in pixels, of said at least one image, according to the given type of linear dimension.

The method comprises an estimation of the parameter w _p comprising:

- a) application to said at least one image of at least one contour extraction filter, to obtain at least one processed image;

- b) dividing said at least one processed image into a plurality of bands of the same width and oriented perpendicular to an axis along which the type of linear dimension is measured, the plurality of bands comprising a central band and first and second batches of strips located respectively on a first and a second side of the central strip;

- c) calculation of an intensity for each of the bands;

- d) selecting, from each of the first and second batches of bands, a band of the highest intensity; and

- e) calculation of a distance d in pixels between the two selected bands, constituting a first estimate of the parameter w _p .

Thus, the proposed solution proposes a completely new and inventive approach for estimating the parameter w _p (apparent linear dimension of the subject in pixels), consisting in applying to the image at least one contour extraction filter, in dividing l 'image resulting in a plurality of bands, in selecting two of these bands and in calculating a distance d in pixels between these two selected bands (constituting a first estimate of the parameter w _p ). The proposed solution therefore does not require the use of a neural network, nor a fortiori prior learning. In addition, it uses few resources and is therefore easily integrated into an on-board system. In short, it is simple to implement and inexpensive.

In a first particular implementation, the type of linear dimension is width and the bands are vertical.

In other words, in this first particular implementation: w _m is the real width in meters of the subject, s _w is the real width in meters of the sensor, w _p is the apparent width in pixels of the subject in the image and f _w is the apparent width in pixels of the image (also called the horizontal resolution in pixels of the camera).

In a second particular implementation, the linear dimension type is height and the bands are horizontal.

In other words, in this second particular implementation: w _m is the real height in meters of the subject, s _w is the real height in meters of the sensor, w _p is the apparent height in pixels of the subject in the image and f _w is the apparent height in pixels of the image (also called the vertical resolution in pixels of the camera).

According to one particular characteristic, each of the bands is composed of a plurality of rows of pixels extending parallel to said axis, and step c) of calculating an intensity for each of the bands comprises: for each row of each band, calculating a sum of the values of the pixels of said row and keeping the sum only if it is greater than or equal to a predetermined threshold; and

- calculation of the intensity of each of the bands by adding the sums kept of the lines of said band.

Thus, an accumulation and a thresholding by bands are carried out. This allows a certain tolerance to be applied to the tilt of the subject (face for example). This tolerance can be configured via the width of the bands. The larger these are, the more tolerant the algorithm is. Conversely, the thinner the bands, the more precise the algorithm. So there is a tradeoff to be made when selecting the bandwidth.

According to one particular characteristic, the camera provides a sequence of successive 2D images. For each of the 2D images, the method comprises a current iteration of steps a) to e) and of the following step f), executed after step e): calculation of an average distance d resulting from an average of the distances d calculated in the current iteration and in a determined number of previous iterations of step e). The mean distance d calculated in the current iteration of step f) constitutes a second estimate of the parameter w _p .

Thus, the averaging of the calculated distances d acts as a low-pass filter which smooths the output so as not to take into account any outliers due to noise. This averaging helps to eliminate the noise generated by the background of the image and the elements of the subject (in the case of a face: eyes, nose, mouth, etc.).

According to one particular characteristic, the camera provides a sequence of successive 2D images. For each of the 2D images, the method comprises a current iteration of steps a) to e) or f) and of the following step a '), executed between steps a) and b): calculation of a filtered image by multiplying the processed image, resulting from the execution of step a), by a probability mask. Step b) is performed with the filtered image resulting from performing step a ’).

Thus, applying a probability mask also helps to eliminate noise generated by the background of the image and elements of the subject.

According to one particular characteristic, the probability mask is a matrix of the same size as the processed image, each row of the probability mask, if the bands are vertical, or each column of the probability mask, if the bands are horizontal, containing coefficients whose values are given, except for the first iteration of step a '), by a double Gaussian-type probability function, presenting two centered peaks:

- on the positions of the two bands selected in the previous iteration of step d); Where

- on average positions calculated from the positions of the two bands selected in a determined number of previous iterations of step d).

In this way, the probability mask is adaptive and improves the quality of the estimate of the distance D between the camera and the subject.

According to a particular characteristic, a default probability mask is used for the first iteration of step a ’).

In this way, bad initialization is avoided and the number of iterations necessary to obtain a good estimate of the distance D between the camera and the subject is reduced.

In a particular application, the subject is a face or an object.

Other subjects, corresponding to other applications of the proposed solution, can be envisaged without departing from the scope of the present invention.

In another embodiment of the invention, there is provided a computer program product comprising program code instructions which, when they are executed by a computing machine, cause the computing machine to perform the aforementioned method. (in any of its various embodiments).

In another embodiment of the invention, there is provided a non-transient, computer readable storage medium storing the aforementioned computer program product.

In another embodiment of the invention, there is provided a computing machine configured to perform the aforementioned method (in any one of its various embodiments). 5. LIST OF FIGURES

Other characteristics and advantages of the invention will become apparent on reading the following description, given by way of indicative and non-limiting example, and the accompanying drawings, in which:

[Fig. 1] shows a simplified flowchart of the method according to the invention;

[Fig. 2] illustrates a particular embodiment of step 20 of Figure 1; [Fig. 3] illustrates the generation of the probability mask, performed in step 22 of Figure 2;

[Fig. 4] illustrates a particular embodiment of step 21 of Figure 2;

[Fig. 5] illustrates a particular embodiment of step 22 of Figure 2;

[Fig. 6] illustrates a particular embodiment of step 23 of Figure 2;

[Fig. 7] illustrates a particular embodiment of step 24 of Figure 2;

[Fig. 8] illustrates a particular embodiment of step 80 of Figure 1; and

[Fig. 9] presents the structure of a computing machine, according to a particular embodiment, configured to carry out the method of the invention.

6. DETAILED DESCRIPTION

In all the figures of this document, identical elements and steps are designated by the same numerical reference.

We now present, in relation to the flowchart of FIG. 1, the method according to the invention for estimating, by a computing machine, a distance D between a camera, comprising a sensor and supplying the minus one 2D image, and a subject represented in the at least one 2D image.

The process (block referenced 1 and with the acronym MDR for "Rapid Distance Measurement") includes:

• a step 20 of estimating the parameter w _p , as a function of the at least one 2D image; and

• a step 80 of calculating the distance D as a function of the parameter w _p (estimated at step 20) and of the parameters (assumed to be known) I, w _m , f and s _w , according to the equation (illustrated in FIG. 8): D = I ^* (w _m ^* f) / (w _p ^* s _w ), hence the acronym of step 80 in figures 1 and 8: EDFG for “Estimation of Distance by Geometric Formula” . As already explained above (in relation to the prior art), the parameters of the aforementioned equation are defined, generically, as follows:

• I a focal length of the camera;

• s _w an actual linear dimension of the sensor, depending on the type of linear dimension;

• f _w an apparent linear dimension, in pixels, of the image, according to the given type of linear dimension.

In the remainder of the description, we go to the first particular implementation (also already explained above) and we consider that: w _m is the real width in meters of the subject, s _w is the real width in meters of the sensor, w _p is the apparent width in pixels of the subject in the image (hence the acronym of step 20 in Figures 1 and 2: AMLI for "Imaginary Width Measurement Algorithm") and f _w is the apparent width in pixel of the image (also called horizontal resolution in pixel of the camera). The parameters I, s _w and f _w are intrinsic parameters of the camera and it is considered in the remainder of the description that they are known.

In the remainder of the description, it is also considered, by way of illustrative example, that the subject is a face. In other words, we consider that the parameter w _p is the apparent width (in pixels) of the face in the image.

In this case, for the parameter w _m , it is possible to take an average value of absolute face width in meters of a human face. On the Wikipedia page dedicated to the average proportions of a human face

(https://en.wikipedia.org/wiki/Human_head), the average edge-to-edge distance of a face between males and females is 13.9 cm.

It is recalled that the present invention is not limited to this first implementation and also applies (in particular but not exclusively) in the second particular implementation (also already explained above) in which: w _m is the real height in meters of the subject, s _w is the real height in meters of the sensor, w _p is the apparent height in pixels of the subject in the image and f _w is the apparent height in pixels of the image (also called the vertical resolution in pixels of the camera).

The present invention is also not limited to the case where the subject is a face. In variants, the subject is for example an object.

FIG. 2 illustrates a particular embodiment of step 20 of FIG. 1. In this particular embodiment of step 20, the parameter w _p (apparent width in pixels of the face in the image) is estimated in function of a sequence of successive 2D images (sequence referenced 10). This is for example a video provided by a video camera.

Step 20 itself comprises, for a given image (hereinafter called "current image"), an iteration (hereinafter called "current iteration") of the following steps:

• a contour extraction step 21, by applying to the current image at least one contour extraction filter, to obtain a processed image;

• a step 22 of calculating a probability mask and of applying it (by multiplication) to the processed image (resulting from the execution of step 21), to obtain a filtered image;

• a step 23 of applying to the filtered image (resulting from the execution of step 22) a “band algorithm”, making it possible to calculate a distance d in pixels between two selected bands (see detailed description below -after), this distance d constituting a first estimate of the parameter w _p ; and

• a step 24 of calculating an average distance d resulting from an average of the distances d (that is to say of the first estimates of the parameter w _p ) calculated on the one hand in the current iteration of the step 23 and on the other hand in a determined number of previous iterations of step 23; the average distance d calculated in the current iteration of step 24 constitutes a second estimate of the parameter w _p , which is supplied as an input to step 80 for calculating the distance D. As illustrated in FIG. 4, in a particular embodiment the step 21 of extracting contours consists in applying one or more extraction filters of contours (extraction of vertical or horizontal contours, depending on whether the parameter w _p is the apparent width or height in pixels of the subject in the image) - for example one or more Sobel filters - on a source image 41 (the image resultant is referenced 42), then applying a binary thresholding to binarize the image and keep only the most salient contours (by applying an absolute value to the output, one can extract the rising edges and the falling edges of the vertical contours d 'an object). The processed image obtained at the end of step 21 is referenced 43.

As illustrated in FIG. 5, in a particular embodiment, step 22 comprises a weighting of the contours contained in the processed image 43 (resulting from step 21) with a probability mask 50 (obtained as detailed below. after), in order to filter out unwanted contours and preserve the edges of the face. The filtered image obtained at the end of step 22 is referenced 51.

In a particular implementation, the probability mask 50 is a matrix of the same size as the processed image 43. Each row of the probability mask contains coefficients whose values are given, except for the first iteration of step 22 (in which a default probability mask is used), by a double Gaussian type probability function (referenced 52 in Figure 5), having two peaks centered on average positions calculated from the positions of the two bands selected in a determined number previous iterations of step 23.

FIG. 3 shows schematically the first two iterations of the method 20 of FIG. 2: the first iteration is carried out with as input the first image of the sequence (referenced "Image 0") and the second iteration is carried out with the second image of the sequence (referenced "Image 1"). It can be seen that during the first iteration, step 24 provides an estimate of the parameter w _{p as} well as information 30 which is used as input to step 22 during the second iteration. The information 30 is used to calculate (in step 22 during the second iteration) the probability mask which is applied to the second image ("Image 1") after it has been processed in step 21 (from the second iteration).

In a particular implementation, the information includes: • a first information on the relative positioning (that is to say the distance between them) of the two peaks of the double Gaussian type probability function: this first information is the estimate of the parameter w _p (mean distance d) ; and

• a second item of information on the absolute positioning of the two peaks on each row of the probability mask: this second item of information corresponds to the positions of the two bands selected in step 23 and it is for example supplied in the form of the position P of the center of the two bands selected in step 23 (to define the position of the center of symmetry of the masking function, that is to say the position of the point of symmetry between the two peaks). In a variant, it is assumed that the subject (face) is substantially centered over the width of the source image 41 (with one edge of the face to the right of the image and the other edge to the left of the image) . In this variant, only one item of information 30 is transmitted, namely the first aforementioned item of information (that is to say the estimate of the parameter w _p ). Indeed, it follows from the aforementioned assumption that the position P of the center of the two bands selected in step 23 is substantially located in the middle of the width of the image; therefore the second information is not necessary.

As illustrated in FIG. 6. In a particular embodiment, step 23 consists in applying a “band algorithm” to the filtered image 51 obtained at the end of step 22. This “band algorithm” for example itself includes the following substeps:

• division of the filtered image 51 into a plurality of vertical bands (in the present example where the parameter w _p is the apparent width in pixels of the subject in the image; or else horizontal bands in the variant where the parameter w _p is the apparent height in pixels of the subject in the image) of the same width, comprising a central band and first and second sets of bands located respectively on a first and a second side of the central band (in figure 6 , the element referenced 63 makes it possible to view the bands, with the central band 630 and the first and second batches of bands referenced 631 and 632 respectively); • calculation of an intensity for each of the bands, by accumulation by bands and thresholding: each of the bands is made up of a plurality of rows of pixels (extending horizontally, that is to say perpendicular to the axis of the bands and parallel to the axis along which the width of the face is measured), and the calculation of an intensity for each of the bands includes: o for each line of each band, calculation of a sum of the values of the pixels of the line and keeping the sum only if it is greater than or equal to a predetermined threshold (the image thus obtained is referenced 61 in FIG. 6); and o calculating the intensity of each of the bands by adding the sums kept of the lines of the band;

• selecting, from each of the first and second batches of bands, a band with the highest intensity; thus, in the example illustrated to the right of FIG. 6, the two selected bands (in the aforementioned image 61) are referenced 62a and 62b; and

• calculation of a distance d in pixels (referenced 63) between the two selected bands (62a, 62b), this distance d constituting the first estimate of the parameter w _p .

As illustrated in Figure 7, in a particular embodiment, step 24 comprises:

The addition of the distance d (also referenced 63 and calculated in step 23) in a first buffer memory 70 which is a FIFO type stack of size N, containing the N distances d calculated during the N previous iterations of l 'step 23; followed by a calculation (block referenced 71) of the average of the N distances d contained in the first buffer memory 70; and

• adding the position P of the center of the two selected bands (this position P is also calculated in step 23) in a second buffer memory 70 'which is a FIFO type stack of size N, containing the N positions P calculated during the previous N iterations of step 23; followed by a calculation (block referenced 71 ') of the average of the N positions P contained in the second buffer memory 70'. The operations performed in step 24 act as a low pass filter which smooths the outputs (distance d and position P) so as not to take into account any outliers due to noise. If N is too small, the noise will not be smoothed. Conversely, if N is too large, the temporal evolution of the output width (parameter w _p ) will be affected by a latency induced by the filter.

The mean distance d resulting from this averaging constitutes the second estimate of the parameter w _p , and it is the output result of step 24. The mean distance d 71 is therefore supplied at the input of calculation step 80. the distance D, as symbolized by the arrow referenced 72. It is also provided (as symbolized by the arrow referenced 73), with the mean position P (as symbolized by the arrow referenced 73 '), for the calculation of the probability mask during the next iteration of step 22 (for the next image in the sequence).

FIG. 9 presents an example of the structure of a computing machine 90 for carrying out (executing) the method presented above in relation to FIGS. 1 to 8.

This structure comprises a random access memory 92 (for example a RAM memory), a read only memory 93 (for example a ROM memory or a hard disk) and a processing unit 91 (equipped for example with a processor, and controlled by a program. computer 930 stored in ROM 93). On initialization, the code instructions of the computer program 930 are for example loaded into the random access memory 92 before being executed by the processor of the processing unit 91. The processor receives the sequence of images as input. and the parameters I, w _m , f and s _w , and outputs an estimate of the distance D.

This FIG. 9 illustrates only one particular way, among several possible, of implementing a computing machine to carry out (execute) the method. Indeed, the computing machine can be implemented indifferently in the form of a reprogrammable computing machine (a PC computer, a DSP processor or a microcontroller) executing a program comprising a sequence of instructions, or in the form of a dedicated computing machine (for example a set of logic gates such as an FPGA or an ASIC, or any other hardware module).

In the case of an implementation in the form of a reprogrammable computing machine, the corresponding program (i.e. the sequence of instructions) may or may not be stored in a removable storage medium (such as for example a floppy disk, a CD-ROM or a DVD-ROM), this storage medium being partially or totally readable by a computer or a processor.

Claims

1) Method for estimating, by a computing machine (90), a distance D between a camera, comprising a sensor and providing at least one 2D image, and a subject represented in said at least one image 2D, said method comprising a calculation (80) according to the following equation: D = I ^* (w _m ^* f _w ) / (w _p ^* s _w ), with:

- I a focal length of the camera;

- w _m a real linear dimension of the subject, having a linear dimension type;

- f _w an apparent linear dimension, in pixels, of said at least one image, according to the given type of linear dimension; characterized in that it comprises an estimate (20) of the parameter w _p comprising:

- a) application (21) to said at least one image of at least one contour extraction filter, to obtain at least one processed image;

- b) division (23) of said at least one processed image into a plurality of bands of the same width and oriented perpendicular to an axis along which the type of linear dimension is measured, the plurality of bands comprising a central band and first and second sets of strips located respectively on a first and a second side of the central strip;

- c) calculation (23) of an intensity for each of the bands;

- d) selecting (23), from each of the first and second batches of bands, a band of the highest intensity; and

- e) calculation (23) of a distance d in pixels between the two selected bands, constituting a first estimate of the parameter w _p .

2) Method according to claim 1, characterized in that the type of linear dimension is a width and in that the bands are vertical. 3) Method according to claim 1, characterized in that the type of linear dimension is a height and in that the bands are horizontal.

4) Method according to any one of claims 1 to 3, characterized in that each of the bands is composed of a plurality of lines of pixels extending parallel to said axis, and in that step c) of calculating d 'an intensity for each of the bands includes:

- for each line of each band, calculating a sum of the values of the pixels of said line and keeping the sum only if it is greater than or equal to a predetermined threshold; and

5) Method according to any one of claims 1 to 4, characterized in that the camera provides a sequence of successive 2D images, in that, for each of the 2D images, the method comprises a current iteration of steps a) to e) and of the following step f), executed after step e): calculation (24) of an average distance d resulting from an average of the distances d calculated in the current iteration and in a determined number of previous iterations of step e), and in that the mean distance d calculated in the current iteration of step f) constitutes a second estimate of the parameter w _p .

6) Method according to any one of claims 1 to 5, characterized in that the camera provides a sequence of successive 2D images, in that, for each of the 2D images, the method comprises a current iteration of steps a) to e) or f) and of the following step a '), executed between steps a) and b): calculation (22) of a filtered image by multiplying the processed image, resulting from the execution of step a), by a probability mask, and in that step b) is performed with the filtered image resulting from the execution of step a ′). 7) Method according to claim 6, characterized in that the probability mask is a matrix of the same size as the processed image, each row of the probability mask, if the bands are vertical, or each column of the probability mask, if the bands are horizontal, containing coefficients whose values are given, except for the first iteration of step a '), by a double Gaussian type probability function, presenting two centered peaks:

8) Method according to claim 7, characterized in that a default probability mask is used for the first iteration of step a ’).

9) Method according to any one of claims 1 to 8, characterized in that the subject is a face or an object.

10) Computer program product (930), comprising program code instructions which, when they are executed by a computing machine (90), cause the computing machine to carry out the method according to any one of the claims 1 to 9.

11) A non-transient, computer readable storage medium (93) storing the computer program product (930) of claim 10.

12) Computing machine (90) configured to perform the method according to any one of claims 1 to 9.