WO2010130964A1

WO2010130964A1 - System for counting people

Info

Publication number: WO2010130964A1
Application number: PCT/FR2010/050938
Authority: WO
Inventors: Louahdi Khoudour; Tarek Yahiaoui
Original assignee: INRETS - Institut National de Recherche sur les Transports et leur Sécurité
Priority date: 2009-05-14
Filing date: 2010-05-14
Publication date: 2010-11-18
Also published as: FR2945652B1; EP2430618A1; FR2945652A1

Abstract

System (1) for counting people, comprising image acquisition means (4) and data processing means (5), characterized in that the image acquisition means (4) include a stereoscopic sensor, said data processing means being able to receive stereoscopic images from said stereoscopic sensor, to infer whether or not one or more persons are present in the field of vision of said stereoscopic sensor (5), and, for each person whose presence has been detected, to calculate the movement path of said person in order to determine whether said person should be counted or not.

Description

SYSTEM FOR COUNTING PEOPLE

The present invention relates to a people counting system. The people counting system is particularly intended to be used to count travelers in and out of a means of transport, for example a bus. Passenger counting is a primary need for transport operators. Indeed, they seek accurate and reliable metering information to be able to plan and manage the human, financial and material resources in the most appropriate manner. In particular, the sharing of revenue between operators serving the same areas and the evaluation of the fraud rate are two objectives that require very precise counting information. In the latter case, approximations and statistical estimates are not sufficient to achieve these objectives.

Many people counting systems in public transport exist and are based on various technologies, such as infrared sensors, ultrasound, contact mats or light rays. However, existing systems have very high error rates and can not handle complex situations like the output of a crowd of people. In addition, existing systems are generally expensive and require complex data processing. The counting system can also be used in a shop, a museum, a cinema, a sports center, or generally in any public or private place.

The present invention aims to provide a people counting system that avoids at least some of the aforementioned drawbacks, which is economical and reliable.

For this purpose, the subject of the invention is a person counting system comprising image acquisition means and data processing means, characterized in that the image acquisition means comprise a stereoscopic sensor, said data processing means being adapted to receive stereoscopic images from said stereoscopic sensor, to deduce the presence or absence thereof one or more person (s) in the field of view of said stereoscopic sensor, and, for each person whose presence has been detected, to calculate the trajectory of said person to determine whether said person should be counted or not. Preferably, the stereoscopic sensor has a resolution of less than or equal to 320 pixels × 240 pixels.

Advantageously, said stereoscopic sensor comprises two webcams, the acquisition frequency of each of said webcams being about thirty images per second. According to one embodiment of the invention, said image acquisition means and said data processing means are connected to each other via a USB2 interface.

According to another embodiment of the invention, said image acquisition means and said data processing means are connected to each other by a series-type input-output bus.

Preferably, said processing means execute a counting method comprising steps, for each pair of stereoscopic images, of: (a) calculating a disparity map, (b) using said disparity map to calculate a map of the heights representing the distances from the ground of each point of the scene, and

(c) use said height map to highlight the presence of one or more person (s). Advantageously, step (c) comprises sub-steps consisting of:

(d) segmenting said height map to highlight the heads of the people, the result being a binary image called a kernel map, a kernel corresponding to a set of related pixels presumably representing a person's head,

(e) using said kernel map to obtain an attribute vector characterizing each person detected,

(f) using said attribute vector to calculate the trajectory of said person, and (g) analyzing said trajectory in a series of successive images to determine whether said person is to be counted or not. According to one embodiment of the invention, said counting method comprises the steps of testing similarity criteria chosen from the similarity of the gray levels of the centers of the calculation neighborhoods, the similarity of belonging to a contour, the similarity of the curves of the gray levels of the central lines of the calculation neighborhoods and the similarity of belonging to a region affected by the movement.

According to one embodiment of the invention, said processing means execute a counting method, said counting method being executed in real time.

According to another embodiment of the invention, said processing means execute a counting method, said counting method being executed in deferred time.

The invention also relates to the use of the counting system for counting passengers entering and exiting through a door of a bus, said stereoscopic sensor being placed vertically above the door of the bus.

The invention will be better understood, and other objects, details, features and advantages thereof will become more clearly apparent in the following detailed explanatory description of several embodiments of the invention given as examples. purely illustrative and non-limiting examples, with reference to the attached schematic drawings.

In these drawings: FIG. 1 is a simplified schematic perspective view of the counting system according to one embodiment of the invention; Fig. 2 is a block diagram showing the steps of the person counting method; FIGS. 3a to 3b are curves representing variables used by the data processing means of the counting system: FIG. 3a is a curve representing the dissimilarity criterion as a function of the offset before using a weighting coefficient ; FIG. 3b is a curve representing the variation of the value of the weighting coefficient as a function of the offset s; o The curve 3c is a curve representing the dissimilarity criterion as a function of the offset s after the integration of the weighting coefficient; Figure 4a is a schematic view of a counting area used by the people counting method, showing examples of valid trajectories; and FIG. 4b is a schematic view of the counting zone, showing examples of invalid trajectories.

The subject of the invention is a people counting system based on stereoscopic vision. In the example shown in FIG. 1, the counting system is used to count the passengers 2 entering and leaving a bus via a door 3.

One objective is to provide a metering system 1 accurate and adapted to the environment in which it is used, in the example the bus. The counting system 1 comprises image acquisition means, or sensor 4, and data processing means, for example a computer 5, which are for example connected to one another via a USB2 interface (Universal Serial Bus). Alternatively, the sensor 4 and the computer 5 are connected to each other by a serial input-output bus (IEEE). This makes it possible to increase the flow rate and the transmission speed.

The basic idea is to isolate and separate people's heads in order to count them. The most favorable way to do this is to install the image acquisition means 4 so as to have a top view of the scene observed.

The stereoscopic sensor 4 comprises a left webcam 6a, for producing left images, and a right webcam 6b, for making straight images. The webcams 6a, 6b are synchronized, calibrated and mounted in a housing. The sensor 4 is powered by power supply means 7 of the system 1. The acquisition frequency of each webcam 6a, 6b is for example about thirty images per second. The webcams 6a, 6b have a low resolution, less than or equal to 320 pixels x 240 pixels. Alternatively, the resolution may be 160 pixels x 120 pixels or 80 pixels x 60 pixels.

The use of a low-resolution sensor 4 makes it possible on the one hand to reduce the price of the sensor 4, and therefore of the system 1, and, on the other hand, to limit the quantity of data to be processed. It is thus possible to perform a real-time processing using simple processing means, for example a standard computer (PC) 5, without loss of accuracy of counting. Note that the use of low resolution image acquisition means 4 involves using a suitable processing method, for example that described below.

The two webcams 6a, 6b are set up in the bus so that their image plans are parallel and they have the same baseline (epipolar configuration). In other words, in the example, the sensor 4 is placed vertically above the door 3 of the bus to have a top view of the heads of the passengers entering and leaving through this door.

The processing means 5 exploit the stereoscopic images acquired using the sensor 4 and save the counting results. The processing means 5 comprise four processing blocks: a detection block, a segmentation block, an attribute extraction block and a tracking and counting block.

The detection block calculates a disparity map for each pair of stereoscopic images and uses the disparity map to calculate a map of the heights representing the distance from the ground of each point of the scene.

The segmentation block uses the heights map to highlight the heads of the passengers, which correspond to the related areas of significant heights. The result is a binary image on which are represented nuclei. The goal is to obtain for each head in the observed scene a kernel on the binary image.

The attribute extraction block measures a number of parameters from the left real image, the height map and the resulting binary image of the segmentation block. As a result, we obtain an attribute vector for each kernel that defines the coordinates of the kernel, its size, its shape, the average gray level of the region corresponds in the real image as well as the average height of the region which corresponds to it in the map of the heights.

The tracking and counting block uses the preceding attribute vectors to restore the trajectories of the nuclei, that is to say the trajectories of the heads of the passengers. The counting of the passengers is made from the analysis of each of the trajectories of the nuclei in the successive images.

Referring to FIG. 2, the method of counting people will now be described in more detail. The process is executed in the computer 5.

According to a first embodiment of the invention, the method is executed in real time. In this embodiment, data from webcams 6a, 6b are temporarily stored in a buffer memory for the time necessary for processing, but are not recorded.

According to a second embodiment of the invention, the method is executed in deferred time. In this embodiment, the data from the webcams 6a, 6b are previously stored in a memory, then are recovered in the memory to perform the processing. For example, the images from the webcam 6a are recorded in a data file "left images" and the images from the webcam 6b are recorded in a data file "right images".

In step 1, the computer 5 receives two stereoscopic images, respectively from the two webcams 6a, 6b. In the case of the first embodiment, the two images come directly from the two webcams 6a, 6b. In the case of the second embodiment, the two images come from the two webcams 6a, 6b via the data files. Step 1 is for example periodically performed with a period of 1/30 second.

Steps 2 to 5 are similarity criterion test steps.

A similarity criterion is a discriminant resemblance criterion that can exist between the pixels to be matched or between their neighborhoods (for example: the gray level, the gradient sign, the membership of a specific geometric shape, or other). The objectives of exploiting these similarity criteria are, on the one hand, the reduction of the calculation time (because only the pixels satisfying the criteria are taken into account) and, on the other hand, the improvement of the matching and the choice of the homologous pixels.

Here, we particularly want to improve the quality of matching. For that, we weight a dissonance criterion C _SAD allowing to refine the choice of the homologous pixels. This is done by introducing a weighting coefficient whose value depends on the verification or otherwise of a similarity criterion. When the similarity criterion is verified, the weighting coefficient takes a value making it possible to reduce the dissimilarity criterion to favor the matching of the pixels satisfying this similarity criterion. The value that the weighting coefficient takes when the similarity criterion is not verified in no way affects the dissimilarity criterion. This is modeled by a multiplication of the dissimilarity criterion C _SAD by a weighting coefficient.

The criterion of initial dissimilarity is written as follows:

C _SAD (x, y, z) = Σ \ G (x + i + s, y + /) - D (x + i, y + j) \ y

With: G (x, y): the gray level of the pixel (x, y) of the left image that we want to match,

D (x, y): the gray level of the candidate pixel (x, y) in the right image, and s: the offset between the two left and right pixels. The measure of dissimilarity after the introduction of the similarity criterion is then written as follows:

C _Sim (x, y, z) = coef £ \ G (x + i + s, y + j) - D (x + i, y + j) \ y coef denoting the weighting coefficient, with:

[coef = 1 if the similarity criterion is not checked, and if not coef = coefO with 0 <coefO <1 By way of example, FIG. 3a is a curve representing the dissimilarity criterion as a function of the offset s before the use of a weighting coefficient, FIG. 3b is a curve representing the variation of the value of the weighting coefficient in function of the offset s, and the curve 3c is a curve representing the dissimilarity criterion as a function of the offset s after the integration of the weighting coefficient. The disparity d corresponds to the offset for which the dissimilarity criterion is minimal.

In the example presented, the fact of being satisfied with the comparison of the neighborhoods of the pixels to be matched leads to a mapping error. As a result, the multiplication by a weighting coefficient makes it possible to favor one of the minima of the dissimilarity curve with respect to the others. Thus, the new value of the disparity corresponds to a point for which the dissimilarity is minimal and the similarity criterion is checked.

Based on this principle, four similarity criteria have been defined. The similarity information provided by each of these similarity criteria is discriminant, is not redundant compared to the others and does not require a crippling calculation time. The similarity criteria used are the similarity of the gray levels of the centers of the calculation neighborhoods, the similarity of the membership of a contour, the similarity of the greens of the curves of the gray levels of the central lines of the calculation neighborhoods, and the similarity of the membership. to a region affected by the movement. Α, β, γ and μ coefficients have been associated respectively with these similarity criteria. The values of these coefficients vary depending on whether the criteria are verified or not.

α = lsi G (x + s, y) and D (x, y) do not have similar gray levels α = αθ with 0 <αO <1 if G (x + s, y) and D (x, y) have similar gray levels β = lsiG (x + s, y) and D (x, y) m do not both correspond to contour points β = β 0 with 0 <β 0 <1 if G (x + s, y) and D (x , y) both correspond to contour points

γ = lsi the gray-level curves of the center lines of the left and right calculation neighborhoods do not have the same trend γ = γθ withθ <γθ <1 if the gray level curves of the center lines of the left and right calculation neighborhoods have the same pace

μ = lsiG (x + s, y) and D (x, y) do not simultaneously correspond to a moving or static object μ = μO with 0 <μO <1 if G (x + s, y) and D (x , y) corresponds simultaneously to a mobile or static object

The optimal values αO, βO, γO and μO have been calculated experimentally by choosing the values minimizing the mapping error rate.

Variations in the measurement of the dissimilarity criterion are possible. For example, to improve the accuracy of pixel matching, the different similarity criteria can be combined. In this case, it is considered that the proposed similarity criteria are of a different nature and therefore more or less independent. This means that the effect of one criterion is not redundant or opposite to that of others. It is therefore possible to combine the different criteria to constitute a global criterion. An additive model was chosen for the calculation of the dissimilarity, which corresponds to the weighting of the dissimilarity criterion by a coefficient encompassing the four criteria. The overall formulation becomes:

C (x, y, s) = (a + $ + y +

+ i + s, y + j) - D (x + i, y + j) \ y This value is used in step 6. In step 2, the method tests a first similarity criterion, relating to the gray levels of the pixels to be matched. The coefficient α is associated with this criterion, as previously described.

If the similarity criterion is verified α = αO otherwise α = 1. After calculating the coefficient α, the process proceeds to step 3.

In step 3, the method tests a second similarity criterion, relating to the membership of contours. The coefficient β is associated with this criterion, as described previously.

If the similarity criterion is verified β = βθ, otherwise β = 1. After calculating the coefficient β, the process proceeds to step 4.

In step 4, the method tests a third similarity criterion, relating to the gray levels of the central lines. The coefficient γ is associated with this criterion, as previously described.

If the similarity criterion is verified γ = γθ, otherwise γ = l. After calculating the coefficient γ, the process proceeds to step 5.

In step 5, the method tests a fourth similarity criterion, relating to the membership of a region affected by the movement. The coefficient μ is associated with this criterion, as previously described. If the similarity criterion is checked μ = μθ, otherwise μ = l. After calculating the coefficient μ, the process proceeds to step 6.

In step 6, the method determines a dense disparity map from the pixel mapping of the right and left images, i.e., according to the similarity criteria tested in steps 2 to 5. Read More the disparity value at a point is high and the corresponding point is close to the sensor.

Various techniques for calculating disparities are known.

They propose solutions to find the homologous pixels that are the pixels in the left and right images of a stereoscopic sensor that correspond to the two projections of the same point in the scene. Here, a fast mapping technique is preferably used. The sum of absolute differences (SAD), which is a technique based on the measure of dissimilarity between the neighborhoods of the pixels to be matched, represents an approach that allows a compromise between the robustness of the calculation of disparities and the processing time. In the example of counting passengers in and out of the bus, the sensor is close to the objects observed. The main difficulty, in this case, is the high number of occultations in the stereoscopic images, which corresponds to regions in the scene that appear on one image and not on the other.

To solve this problem, we use a technique based on the dissimilarity measure, but which integrates the criteria of similarity between the pixels to be matched as well as their neighborhood to improve the mapping in a global way on the whole of the image and in particular at the level of the hidden regions or at the level of those which are close to them.

The disparity, in this specific case, is the difference between the abscissae of the two projections of one point of the scene on the two images. The dense stereovision provides maps of dense disparities corresponding to the representation of each point of the scene (pixel) by the corresponding disparity value. The goal is to determine the distance information from the sensor at each point in the scene. This is inversely proportional to the disparity value.

In step 7, the method transforms the disparity information into pitch information using a mathematical transformation based on geometric modeling of the stereoscope. The result at the end of this step is a map of heights.

In step 8, the method segments the map of heights to highlight the nuclei, which represent the heads of people who pass under the sensor, and follow them. These cores have characteristics (average gray level, width, height, among others) that can be tracked in the counting area. Thus vectors of these characteristics are used for monitoring.

In other words, the calculated disparity maps are transformed into height maps, by a simple triangulation, then, to highlight the heads of the passengers, the process defines a number of height intervals. For each interval, the method has a binary image on which only the regions having a height between the values limiting the interval appear. The application of aperture-type morphological filters as well as the removal of neighborhoods of the related entities identified in the upper intervals on each of these binary images makes it possible to identify the heads of persons having sizes included in these intervals. At the end of the processing, we obtain a binary image representing the meeting of the intermediate results on which each nucleus corresponds to the head of a person.

The segmentation step of the heights map consists in identifying, from this one, the upper part of the body of each person moving in the scene. The goal is to highlight the heads of these people. The principle of the proposed approach is based on the definition of a certain number of threshold levels in the height map. For a certain number of predefined height intervals, the objective is to identify and recognize zones corresponding to heads in the thresholded images according to these intervals. Step 8 will now be described in more detail.

The number of intervals is for example set at four, the heights of the intervals corresponding to heights of people. This amounts to saying that five thresholds Si are predefined, i being an integer index between 1 and 5, to define the different intervals. The S-threshold corresponding to the highest height is S5 and the one corresponding to the lowest height is S l, which corresponds to the average height of a four-year-old child, since children of this age represent a category of users who do not pay for transportation.

Step 8 includes the substeps described below. A first sub-step is to binarize the image with the thresholds Si corresponding to the interval. An image is obtained in which a pixel of value 1 indicates that the point of the corresponding scene has a height between Si and Si-I. The first sub-step is performed for each interval, which allows to obtain four binary images.

A second substep is executed for all intervals except the upper interval. The method eliminates the elements of the image corresponding to persons already identified during the processing of the previous intervals. These elements are identified because they are related or located in neighborhoods of markers already defined during a transition to the fourth substep. A third substep is to eliminate small areas that are considered as detection errors. To do this, the method uses a morphological opening operation, which eliminates the connected components of the image whose surface is small. A fourth substep consists in defining markers on the related elements of the image that have not been eliminated during a transition to the second substep. These markers indicate the heads of people whose height is between Si and Si-I.

A fifth substep consists in uniting the resulting binary images to highlight, on the initial image, all the identified nuclei.

In step 9, the method calculates for each detected core a number of parameters. These parameters describe the position and shape of the kernel and correspond to a description of the gray level of the zone corresponding to the kernel on the real image and the height with respect to the ground of the zone corresponding to the kernel on the map of heights. This set of parameters is called an attribute vector. As a result, in step 9 the method computes for each kernel an attribute vector which makes it possible to identify each kernel. Step 9 will now be described in greater detail. This step makes it possible, in particular, to identify people of small sizes, for example children.

As described above, the method associates with each detected core a vector called an attribute vector. This vector represents a set of properties that distinguish each nucleus from the other nuclei present in the same scene, that is to say in the same image. In the example, kernels are people's heads, and attributes are properties that vary from head to head. Attributes enabling the processing of nuclei are, for example: - the average value of the gray levels of the points of the nucleus, calculated from the real images, the average height of the points corresponding to the pixels constituting the nucleus, calculated from the height map, the number of pixels forming the nucleus, calculated from the segmented height map (which corresponds to the size of the nucleus), and the width and length of the nucleus, as well as the coordinates of the center of gravity of the nucleus, calculated at from the segmented heights map.

It should be noted that the greater the number of discriminant attributes between cores, the more reliable the tracking procedure will be. Thus, alternatively, the attribute vectors can comprise seven components, namely: the size of the kernel in pixels, the width of the kernel in pixels, the length of the kernel in pixels, the average height of the kernel in centimeters, - the level average gray of the nucleus, and the horizontal coordinates of the nucleus.

In step 10, the method exploits the attribute vectors to reconstruct the trajectories of all the detected heads. The procedure consists in exploiting the measurements corresponding to the attribute vectors at time t and making a prediction on the variation of the latter and comparing the results of the prediction with the measurements corresponding to the vectors of attributes at time t + 1. The monitoring procedure then consists in measuring in a combinatorial manner a probability of correspondence, which makes it possible to associate the nuclei of the instant t with those of the instant t + 1 and thus reconstruct their trajectories. All tracking data is saved and is used at the end of the sequence by the counting procedure.

At the end of step 10, the method tests the end of the sequence. If the sequence is not finished, the process returns to step 1 to acquire a new pair of stereoscopic images and perform a new processing loop, otherwise the process proceeds to step 11.

In step 11, the method analyzes the trajectories, i.e., the method tests the validity of the trajectory of each kernel representing a person whose presence has been detected. Step 11 is repeated for each trajectory. If the process considers that the trajectory is valid, it go to step 12, otherwise it loops back to step 11 to analyze a new trajectory.

To track a trajectory, the method associates with each segmented object a vector called an attribute vector. This vector is a set of properties that distinguish each object from others in the same scene. In our case, the objects we are interested in are the kernels, that is to say the heads of people, and the attributes are properties that vary from one kernel to another, for example kernel size, width of kernel, kernel length, average height in centimeters of the region corresponding to the kernel on the map of heights, mean gray level of the region corresponding to the kernel on the actual image, and horizontal coordinates (x, y) of the center of gravity of the core. Other attributes can be introduced in the follow-up phase.

Tracking can be done using a Kalman filter. This consists in making a prediction on the positions of the nuclei and in comparing the predicted positions with the positions of the current measurement.

Figure 4a shows a rectangular counting area.

The counting zone 20 corresponds to an area under the sensor 4.

The counting zone 20 has an upper line 20a and a lower line 20b.

A first valid trajectory, symbolized by the arrow 21, corresponds to an entry with an appearance in the vicinity of the upper line 20a and a disappearance in the counting zone 20, which means that the person enters and remains in the counting zone 20 Such a person is taken into account.

A second valid trajectory, symbolized by the arrow 22, corresponds to an entry with an appearance in the vicinity of the upper line 20a and a disappearance in the vicinity of the lower line 20b, which means that the person enters and passes through the counting zone 20 Such a person is also taken into account.

A third valid trajectory, symbolized by the arrow 23, corresponds to an exit with an appearance in the vicinity of the lower line 20b and a disappearance in the vicinity of the upper line 20a, which means that the person passes through the counting zone 20 when his exit. Such a person is also taken into account. A fourth type of valid trajectory, symbolized by the arrows 24a and 24b, corresponds to an exit with an appearance in the counting zone 20 and a disappearance in the vicinity of the upper line 20a, which means that the person is in the zone counting 20 before its release. Such a person is also taken into account.

FIG. 4b, which also represents the counting zone 20, shows examples of invalid trajectories.

A first invalid trajectory, symbolized by the arrow 25, corresponds to an appearance in the vicinity of the upper line 20a and a disappearance in the vicinity of the same line 20a, which means that the person is entered and immediately returned.

A second invalid trajectory, symbolized by the arrow 26, corresponds to an appearance in the vicinity of the lower line 20b and a disappearance in the vicinity of the same line 20b, which corresponds to an aborted exit intention.

A third invalid trajectory, symbolized by the arrow 27, corresponds to an appearance and a disappearance in the counting zone 20, which corresponds to a wandering under the sensor 4 without precise intentions.

A fourth invalid trajectory, symbolized by the arrow 28, corresponds to an appearance in the vicinity of the lower line 20b and a disappearance in the counting zone 20.

When, in step 11, the method considers that the path tested should not result in the counting of a person, the process ends for the path considered.

In step 12, the method considers that the trajectory tested at Petapell must result in the counting of a person. In this case, the method increments the input counter or the output counter by one unit, depending on the direction of the path.

At the end of step 2, the method performs a test (step 13) to check if all the trajectories have been analyzed. If the test is not verified the process returns to step 11, otherwise the process stops.

The counting system has been evaluated on real data and the results obtained have a maximum error rate of the order of 1%. In particular, the system makes it possible to count passengers by size class. Indeed, the segmentation of the height map being performed by height class, it is possible to have the count according to this class of heights. Thus, it is possible to discriminate flows of adults and children, for example.

The system also makes it possible to take into account cases of temporary descents.

In the case of buses for example, the data corresponding to each descendant person are kept in buffer memory for a certain time, that is to say that the attributes of the kernel corresponding to the head of this person are memorized temporarily, to check whether the descent of the person is not temporary. This information is very important for bus operators who want to know the exact number of descents and climbs per stop. The exploitation of the stereoscopic vision and the highlighting of the heads of the passengers by using a process that does not depend on the color information (gray levels) make it possible to lower the resolution of the images to be processed without significantly degrading the accuracy count. This advantage has been demonstrated by different evaluations of the counting system. Thus, the lack of information due to the lowering of resolution is compensated by the information of height and the segmentation of the scene according to geometrical parameters. On the other hand, the height relative to the ground is insensitive to the variation of the resolution. For example, a person who measures lm80 will correspond to a kernel with a height of lm80 regardless of the resolution of the images. As a result, the counting method makes it possible to obtain a good counting accuracy despite the constraints of variability of the illumination, of the drop in resolution, of juxtaposition and of the deformable nature of the objects observed. Although the invention has been described in connection with several particular embodiments, it is obvious that it is not limited thereto and that it comprises all the technical equivalents of the means described and their combinations if they are within the scope of the invention.

Claims

1. Counting system (1) of persons comprising image acquisition means (4) and data processing means (5), the image acquisition means (4) comprising a stereoscopic sensor, said data processing means being able to receive stereoscopic images from said stereoscopic sensor, to deduce the presence or absence of one or more persons in the field of vision of said stereoscopic sensor (5), and for each person whose presence has been detected, calculating the trajectory of said person to determine whether said person should be counted or not, characterized in that said stereoscopic sensor has a resolution less than or equal to 320 pixels x 240 pixels.

Counting system according to claim 1, characterized in that said stereoscopic sensor (4) comprises two webcams (6a,

6b), the acquisition frequency of each of said webcams (6a, 6b) being about thirty frames per second.

3. Counting system according to claim 1 or 2, characterized in that said image acquisition means (4) and said data processing means (5) are connected to each other via a USB2 interface.

4. Counting system according to claim 1 or 2, characterized in that said image acquisition means (4) and said data processing means (5) are connected to each other by an input / output bus of series type

5. A counting method that can be executed by the processing means of the counting system according to any one of claims 1 to 4, characterized in that it comprises the steps of: (a) receiving two stereoscopic images originating from a stereoscopic sensor having a resolution less than or equal to 320 pixels x 240 pixels,

(b) computing, for said pair of stereoscopic images, a map of disparities, (c) exploiting said disparity map to calculate a map of the heights representing the distances from the ground of each point of the scene, and

(d) using said height map to highlight the presence of one or more person (s).

Counting method according to claim 5, characterized in that step (d) comprises sub-steps consisting of:

(e) segmenting said height map to highlight the heads of the persons, the result being a binary image called a map of the cores, a cores corresponding to a set of related pixels presumably representing a person's head,

(f) using said kernel map to obtain an attribute vector characterizing each person detected,

(g) using said attribute vector to calculate the trajectory of said person, and

(h) analyzing said trajectory in a series of successive images to determine whether said person should be counted or not.

7. Counting method according to claim 5 or 6, characterized in that it comprises the steps of testing similarity criteria selected from the similarity of the gray levels of the centers of calculation neighborhoods, the similarity of membership to a contour, the similarity of the curves of the gray level curves of the central lines of the calculation neighborhoods, and the similarity of belonging to a region affected by the motion.

8. Counting method according to any one of claims 5 to 7, characterized in that it is executed in real time.

9. Counting method according to any one of claims 5 to 7, characterized in that it is executed in deferred time.

10. Use of the counting system according to any one of claims 1 to 4 to count the passengers entering and exiting through a door of a bus, said stereoscopic sensor (4) being placed vertically above the door (3) of the bus.