CN106709432B

CN106709432B - Human head detection counting method based on binocular stereo vision

Info

Publication number: CN106709432B
Application number: CN201611108304.5A
Authority: CN
Inventors: 周剑; 龙学军; 姜艾佳; 谷瑞翔
Original assignee: Chengdu Topplusvision Science & Technology Co ltd
Current assignee: Chengdu Topplusvision Science & Technology Co ltd
Priority date: 2016-12-06
Filing date: 2016-12-06
Publication date: 2020-09-11
Anticipated expiration: 2036-12-06
Also published as: CN106709432A

Abstract

The invention relates to the technical field of computer vision, and discloses a human head detection counting method based on binocular stereo vision, which solves the problems of large calculation amount, inaccurate counting and low precision of a human head detection counting scheme in the prior art. The method comprises the following steps: calibrating a binocular image acquisition system; b, acquiring images of a monitoring area by using a calibrated binocular image acquisition system to acquire a passenger flow scene image; c, preprocessing the acquired passenger flow scene image; step d, obtaining a depth map of the image; e, detecting the heads of different depths in the depth map by using a contour line searching method; f, tracking the detected human head; step g, detecting a new head in a new video frame, and updating the number of the heads; and h, outputting the number of the current heads. The invention is suitable for head detection and counting in passenger flow scenes.

Description

Human head detection counting method based on binocular stereo vision

Technical Field

The invention relates to the technical field of computer vision, in particular to a human head detection counting method based on binocular stereo vision.

Background

With the increase of population and the frequent traveling of people, a great number of people are surged in various traffic systems, meeting places, shopping malls, exhibition halls, airport terminals and other places, and the work of people counting is very important from the perspective of both the business and the safety. However, it is difficult to count manually, and the traffic in these large traffic areas is not manually counted. Therefore, the method has great significance for the research of automatic people counting.

Currently, the proposed automatic people counting methods mainly include methods based on active infrared sensing, passive infrared sensing, pedal pressure sensing, video image processing and the like. The technology of the active infrared induction-based system is mature, the anti-interference capability is strong, but the counting of crowds can not be effectively solved no matter single-beam or multi-beam infrared light is adopted; the passive infrared counting technology can distinguish the objects with life and without life by detecting the thermal infrared emitted by the human body for counting, but is easily influenced by the clothes of people, the environmental temperature and the like, and cannot adapt to the counting of crowds. The people counting method based on the pedal pressure sensor is usually used in the bus occasions, but the passengers are required to get on and off the bus in sequence and cannot be crowded, and the entering and exiting directions of the passenger flow cannot be well judged.

Methods based on video image processing techniques are currently the most recently developed counting methods, and are classified into two categories based on monocular and binocular shooting. The former method utilizes the gray scale and the chromaticity information of the target to segment the moving target, but is sensitive to the change of light in a counting scene, and the influence of shadow and interferents on the extraction of the moving target is very large, so that the accurate counting is difficult; the latter method utilizes the three-dimensional information of the moving target, can better solve the problems of illumination change and shadow in the former method, but has large calculation amount and inaccurate detection based on the head circle, so that the head statistical accuracy in a complex scene is low.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: a human head detection counting method based on binocular stereo vision is provided, and the problems of large calculated amount, inaccurate counting and low precision existing in a human head detection counting scheme in the traditional technology are solved.

The scheme adopted by the invention for solving the technical problems is as follows:

the human head detection counting method based on binocular stereo vision comprises the following steps:

calibrating a binocular image acquisition system;

b, acquiring images of a monitoring area by using a calibrated binocular image acquisition system to acquire a passenger flow scene image;

c, preprocessing the acquired passenger flow scene image;

step d, obtaining a depth map of the image;

e, detecting the heads of different depths in the depth map by using a contour line searching method;

f, tracking the detected human head;

step g, detecting a new head in a new video frame, and updating the number of the heads;

and h, outputting the number of the current heads.

As a further optimization, step d specifically includes:

d1, extracting and matching the characteristic points of the left image and the right image;

d2, extracting the sub-pixel coordinates of the matched left and right image sequences;

d3, obtaining the three-dimensional coordinates of the image by using the parallax principle and combining the calibration parameters:

left image pixel coordinate (x)_l,y_l) Right image pixel coordinate (x)_r,y_r) With three-dimensional space coordinates (X)_W,Y_W,Z_W) The relationship of (a) is shown as follows:

wherein x is_lAnd x_rRepresenting the abscissa, y, of the left and right image matching point pairs in a pixel coordinate system_lRepresenting the ordinate of the matching point in the left image under the pixel coordinate system; b represents the baseline distance between the left and right cameras, and f represents the left camera focal length; b and f are obtained according to camera calibration.

As a further optimization, step e specifically includes:

e1, gridding the discrete depth image data;

e2, filling different contour lines by different colors on the basis of the grid data to generate an equivalent filling map;

e3, generating contour map on the basis of contour filling map;

e4, detecting circles in the contour map;

e5, realizing human head detection based on circle detection.

As a further optimization, in step e1, depth image data is subjected to gridding processing by adopting an interpolation method, and a regular grid structure is formed after interpolation.

As a further optimization, step e2 specifically includes:

e21 obtaining the maximum value Z of the gridded data depth values Z_maxMinimum value Z_minAnd the isoline spacing Δ Z, the isoline step length k;

e22, determining the size of the contour-filled image to be formed according to the contour interval delta Z and the density of the gridding data, and encrypting the gridding data by adopting a bilinear interpolation method:

according to P₁(x₁,y₁,z₁)，P₂(x₂,y₂,z₂)，P₃(x₃,y₃,z₃)，P₄(x₄,y₄,z₄) Value z of a point₁,z₂,z₃,z₄And calculating the Z value of the encryption point P by combining the following formula:

e23, establishing mapping between the Z value interval and the color interval: f is Z_-＞C(Z∈{Z_min,Z_max},C∈{c₁,c₂,…,c_n}) determining the value range of the color variable C, using

Obtaining the number of the color variable C, and selecting corresponding n color values color (n); then

Wherein, Δ Z is the equivalent distance, [ ] is the rounding symbol;

e24, for the encrypted grid data, performing graph color filling according to the mapping f: Z _ > C to generate an equivalent filling image.

As a further optimization, step e3 specifically includes:

e31, contour boundary judgment: setting a 2 x 2 contour line distinguishing template, and carrying out convolution distinguishing on the contour line filling graph by using the template, wherein if the template condition is met, namely 2P (i, j) ≠ P (i, j +1) + P (i +1, j +1), the pixel point can be set as a value point on the contour line, otherwise, the pixel point is an intra-contour region point;

e32, setting of contour color values: setting the color of the equivalence point as 255, properly reducing the color pixel value of the equivalence inner point according to a certain amplitude, and after setting the contour color of the whole image, carrying out binarization on the image, wherein the binarization criterion is as follows: if the pixel Z of the point P (i, j)_ijWhen the pixel is 255, the pixel at that point remains unchanged, otherwise the pixel at that point is set to 0, i.e., Z_ij＝0。

As a further optimization, step e4 specifically includes:

e41, detecting contour line edge contour points in the contour map of the depth image, storing the coordinate positions of the contour line edge contour points, and setting the change range and the step length of the angle theta and the change range and the step length of the radius r;

e42, coordinate transformation: obtaining the values of a and b by using the formula x ═ a + rcos (theta) and y ═ b + r sin (theta); if the values of a and b are within a reasonable range, adding 1 to the number of reasonable positions;

e43, after the reasonable position number is accumulated, searching the maximum value of the radius, and solving the coordinate and the radius of the circle center; the maximum radius value determining mode is as follows: finding out a parameter space h (a, b, r) with the most points in a reasonable range, finding out the largest r and the circle center (a, b) corresponding to the r in the parameter space, and marking as (a, b, maxr), wherein a circle with the circle center (a, b) and the radius maxr can be obtained; thus, all circles in the value line graph can be detected;

e44, drawing the detected circle, and recording the position information of the circle.

As a further optimization, in step e5, assume the head standard size is r_manIs a circle with a radius, and the steps specifically comprise:

e51, maximum if in a certain cluster of concentric circlesRadius R of the circle_maxi∈U(r_manIf the detected concentric circle is a human head, the number of the human heads is accumulated to be 1;

e52, if

Scaling the concentric circle group according to a certain proportion, and if the concentric circle group is scaled according to a certain proportion, R_maxiOr the condition R cannot be satisfied_maxi∈U(r_manAnd b), the concentric circle group is not the human head.

As a further optimization, in step f, the specific method for tracking the detected head is as follows:

the center position of the human head A detected by the current frame depth image is set as (x)_a,y_a,z_a) If a human head B with the same size is detected in the next frame image of the current frame of the monitoring video depth image and the central position (x) of the human head_b,y_b,z_b) In (x)_a,y_a,z_a) B ∈ U (a, σ), σ is the neighborhood radius, σ is taken as vt, where v is the head moving speed and t is the time interval between two video frames, then it is determined that head B is head a, and the iteration is performed until head a disappears in the camera view, and by this method, all detected heads are tracked.

As a further optimization, in step g, the specific method for detecting a new head in a new video frame and updating the number of heads includes:

the center position of the human head A detected by the current frame depth image is set as (x)_a,y_a,z_a) If a human head B with the same size is detected in the next frame image of the current frame of the monitoring video depth image, and

taking sigma as a neighborhood radius, and taking sigma as vt, wherein v is the human flow moving speed, and t is the time interval between two video frames; determining that the head B is a head different from the detected head a in the previous frame;

according to the mode, the head B is matched with the head with the same size detected by the previous frame of image, if the matching fails, the head B is judged to be a new head, the number N of the detected heads is N +1, and iteration is carried out until the new head is not detected any more.

The invention has the beneficial effects that:

1. the binocular image acquisition system is utilized to monitor the number of the human heads, on one hand, accurate size information of the human heads can be obtained, and on the other hand, the influence of factors such as illumination, hairstyle and the like on human head detection is avoided;

2. a fixed step length is used as the contour line distance in the depth map, so that the complexity of contour line searching is reduced;

3. the contour map is established by the depth map, the head of people with different heights is detected by using a contour line searching method, and the head of people is detected by using the circles with different sizes through image scaling detection, so that the accuracy of head detection and identification is greatly improved, and the head counting precision is improved;

4. the head is tracked, whether the head detected in the video frame is counted or not is judged in a certain neighborhood of the circle center of the concentric circle of the head, and the accuracy rate of counting the head is improved;

5. the people flow moving speed is considered in the use of the head detection neighborhood, and the distinguishing efficiency is improved.

Drawings

FIG. 1 is a flow chart of a human head detecting and counting method based on binocular stereo vision according to the present invention;

FIG. 2 is a flow chart of using a contour search method to detect heads at different depths in a depth map;

FIG. 3 is a result of gridding discrete depth image data;

FIG. 4 is a diagram illustrating the result of encrypting grid data by bilinear interpolation;

FIG. 5 is an equivalent fill map boundary pixel;

FIG. 6 is a diagram of a contour discrimination template.

Detailed Description

The invention aims to provide a human head detection counting method based on binocular stereo vision, and solves the problems of large calculation amount, inaccurate counting and low precision of a human head detection counting scheme in the prior art.

The technical scheme of the invention will be more clearly and completely described with reference to the accompanying drawings; it should be understood that the following description is only a few examples of the present invention, not all examples, and is not intended to limit the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for detecting and counting human heads based on binocular stereo vision in the present invention comprises:

step 1, calibrating a binocular image acquisition system:

firstly, a binocular stereo vision hardware system is built: two cameras with the same model are fixed on the optical platform at a certain baseline distance, so that an observation target is ensured to be within the imaging range of the two cameras, and the relative position between the two cameras is fixed after the two cameras are built.

Then, a calibration plate image group is photographed: the chessboard grids calibration board is placed in front of the binocular platform, so that the calibration board can be completely imaged in the two cameras. And shooting a plurality of groups of calibration plate images in different postures in the modes of rotating and translating the calibration plate and the like.

Step 2, acquiring a human head scene image: and shooting a video image of the monitoring area by using a calibrated binocular image acquisition system. In order to reduce the head occlusion, the installation position of the detection camera is just above the passenger flow inlet or the exit, and the angle of the shooting monitoring area is a depression angle. The image collected by the left image collection system is an original left image, and the image collected by the right image collection system is an original right image. And carrying out distortion elimination and epipolar line correction processing on the left image and the right image according to the calibration parameters, so that the two images after distortion elimination strictly correspond to each other.

Step 3, image preprocessing: and carrying out noise reduction and enhancement pretreatment on the original left and right images.

Step 4, obtaining an image depth map:

the main objective in this step is to calculate the three-dimensional coordinates of the image. The method specifically comprises the following steps:

step 4.1, respectively extracting the characteristics of the left image and the right image;

4.2, matching the characteristic points of the left image and the right image;

and 4.3, solving the three-dimensional coordinates of the image by using the binocular stereo vision measurement model, and after obtaining a plurality of groups of matching point pairs, converting the pixel coordinates into a world coordinate system according to the corresponding pixel coordinates of the matching point pairs in the left image and the right image so as to complete the three-dimensional coordinate measurement of the image. The method comprises the following specific steps:

and 4.3.1, extracting the sub-pixel coordinates of the matched left and right image sequences. In the spatial positioning of the image, the image measurement distance is far, and a huge measurement error can be caused by a small change of the pixel coordinate.

And 4.3.2, obtaining the three-dimensional coordinates of the image by combining the parallax principle with the calibration parameters. Left image pixel coordinate (x)_l,y_l) Right image pixel coordinate (x)_r,y_r) With three-dimensional space coordinates (X)_W,Y_W,Z_W) The relationship of (a) is shown as follows:

wherein x is_lAnd x_rRepresenting the abscissa, y, of the left and right image matching point pairs in a pixel coordinate system_lAnd represents the ordinate of the matching point in the left image under the pixel coordinate system. B represents the baseline distance between the left and right cameras, and f represents the left camera focal length. B and f are obtained according to camera calibration. Three-dimensional coordinates of the image are thus obtained, where Z is the depth map.

And 5, finding the heads at different depths by using an isoline searching method: because the depth map represents the distance between the target and the sensing system, targets with different distances correspond to different depth values, the contour line of the depth image can be drawn firstly, then the contour line with the shape of a circle can be detected by using hough transformation, if the diameter of the circle is within the size range of the head, the circle is determined to be the head, and neither the circle with the size larger or smaller is the head. The specific process is shown in fig. 2:

step 5.1, gridding the discrete depth image data: the gridding of the discrete data is to interpolate the position area of the discrete data area according to a certain principle to form a regular rectangular gridding data area. After interpolation, a regular network structure as shown in fig. 3 is formed, and the unit transverse side length of each grid is dx, and the longitudinal side length is dy. Grid point coordinate x_ij＝j*dx,y_ij＝i*dy。

Step 5.2, generating an equivalent filling graph: an iso-fill map is a map formed by filling different contours with different colors. On the basis of the grid data, an equivalent filling map is formed by the following algorithm steps:

step 5.2.1, obtaining the maximum value and the minimum value Z of the gridding data_max，Z_minAnd the values of the isoline spacing delta Z, the isoline step length k and the isoline spacing delta Z are continuously adjusted manually by experiments to obtain the optimal value.

Step 5.2.2, grid encryption: in order to ensure the precision of the formed contour line graph, the size of an image to be formed for filling the contour line is determined according to the contour line distance delta Z and the density of gridding data, and the gridding data is encrypted by adopting a bilinear interpolation method; as shown in FIG. 4, according to P₁(x₁,y₁,z₁)，P₂(x₂,y₂,z₂)，P₃(x₃,y₃,z₃)，P₄(x₄,y₄,z₄) Value z of a point₁,z₂,z₃,z₄And calculating the Z value of the encryption point P by combining the following formula:

step 5.2.3, establishing the mapping between the Z value interval and the color interval f: Z _ > C (Z ∈ { Z _ > C)_min,Z_max},C∈{c₁,c₂,…,c_n}) determining the value range of the color variable C, using

The number of color variables C is counted and the corresponding n color values color (n) are selected. Then

Where Δ Z is the equivalent spacing and [ ] is the rounding symbol.

And step 5.2.4, performing graph color filling on the encrypted grid data according to the mapping f, Z _ > C to generate an equivalent filling image.

And 5.3, generating an isoline graph: the process of generating the contour map on the basis of solving the contour filling image is the process of selecting a proper image edge detection template for image processing, so that contour boundary judgment and contour color value setting are required. The following were used:

step 5.3.1, contour line boundary discrimination: the depth image by the contour filling method has obvious image boundaries in different Z value intervals, so that for any point in the image, if the condition shown in FIG. 5 is met, the depth image can be judged as a contour line boundary. Therefore, a 2 × 2 contour discrimination template as shown in fig. 6 is set. And (3) carrying out convolution judgment on the contour filling image by using the template, if the template condition is met, namely 2P (i, j) ≠ P (i, j +1) + P (i +1, j +1), setting the pixel point as a value point on the contour, and otherwise, setting the pixel point as a point in the contour region.

Step 5.3.2, setting of contour color values: the color of the equivalence point is set to be 255, and the color pixel value of the equivalence interior point is properly reduced by a certain amplitude for the purpose of reducingLet the color pixel values of the interior points better distinguish from the pixel values of the boundary points. After the contour line color of the whole image is set, the image is binarized, wherein the binarization criterion is as follows: if the pixel Z of the point P (i, j)_ijWhen the pixel is 255, the pixel at that point remains unchanged, otherwise the pixel at that point is set to 0, i.e., Z_ij＝0。

And 5.4, detecting a circle in the contour map by using hough transformation because the contour map is a depth map edge map. The method specifically comprises the following steps:

and 5.4.1, detecting contour line edge contour points in the contour line graph of the depth image, and storing the coordinate positions of the contour line edge contour points. The variation range and step length of the angle theta and the variation range and step length of the radius r are set.

Step 5.4.2, coordinate transformation: the values of a and b are obtained by using the formulas x + r cos (θ) and y + b + r sin (θ). If the values of a and b are within a reasonable range, the number of reasonable positions is increased by 1.

And 5.4.3, after the number of the reasonable positions is accumulated, searching the maximum value of the radius, and solving the coordinate and the radius of the circle center. The maximum radius is determined in the following way: and finding out a parameter space h (a, b, r) with the most points in a reasonable range, finding out the maximum r and the circle center (a, b) corresponding to the r in the parameter space, and recording the circle center (a, b, maxr) as (a, b, maxr), wherein the circle with the circle center (a, b) and the radius maxr can be obtained. This is the circle detected by the hough transform. In this way, all circles in the value line graph can be detected.

And 5.4.4, drawing the detected circle and recording the position information of the circle.

And 5.5, detecting the human head. Because the camera angle of the monitoring system is overlook, the shape of the contour map corresponding to the shot human head depth map is one or more concentric circles, namely the depth characteristic of the human head. Because the sizes of the circles (concentric circles) detected in the contour map of the corresponding depth map are different due to the size of the head in reality, the contour map of the depth map is properly zoomed, the heads with different sizes can be detected, the head leakage rate is reduced, and the passenger flow detection precision is improved. The method specifically comprises the following steps: assume that the standard head size is given as r_manIs a circle of radius.

Step 5.5.1, radius R of the largest circle if in a cluster of concentric circles_maxi∈U(r_manAnd (is a radius deviation value), the detected concentric circle is a head, and the number of heads is N + 1.

Step 5.5.2, if

Step 6, tracking the head of the person: and (5) detecting the human head in the following video frame by using the human head detection method from the step 3 to the step 5, and when the distance between the central position of the human head detected in the following video frame and the central position of the human head detected in the current frame is within a certain range, determining that the human heads detected by the two frames of images are the same, and iterating in the above way until the human head disappears in the field of view of the camera. The method specifically comprises the following steps: the center position of the human head A detected in the depth image of the current frame is (x)_a,y_a,z_a) Detecting a human head B with the same size in a next frame image of a current frame of the monitoring video depth image, and determining the central position (x) of the human head_b,y_b,z_b) In (x)_a,y_a,z_a) B ∈ U (a, σ), σ being the neighborhood radius, σ being v, where v is the head movement speed and t is the time interval between two video frames.

And 7, detecting new heads in the new video frames, and updating the number of the heads. If a head is detected in the video frame later and the center position of the head is out of a certain range of the center position of the head detected in the video frame of the previous frame, a new head appears, and at the moment, the number N of the updated heads is N + 1. The concrete description is as follows: setting the center position of the human head A detected by the current frame depth imageIs (x)_a,y_a,z_a) Detecting a human head B with the same size in the next frame image of the current frame of the monitoring video depth image, and

and sigma is the neighborhood radius, and sigma is taken as vt, wherein v is the human flow moving speed, and t is the time interval between two video frames. At this time, it is considered that the head B is a new head different from the detected head a in the previous frame. Matching the head B with the head with the same (or approximate) size detected by the previous frame image, wherein the matching method is the same as the matching method A, if the matching fails, B is a new head, and at the moment, N is N + 1. And iterating until no new head is detected.

And 8, outputting the current number N of the people, and finishing the detection of the number of the people in the passenger flow scene through the steps so as to detect the passenger flow.

Claims

1. The binocular stereo vision-based human head detection and counting method is characterized by comprising the following steps:

calibrating a binocular image acquisition system;

c, preprocessing the acquired passenger flow scene image;

step d, obtaining a depth map of the image;

step f, tracking the detected human head:

the center position of the human head A detected by the current frame depth image is set as (x)_a，y_a，z_a) If a human head B with the same size is detected in the next frame image of the current frame of the monitoring video depth image and the central position (x) of the human head_b，y_b，z_b) In (x)_a，y_a，z_a) B ∈ U (A, σ), σTaking σ as vt as the neighborhood radius, wherein v is the human head moving speed, and t is the time interval between two video frames; judging that the head B is the head A, and iterating in the above way until the head A disappears in the camera view field, and tracking all detected heads by the method;

step h, outputting the number of the current heads;

the step e specifically comprises the following steps:

e1, gridding the discrete depth image data;

e3, generating contour map on the basis of contour filling map;

e4, detecting circles in the contour map;

e5, realizing human head detection based on circle detection;

step e2 specifically includes:

e21 obtaining the maximum value Z of the gridded data depth values Z_maxMinimum value Z_minAnd contour spacing △ Z, contour step k;

e22, determining the size of the contour filling image to be formed according to the contour distance Delta Z and the density of the gridding data, and encrypting the gridding data by adopting a bilinear interpolation method:

according to P₁(x₁，y₁，z₁)，P₂(x₂，y₂，z₂)，P₃(x₃，y₃，z₃)，P₄(x₄，y₄，z₄) Value z of a point₁，z₂，z₃，z₄And calculating the Z value of the encryption point P by combining the following formula:

e23, establishing the mapping between Z value interval and color interval, f: Z > C, Z ∈ [ Z_min，Z_max]，C∈{c₁，c₂，...，c_nDetermining the value range of the color variable C, and utilizing

Wherein, Δ Z is the equivalent distance, [ ] is the rounding symbol;

e24, for the encrypted mesh data, according to the mapping f: and Z is larger than C, and the equivalent filling image can be generated by carrying out graphic color filling.

2. The binocular stereo vision-based human head detection and counting method according to claim 1, wherein the step d specifically comprises:

left image pixel coordinate (x)_l，y_l) Right image pixel coordinate (x)_r，y_r) With three-dimensional space coordinates (X)_W，Y_W，Z_W) The relationship of (a) is shown as follows:

3. The binocular stereo vision based human head detecting and counting method according to claim 1, wherein in step e1, depth image data is gridded by interpolation, and a regular grid structure is formed after interpolation.

4. The binocular stereo vision-based human head detection and counting method according to claim 3, wherein the step e3 specifically comprises:

e32, setting of contour color values: setting the color of the equivalence point as 255, properly reducing the color pixel value of the equivalence inner point according to a certain amplitude, and after setting the contour color of the whole image, carrying out binarization on the image, wherein the binarization criterion is as follows: if the pixel Z of the point P (i, j)_ijAt 255, the pixel at that point remains unchanged, otherwise the pixel at that point is set to 0, Z_ij＝0。

5. The binocular stereo vision-based human head detection and counting method according to claim 4, wherein the step e4 specifically comprises:

e42, coordinate transformation: calculating the values of a and b by using the formula x ═ a + rcos (theta) and y ═ b + rsin (theta); if the values of a and b are within a reasonable range, adding 1 to the number of reasonable positions;

6. The binocular stereo vision based human head detecting and counting method according to claim 5, wherein in the step e5, the human head standard size is assumed to be r_manIs a circle with a radius, and the steps specifically comprise:

e51 radius R of the largest circle if in a cluster of concentric circles_maxi∈U(r_manAnd) in the radius deviation value, the detected concentric circle is a human head, and the number of the human heads is accumulated by 1;

e52, if

7. The binocular stereo vision based human head detection and counting method according to claim 6, wherein in the step g, the specific method for detecting a new human head in a new video frame and updating the number of human heads comprises:

the center position of the human head A detected by the current frame depth image is set as (x)_a,y_a,z_a) If a human head B with the same size is detected in the next frame image of the current frame of the monitoring video depth image,and is