CN112257520A

CN112257520A - People flow statistical method, device and system

Info

Publication number: CN112257520A
Application number: CN202011062769.8A
Authority: CN
Inventors: 罗凤鸣; 李勇基; 杜晨光; 李燕妮
Original assignee: Lorentech Beijing Technology Co ltd
Current assignee: Lorentech Beijing Technology Co ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-22

Abstract

The invention provides a people flow statistical method, a device and a system, which relate to the technical field of computer vision, and the people flow statistical method comprises the following steps: acquiring a depth image and an amplitude image of a target area based on an image sensor; the image sensor is arranged right above the target area; carrying out extreme point detection on the depth distance in the depth image based on the sliding window, and determining each candidate head region in the depth image; acquiring each image to be detected in the amplitude image based on each candidate head area, and performing target classification on the image to be detected to obtain a people flow statistical result of a target area; wherein, the image to be detected is a corresponding candidate head area in the amplitude image. The invention improves the calculation accuracy and efficiency of the people flow statistics.

Description

People flow statistical method, device and system

Technical Field

The invention relates to the technical field of computer vision, in particular to a people flow statistical method, a device and a system.

Background

With the rapid development of the new retail industry, people flow data statistics are shifting from human observation to smart sensor statistics. The statistical method based on the computer vision technology can scientifically and effectively analyze the passenger flow in time and space and make operation decisions quickly and timely. The statistics of the passenger flow in different time intervals enables managers to accurately plan future activities and determine time, manpower and the like, so that the service quality is improved.

At present, people flow statistics is mainly carried out by adopting a 2D visual algorithm or 3D point cloud data. However, when the 2D target recognition algorithm is used for recognizing people in the RGB image, the method is easily affected by changes in ambient light, when the people are too dense, even if the model is complex enough, the target segmentation is difficult, the calculation amount is large, and the people flow rate statistical efficiency and accuracy are reduced; the people flow statistics using the 3D point cloud data is affected by the limbs of people, and false detection is easy to occur. Therefore, the existing people flow rate statistical technology has the problems of low accuracy and low calculation efficiency.

Disclosure of Invention

In view of this, the present invention provides a people flow rate statistical method, a device and a system, which can improve the calculation accuracy and efficiency of people flow rate statistics.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical solutions:

in a first aspect, an embodiment of the present invention provides a people flow rate statistical method, including: acquiring a depth image and an amplitude image of a target area based on an image sensor; wherein the image sensor is arranged right above the target area; carrying out extreme point detection on the depth distance in the depth image based on a sliding window, and determining each candidate head region in the depth image; acquiring each image to be detected in the amplitude image based on each candidate head region, and carrying out target classification on the image to be detected to obtain a people flow statistical result of the target region; and the image to be detected is a corresponding candidate head area in the amplitude image.

Further, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the determining, based on extreme point detection performed on a depth distance in the depth image by using a sliding window, each candidate head region in the depth image includes: acquiring a depth distance corresponding to each pixel point in the depth image; determining a window size of the sliding window based on an internal reference of the image sensor; positioning each extremum region in the depth image according to the window size of the sliding window and the depth distance corresponding to each pixel point; the extreme value region is a minimum value region of a depth distance corresponding to each pixel point in the depth image; and calculating the area intersection ratio among the extreme value regions, and determining each candidate head region according to the area intersection ratio.

Further, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the step of locating each extremum region in the depth image according to the window size of the sliding window and the depth distance corresponding to each pixel point includes: determining the average depth distance in each sliding window based on the depth distance corresponding to each pixel point, the window size of the sliding window and a preset sliding step to obtain a sliding window mean value matrix; determining the coordinates of the central point of each sliding window in the depth image based on the window size of the sliding window and a preset sliding step to obtain a coordinate index matrix corresponding to the sliding window mean matrix; determining the coordinate size of each extreme value area in the depth image based on the sliding window mean matrix and the coordinate index matrix; and the coordinate size comprises the center point coordinate and the pixel size of the extreme value area.

Further, an embodiment of the present invention provides a third possible implementation manner of the first aspect, wherein the step of determining the coordinate size of each extremum area in the depth image based on the sliding window mean matrix and the coordinate index matrix includes: determining each minimum depth distance in the sliding window mean value matrix; wherein the minimum depth distance is less than each adjacent average depth distance value; obtaining the center point coordinates corresponding to the sliding window where each minimum depth distance is located from the coordinate index matrix to obtain the center point coordinates of each extremum region; obtaining the pixel length of each extremum region based on each minimum depth distance, the internal parameters of the image sensor and a first calculation formula; wherein the pixel length is the number of pixel columns of the extremum region, and the first calculation formula is:

obtaining the pixel width of each extremum region based on each minimum depth distance, the internal reference of the image sensor and a second calculation formula; wherein the pixel width is the number of pixel rows of the extremum region, and the second calculation formula is:

wherein, W_iIs the pixel length of the ith extremum region, L_iIs the pixel width, Head, of the ith extremum region_wHead being a predetermined physical length of the Head_lIs a preset physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsIs the pixel line number of the depth image, alpha is the horizontal field angle of the image sensor, beta is the vertical field angle of the image sensor, H_iFor the ith minimum depth distance, round is the rounding operation.

Further, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, wherein the step of determining the window size of the sliding window based on the internal reference of the image sensor includes: acquiring a preset head physical size; wherein the head physical dimensions comprise a head physical length and a head physical width; determining the number of pixels corresponding to the physical length of the head based on the internal reference of the image sensor and a third calculation formula; wherein the third calculation formula is:

determining the number of pixels corresponding to the physical width of the head based on the internal parameters of the image sensor and a fourth calculation formula; wherein the fourth calculation formula is:

wherein, W_pixelNumber of pixels, L, corresponding to the physical length of the head_pixelHead, the number of pixels corresponding to the physical width of the Head_wIs the physical length of the Head, Head_lIs the physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsIs the pixel line number of the depth image, alpha is the horizontal field angle of the image sensor, beta is the vertical field angle of the image sensor, H_installIs the vertical distance between the image sensor and the plane of the person, H_avgThe average pedestrian height is calculated, and round is rounding operation; and determining the window size of the sliding window based on the number of pixels corresponding to the physical length of the head and the number of pixels corresponding to the physical width of the head.

Further, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the step of obtaining each image to be detected in the amplitude image based on each candidate head region, performing target classification on the image to be detected, and obtaining a statistical result of the flow rate of people in the target region includes: acquiring images in the coordinate size areas from the amplitude images based on the coordinate sizes of the candidate head areas to obtain images to be detected; inputting each image to be detected into a neural network model obtained by pre-training to obtain a classification result of each target to be detected; the neural network model is obtained by training based on pre-labeled figure amplitude image samples; and determining a people flow statistical result in the target area based on the classification result of each target to be detected.

Further, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, wherein the step of determining a people flow rate statistical result in the target region based on the classification result of each target to be detected includes: determining the number of people in the target area based on the classification result of each target to be detected; and taking the number of the people as a people flow statistical result in the target area.

Further, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where the people flow statistics result includes a people entering amount and a people leaving amount; a warning line is arranged in the depth image; the step of determining the people flow statistic result in the target area based on the classification result of each target to be detected comprises the following steps: determining each human head area in the depth image based on the classification result of each target to be detected; determining the moving track of each human head area in the target area based on each human head area in continuous multi-frame depth images; and determining the person entering amount and the person leaving amount of the target area based on the moving track of each person head area in the target area and the position of the warning line.

Further, an embodiment of the present invention provides an eighth possible implementation manner of the first aspect, wherein a first warning line, a second warning line, and a third warning line are disposed in the depth image; the step of determining the person entering amount and the person leaving amount of the target area based on the moving track of each person head area in the target area and the position of each warning line comprises the following steps: when the moving track of the head area of the person sequentially generates intersection points with the first warning line, the second warning line and the third warning line, adding one to the count value of the entering amount of the person; when the moving track of the head area of the person sequentially generates intersection points with the third warning line, the second warning line and the first warning line, adding one to the count value of the person leaving amount; and obtaining the person entering amount and the person leaving amount of the target area based on the counting value of the person entering amount and the counting value of the person leaving amount in the preset time.

Further, an embodiment of the present invention provides a ninth possible implementation manner of the first aspect, where the method further includes: and when the moving track of the head area of the person and the warning line generate an intersection point, triggering a preset alarm device to alarm.

In a second aspect, an embodiment of the present invention further provides a people flow rate statistics apparatus, including: the image acquisition module is used for acquiring a depth image and an amplitude image of the target area based on the image sensor; the image sensor is arranged right above the target area; the region determining module is used for carrying out extreme point detection on the depth image based on the sliding window and determining each candidate head region in the depth image; the flow statistic module is used for acquiring each image to be detected in the amplitude image based on each candidate head region, and performing target classification on the image to be detected to obtain a people flow statistic result of a target region; wherein the image to be detected comprises a candidate head region in the amplitude image.

In a third aspect, an embodiment of the present invention provides a people flow rate statistical system, including: an image sensor and a controller, the controller comprising a processor and a storage device; the storage means has stored thereon a computer program which, when executed by a processor, performs the method of any of the first aspects.

In a fourth aspect, the present invention provides a computer program stored on a computer-readable storage medium, wherein the computer program, when executed by a processor, performs the steps of the method according to any one of the first aspect.

The embodiment of the invention provides a people flow statistical method, a device and a system, wherein the method comprises the following steps: acquiring a depth image and an amplitude image of a target area based on an image sensor; the image sensor is arranged right above the target area; carrying out extreme point detection on the depth distance in the depth image based on the sliding window, and determining each candidate head region in the depth image; acquiring each image to be detected in the amplitude image based on each candidate head area, and performing target classification on the image to be detected to obtain a people flow statistical result of a target area; wherein, the image to be detected is a corresponding candidate head area in the amplitude image. According to the method, the depth image and the amplitude image of the target area are obtained, so that the influence of light overexposure and excessive darkness on imaging is avoided, the depth image carries depth distance information corresponding to each pixel point, each candidate head area in the depth image can be obtained by detecting an extreme point of the depth distance in the depth image based on the sliding window, target segmentation is not needed, the number of the head areas in the target area can be determined by further target classification of the corresponding candidate head areas in the amplitude image, the people flow statistical result is rapidly and accurately obtained, and the calculation accuracy and the calculation efficiency of people flow statistics are improved.

Additional features and advantages of embodiments of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of embodiments of the invention as set forth above.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a people flow statistical method according to an embodiment of the present invention;

FIG. 2 illustrates a depth image of a target area provided by an embodiment of the invention;

FIG. 3 illustrates an amplitude image of a target region provided by an embodiment of the present invention;

FIG. 4 is a schematic view of a cordline provided by an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a people flow rate statistic device according to an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating a controller structure provided by an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, not all, embodiments of the present invention.

At present, the pedestrian flow statistics based on the 2D visual algorithm generally adopts RGB images as processing data, the RGB images are easily influenced by ambient light change, so that the accuracy of the pedestrian flow statistics is influenced, the 2D target recognition algorithm adopts a deep learning model, the training cost and the hardware cost are high, and when people in the images are too dense, the target segmentation is difficult and the calculated amount is large even if the model is complex enough; people flow statistics is carried out based on 3D point cloud, and a head area is judged by searching extreme points and fitting circles or ellipses, so that the head area judgment mode is easily influenced by limbs, and false detection is easily generated depending on the size characteristics of the point cloud. Therefore, in view of the problem that the existing people flow rate statistical technology is low in accuracy and calculation efficiency, in order to improve the problem, the embodiment of the invention provides the people flow rate statistical method, the device and the system, and the technology can be applied to improving the calculation accuracy and the calculation efficiency of people flow rate statistics. The following describes embodiments of the present invention in detail.

The present embodiment provides a people flow rate statistical method, referring to the flow chart of the people flow rate statistical method shown in fig. 1, the method can be applied to a controller (such as a computer or a mobile terminal) in communication connection with an image sensor, and the method mainly includes the following steps S102 to S106:

step S102, acquiring a depth image and an amplitude image of the target area based on the image sensor.

The image sensor can simultaneously acquire a depth image and an amplitude image of the target area, and the resolution and the acquisition range of the depth image and the amplitude image are the same (namely, the pixel coordinates of each target object in the depth image are correspondingly equal to the pixel coordinates of each target object in the amplitude image). The amplitude image is an infrared amplitude map obtained by acquiring infrared light amplitudes of each person in the target area, and the image sensor may be a solid-state laser radar through which a depth image and an amplitude image of the target area can be acquired at the same time.

The target area may be any area that needs to perform people flow statistics, see the depth image of the target area shown in fig. 2, and fig. 2 shows the depth image of the target area, where a gray value of a pixel point in the depth image may reflect a depth distance corresponding to the pixel point, see the amplitude image of the target area shown in fig. 3, and fig. 3 shows the amplitude image of the target area, where the content of the image imaged by the amplitude image is the same as that in fig. 2, and the pixel sizes of fig. 2 and fig. 3 are also the same.

In order to facilitate people flow statistics and prevent people in the image from being shielded, the image sensor can be arranged right above the target area, the head of each person in the target area can be shot by the image sensor right above the target area, the situation that people are shielded and are difficult to count people flow is avoided, and the accuracy of people flow statistics is improved.

In order to improve the calculation efficiency of the people flow statistics, the depth image and the amplitude image can be subjected to hole filling and denoising through median filtering, the interference of noise points is reduced, the tracked target detected in the people flow statistics task is considered as a pedestrian, the calculated amount is increased and the recognition interference is generated by the ground and low objects, therefore, a depth threshold value (such as the depth threshold value can be 1 m-2 m) can be set according to the height range information of children and adults, the depth image and the amplitude image are subjected to through filtering based on the depth threshold value, pixel points with the depth distance not beyond the depth threshold value are filtered, the interference of the ground and the low objects on the people flow statistics calculation is eliminated, and the calculation efficiency is improved.

And step S104, carrying out extreme point detection on the depth distance in the depth image based on the sliding window, and determining each candidate head region in the depth image.

Because the position of the image sensor is fixed, in the shot person image, the distance between the head area of the person and the image sensor is minimum, the depth image carries the depth distance information of each pixel point, and the candidate head area which is possibly the head area in the depth image can be determined by detecting the minimum value of the depth distance corresponding to each pixel point in the depth image.

Above-mentioned sliding window can traverse each pixel in the depth map, detects the minimum of each pixel corresponding depth distance in the depth map through adopting sliding window, can prevent to omit the minimum of depth distance, has avoided the influence of noise (the extreme in the direct search depth map can receive the influence of noise), and then avoids omitting to subtract the head region. Compared with the method for recognizing the head area by adopting a target recognition algorithm, the method for detecting the candidate head area by adopting the sliding window does not need to carry out target segmentation on the depth image, and can accurately detect each candidate head area when people in the depth image are dense, thereby avoiding the problem that the target segmentation accuracy is reduced due to the dense people, and then the people flow statistics accuracy is reduced.

And S106, acquiring each image to be detected in the amplitude image based on each candidate head region, and performing target classification on the image to be detected to obtain a people flow statistical result of the target region.

The method comprises the steps of obtaining images with the same size from the same position in an amplitude image based on the position and the size of each candidate head region in a depth image to obtain each image to be detected, carrying out target classification on each image to be detected in the amplitude image to identify whether each image to be detected is a head region, obtaining the number of the head regions in the amplitude image according to the target classification result of each image to be detected in the amplitude image to further obtain the number of the head regions in the target region, and further obtaining a people flow rate statistical result (the people flow rate statistical result comprises the number of people in the target region).

For example, the depth image includes a first candidate head region, a second candidate head region and a third candidate head region, and a first to-be-detected image, a second to-be-detected image and a third to-be-detected image corresponding to each candidate head region are obtained from the amplitude image according to the coordinates of the center points and the pixel sizes (length and width, in pixels) of the first candidate head region, the second candidate head region and the third candidate head region, where the coordinates of the center points and the pixel sizes of the first candidate head region and the first to-be-detected image are the same, the target object in the first candidate head region and the target object in the first to-be-detected image are the same target object, the position sizes of the target object in the depth image and the amplitude image are the same, and so on, the coordinates of the center points and the pixel sizes of the second candidate head region and the second to-be-detected image are the same, and the coordinates and the pixel sizes of the center point of the third candidate head area and the third to-be-detected image are the same.

In addition, compared with a color RGB image, the people feature can only be detected in the depth image and the amplitude image, and the privacy of the user can be protected, so the people flow rate statistical method can also be applied to an area which prohibits the acquisition of the RGB image and has a high requirement on the privacy of the user; because the depth image carries the depth distance information corresponding to each pixel point, each candidate head region in the depth image can be obtained by detecting the extreme point of the depth distance in the depth image based on the sliding window, target segmentation is not needed, the number of head regions in the target region can be determined by further target classification of the corresponding candidate head region in the amplitude image, the people flow statistical result can be rapidly and accurately obtained, and the calculation accuracy and the calculation efficiency of people flow statistics are improved.

In order to accurately determine each candidate head region in the depth image, this embodiment provides an implementation manner for performing extreme point detection on a depth distance in the depth image based on a sliding window, and determining each candidate head region in the depth image, which may be specifically performed with reference to the following steps (1) to (4):

step (1): and acquiring the depth distance corresponding to each pixel point in the depth image.

The depth image carries depth distances corresponding to the pixel points, and the depth distances are vertical distances from the image sensor to the pixel points on the surface of the shot person.

Step (2): the window size of the sliding window is determined based on the internal parameters of the image sensor.

In designing the window size of the sliding window, parameters such as the installation height of the image sensor, the field angle, the resolution, and the average height of the person need to be considered, and in order to determine the appropriate window size, a preset head physical size is first obtained, where the preset head physical size includes a preset head physical length and a preset head physical width, and the preset head physical size may be an average head physical size obtained from multiple measurement experiments, such as 15cm × 15 cm. The unit of the head physical size is meter or centimeter, and for convenience of calculation, the preset head physical size can be converted into the pixel size according to internal parameters of the image sensor, that is, the unit of the head size is converted into the pixel size from meter or centimeter.

And determining the number of pixels corresponding to the physical length of the head based on the internal reference of the image sensor and the third calculation formula. Wherein the third calculation formula is:

and determining the number of pixels corresponding to the physical width of the head based on the internal reference of the image sensor and a fourth calculation formula. Wherein the fourth calculation formula is:

wherein, W_pixelThe number of pixels corresponding to the physical length of the head, L_pixelHead, the number of pixels corresponding to the physical width of the Head_wHead being the physical length of the Head_lIs the physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsThe number of pixel lines of the depth image, α is the horizontal field angle of the image sensor, β is the vertical field angle of the image sensor, H_installThe round is a rounding operation, and since the number of pixels is usually an integer, the round is used to perform a rounding operation so that the number of pixels obtained is an integer, H_avgThe pedestrian average height can be obtained according to the average height of a plurality of pedestrians counted in the target area. According to the triangle theorem, the preset head pixel size corresponding to the preset head physical size can be calculated, wherein the preset head pixel size comprises the pixel number corresponding to the head physical length and the pixel number corresponding to the head physical width, namely the pixel length of the preset head pixel size is W_pixelPixel width of L_pixel。

And determining the window size of the sliding window based on the number of pixels corresponding to the physical length of the head and the number of pixels corresponding to the physical width of the head. The sliding window is a rectangular frame with a certain size, and the window size of the sliding window can be determined based on the preset head pixel size in order to obtain the extremum region in the depth image based on the sliding window, for example, the window size of the sliding window can be 1/2-1/3 of the preset head pixel size, and the preset sliding step of the sliding window can be 1/2 of the window size. And (3) overlapping the upper left corner of the sliding window rectangular frame with the upper left corner of the depth image to obtain a first sliding window, moving the sliding window to the right in sequence by a preset sliding step, moving the sliding window downwards by the preset sliding step when the sliding window moves to the rightmost end of the depth image, and so on until the sliding window is used for traversing each pixel point in the depth image.

And (3): and positioning each extremum region in the depth image according to the window size of the sliding window and the depth distance corresponding to each pixel point.

Considering that the pedestrian walks in the target area to cause body contact, the pedestrian segmentation task is difficult, and the head areas are mutually independent, so the head area detection becomes the basis of the pedestrian flow statistic task, and because the head areas belong to extreme values in the imaging area, the head positioning can be realized by searching the positions of extreme points on the basis of the assumption.

The extreme value area is a minimum value area of the depth distance corresponding to each pixel point in the depth image. Determining a minimum value region in the depth image according to the depth distance of the pixel points in each sliding window, and specifically referring to the following steps 1) to 3):

step 1): and determining the average depth distance in each sliding window based on the depth distance corresponding to each pixel point, the window size of the sliding window and the preset sliding step to obtain a sliding window mean value matrix.

Based on the window size of the sliding window and the preset sliding step, the position of each sliding window in the depth image can be calculated, the average value of the depth distance corresponding to each pixel point in the window when the sliding window is at each position is calculated, the average depth distance in each sliding window is obtained, and the sliding window mean value matrix corresponding to the depth image is obtained according to the position of each sliding window and the corresponding average depth distance, namely, each element in the sliding window mean value matrix is the average depth distance in each sliding window.

Step 2): and determining the coordinates of the central point of each sliding window in the depth image based on the window size of the sliding window and the preset sliding step to obtain a coordinate index matrix corresponding to the sliding window mean matrix.

And obtaining the coordinates of the center point of each sliding window in the depth image according to the window size of the sliding window and the preset sliding step. For example, when the number of pixels in each row of the sliding window is w, the number of pixels in each column is l, and the preset sliding step is x, the coordinates of the center point of the first sliding window in the first row are (w/2, l/2), the coordinates of the center point of the second sliding window in the first row are ((w/2) + x, l/2), the coordinates of the center point of the nth sliding window in the first row are ((w/2) + x n, l/2), and the coordinates of the center point of the nth sliding window in the mth row are ((w/2) + x n, (l/2) + x n).

And obtaining a coordinate index matrix corresponding to the sliding window mean matrix according to the central point coordinates of each sliding window in the depth image, wherein each element in the sliding window mean matrix corresponds to each element in the coordinate index matrix one by one, for example, the central point coordinate of the sliding window where the first average depth distance in the sliding window mean matrix is located is the coordinate value of the first element in the coordinate index matrix, and so on, the central point coordinate of the sliding window where the average depth matrix of the ith row and the jth column in the sliding window mean matrix is located is the coordinate value of the jth row and jth column in the coordinate index matrix.

Because the extreme point of the depth distance in the direct search depth image is easily influenced by noise, the local area mean value is obtained by adopting the sliding window to replace a single extreme point, and the sliding windows are overlapped, so that the sliding window mean value matrix and the coordinate index matrix are obtained, each pixel point in the depth image can be traversed according to the sliding window mean value matrix and the coordinate index matrix, and the accuracy of extreme area positioning is improved.

Step 3): and determining the coordinate size of each extreme value area in the depth image based on the sliding window mean matrix and the coordinate index matrix.

The coordinate size includes the center point coordinate of the extremum region and the pixel size (the pixel size is the number of pixel columns and pixel rows of the extremum region). Because the image sensor is arranged right above the target area, the head of the person is closest to the image sensor, and the depth distance corresponding to the pixel point of the head image in the depth image is minimum, each minimum depth distance in the sliding window mean value matrix is determined at first, and the minimum depth distance is smaller than each adjacent average depth distance value. Comparing each average depth distance in the sliding window mean matrix with 8 adjacent elements (comparing edge elements with 3 adjacent elements), and when the average depth distances of a certain position are less than or equal to the 8 adjacent elements, determining that the average depth distance of the position is the minimum depth distance. Considering the influence of the sliding step preset by the sliding window, the extreme value area may be inaccurately positioned, and local climbing search is performed according to the coordinate index and the neighborhood of each average depth distance of the sliding window in the sliding window mean value matrix, so that the accuracy of determining the position of the extreme value area is further improved.

And acquiring the center point coordinates corresponding to the sliding window where each minimum depth distance is located from the coordinate index matrix to obtain the center point coordinates of each extreme value area. For example, when the first minimum depth distance is the jth row and jth column element in the ith row in the sliding window mean matrix, the jth row and jth column element in the ith row in the coordinate index matrix is the center point coordinate of the sliding window where the first minimum depth distance is located, and is also the center point coordinate of the first extremum region.

Considering that the head regions are different in size when people with different heights are imaged, the pixel size (including the pixel length and the pixel width) of each extreme value region is calculated according to the preset physical size of the head, the installation height of the image sensor, the visual angle of the image sensor and each minimum depth distance.

Obtaining a pixel length of each extremum region based on each minimum depth distance, an internal parameter of the image sensor, and a first calculation formula, where the pixel length is a number of pixel columns of the extremum region (i.e., a number of pixels per row of the extremum region), and the first calculation formula is:

obtaining the pixel width of each extremum region based on each minimum depth distance, the internal parameters of the image sensor, and a second calculation formula, where the pixel width is the number of pixel rows of the extremum region (i.e. the number of pixels in each column of the extremum region), and the second calculation formula is:

wherein, W_iPixel length of ith extremum regionDegree, L_iIs the pixel width, Head, of the ith extremum region_wHead being a predetermined physical length of the Head_lIs a preset physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsThe number of pixel lines of the depth image, α is the horizontal field angle of the image sensor, β is the vertical field angle of the image sensor, H_iFor the ith minimum depth distance, round is the rounding operation.

And (4): and calculating the area intersection ratio among the extreme value regions, and determining the candidate head region according to the area intersection ratio.

In order to avoid searching the repeated region in the extreme value region, calculating the area intersection ratio of every two extreme value regions according to the pixel size of each extreme value region, wherein the calculation formula of the area intersection ratio of every two extreme value regions is as follows:

wherein IOU is area cross-over ratio, I_sIs the area intersection of two extreme regions, U_sThe areas of two extreme regions are combined.

When the intersection ratio of the two extreme value areas is larger than a preset threshold value, the two extreme value areas are considered to be selected repeatedly, the area with the smaller average depth distance in the two extreme value areas is taken as a candidate head area, and the extreme value area with the larger average depth distance is deleted; and when the intersection ratio of the two extreme value regions is smaller than a preset threshold value, respectively taking the two extreme value regions as candidate head regions.

Since each candidate head region is a suspected head region, in order to further accurately determine the head region in the depth image and further count the pedestrian volume, this embodiment provides an implementation manner of obtaining each image to be detected in the amplitude image based on each candidate head region, performing target classification on the image to be detected, and obtaining a pedestrian volume statistical result of the target region, which can be specifically executed with reference to the following steps 1 to 3:

step 1: and acquiring images in the coordinate size areas from the amplitude images based on the coordinate sizes of the candidate head areas to obtain the images to be detected.

And acquiring the coordinate size of each candidate head region in the obtained depth image, and intercepting an image corresponding to the rectangle from the amplitude image according to the size of each candidate head region to be used as an image to be detected. For example, when the center point coordinates of the first candidate head region are (a, b), the pixel length is w1, and the pixel width is l1, a rectangular frame with the center point coordinates of (a, b), the pixel length is w1, and the pixel width is l1 is drawn from the amplitude image, the image in the rectangular frame is cut out to obtain a first image to be detected, and accordingly, the image to be detected corresponding to each candidate head region is cut out from the amplitude image.

Step 2: and inputting each image to be detected into a neural network model obtained by pre-training to obtain a classification result of each target to be detected.

The neural network model is obtained by training based on pre-labeled figure amplitude image samples. Above-mentioned amplitude image is a grey level image that image sensor gathered under laser illumination, does not receive ambient light to change and influences, can obtain effective data under dark and highlight, through replacing the training and the detection data of RGB image as the neural network model with amplitude image, can accurately discern the head region in waiting to detect the image, has promoted the accuracy of target classification.

In a specific implementation manner, the neural network model may be a convolutional neural network, and since the target classification only needs to perform a binary operation on the image to be detected (whether the image to be detected is a head region or a non-head region is detected), a deep model is not needed, but a shallow model belongs to a weak classifier and is prone to false detection, and in order to ensure the identification accuracy and the operation real-time performance of the classifier, the neural network model may adopt a shallow convolutional neural network based on a Boosting idea. The Boosting idea is equivalent to that all the weak classifiers are connected in parallel, an input picture independently runs on each weak classifier and obtains a prediction result, and weighting addition is carried out according to the error rate of each weak classifier to obtain a final result. The neural network model can adopt three shallow layer convolution neural networks with the same structure to be cascaded. For example, when the pixel size of the input picture is 48 × 48 × 1, the operation parameters of each layer of the convolutional neural network are as shown in the following table one:

table-convolution neural network each network layer operation parameter table

Network layer type	Network layer structure	Step length of sliding	Activating a function	Output size
					Convolutional layer	3×3×8	1	Relu6	46×46×8
Pooling layer	2×2	2	Relu6	23×23×8
					Convolutional layer	3×3×16	2	Relu6	11×11×16
Pooling layer	2×2	2	Relu6	5×5×16
					Full connection layer	400×128	--	Tanh	1×128
Full connection layer	128×2	--	Tanh	1×2

Considering that the head extreme point is affected by the body posture and is easy to generate missing detection, the observation data shows that when the head is normally missed, an extreme point area appears near the neighborhood, so when the identification result of the image to be detected corresponding to the candidate head area is a non-head area, the extreme area in the neighborhood of the candidate head area can be further subjected to target identification, and whether the candidate head area is the head area or not is judged.

And step 3: and determining a people flow statistical result in the target area based on the classification result of each target to be detected.

In one embodiment, the people flow rate statistic result includes the current number of people in the target area, the image sensor may acquire a panoramic image of the target area, determine the number of people in the target area based on the classification result of each target to be detected, and use the number of people as the people flow rate statistic result in the target area. And determining the number of the head areas in the target area obtained by identification according to the classification result of each target to be detected, wherein the number of the head areas is the number of people in the target area, so that the people flow statistical result in the target area is obtained.

In another embodiment, the statistical result of the people flow rate includes the people entering amount and the people leaving amount of the target area, and warning lines are set in each depth image collected by the image sensor; when determining the people flow statistic result in the target area based on the classification result of each target to be detected, the following steps a to c may be specifically referred to:

step a: and determining each human head area in the depth image based on the classification result of each target to be detected.

And when the target classification result of the target to be detected is the head region, acquiring the center point coordinate and the pixel size of the target to be detected in the amplitude image, and framing out each person head region in the depth image according to the center point coordinate and the pixel size of the target to be detected in the amplitude image.

Step b: and determining the moving track of each human head area in the target area based on each human head area in the continuous multi-frame depth images.

When counting the person entering amount and the person leaving amount of a target area, image sensors are required to be arranged at each entrance and exit of the target area, depth images and amplitude images are acquired in real time by using the image sensors, the head area of each person in each depth image is determined, and the moving track of each person head area is calculated according to the coordinates of the central point of each person head area in each continuous two-frame depth image. When the movement track of the head area of the person is determined, whether the head area of the person between the frames belongs to the same person needs to be matched, if the person is in approximately uniform motion in the walking process, namely the person with the coordinate of the center point of the head area of the person closest to the center point of the two frames belongs to the same person, the distance calculation formula is as follows:

where dis is the distance between the center points of the head regions of the object, x_1，iIs the former oneCenter point row coordinate, y, of ith person head region in frame depth image_1，iIs the coordinate of the central point column of the ith personal object head area in the previous frame depth image, x_2，iFor the line coordinate of the central point of the ith personal object head area in the depth image of the next frame, y_2，iThe coordinates of the central point column of the ith personal object head area in the depth image of the next frame. And obtaining the moving track of the person head area according to the coordinates of the central point of the person head area of the same person in the continuous multi-frame depth images.

Step c: and determining the person entering amount and the person leaving amount of the target area based on the moving track of the head area of each person in the target area and the position of the warning line.

The head area collision line detection is the basis for realizing the bidirectional statistics of the pedestrian flow, and by arranging a plurality of warning lines, when the head area of a person passes through the warning lines, the pedestrian movement direction can be judged to enter or leave according to the sequence of the head area of the person passing through the warning lines.

In a specific embodiment, the depth image may be provided with a first guard line, a second guard line and a third guard line. And when the moving track of the head area of the person sequentially generates intersection points with the first warning line, the second warning line and the third warning line, the counting value of the entering amount of the person is increased by one. The first warning line is a warning line far away from the center of the target area, the third warning line is a warning line close to the center of the target area, the second warning line is located between the first warning line and the third warning line, and the first warning line, the second warning line and the third warning line are parallel to each other. According to the moving track of the head area of the person and the coordinates of each warning line, the sequence of the head area of the person passing through each warning line can be obtained. Considering that the head area may be inaccurately positioned, the head may be triggered for multiple times by only setting a single warning line, three warning lines are set according to the pixel size of the head area, and the distance between the warning lines is greater than the pixel width of the head area, so as to avoid miscalculation. When the movement locus of the head region of the person intersects only the first warning line, it indicates that the target region is not entered.

And when the moving track of the head area of the person sequentially generates intersection points with the third warning line, the second warning line and the first warning line, the counting value of the person leaving amount is increased by one. Since the first warning line is a warning line far from the center of the target area and the third warning line is a warning line near the center of the target area, when the head area of the task passes through the third warning line, the second warning line and the first warning line, it is indicated that the person is far from the target area, that is, the target person is about to leave the target area, and the count value of the person leaving amount is incremented by one. Referring to the schematic view of the guard lines as shown in fig. 4, fig. 4 is an image of one entrance and exit of the target area, three guard lines of fig. 4 are a third guard line, a second guard line and a first guard line in sequence from top to bottom, the moving track of the head area of the person of fig. 4 is intersected with the third guard line and the second guard line in sequence, it is determined that the person is about to leave the target area, and the count value of the quantity of the person to leave is incremented by one.

And obtaining the person entering amount and the person leaving amount of the target area based on the counting value of the person entering amount and the counting value of the person leaving amount in the preset time. The preset time can be the opening time of the target area, and the people flow rate statistical result of the target area is obtained through the counting value of the people entering amount and the counting value of the people leaving amount counted in the opening time of the target area.

In practical application, image sensors can be respectively arranged in a target area and at an entrance and an exit of the target area, and the number of people in the target area, the entering amount of people and the leaving amount of people can be comprehensively counted according to depth images and amplitude images acquired by the image sensors.

In a specific implementation manner, an alarm line is further provided in the depth image, and this embodiment further provides an implementation manner of performing a security alarm according to the movement trajectory of the human head region: when the moving track of the head area of the person and the alarm line form an intersection point, triggering a preset alarm device to give an alarm. When the pedestrian passes through the alarm line, the controller controls the preset alarm device to give an alarm so as to warn the pedestrian to leave or send out a safety warning. The above-mentioned alarm line can be the alarm line that sets up according to the regional actual demand of target, set up the regional alarm area (this alarm area can be arbitrary shape) of forbidding pedestrian's entering in the target area, the marginal profile of alarm area is provided with the alarm line, can obtain the functional relation formula of alarm line according to the alarm area, whether the curve that removes orbit and functional relation formula can be judged to the functional relation formula according to pedestrian's personage head regional removal orbit and alarm line's functional relation formula has the nodical point, the personage head region when pedestrian contacts the alarm line, when removing the orbit and producing the nodical point with the alarm line promptly, show that the pedestrian is about to get into the alarm area, trigger the warning. The preset alarm device can alarm in a sound, light or character reminding mode and the like, and can remind pedestrians in alarm modes such as warning sound, reporting and not entering or flashing red light and the like.

According to the people flow statistical method provided by the embodiment, the influence of ambient light on imaging is avoided by collecting the depth image and the amplitude image, the rapid detection and identification of the head region are realized, the people flow of the target region is calculated in real time, the people entering amount and the people leaving amount of the target region can be accurately calculated according to the moving track of the head region, and the accuracy of people flow calculation is improved.

Corresponding to the people flow rate statistical method provided by the above embodiment, an embodiment of the present invention provides a people flow rate statistical device, which is shown in fig. 5 as a schematic structural diagram of the people flow rate statistical device, and the device includes the following modules:

an image acquisition module 51 for acquiring a depth image and an amplitude image of the target area based on the image sensor; the image sensor is arranged right above the target area.

And the region determining module 52 is configured to perform extreme point detection on the depth image based on the sliding window, and determine each candidate head region in the depth image.

The flow rate statistic module 53 is configured to obtain each to-be-detected image in the amplitude image based on each candidate head region, perform target classification on the to-be-detected image, and obtain a people flow rate statistic result of the target region; wherein the image to be detected comprises a candidate head region in the amplitude image.

The above-mentioned people flow statistics device that this embodiment provided, through the depth image and the amplitude image that acquire the target area, the influence of light overexposure and dark to formation of image has been avoided, because carry the depth distance information that each pixel corresponds in the depth image, through the extreme point based on the depth distance among the sliding window detection depth image, can acquire each candidate head region in the depth image, need not to carry out the target segmentation, through carrying out further target classification to the candidate head region that corresponds in the amplitude image, can confirm the head region quantity in the target area, thereby obtain people flow statistics result fast accurately, the calculation accuracy and the computational efficiency of people flow statistics have been promoted.

In an embodiment, the area determining module 52 is further configured to obtain a depth distance corresponding to each pixel point in the depth image; determining a window size of the sliding window based on the internal reference of the image sensor; positioning each extremum region in the depth image according to the window size of the sliding window and the depth distance corresponding to each pixel point; the extreme value area is a minimum value area of the depth distance corresponding to each pixel point in the depth image; and calculating the area intersection ratio among the extreme value regions, and determining the candidate head region according to the area intersection ratio.

In an embodiment, the area determining module 52 is further configured to determine an average depth distance in each sliding window based on a depth distance corresponding to each pixel point, a window size of the sliding window, and a preset sliding step, so as to obtain a sliding window mean matrix; determining the coordinates of the center point of each sliding window in the depth image based on the window size of the sliding window and the preset sliding step to obtain a coordinate index matrix corresponding to the sliding window mean matrix; determining the coordinate size of each extreme value area in the depth image based on the sliding window mean value matrix and the coordinate index matrix; the coordinate size comprises the center point coordinate of the extreme value area and the pixel size.

In one embodiment, the area determination module 52 is further configured to determine each minimum depth distance in the sliding window mean matrix; wherein the minimum depth distance is less than each adjacent average depth distance value; acquiring center point coordinates corresponding to the sliding window where each minimum depth distance is located from the coordinate index matrix to obtain center point coordinates of each extreme value area; obtaining the pixel length of each extreme value area based on each minimum depth distance, the internal reference of the image sensor and a first calculation formula; wherein, the pixel length is the number of pixel columns of the extremum region, and the first calculation formula is:

obtaining the pixel width of each extreme value area based on each minimum depth distance, the internal reference of the image sensor and a second calculation formula; wherein, the pixel width is the pixel line number of the extremum region, and the second calculation formula is:

wherein, W_iIs the pixel length of the ith extremum region, L_iIs the pixel width, Head, of the ith extremum region_wHead being a predetermined physical length of the Head_lIs a preset physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsThe number of pixel lines of the depth image, α is the horizontal field angle of the image sensor, β is the vertical field angle of the image sensor, H_iFor the ith minimum depth distance, round is the rounding operation.

In one embodiment, the area determination module 52 is further configured to obtain a preset physical size of the head; wherein the head physical dimensions comprise a head physical length and a head physical width; determining the number of pixels corresponding to the physical length of the head based on the internal reference of the image sensor and a third calculation formula; wherein the third calculation formula is:

determining the number of pixels corresponding to the physical width of the head based on the internal reference of the image sensor and a fourth calculation formula; wherein the fourth calculation formula is:

wherein, W_pixelThe number of pixels corresponding to the physical length of the head, L_pixelHead, the number of pixels corresponding to the physical width of the Head_wHead being the physical length of the Head_lIs the physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsThe number of pixel lines of the depth image, α is the horizontal field angle of the image sensor, β is the vertical field angle of the image sensor, H_installIs the vertical distance between the image sensor and the plane of the person, H_avgThe average pedestrian height is calculated, and round is rounding operation; and determining the window size of the sliding window based on the number of pixels corresponding to the physical length of the head and the number of pixels corresponding to the physical width of the head.

In an embodiment, the flow rate statistic module 53 is further configured to obtain an image in each coordinate size region from the amplitude image based on the coordinate size of each head candidate region, so as to obtain each image to be detected; inputting each image to be detected into a neural network model obtained by pre-training to obtain a classification result of each target to be detected; the neural network model is obtained by training based on pre-labeled figure amplitude image samples; and determining a people flow statistical result in the target area based on the classification result of each target to be detected.

In an embodiment, the traffic statistic module 53 is further configured to determine the number of people in the target area based on the classification result of each target to be detected; and taking the number of the people as the statistical result of the flow of the people in the target area.

In one embodiment, the people flow statistics include people entering amount and people leaving amount; a warning line is arranged in the depth image; the flow rate statistic module 53 is further configured to determine each human head region in the depth image based on the classification result of each target to be detected; determining the moving track of each human head area in the target area based on each human head area in the continuous multi-frame depth images; and determining the person entering amount and the person leaving amount of the target area based on the moving track of the head area of each person in the target area and the position of the warning line.

In one embodiment, the depth image is provided with a first warning line, a second warning line and a third warning line; the flow rate counting module 53 is further configured to increment a count value of the person entry amount by one when the movement trajectory of the person head area sequentially intersects the first warning line, the second warning line, and the third warning line; when the moving track of the head area of the person sequentially generates intersection points with the third warning line, the second warning line and the first warning line, the counting value of the person leaving amount is increased by one; and obtaining the person entering amount and the person leaving amount of the target area based on the counting value of the person entering amount and the counting value of the person leaving amount in the preset time.

In one embodiment, the depth image further includes a warning line, and the apparatus further includes:

and the alarm module is used for triggering a preset alarm device to alarm when the moving track of the head area of the person intersects with the alarm line.

The above-mentioned flow of people statistics device that this embodiment provided has avoided the ambient light to the influence of formation of image through gathering degree of depth image and amplitude image, has realized the short-term test and the discernment to the head region to the realization is to the real-time statistics of target area flow of people, can also be according to the movement track of head region, and accurate statistics target area's personage entering volume and personage leave the volume has promoted the accuracy of flow of people statistics.

The device provided by the embodiment has the same implementation principle and technical effect as the foregoing embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiment for the portion of the embodiment of the device that is not mentioned.

Corresponding to the method and device provided by the foregoing embodiments, the embodiment of the present invention further provides a people flow rate statistics system, including: the image sensor is arranged right above the target area.

The controller includes a processor and a storage device, as shown in the schematic controller structure diagram of fig. 6, the controller includes a processor 61 and a memory 62, where the memory stores a computer program that can run on the processor, and the processor executes the computer program to implement the steps of the method provided by the above-mentioned embodiment.

Referring to fig. 6, the controller further includes: a bus 64 and a communication interface 63, and the processor 61, the communication interface 63 and the memory 62 are connected by the bus 64. The processor 61 is for executing executable modules, such as computer programs, stored in the memory 62.

The Memory 62 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.

The bus 64 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

The memory 62 is configured to store a program, and the processor 61 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 61, or implemented by the processor 61.

The processor 61 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 61. The Processor 61 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like. The device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 62, and the processor 61 reads the information in the memory 62, and completes the steps of the method in combination with the hardware thereof.

Embodiments of the present invention provide a computer-readable medium, wherein the computer-readable medium stores computer-executable instructions, which, when invoked and executed by a processor, cause the processor to implement the method of the above-mentioned embodiments.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing embodiments, and is not described herein again.

The computer program product of the people flow rate statistical method, the device and the system provided by the embodiment of the invention comprises a computer readable storage medium storing a program code, wherein instructions included in the program code can be used for executing the method described in the foregoing method embodiment, and specific implementation can refer to the method embodiment, which is not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A people flow statistical method is characterized by comprising the following steps:

acquiring a depth image and an amplitude image of a target area based on an image sensor; wherein the image sensor is arranged right above the target area;

carrying out extreme point detection on the depth distance in the depth image based on a sliding window, and determining each candidate head region in the depth image;

acquiring each image to be detected in the amplitude image based on each candidate head region, and carrying out target classification on the image to be detected to obtain a people flow statistical result of the target region; and the image to be detected is a corresponding candidate head area in the amplitude image.

2. The method of claim 1, wherein the step of performing extreme point detection on the depth distance in the depth image based on the sliding window to determine each candidate head region in the depth image comprises:

acquiring a depth distance corresponding to each pixel point in the depth image;

determining a window size of the sliding window based on an internal reference of the image sensor;

positioning each extremum region in the depth image according to the window size of the sliding window and the depth distance corresponding to each pixel point; the extreme value region is a minimum value region of a depth distance corresponding to each pixel point in the depth image;

and calculating the area intersection ratio among the extreme value regions, and determining each candidate head region according to the area intersection ratio.

3. The method of claim 2, wherein the step of locating each extremum region in the depth image according to the window size of the sliding window and the depth distance corresponding to each pixel point comprises:

determining the average depth distance in each sliding window based on the depth distance corresponding to each pixel point, the window size of the sliding window and a preset sliding step to obtain a sliding window mean value matrix;

determining the coordinates of the central point of each sliding window in the depth image based on the window size of the sliding window and a preset sliding step to obtain a coordinate index matrix corresponding to the sliding window mean matrix;

determining the coordinate size of each extreme value area in the depth image based on the sliding window mean matrix and the coordinate index matrix; and the coordinate size comprises the center point coordinate and the pixel size of the extreme value area.

4. The method of claim 3, wherein the step of determining the coordinate size of each extremum region in the depth image based on the sliding window mean matrix and the coordinate index matrix comprises:

determining each minimum depth distance in the sliding window mean value matrix; wherein the minimum depth distance is less than each adjacent average depth distance value;

obtaining the center point coordinates corresponding to the sliding window where each minimum depth distance is located from the coordinate index matrix to obtain the center point coordinates of each extremum region;

obtaining the pixel length of each extremum region based on each minimum depth distance, the internal parameters of the image sensor and a first calculation formula; wherein the pixel length is the number of pixel columns of the extremum region, and the first calculation formula is:

5. The method of claim 2, wherein the step of determining the window size of the sliding window based on the internal parameters of the image sensor comprises:

acquiring a preset head physical size; wherein the head physical dimensions comprise a head physical length and a head physical width;

determining the number of pixels corresponding to the physical length of the head based on the internal reference of the image sensor and a third calculation formula; wherein the third calculation formula is:

wherein, W_pixelNumber of pixels, L, corresponding to the physical length of the head_pixelHead, the number of pixels corresponding to the physical width of the Head_wIs the physical length of the Head, Head_lIs the physical width of the head, I_colsIs the number of pixel columns, I, of the depth image_rowsIs the pixel line number of the depth image, alpha is the horizontal field angle of the image sensor, beta is the vertical field angle of the image sensor, H_{ins said all}Is the vertical distance between the image sensor and the plane of the person, H_avgThe average pedestrian height is calculated, and round is rounding operation;

and determining the window size of the sliding window based on the number of pixels corresponding to the physical length of the head and the number of pixels corresponding to the physical width of the head.

6. The method according to claim 1, wherein the step of obtaining each image to be detected in the amplitude image based on each head candidate region, performing target classification on the image to be detected, and obtaining the statistical result of the human flow rate of the target region comprises:

acquiring images in the coordinate size areas from the amplitude images based on the coordinate sizes of the candidate head areas to obtain images to be detected;

inputting each image to be detected into a neural network model obtained by pre-training to obtain a classification result of each target to be detected; the neural network model is obtained by training based on pre-labeled figure amplitude image samples;

and determining a people flow statistical result in the target area based on the classification result of each target to be detected.

7. The method according to claim 6, wherein the step of determining the statistical result of the human traffic in the target area based on the classification result of each target to be detected comprises:

determining the number of people in the target area based on the classification result of each target to be detected;

and taking the number of the people as a people flow statistical result in the target area.

8. The method of claim 6, wherein the people flow statistics include people entering volume and people leaving volume; a warning line is arranged in the depth image;

the step of determining the people flow statistic result in the target area based on the classification result of each target to be detected comprises the following steps:

determining each human head area in the depth image based on the classification result of each target to be detected;

determining the moving track of each human head area in the target area based on each human head area in continuous multi-frame depth images;

and determining the person entering amount and the person leaving amount of the target area based on the moving track of each person head area in the target area and the position of the warning line.

9. The method according to claim 8, wherein a first fence, a second fence and a third fence are provided in the depth image;

the step of determining the person entering amount and the person leaving amount of the target area based on the moving track of each person head area in the target area and the position of the warning line comprises the following steps:

when the moving track of the head area of the person sequentially generates intersection points with the first warning line, the second warning line and the third warning line, adding one to the count value of the entering amount of the person;

when the moving track of the head area of the person sequentially generates intersection points with the third warning line, the second warning line and the first warning line, adding one to the count value of the person leaving amount;

and obtaining the person entering amount and the person leaving amount of the target area based on the counting value of the person entering amount and the counting value of the person leaving amount in the preset time.

10. The method of claim 8, wherein an alarm line is further provided in the depth image, the method further comprising:

and when the moving track of the head area of the person and the alarm line form an intersection point, triggering a preset alarm device to alarm.

11. A people flow statistics system, comprising: an image sensor and a controller, the controller comprising a processor and a storage device;

the storage device has stored thereon a computer program which, when executed by the processor, performs the method of any one of claims 1 to 10.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of the preceding claims 1 to 10.