CN113034544A

CN113034544A - People flow analysis method and device based on depth camera

Info

Publication number: CN113034544A
Application number: CN202110297860.6A
Authority: CN
Inventors: 王卫芳; 龚国基; 朱毅博; 胡正
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2021-06-25
Also published as: WO2022193516A1

Abstract

The application is applicable to the field of machine vision, and provides a people stream analysis method and device based on a depth camera, wherein the method comprises the following steps: acquiring a pedestrian image set of a target detection area through a depth camera; the pedestrian image set comprises a plurality of frames of pedestrian images; importing the pedestrian images into a human body detection network, and outputting human body detection images corresponding to each frame of pedestrian images; determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images; generating pedestrian flow analysis information about the target detection area based on the motion tracking data of the number of pedestrian objects. The human body detection network is obtained through deep learning training, pedestrians can be accurately tracked, and images of the pedestrians are deep images, so that the problem that color images invade privacy of the pedestrians is solved, and information except plane dimensions is displayed.

Description

People flow analysis method and device based on depth camera

Technical Field

The application belongs to the field of machine vision, and particularly relates to a people stream analysis method and device based on a depth camera.

Background

With the improvement of public safety consciousness, public safety management and public safety early warning are indispensable, and people flow analysis is widely applied to public places with large pedestrian flow quantity, such as shopping malls, scenic spots, airports and the like, and provides important basis for public safety management, public safety early warning and business decision.

The main method for analyzing the pedestrian flow at present is to acquire a pedestrian image set of a region to be analyzed by the pedestrian flow by using a three-primary-color camera, achieve the purpose of pedestrian detection by identifying a single human body feature (such as a human face or a human head) contained in the pedestrian image set, and perform subsequent pedestrian flow analysis according to detected pedestrian information. However, in this case, there is a problem that the color image of the pedestrian acquired by the three-primary-color camera may violate the privacy of the pedestrian, and the color image may only represent information in a plane dimension, and the pedestrian cannot be accurately tracked.

Disclosure of Invention

The embodiment of the application provides a pedestrian flow analysis method and device based on a depth camera, a pedestrian image set can be obtained through the depth camera, multi-frame pedestrian images in the pedestrian image set are led into a human body detection network, so that motion tracking data of a pedestrian object in a target detection area of the depth camera are determined, pedestrian flow analysis information is further generated, the human body detection network is obtained through deep learning training, pedestrians can be accurately tracked, the input of the human body detection network is the pedestrian images obtained through the depth camera, namely the depth images, and therefore the problem that color images invade the privacy of pedestrians is solved.

In a first aspect, an embodiment of the present application provides a people stream analysis method based on a depth camera, including: acquiring a pedestrian image set of a target detection area through a depth camera; the pedestrian image set comprises a plurality of frames of pedestrian images; importing the pedestrian images into a human body detection network, and outputting human body detection images corresponding to each frame of pedestrian images; the human body detection network is obtained through deep learning training; determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images; generating pedestrian flow analysis information about the target detection area based on the motion tracking data of the number of pedestrian objects.

In a second aspect, an embodiment of the present application provides a people stream analysis apparatus based on a depth camera, including: the pedestrian image acquisition module is used for acquiring a pedestrian image set of a related target detection area through the depth camera; the pedestrian image set comprises a plurality of frames of pedestrian images; the human body detection network module is used for importing the pedestrian images into a human body detection network and outputting human body detection images corresponding to the pedestrian images of each frame; the human body detection network is obtained through deep learning training; the motion tracking data determining module is used for determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images; a pedestrian flow analysis information generation module for generating pedestrian flow analysis information about the target detection area based on the motion tracking data of the plurality of pedestrian objects.

In a third aspect, an embodiment of the present application provides a terminal device, including: a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method of any of the above first aspects when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, including: the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of the first aspects described above.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the method of any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the advantages that:

compared with the prior art, the pedestrian flow analysis method provided by the application can acquire a pedestrian image set through a depth camera, and leads a plurality of frames of pedestrian images in the pedestrian image set into a human body detection network, so as to determine the motion tracking data of a pedestrian object in a target detection area of the depth camera, and further generate the pedestrian flow analysis information, wherein the human body detection network is obtained through deep learning training and can accurately track pedestrians, and the input of the human body detection network is the pedestrian image acquired by the depth camera, namely the depth image, so that the problem that the privacy of pedestrians is easily invaded due to the adoption of color images during the pedestrian flow analysis is avoided, and the problem that only information on a plane dimension can be reflected due to the fact that the depth image can reflect the depth information during the pedestrian flow analysis through the color images in the prior art is solved, the problem that the pedestrian cannot be accurately tracked.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of an implementation of a people stream analysis method provided in a first embodiment of the present application;

FIG. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 3 is a flowchart of an implementation of a people flow analysis method according to a second embodiment of the present application;

FIG. 4 is a schematic diagram of a human body detection network according to an embodiment of the present application;

fig. 5 is a flowchart of an implementation of a people flow analysis method according to a third embodiment of the present application;

fig. 6 is a flowchart of an implementation of a people flow analysis method according to a fourth embodiment of the present application;

fig. 7 is a flowchart of an implementation of a people flow analysis method according to a fifth embodiment of the present application;

fig. 8 is a flowchart of an implementation of a people flow analysis method according to a sixth embodiment of the present application;

fig. 9 is a schematic view of an effect of a monitoring screen provided in a sixth embodiment of the present application;

fig. 10 is a schematic flow chart of a people flow analysis method according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a human flow analysis apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the embodiment of the present application, the main execution body of the flow is a terminal device. The terminal devices include but are not limited to: the device comprises a server, a computer, a smart phone, a tablet computer and the like, and can execute the people flow analysis method provided by the application. Fig. 1 shows a flowchart of an implementation of a people stream analysis method provided in a first embodiment of the present application, which is detailed as follows:

in S101, a pedestrian image set of a target detection region is acquired by a depth camera.

In the present embodiment, the pedestrian image set includes a plurality of frames of pedestrian images.

In the present embodiment, generally, the depth camera is disposed at a position where the target detection area can be observed, and for example, the depth camera may specifically take a downward shot of the target detection area where the pedestrian flow statistics needs to be performed, so that the moving track of the pedestrian appearing in the target detection area may be acquired at an angle of the downward shot to perform the pedestrian flow statistics and analysis. The shooting picture of the depth camera is specifically a picture related to the target detection area. In order to subsequently detect the pedestrian object in the target detection area and generate the people stream analysis information based on the detection result, multiple frames of continuous images of the target detection area are needed, so the pedestrian image set is obtained through the depth camera.

In a possible implementation manner, the manner of acquiring the pedestrian image set may specifically be: and controlling the depth camera to acquire multiple frames of pedestrian images corresponding to the target detection area at preset acquisition time intervals, and encapsulating a plurality of continuous pedestrian images in a preset number according to the sequence of the acquisition time of each frame of pedestrian object to obtain the pedestrian image set, for example, encapsulating the pedestrian images within the same minute at the acquisition time into a pedestrian image set.

It should be understood that the pedestrian image in the pedestrian image set is a depth image, and the pedestrian image only contains the depth value corresponding to each pixel, but does not contain the three-primary-color pixel value of each pixel, so as to reduce the information amount of the image input by the subsequent human body detection network and improve the efficiency of subsequently generating the people stream analysis information.

In S102, the pedestrian images are imported into a human body detection network, and a human body detection image corresponding to each frame of the pedestrian images is output.

In this embodiment, the human body detection network is obtained through deep learning training; specifically, collecting multi-frame training images, and marking a true value human body area on each frame of training image; the training image is imported into the human body detection network, a predicted human body region is divided from the training image based on internal parameters of the human body detection network, a mask loss value of the training is calculated based on the predicted human body region and the true human body region, the internal parameters of the human body detection network are adjusted based on the mask loss value, when the mask loss value is smaller than a preset adjustment threshold value, the human body detection network is identified to be adjusted, and in S102, the human body detection image is identified through the trained human body detection network. It should be understood that in S102, the input of the human body detection network is the above-mentioned pedestrian image, and the output is the above-mentioned human body detection image, which is an image in which a plurality of human body regions are marked in the pedestrian image. It should be understood that the human body detection image may include at least one human body region, that is, the human body detection network detects that a pedestrian object exists in the target detection region; the human body detection image may not include a human body region, that is, the human body detection network detects that no pedestrian object exists in the target detection region.

In a possible implementation manner, the manner of obtaining the human body detection image through the human body detection network may specifically be: the human body detection network comprises a feature extraction layer, a region division layer and a region identification layer; leading the pedestrian image into the feature extraction layer to obtain a feature image; importing the characteristic image into the region division layer to obtain mask regions of all objects in the pedestrian image; importing the characteristic image into the region identification layer to obtain classification information of each object in the pedestrian image (in order to improve the output efficiency of the human body detection network, the value range of the classification information only comprises a human body or a non-human body); if the classification information of the object is a human body, identifying the mask region of the object as the human body region; and marking all human body areas in the pedestrian image to obtain the human body detection image. For example, the human body detection image is obtained by marking all the human body regions in the pedestrian image, specifically, the depth values of pixels in the pedestrian image except for the pixels in the human body regions are marked as null, the human body detection image is obtained, that is, only the pixels in the human body regions in the human body detection image are associated with corresponding depth values, when the motion tracking data is determined according to the human body detection image, the interference terms can be reduced, only the human body regions are considered, and the efficiency is improved.

It should be understood that the feature extraction layer may be a Convolutional Neural Network (CNN), the region division layer and the region identification layer may be part of a Mask-RCNN (Mask-resolution probabilistic Neural network) model, that is, the output of the region division layer is the output of the Mask-RCNN model through a Mask layer (Mask), and the output of the region identification layer is the output of the Mask-RCNN network through a classification layer (class).

In S103, based on the human body detection images corresponding to the multiple frames of continuous pedestrian images, motion tracking data of a plurality of pedestrian objects included in the target detection region is determined.

In the present embodiment, the motion tracking data includes coordinate information of the pedestrian object within each frame of the pedestrian image. Taking the example of determining motion tracking data of a human body detection image corresponding to one frame of human body image as an example, the motion tracking data may be a region center of gravity of a human body region, in this case, the terminal device may set weights of pixels in the human body region to be the same value, so as to determine a pixel coordinate of the region center of gravity of the human body region, where the pixel coordinate refers to a coordinate of the region center of gravity in an image pixel coordinate system corresponding to the human body image; determining a world coordinate of the area gravity center according to the pixel coordinate of the area gravity center and the depth value corresponding to the area gravity center, wherein the world coordinate refers to a coordinate of the area gravity center on a world coordinate system corresponding to the target detection area; specifically, according to the pixel coordinates, the depth values and the internal parameters of the depth camera of the area gravity center, calculating to obtain the camera coordinates of the area gravity center, wherein the camera coordinates refer to the coordinates of the area gravity center in a camera coordinate system corresponding to the depth camera; and calculating the world coordinate of the gravity center of the region according to the camera coordinate of the gravity center of the region and the external parameters of the depth camera (namely, a conversion matrix from a camera coordinate system to a world coordinate system).

The determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images specifically further includes: matching the regional barycenters of all human body regions in the first pedestrian image with the regional barycenters of all human body regions in the second pedestrian image based on the obtained world coordinates, if the matching is successful, matching the human body feature information corresponding to the two successfully matched human body regions, if the matching is successful, associating the two human body regions with the same pedestrian object, wherein the first pedestrian image and the second pedestrian image are two continuous frames of pedestrian images, namely the acquisition time of the first pedestrian image is closest to that of the second pedestrian image, and the human body feature information corresponding to the human body regions can be specifically the contour information of the human body regions; and identifying the world coordinates of the region gravity center of one of the successfully matched human body regions as the coordinate information of the related pedestrian object in the pedestrian image.

It should be understood that the position of the depth camera is unchanged in the process of acquiring the pedestrian image set, and both the internal parameters and the external parameters of the depth camera can be obtained by calibrating the depth camera.

In S104, based on the motion tracking data of the several pedestrian objects, pedestrian flow analysis information about the target detection area is generated.

In this embodiment, the people stream analysis information includes coordinate information of the pedestrian objects in each frame of pedestrian image, which is used to represent positions of the pedestrian objects in each frame of pedestrian image, and may further include walking speeds of the pedestrian objects in each frame of pedestrian image, which may be calculated based on the coordinate information of the pedestrian objects in each frame of pedestrian image.

In a possible implementation manner, the generating people stream analysis information about the target detection area based on the motion tracking data of the pedestrian objects may specifically include: taking the motion tracking data of a pedestrian object as an example, calculating the walking speed of the pedestrian object in each frame of pedestrian image based on the motion tracking data, specifically, taking an example pedestrian image as an example, calculating the total displacement amount of the pedestrian object in the three frames of adjacent pedestrian images according to the motion tracking data (including the coordinate information) of the pedestrian object in the three frames of adjacent pedestrian images based on the three frames of adjacent pedestrian images including the example pedestrian image; calculating the average speed of the pedestrian object in the three adjacent pedestrian images based on the total displacement amount and the maximum acquisition time interval difference in the three adjacent pedestrian images, and identifying the average speed as the walking speed of the pedestrian object in the example pedestrian image.

Specifically, if the walking speed of the pedestrian object in the example pedestrian image is zero, the pedestrian object is identified to be in the stay state in the example pedestrian image; and identifying the maximum acquisition time interval difference of the continuous multi-frame pedestrian images of which the pedestrian object is always in the staying state as the staying time period of the pedestrian object in the target detection area. It should be understood that the people flow analysis information includes the staying time periods of several pedestrian objects in the target detection area.

It should be understood that the people flow analysis information may also include the number of pedestrians within each frame of pedestrian image, determined according to the number of pedestrian objects within the pedestrian object; the pedestrian flow analysis information may further include a pedestrian flow number of the target detection area, where the pedestrian flow number refers to a total number of different pedestrian objects in the pedestrian image set.

In the embodiment, a pedestrian image set can be acquired by a depth camera, a plurality of frames of pedestrian images in the pedestrian image set are imported into a human body detection network, thereby determining motion tracking data of the pedestrian object within the target detection area of the depth camera, further generating pedestrian flow analysis information, wherein the human body detection network is obtained through deep learning training, can accurately track pedestrians, the input of the human body detection network is a pedestrian image obtained by the depth camera, namely the depth image, so that the problem that the privacy of pedestrians is easily violated due to the adoption of a color image when people stream analysis is carried out is solved, and the problem that only information on a plane dimension can be reflected and pedestrians cannot be accurately tracked when people stream analysis is carried out through the color image in the prior art can be solved because the depth image can reflect depth information.

Fig. 2 is a schematic diagram of an application scenario provided by an embodiment of the present application, referring to fig. 2, where the depth camera shoots towards the target detection area; the target detection area comprises a pedestrian object; acquiring a pedestrian image set related to the target detection area through the depth camera; then leading the multi-frame pedestrian images in the pedestrian image set into the human body detection network, enabling the human body detection network to output corresponding human body detection images based on the pedestrian images, and analyzing the human body detection images to obtain motion tracking data corresponding to the pedestrian object; and generating people flow analysis information about the target detection area according to the motion tracking data of a plurality of pedestrian objects in the target detection area.

Fig. 3 shows a flowchart of an implementation of a people stream analysis method according to a second embodiment of the present application. Referring to fig. 3, with respect to the embodiment shown in fig. 1, the people flow analysis method S102 provided in this embodiment includes steps S1021 to S1022, which are detailed as follows:

further, the importing the pedestrian image into a human body detection network, and outputting a human body detection image corresponding to each frame of the pedestrian image includes:

in S1021, the pedestrian image is imported into a human body detection network, so that the human body detection network divides a plurality of human body regions in the pedestrian image based on the depth values of the pixels in the pedestrian image.

In this embodiment, the human body detection network is a neural network model obtained through deep learning training, specifically, the human body detection network is a classification model, and classifies each region in the pedestrian image, and identifies a human body region of which the category is a human body, which may specifically refer to the above description of S102, and is not repeated herein.

In one possible implementation, the human detection network includes a region division layer and a region identification layer; as described above, the pedestrian image is imported into the human body detection network, so that the human body detection network divides a plurality of human body regions in the pedestrian image based on the depth values of the pixels in the pedestrian image, specifically refer to fig. 4, and fig. 4 shows a schematic diagram of the human body detection network provided in an embodiment of the present application. Referring to fig. 4, the pedestrian image is imported into the region division layer of the human body detection network, and based on the depth value of each pixel in the pedestrian image and an edge detection algorithm, edge detection is performed on the pedestrian image to obtain a plurality of edge contour lines; performing area division on the pedestrian image based on the edge contour lines to obtain a plurality of mask areas in the pedestrian image, wherein each mask area corresponds to an object, and the edge detection algorithm can be a depth difference detection algorithm or a gradient difference detection algorithm; and identifying whether the object (category) corresponding to each mask region is a human body or not based on the parameters of the region division layer in the human body detection network, and if the object is a pedestrian object, identifying the mask region of the object as a human body region and marking all the human body regions in the pedestrian image.

In S1021, different region identifiers are configured for the human body regions based on the positions of the human body regions in the pedestrian image, and the pedestrian image marked with the human body regions and the region identifiers corresponding to the human body regions is used as the human body detection image.

In this embodiment, different region identifiers are configured for the marked human body regions, so that the same pedestrian object can be identified in the multi-frame pedestrian images in the following process. In a possible implementation manner, configuring different area identifiers for each human body area based on the positions of the human body areas in the pedestrian image, and taking the pedestrian image marked with the human body area and the area identifier corresponding to the human body area as the human body detection image may specifically be: according to the sequence of the pedestrian object from left to right and from top to bottom, configuring a serial number for each human body area, wherein the serial number is the area identifier; and identifying the pedestrian image marked with the human body area and the area identifier corresponding to the human body area as the human body detection image.

In this embodiment, when the pedestrian image is imported into the human body detection network, each mask region is divided based on the depth value of each pixel in the pedestrian object, whether each mask region is a human body region is identified, and each human body region is numbered to obtain the human body detection image, so that the same pedestrian object is identified in the following multi-frame pedestrian images.

Fig. 5 shows a flowchart of an implementation of a people stream analysis method according to a second embodiment of the present application. Referring to fig. 5, with respect to the embodiment shown in fig. 3, the people flow analysis method S103 provided in this embodiment includes steps S501 to S505, which are detailed as follows:

further, the determining, based on the human body detection images corresponding to the multiple frames of continuous pedestrian images, motion tracking data of a plurality of pedestrian objects included in the target detection region includes:

in S502, based on the human body region in the human body detection image, the position of the center of gravity of the human body corresponding to the region identifier is determined.

In this embodiment, it should be noted that the difference from the embodiment described in S103 above is that the present embodiment needs to determine the position of the center of gravity of the human body, rather than the position of the center of gravity of the region, and it should be understood that when the pedestrian object has no other obstruction to block it in the pedestrian image, the position of the center of gravity of the region should be the same as the position of the center of gravity of the human body.

In a possible implementation manner, the determining, based on the human body region in the human body detection image, the position of the center of gravity of the human body corresponding to the region identifier may specifically be: taking a human body detection image corresponding to one frame of human body image as an example for explanation, determining the human body gravity center position of each human body area in the human body detection image, specifically, importing the human body area into a human body gravity center identification network to obtain the human body gravity center pixel coordinates and the human body gravity center depth value of the human body area; based on the pixel coordinates of the center of gravity of the human body and the depth value of the center of gravity of the human body, the world coordinates of the center of gravity of the human body in the human body region, that is, the position of the center of gravity of the human body, is determined, and the specific method for determining the world coordinates of the center of gravity of the human body may refer to the related step of determining the world coordinates of the center of gravity of the region in S103, which is not described herein again.

Exemplarily, the human body barycentric pixel coordinates and the human body barycentric depth values of the human body region are determined based on the contour information of the human body region and the depth values of the respective pixels in the human body region; specifically, the human body gravity center of the human body region is determined based on the contour information of the human body region, and the pixel coordinates of the human body gravity center are obtained; and identifying the average value of the depth values of all pixels in the human body area as the human body gravity center depth value.

It should be understood that, the above-mentioned determining the human body gravity center of the human body region based on the contour information of the human body region may specifically be: by analyzing the contour information of the human body region, at least one human body feature point in the human body region is identified, and the human body gravity center of the human body region is determined according to the human body feature point (such as a head). The more human feature points are identified in the body region, the more accurate the determination of the center of gravity of the body.

In S503, any two adjacent frames of human body detection images are matched based on the position of the center of gravity of the human body identified by the region and the human body feature information.

In this embodiment, the matching between the two adjacent frames of human body detection images refers to acquiring two frames of human body detection images with the closest time interval. In a possible implementation manner, the matching of any two adjacent frames of human body detection images based on the position of the center of gravity of the human body and the human body feature information of the region identifier specifically means: comparing the human body gravity center positions of all the area identifications in the one frame of human body detection image with the human body gravity center positions of all the area identifications in the other frame of human body detection image one by one; if two area identifications exist in the different human body detection images respectively, and the distance between the gravity center positions of the human bodies of the two area identifications is smaller than a preset distance threshold, comparing the human body characteristic information of the human body areas corresponding to the two area identifications, and if the difference value between the human body characteristic information of the human body areas corresponding to the two area identifications is smaller than a preset difference threshold, identifying the two area identifications as two area identifications which are successfully matched; the body feature information generally refers to contour information of the body region.

In S504, if two successfully matched region identifiers exist in the two adjacent frames of human body detection images, the human body regions corresponding to the two region identifiers are identified as being associated with the same pedestrian object.

In this embodiment, if two successfully matched region identifiers exist in the two adjacent frames of human body detection images, the human body regions corresponding to the two region identifiers may be considered as human body regions of the same pedestrian object, that is, the pedestrian objects associated with the human body regions corresponding to the two region identifiers are identified as the same pedestrian object.

Specifically, a human body region with a region identifier of 1 in a first frame of human body detection image is associated with a first pedestrian object, a human body gravity center position with the region identifier of 1 in the first frame of human body detection image is taken as a reference, a region identifier matched with the region identifier of 1 in the first frame of human body detection image is inquired in the other human body detection image, and the matched region identifier is associated with the first pedestrian object. And analogizing until all the human body areas in all the human body detection images are associated with corresponding pedestrian objects.

Taking one matching as an example for explanation, exemplarily, a human body region with a region identifier of 1 in the first frame of human body detection image and a pedestrian object associated with a human body region with a region identifier of 2 in the second frame of human body detection image are the same, and the pedestrian object associated with the two human body regions can be identified as a first pedestrian object, that is, the first pedestrian object moves to the human body region with a region identifier of 2 from the human body region with a region identifier of 1 between the first frame of human body detection image and the second frame of human body detection image.

In S505, motion tracking data of the pedestrian object is determined according to the human body region associated with the human body detection image of the pedestrian object in each frame.

In this embodiment, after the step S504 is executed, the human body region associated with the human body detection image in each frame of each pedestrian object may be obtained, and only the specific position of the human body region associated with the pedestrian object in each frame of the human body detection image may be determined according to the data in order to determine the motion of the pedestrian object.

In a possible implementation manner, the determining the motion tracking data of the pedestrian object according to the human body region associated with the human body detection image of the pedestrian object in each frame may specifically be: and extracting the human body gravity center position of the human body region associated with the same pedestrian object in each frame of human body detection image, and converting the human body gravity center position into the world coordinate of the pedestrian object at the acquisition time of each frame of human body detection image, namely obtaining the motion trail of the pedestrian object in the acquisition time period of the pedestrian image set in the target detection region, namely the motion tracking data of the pedestrian object.

In this embodiment, the human body barycenter and the human body feature information of two adjacent frames of human body detection images are compared to determine whether the same pedestrian object is associated between the region identifiers of the two adjacent frames of human body detection images, so as to determine the human body region corresponding to the same pedestrian object in each frame of human body detection image, and determine the motion tracking data of the pedestrian object.

Further, the people stream analysis method S502 provided in this embodiment further includes S501 before, which is detailed as follows:

before determining the position of the center of gravity of the human body corresponding to the region identifier based on the human body region in the human body detection image, the method further includes:

in S501, the depth camera is calibrated to obtain internal parameters of the depth camera.

In this embodiment, generally, the depth camera is calibrated to obtain an internal parameter of the depth camera, where the internal parameter can be used to convert a position of a point in a human body detection image into a relative position of the point with respect to the depth camera, and is specifically represented as: according to the internal parameters, the pixel coordinates of a point on the pixel coordinate system corresponding to the human body detection image can be converted into the camera coordinates on the camera coordinate system corresponding to the depth camera. In the process of calibrating the depth camera, the pixel coordinates of a plurality of feature points and the camera coordinates need to be obtained to obtain a conversion model for converting the pixel coordinates into the camera coordinates, and the parameters of the conversion model are the internal parameters of the depth camera. It should be appreciated that the depth camera is stationary when it is calibrated and subsequent human detection images are acquired by the depth camera.

Further, the people stream analysis method S502 provided in this embodiment includes S5021 to S5023, which are detailed as follows:

the determining the position of the center of gravity of the human body corresponding to the region identifier based on the human body region in the human body detection image includes:

in S5021, pixel coordinates of the center of gravity of the human body are calculated based on the depth values of the respective pixels in the human body region.

In this embodiment, the pixel coordinates of the center of gravity of the human body refer to coordinates of the center of gravity of the human body corresponding to the human qi region on the pixel coordinate system corresponding to the human body detection image. The implementation of S5021 may specifically refer to the relevant step of determining the coordinates of the gravity center pixel of the human body in S502, which is not described herein again.

In S5022, the pixel coordinates are converted into three-dimensional coordinates based on the internal parameters of the depth camera and the depth value of the human body center of gravity in the human body detection image.

In this embodiment, the three-dimensional coordinates may refer to camera coordinates of the center of gravity of the human body on a camera coordinate system of the depth camera, and the pixel coordinates are converted into the three-dimensional coordinates based on the internal parameters of the depth camera and the depth value of the center of gravity of the human body in the human body detection image, specifically referring to the following formula:

wherein, K is an internal parameter of the camera, Zc is a depth value of the center of gravity of the human body in the human body detection image, and (u, v) and (Xc, Yc, Zc) are pixel coordinates and camera coordinates of the center of gravity of the human body; it should be understood that the camera coordinates (Xc, Yc, Zc) of the center of gravity of the human body, i.e., the three-dimensional coordinates described above, can be determined by solving the above formula.

In S5023, the three-dimensional coordinates of the center of gravity of the human body are recognized as the position of the center of gravity of the human body corresponding to the region identifier.

In this embodiment, the three-dimensional coordinates of the center of gravity of the human body obtained in step S5022 are the camera coordinates of the center of gravity of the human body on the camera coordinate system of the depth camera, and the position of the center of gravity of the human body in this embodiment may refer to the camera coordinates of the center of gravity of the human body on the camera coordinate system of the depth camera, or may refer to the world coordinates of the center of gravity of the human body on the world coordinate system, which may be specifically set according to the user' S requirements. If the human body center of gravity position refers to the camera coordinates of the human body center of gravity on the camera coordinate system of the depth camera, the camera coordinates (Xc, Yc, Zc) of the human body center of gravity are identified as the human body center of gravity position. If the position of the center of gravity of the human body refers to the world coordinates of the center of gravity of the human body on the world coordinate system, the camera coordinates of the center of gravity of the human body are converted into the world coordinates according to the external parameters of the depth camera, which can be determined by the relative position between the depth camera and the world coordinate system.

In this embodiment, the depth camera is calibrated to obtain an internal parameter of the depth camera, and the pixel coordinates of the center of gravity of the human body are converted into three-dimensional coordinates by the internal parameter, so as to determine the position of the center of gravity of the human body.

Fig. 6 shows a flowchart of an implementation of a people stream analysis method according to a second embodiment of the present application. Referring to fig. 6, with respect to the embodiment shown in fig. 1, the people flow analysis method provided in this embodiment includes steps S601 to S603, which are detailed as follows:

further, before the step of importing the pedestrian image into a human body detection network and outputting a human body detection image corresponding to each frame of the pedestrian image, the method further includes: preprocessing the pedestrian image, including:

in S601, contour extraction is performed on the pedestrian image based on the depth value of each pixel in the pedestrian image, so as to obtain a contour line of the pedestrian image.

In this embodiment, the above-mentioned performing contour extraction on the pedestrian image based on the depth value of each pixel in the pedestrian image to obtain the contour line of the pedestrian image may specifically be: the method includes the steps of carrying out edge detection on the pedestrian image based on the depth value of each pixel in the pedestrian image, exemplarily, processing the depth value of each pixel of the pedestrian image based on a Prewitt operator, identifying edge pixels of the pedestrian image, and determining and extracting a contour line of the pedestrian image according to all the edge pixels. The contour line is a line made up of a plurality of edge pixels.

In S602, foreground extraction is performed on the pedestrian image based on a background image and the contour line, so as to obtain a foreground region of the pedestrian image.

In this embodiment, the background image is determined after the pedestrian image set is imported into a background modeling algorithm, or is acquired based on the depth camera; generally, the background image may be captured by the depth camera when there is no pedestrian in the target detection area; in particular, the background image may be obtained by introducing a plurality of pedestrian image sets into a background modeling algorithm to identify background pixels corresponding to respective pixel coordinates (for representing the positions of pixels in the pedestrian image) in each pedestrian image, and constructing the background image based on all the identified background pixels.

In this embodiment, the foreground extraction of the pedestrian image based on the background image and the contour line to obtain the foreground region of the pedestrian image may specifically be: comparing the background image with the pedestrian image, and if the difference of the depth values of the same pixel coordinate in the pedestrian image and the background image is greater than a preset comparison threshold value, identifying the pixel corresponding to the pixel coordinate as a foreground pixel; and (4) enclosing all foreground pixels into at least one foreground area through the contour line, namely obtaining the foreground area of the pedestrian image.

In S603, the depth values of the pixels in the foreground region are normalized, and a preprocessed pedestrian image is obtained according to the normalized depth values of the pixels in the foreground region.

In this embodiment, the foreground region is used for the human body detection network to determine the human body region according to the depth value of each pixel in the foreground region after normalization. The pedestrian image is a depth image acquired by the depth camera, the range of the depth value of each pixel in the depth image is larger, in order to improve the output efficiency of the subsequent human body detection network, the depth values of all pixels in the pedestrian image can be normalized, the data actually needed to be processed in the subsequent human body detection network is only the foreground image of the pedestrian image, therefore, the depth value of each pixel in the foreground region is specifically normalized, and a pre-processed pedestrian image is obtained according to the normalized depth value of each pixel in the foreground region, specifically, the depth value of the pixel in the non-foreground region in the pre-processed pedestrian image is zero or null, so that the human body detection network can be free from processing the pixel in the non-foreground region when the pre-processed pedestrian image is subsequently imported into the human body detection network.

In this embodiment, the preprocessing is performed on the pedestrian image before the pedestrian image is imported into the human body detection network, so as to reduce data at the input end of the human body detection network and improve efficiency; specifically, the contour line of the pedestrian image is obtained by extracting the contour of the pedestrian image, so that the foreground area of the pedestrian image is convenient to determine; only the depth value of each pixel in the foreground area is normalized, and the preprocessed pedestrian image is obtained according to the normalized depth value of each pixel in the foreground area, so that the preprocessed pedestrian image is processed by a subsequent human body detection network, the input quantity is reduced, the efficiency is improved, the human body detection network can be converged more quickly during training, and the training process is accelerated. It should be understood that the preprocessed pedestrian image is imported into the human detection network, and the human detection network only needs to identify the human body region in the foreground region.

Fig. 7 shows a flowchart of an implementation of a people stream analysis method according to a second embodiment of the present application. Referring to fig. 7, with respect to any one of the above method embodiments, the people flow analysis method S104 provided in this embodiment includes S1041 to S1042, which are specifically detailed as follows:

further, the generating people flow analysis information about the target detection area based on the motion tracking data of the number of pedestrian objects comprises:

in S1041, based on the motion tracking data of the pedestrian objects, the pedestrian coordinates and the pedestrian speed of the pedestrian object in each frame of the pedestrian image in the target detection area are respectively determined.

In this embodiment, the pedestrian coordinates may specifically be world coordinates of the pedestrian object at the time of acquisition of each frame of the pedestrian image; the pedestrian speed specifically refers to the walking speed of the pedestrian object at the acquisition moment of the pedestrian image; the motion tracking data is the motion data of the pedestrian object in the target detection area, and records the position of the pedestrian object at the acquisition time of each frame of pedestrian image and the motion track during the acquisition period of the pedestrian image set.

In this embodiment, the determining the pedestrian coordinates and the pedestrian speed of the pedestrian object in each frame of the pedestrian image in the target detection area based on the motion tracking data of the pedestrian objects may specifically be: taking a pedestrian object as an example for explanation, the world coordinates of the pedestrian object in each frame of pedestrian image are obtained from the motion tracking data of the pedestrian object, and are identified as the above-mentioned pedestrian coordinates, and the world coordinates of the pedestrian object in each frame of pedestrian image are obtained in S103, which is not described herein again; the calculation formula of the pedestrian speed of the pedestrian object in each frame of the pedestrian image is as follows:

wherein v is_iThe pedestrian speed of the pedestrian object in the ith frame of the pedestrian image; | s_is_i-1I is the position (x) of the pedestrian object in the ith frame of the pedestrian image_i,y_i,z_i) And the position (x) of the pedestrian image in the i-1 th frame_i-1，y_i-1，z_i-1) The distance between them; dt is the acquisition interval for which the depth camera acquires images of pedestrians. It should be understood that in calculating the pedestrian speed of the pedestrian object in the last frame of the pedestrian image, the calculation is generally made according to the distance between the position of the pedestrian object in the last frame of the pedestrian image and the position of the pedestrian image in the frame preceding the last frame of the pedestrian image.

In S1042, the pedestrian coordinates and the pedestrian speed are imported into a preset pedestrian flow analysis template, so as to obtain the pedestrian flow analysis information.

In this embodiment, the people flow analysis template is preset and can be adjusted according to requirements. The pedestrian flow analysis information includes the pedestrian coordinates and the pedestrian speed of the pedestrian object at each frame of the pedestrian object acquisition time, illustratively,

it should be understood that the people flow analysis template can be adjusted according to requirements; the pedestrian flow analysis information may further include a total number of pedestrian objects within the target detection region within an acquisition period of the pedestrian object, which may be determined by traversing the pedestrian objects within all the pedestrian images; the pedestrian flow analysis information may further include an abnormal staying pedestrian object, and an abnormal staying position and an abnormal staying duration corresponding to the abnormal staying pedestrian object, where the abnormal staying pedestrian object is a pedestrian object that is output after the pedestrian coordinates and the pedestrian speed of each pedestrian object in each frame of pedestrian image are led into the pedestrian flow analysis template, specifically, a pedestrian object whose pedestrian speed is zero in a continuous multi-frame pedestrian image and whose frame number is greater than a preset threshold is identified as the abnormal pedestrian object in each pedestrian object, and the position of the abnormal pedestrian object in the pedestrian image whose pedestrian speed is zero is identified as the abnormal staying position, and the abnormal staying duration is calculated based on the frame number of the continuous multi-frame pedestrian image.

Fig. 8 shows a flowchart of an implementation of a people stream analysis method according to a second embodiment of the present application. Referring to fig. 8, in comparison with the embodiment shown in fig. 7, the people flow analysis method provided in this embodiment includes steps S801 to S802, which are detailed as follows:

further, after generating the people flow analysis information about the target detection area based on the motion tracking data of the pedestrian objects, the method further includes:

in S801, a first monitoring diagram corresponding to each frame of the pedestrian image is sequentially displayed, and a monitoring screen of the target detection area is generated.

In this embodiment, the first monitoring diagram includes the pedestrian image and the pedestrian flow analysis information corresponding to the pedestrian image.

In a possible implementation manner, the sequentially displaying the first monitoring schematic diagram corresponding to each frame of the pedestrian image to generate the monitoring picture of the target detection area may specifically be: taking a first monitoring schematic diagram corresponding to one frame of pedestrian image as an example, displaying the pedestrian flow analysis information obtained in the above S1042 on the pedestrian image, and encapsulating the pedestrian flow analysis information into the first monitoring schematic diagram, specifically referring to fig. 10, fig. 10 shows a monitoring picture effect schematic diagram provided in a sixth embodiment of the present application, at this time, a frame of first monitoring schematic diagram is displayed on the monitoring picture, on the first monitoring schematic diagram, each content in the pedestrian flow analysis information is distributed at a corresponding position in the pedestrian image, exemplarily, a pedestrian coordinate and a pedestrian speed (it should be understood that a pedestrian number of the pedestrian object may also be marked) of the pedestrian object in the pedestrian image are marked, and a total number of all the pedestrian objects in the pedestrian image is marked at an upper left corner of the pedestrian image; and forming a monitoring picture of the target detection area by using the first monitoring schematic diagrams corresponding to all the pedestrian images in the pedestrian image set.

In S802, the monitoring picture is updated in real time based on the second monitoring schematic diagram corresponding to the newly acquired pedestrian image.

In this embodiment, the newly acquired pedestrian image refers to a new pedestrian image that is continuously acquired by the depth camera after the pedestrian image set is acquired by the depth camera in S101, and a second monitoring schematic diagram corresponding to the newly acquired pedestrian image is determined. The real-time updating of the monitoring picture based on the second monitoring schematic diagram corresponding to the newly acquired pedestrian image is to keep real-time monitoring of the target detection area so as to obtain the people flow analysis information of the target detection area in each time period.

Fig. 11 is a schematic flow chart of a pedestrian flow analysis method according to an embodiment of the present application, and referring to fig. 11, the pedestrian flow analysis method according to the embodiment specifically includes acquiring a pedestrian image set of a target detection area by the depth camera; preprocessing the pedestrian images in the pedestrian image set; leading the preprocessed pedestrian image into the human body detection network, detecting and marking a human body area in the pedestrian image, and obtaining the human body detection image; converting the pixel coordinates of each pedestrian object in the human body detection image into world coordinates to obtain motion tracking data of the pedestrian object; based on the motion tracking data of the pedestrian object, obtaining various attribute characteristics (including position and speed) of the pedestrian object according to a preset pedestrian flow analysis template, and generating pedestrian flow analysis information of the target detection area; and generating a monitoring picture of the target detection area based on the people flow analysis information of the target detection area and the pedestrian image set of the target detection area so as to intuitively display the people flow condition of the target detection area.

It should be understood that the people flow analysis information of the target detection area can be uploaded to a server, so as to conveniently count the people flow condition of the target detection area.

Fig. 12 shows a schematic structural diagram of a human flow analysis apparatus provided in an embodiment of the present application, corresponding to the method described in the above embodiment, and only the parts related to the embodiment of the present application are shown for convenience of description.

Referring to fig. 12, the human flow analysis apparatus includes: the pedestrian image acquisition module is used for acquiring a pedestrian image set of the target detection area through the depth camera; the pedestrian image set comprises a plurality of frames of pedestrian images; the human body detection network module is used for importing the pedestrian images into a human body detection network and outputting human body detection images corresponding to the pedestrian images of each frame; the human body detection network is obtained through deep learning training; the motion tracking data determining module is used for determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images; a pedestrian flow analysis information generation module for generating pedestrian flow analysis information about the target detection area based on the motion tracking data of the plurality of pedestrian objects.

It should be noted that, for the information interaction, the execution process, and other contents between the above-mentioned apparatuses, the specific functions and the technical effects of the embodiments of the method of the present application are based on the same concept, and specific reference may be made to the section of the embodiments of the method, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 12 shows a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 12, the terminal device 12 of this embodiment includes: at least one processor 120 (only one processor is shown in fig. 12), a memory 121, and a computer program 122 stored in the memory 121 and executable on the at least one processor 120, the steps of any of the various method embodiments described above being implemented when the computer program 122 is executed by the processor 120.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A people stream analysis method based on a depth camera is characterized by comprising the following steps:

acquiring a pedestrian image set of a target detection area through a depth camera; the pedestrian image set comprises a plurality of frames of pedestrian images;

importing the pedestrian images into a human body detection network, and outputting human body detection images corresponding to each frame of pedestrian images; the human body detection network is obtained through deep learning training;

determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images;

generating pedestrian flow analysis information about the target detection area based on the motion tracking data of the number of pedestrian objects.

2. The method of claim 1, wherein the importing the pedestrian images into a human detection network and outputting human detection images corresponding to each frame of the pedestrian images comprises:

importing the pedestrian image into a human body detection network so that the human body detection network divides a plurality of human body areas in the pedestrian image based on the depth value of each pixel in the pedestrian image;

configuring different region identifiers for the human body regions based on the positions of the human body regions in the pedestrian image, and taking the pedestrian image marked with the human body regions and the region identifiers corresponding to the human body regions as the human body detection image.

3. The method according to claim 2, wherein the determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human detection images corresponding to the plurality of frames of consecutive pedestrian images comprises:

determining a human body gravity center position corresponding to the region identification based on a human body region in the human body detection image;

matching any two adjacent frames of human body detection images based on the human body gravity center position of the region identification and the human body characteristic information;

if two successfully matched region identifications exist in the two adjacent frames of human body detection images respectively, identifying the pedestrian objects which are associated with the same human body region and corresponding to the two region identifications;

and determining motion tracking data of the pedestrian object according to the human body area associated with the human body detection image of the pedestrian object in each frame.

4. The method of claim 3, wherein before determining the position of the center of gravity of the human body corresponding to the region identifier based on the region of the human body in the human body detection image, further comprising:

calibrating the depth camera to obtain internal parameters of the depth camera;

calculating pixel coordinates of the gravity center of the human body based on the depth values of the pixels in the human body area;

converting the pixel coordinates into three-dimensional coordinates based on internal parameters of the depth camera and depth values of the human body gravity center within the human body detection image;

and identifying the three-dimensional coordinates of the gravity center of the human body as the position of the gravity center of the human body corresponding to the region identification.

5. The method of claim 1, wherein before the step of importing the pedestrian images into a human body detection network and outputting the human body detection image corresponding to each frame of the pedestrian images, the method further comprises:

preprocessing the pedestrian image, including:

carrying out contour extraction on the pedestrian image based on the depth value of each pixel in the pedestrian image to obtain a contour line of the pedestrian image;

performing foreground extraction on the pedestrian image based on a background image and the contour line to obtain a foreground region of the pedestrian image; the background image is determined after the pedestrian image set is imported into a background modeling algorithm or acquired based on the depth camera;

normalizing the depth value of each pixel in the foreground area, and obtaining a preprocessed pedestrian image according to the normalized depth value of each pixel in the foreground area; the foreground area is used for the human body detection network to determine the human body area according to the depth value of each pixel in the foreground area after normalization.

6. The method of any one of claims 1-5, wherein generating people stream analysis information about the target detection area based on the motion tracking data of the number of pedestrian objects comprises:

respectively determining the pedestrian coordinates and the pedestrian speed of the pedestrian objects in each frame of the pedestrian image in the target detection area based on the motion tracking data of the pedestrian objects;

and leading the pedestrian coordinates and the pedestrian speed into a preset pedestrian flow analysis template to obtain the pedestrian flow analysis information.

7. The method of claim 6, wherein after generating the pedestrian flow analysis information about the target detection area based on the motion tracking data of the number of pedestrian objects, further comprising:

sequentially displaying a first monitoring schematic diagram corresponding to each frame of pedestrian image to generate a monitoring picture of the target detection area; the first monitoring schematic diagram comprises the pedestrian image and the pedestrian flow analysis information corresponding to the pedestrian image;

and updating the monitoring picture in real time based on a second monitoring schematic diagram corresponding to the newly acquired pedestrian image.

8. A people flow analysis device based on a depth camera, comprising:

the pedestrian image acquisition module is used for acquiring a pedestrian image set of a related target detection area through the depth camera; the pedestrian image set comprises a plurality of frames of pedestrian images;

the human body detection network module is used for importing the pedestrian images into a human body detection network and outputting human body detection images corresponding to the pedestrian images of each frame; the human body detection network is obtained through deep learning training;

the motion tracking data determining module is used for determining motion tracking data of a plurality of pedestrian objects contained in the target detection area based on the human body detection images corresponding to the plurality of continuous pedestrian images;

a pedestrian flow analysis information generation module for generating pedestrian flow analysis information about the target detection area based on the motion tracking data of the plurality of pedestrian objects.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.