CN108986064B

CN108986064B - People flow statistical method, equipment and system

Info

Publication number: CN108986064B
Application number: CN201710399814.0A
Authority: CN
Inventors: 宋涛; 谢迪; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2022-05-06
Anticipated expiration: 2037-05-31
Also published as: CN108986064A

Abstract

The embodiment of the invention provides a people flow statistical method, a device and a system, wherein the people flow statistical method comprises the following steps: acquiring continuous frame images acquired by image acquisition equipment; inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence distribution graph of each frame image in the continuous frame images; aiming at the human head confidence coefficient distribution graph of each frame of image, a preset target determining method is adopted to determine at least one human head detection target in each frame of image; acquiring and carrying out target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and distributing a tracking identifier for the tracking target; and counting the number of all tracking identifications to obtain a people flow counting result. The invention can improve the accuracy and the operational efficiency of the people flow statistics.

Description

People flow statistical method, equipment and system

Technical Field

The invention relates to the technical field of machine vision, in particular to a people flow statistical method, equipment and a system.

Background

With the continuous progress of society, the application range of the video monitoring system is wider and wider. The flow of people entering and leaving places such as supermarkets, malls, gymnasiums, airports, stations and the like is of great significance to operators or managers in the places, and the operation work of public activity areas can be effectively monitored and organized in real time by counting the flow of people. In traditional video monitoring, people flow statistics is mainly achieved through manual counting of monitoring personnel, the achieving method is reliable under the conditions of short monitoring time and sparse people flow, however, due to the limitation of biological characteristics of human eyes, when the monitoring time is long and the people flow is dense, the accuracy of statistics is greatly reduced, and a large amount of labor cost is consumed in a manual statistics mode.

Aiming at the problems, the related people flow statistical method adopts a plurality of classifiers connected in parallel to detect the head of the current image and determine each head in the current image; tracking each determined human head to form a human head target motion track; and counting the flow of people in the direction of the motion track of the human head target.

As the detection sequence of various classifiers is required to be set in the detection process of the parallel structure of the multi-class classifiers, the classifiers are sequentially adopted to carry out head detection on the current image according to the detection sequence, the selection of the classifiers directly influences the accuracy of the head counting, and the training samples of the parallel multi-class classifiers need to respectively collect and mark positive and negative samples of multiple classes and mark a head target frame according to the classes and specific scenes of the classifiers, so that the complexity of head target identification is too high, and the operational efficiency of the head counting is influenced.

Disclosure of Invention

The embodiment of the invention aims to provide a people flow rate statistical method, equipment and a system so as to improve the accuracy and the operational efficiency of people flow rate statistics. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a people flow rate statistical method, where the method includes:

acquiring continuous frame images acquired by image acquisition equipment;

inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence distribution graph of each frame image in the continuous frame images;

aiming at the human head confidence coefficient distribution graph of each frame of image, a preset target determining method is adopted to determine at least one human head detection target in each frame of image;

acquiring and carrying out target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and distributing a tracking identifier for the tracking target;

and counting the number of all tracking identifications to obtain a people flow counting result.

In a second aspect, an embodiment of the present invention provides a people flow rate statistics apparatus, where the apparatus includes:

the first acquisition module is used for acquiring continuous frame images acquired by the image acquisition equipment;

the convolution module is used for inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence distribution map of each frame image in the continuous frame images;

the human head detection target determining module is used for determining at least one human head detection target in each frame of image by adopting a preset target determining method according to the human head confidence coefficient distribution diagram of each frame of image;

the tracking identifier distribution module is used for acquiring and carrying out target association on the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness of any human head detection target in each frame of image to obtain a tracking target and distributing a tracking identifier to the tracking target;

and the counting module is used for counting the number of all tracking identifications to obtain a people flow counting result.

In a third aspect, an embodiment of the present invention provides a people flow rate statistical system, where the system includes:

the image acquisition equipment is used for acquiring continuous frame images;

a processor for acquiring successive frame images acquired by the image acquisition device; inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence distribution graph of each frame image in the continuous frame images; aiming at the human head confidence coefficient distribution graph of each frame of image, a preset target determining method is adopted to determine at least one human head detection target in each frame of image; acquiring and carrying out target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and distributing a tracking identifier for the tracking target; and counting the number of all tracking identifications to obtain a people flow counting result.

According to the method, the device and the system for counting the pedestrian volume, the collected video continuous frame images are input into the trained full convolution neural network to generate the head confidence degree distribution map corresponding to each frame of image, the head detection target in each frame of image is determined according to the head confidence degree distribution map, the tracking identification is distributed to the associated tracking target through the association of the current frame and the target of the previous frame, and finally the number of all the tracking identifications is counted, so that the pedestrian volume counting result is obtained. The full convolution neural network obtained through training can be used for extracting essential features of the human head and improving the accuracy of the human flow statistics, and the human head detection target can be determined by generating the human head confidence coefficient distribution map only through one full convolution neural network, so that the complexity of human head target recognition is reduced, and the computational efficiency of the human flow statistics is improved. Compared with a method based on feature point tracking, the embodiment of the invention only needs to record the tracking identification, so that the human head target can be stably tracked, and the counting precision is improved; compared with a method based on human body segmentation and tracking, the method and the device provided by the embodiment of the invention can record the tracking identifier, and the recording of the tracking identifier is not influenced by shielding, so that the precision is higher.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a people flow rate statistical method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a full convolution neural network according to an embodiment of the present invention;

FIG. 3 is a human head confidence profile of an embodiment of the present invention;

FIG. 4 is another schematic flow chart of a people flow statistical method according to an embodiment of the present invention;

FIG. 5 is a diagram of the true human head confidence distribution according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of another fully convolutional neural network according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a human head target of the embodiment of the invention;

FIG. 8 is a schematic view of a tracking area according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a people flow rate statistic device according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a people flow rate statistic device according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a people flow rate statistical system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve accuracy and operational efficiency of people flow statistics, the embodiment of the invention provides a people flow statistics method, equipment and a system.

First, a people flow rate statistical method provided by the embodiment of the present invention is described below.

The execution main body of the people flow rate statistical method provided in the embodiment of the present invention may be a Processor with a core processing chip, for example, a Processor with a core processing chip such as a DSP (Digital Signal Processor), an ARM (Advanced Reduced Instruction Set Computer microprocessor), or an FPGA (Field Programmable Gate Array), and the execution main body may also be an image capture device including a Processor. The manner of implementing the people flow statistical method provided by the embodiment of the invention can be software, hardware circuit and/or logic circuit arranged in the execution main body.

As shown in fig. 1, a people flow rate statistical method provided in the embodiment of the present invention may include the following steps:

s101, acquiring continuous frame images acquired by image acquisition equipment.

The image capturing device may be a video camera with a video shooting function, or may also be a camera with a continuous shooting function, and of course, the image capturing device is not limited thereto. When the image acquisition equipment is a camera, the camera shoots a video in a period of time, and the video is composed of a plurality of continuous frame images; when the image acquisition equipment is a camera, the camera can continuously shoot, each time of shooting obtains one image, and a series of images shot by the camera can be used as continuous frame images according to the shooting sequence. If the image acquisition equipment only acquires one image, the pedestrian volume in the image can be counted in a head recognition mode, however, the situation of shielding, blurring and the like exists when only one image is acquired, so that the counted pedestrian volume is inconsistent with the actual situation and has a certain error.

And S102, inputting the continuous frame images into the trained full convolution neural network to generate a human head confidence distribution graph of each frame image in the continuous frame images.

Because the full convolution neural network has the capability of automatically extracting the essential characteristics of the human head, and the network parameters of the full convolution neural network can be obtained through the process of sample training. Therefore, the trained full convolution neural network can ensure that the human head targets of various new samples with the characteristics of dark hair, light hair, whether to wear a hat or not and the like can be quickly identified, so that the real human head targets can be obtained to a greater extent, and the accuracy of the human flow statistics is improved. As shown in fig. 2, in the embodiment of the present invention, the full convolution neural network is formed by arranging a plurality of convolution layers and a plurality of down-sampling layers at intervals, the obtained continuous frame images are input into the full convolution neural network, and the head features of each frame image are subjected to feature extraction by the full convolution neural network, so that a head confidence distribution diagram of each frame image can be obtained, as shown in fig. 3, a bright spot in the diagram is a head confidence. Wherein, the human head confidence distribution diagram can be understood as: a distribution map of the probability that the detected target is a human head target. The parameter in the head confidence map may be a specific probability value that the target in each identified region is the head target, where the identified region is a region related to the position and size of the target, and the area of the region may be greater than or equal to the actual size of the target in general; the pixel value of the pixel point may also be used to represent the size of the probability, and the larger the pixel value of each pixel point in the region is, the larger the probability that the target in the region is the human head target is, and of course, the specific parameters of the human head confidence degree distribution map in the embodiment of the present invention are not limited thereto.

S103, aiming at the head confidence degree distribution graph of each frame of image, a preset target determining method is adopted to determine at least one head detection target in each frame of image.

Because the human head confidence level distribution diagram of each frame of image obtained by the full convolution neural network contains the probability that the target of each identified region is the human head target and all the targets may contain the targets of non-human heads, a preset target determination method is adopted for the human head confidence level distribution diagram of each frame of image to determine the accurate human head detection target of the frame of image from the distribution diagram, wherein the preset target determination method can set a threshold value, determine the region corresponding to the probability as the human head detection target when the probability in the human head confidence level distribution diagram is greater than the threshold value, or determine the region as the human head detection target when each pixel value in the region is greater than a preset pixel value according to the pixel value of each pixel point, or determine the region as the human head detection target when the confidence level of each pixel point is greater than a preset confidence level threshold value, or when the average value of the confidence degrees of the pixel points is greater than a preset confidence degree threshold value, the region is determined as the human head detection target, of course, the specific method for determining the human head detection target is not limited to this, and a threshold processing method may be adopted for the convenience of implementation.

Optionally, the step of determining at least one human head detection target in each frame of image by using a preset target determination method for the human head confidence level distribution map of each frame of image may include:

the method comprises the following steps of firstly, aiming at a head confidence distribution diagram of each frame of image, determining the position of the central point of at least one detection target by adopting a non-maximum suppression method.

In the human head confidence distribution diagram of each frame of image, the confidence maximum value point represents the position of the central point of each detection target, the non-zero points spatially gathered on the distribution diagram represent the area where the detection target is located, the human head confidence distribution diagram adopts non-maximum value inhibition, and the maximum value in the area is searched by inhibiting the elements which are not the maximum value, so that the position of the central point of each detection target can be obtained. The formation of the region is related to the confidence of each pixel point, and because two targets are close to each other and influence of factors such as background objects may exist, the region may have deviation with an actual detection target, but the maximum point of the confidence represents a detection target central point, and the human head is a circular target, so that after the position of the central point is determined, the human head can be determined as a detection target in a certain neighborhood of the central point, and the accuracy of human head detection can be improved by determining the position of the central point.

And secondly, obtaining the confidence degrees of all pixel points in the neighborhood of the central point of each detection target.

The neighborhood of the central point of the detection target can be determined as a detection target, and the size of the neighborhood can be determined according to the statistical analysis of the radius of the human head, namely, an average value counted by the size of the actual radius of the human head or a value obeying a preset distribution can be obtained. Since the confidence of all the pixel points in the neighborhood of the center point of the detection target is higher, the probability that the detection target is the human head detection target is higher, and therefore, in this embodiment, the confidence of all the pixel points in the neighborhood needs to be obtained.

And thirdly, determining the detection target with the confidence coefficient of each pixel point being larger than a preset confidence coefficient threshold value as the human head detection target of the frame image.

Since the greater the confidence of all the pixel points in the neighborhood of the detection target center point, the greater the probability that the detection target is the human head detection target, in this embodiment, a preset confidence threshold is preset, and when the confidence of all the pixel points is greater than the preset confidence threshold, it may be determined that the detection target is the human head detection target of the frame of image. The preset confidence threshold may be set according to experience, requirements, or multiple test results, for example, the preset confidence may be set to 85%, and if the confidence of all the pixel points in the neighborhood of the center point of the detection target is greater than 85%, the detection target may be determined to be a human head detection target. For another example, the preset confidence level may be set to 91%, or other values, which are not limited herein.

Compared with other methods for determining the human head detection target, the method limits that the confidence degrees of all the pixel points are required to be larger than the preset confidence degree threshold value, and further ensures the accuracy of the human head detection target.

And S104, performing target association on the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and distributing a tracking identifier for the tracking target.

After the head detection target in each frame of image is determined, firstly, the feature matching of the head detection target is carried out on each frame of image, namely, the feature of each head detection target in each frame of image is obtained through the feature matching, and then the head detection targets with the same feature in different frames of images can be determined so as to track the head detection targets with the same feature; meanwhile, smoothness analysis of the human head detection target is carried out on each frame of image, namely the motion smoothness of each human head detection target in each frame of image is obtained through the smoothness analysis, wherein the motion smoothness refers to the motion trend of the human head detection target with the same characteristics in the continuous frame of image, and if the motion trend of the human head detection target has larger jumps, the human head detection target is possibly misdetected. And then, according to the feature matching result and the motion smoothness of each human head detection target in each acquired frame image, aiming at each human head detection target, the target tracking is realized by using the association between the current frame and the target of the previous frame, further tracking targets in the human head detection targets can be determined, and tracking marks are distributed to the tracking targets for tracking the tracking targets according to the tracking marks. All human head targets can be determined as tracking targets, targets with high motion smoothness in human head detection targets can be determined as tracking targets, accuracy of target tracking can be guaranteed by once screening according to the motion smoothness, and efficiency of human flow statistics is improved; for each human head detection target, the step of associating the target of the current frame with the target of the previous frame may be performed synchronously or not, and the target tracking result is not obviously affected, and is not specifically limited herein.

Optionally, the step of obtaining and performing target association between the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and allocating a tracking identifier to the tracking target may include:

firstly, performing feature matching and smoothness analysis on any human head detection target of each frame of image in continuous video frame images to obtain a feature matching result and motion smoothness of the human head detection target.

After the head detection target in each frame of image is determined, the head detection target in each frame of image can be subjected to feature matching, namely the features of the head detection target are determined, and the head detection targets with the same features in each frame of image are matched; and then, performing smoothness analysis on the human head detection target in each frame of image, namely determining the motion smoothness of the human head detection target, wherein the motion smoothness refers to the motion trend of the human head detection target with the same characteristics in the continuous frame of images, and if the motion trend of the human head detection target has larger jumps, the human head detection target is possibly misdetected. Therefore, in order to track different human head detection targets in consecutive frame images respectively and improve the tracking accuracy, in this embodiment, a feature matching result and a motion smoothness of the human head detection target need to be obtained through feature matching and smoothness analysis.

And secondly, performing target association on the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness, and determining the human head detection target as a tracking target when the feature matching degree is higher than a preset matching degree threshold value and the motion smoothness is higher than a preset smoothness threshold value.

For example, after feature matching and smoothness analysis are performed on a human head detection target, based on an obtained feature matching result and motion smoothness, it can be determined that human head detection targets with high feature matching degree and high motion smoothness in previous and subsequent frame images are the same human head target, and then the human head detection target with the features can be determined as a tracking target. For example, a preset matching degree threshold is set to 88% or 83%; the preset smoothness threshold is the inner product of the vector of the motion speed and the vector of the motion direction of the human head detection target, represents the motion consistency of the human head detection target, and the larger the preset smoothness threshold is, the higher the requirement on the motion consistency of the human head detection target is.

And thirdly, distributing a tracking identifier for the tracking target.

After the tracking target is determined, a tracking identifier may be allocated to the tracking target, so as to accurately track the targets with different tracking identifiers respectively. It can be understood that, for each human head detection target, the above steps are performed to determine different tracking targets in consecutive frame images, so as to achieve tracking of different tracking targets.

And S105, counting the number of all tracking identifications to obtain a people flow counting result.

Because different tracking marks represent different tracking targets, and each tracking target is an accurate human head, the human flow in the currently acquired continuous frame images can be determined by counting the number of all the tracking marks.

Optionally, the step of counting the number of all tracking identifiers to obtain a people flow statistic result may include:

firstly, determining the motion track direction of the people stream according to the continuous frame images.

In the continuous frame images, the position of each tracking object in different frame images may be changed, and such a change causes the tracking object to generate a motion track between different frame images, and in a fixed scene, the motion track directions of different tracking objects are substantially consistent, for example, in a street scene, the object generally moves along the street direction. Therefore, by analyzing the position of the tracking target in the continuous frame images, the direction of the motion trajectory of the human flow can be determined.

And secondly, determining a detection line which is vertical to the motion trail direction of the people stream in the continuous frame images.

In a fixed scene, the targets are generally in a motion state, in order to reduce a tracking error caused when two targets are staggered in a tracking process, a detection line can be arranged in continuous frame images, the direction of the detection line is generally perpendicular to the direction of a people flow motion track, and the detection line can be used as a detection condition to carry out people flow statistics.

And thirdly, recording the tracking identification corresponding to any tracking target when the tracking target passes through the detection line.

And fourthly, counting the number of the tracking identifications corresponding to all the tracking targets passing through the detection line to obtain a people flow counting result.

When a tracking target passes through the detection line, the tracking identification corresponding to the tracking target can be recorded, and the total number of tracking targets passing through the detection line indicates the number of people flowing in the continuous frame images.

Optionally, before the step of obtaining and performing target association between the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and allocating a tracking identifier to the tracking target, the method may further include:

and according to the preset statistical conditions of the pedestrian flow, at least one tracking area is defined for the continuous frame images.

The preset people flow statistical conditions can be as follows: the application requirements of the people flow statistics, such as the people flow in and out of the gate needs to be counted, the people flow in a waiting hall needs to be counted, the people flow in a ticket selling hall needs to be counted, etc., for example, the tracking area divided according to the preset people flow statistics conditions may be within 5 meters in front of and behind the gate, or within 2 meters near the gate and ticket gate of the waiting hall, or within 10 meters in front of each ticket gate in the ticket selling hall, etc. Due to the fact that aiming at the scenes with large and complex pedestrian volume, by means of the pedestrian volume statistical method, after the detection lines are set, part of targets may not pass through the detection lines, and therefore errors of the pedestrian volume statistical result are too large. Therefore, in order to reduce the error of the people flow rate statistical result, a single or a plurality of tracking areas can be defined according to different preset people flow rate statistical conditions, and people flow rates are counted in the tracking areas, so that each tracking target in the continuous frame images can pass through the corresponding detection line. The tracking area of the continuous frame image can be defined according to engineering experience, and a specific tracking area with a simple background can be selected for detection, tracking and statistics, so that the interference of a complex background is eliminated, and the accuracy of the people flow statistics is further improved; in addition, by dividing the tracking area, the calculation amount of matching tracking is reduced, and the real-time performance of the method execution is improved.

in any tracking area, acquiring and carrying out target association on the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and respectively allocating tracking identification to the tracking target.

After a tracking area is defined for continuous frame images, each human head detection target in each tracking area in each frame image can be subjected to feature matching by using the target association of the current frame and the previous frame, namely the feature of each human head detection target is determined, and the human head detection targets with the same feature in each frame image are matched; and performing smoothness analysis on each human head detection target in each tracking area in each frame of image, namely determining the motion smoothness of each human head detection target, wherein the motion smoothness refers to the motion trend of human head detection targets with the same characteristics in continuous frame images, and if the motion trend of the human head detection targets has larger jumps, the human head detection targets are possibly misdetected. And performing target association between the current frame and the previous frame on the human head detection target in each frame of image through feature matching and smoothness analysis, determining the human head detection target with high smoothness as a tracking target, namely the human head detection target can be used as a target of the human head target, and distributing tracking identifiers for the tracking targets for tracking the tracking targets in a tracking area according to the tracking identifiers.

Optionally, the step of counting the number of all tracking identifiers to obtain a people flow statistic result may further include:

in any tracking area, determining a detection line vertical to the direction of the people stream motion trail;

then, counting the number of the tracking identifications corresponding to all the tracking targets passing through the detection line in the tracking area to obtain a people flow counting result.

In order to reduce the tracking error caused by the intersection of two targets in the tracking process, one detection line can be arranged in any tracking area of continuous frame images, one or more detection lines can be obtained, the number of the detection lines is the same as that of the defined tracking areas, the direction of each detection line is generally perpendicular to the direction of the motion trajectory of the stream of people in the tracking area where the detection line is located, and the detection lines can be used as detection conditions to carry out people flow statistics. When a tracking target in any tracking area passes through the detection line, the tracking identification corresponding to the tracking target can be recorded, and the total number of tracking targets passing through the detection line indicates the number of people flowing in the tracking area. And determining the total pedestrian volume in the continuous frame images by counting the statistical results of the pedestrian volume in all the tracking areas.

By applying the embodiment, the acquired continuous frame images of the video are input into the trained full convolution neural network to generate the head confidence distribution map corresponding to each frame image, the head detection target in each frame image is determined according to the head confidence distribution map, the tracking identification is distributed to the associated tracking target through the association of the current frame and the target of the previous frame, and finally the number of all the tracking identifications is counted, so that the people flow counting result is obtained. The full convolution neural network obtained through training can be used for extracting essential features of the human head and improving the accuracy of the human flow statistics, and the human head detection target can be determined by generating the human head confidence coefficient distribution map only through one full convolution neural network, so that the complexity of human head target recognition is reduced, and the computational efficiency of the human flow statistics is improved. Compared with a method based on feature point tracking, the embodiment can stably track the human head target only by recording the tracking identification, thereby improving the counting precision; compared with a method based on human body segmentation and tracking, the method has the advantages that the tracking identification can be recorded, the recording of the tracking identification is not influenced by shielding, and the precision is higher.

Based on the embodiment shown in fig. 1, as shown in fig. 4, another people flow rate statistical method provided in the embodiment of the present invention may further include, before S102, the following steps:

s401, acquiring a preset training set sample image and the central position of each human head target in the preset training set sample image.

Before the operation of the full convolution neural network is performed, the full convolution neural network needs to be constructed, since the network parameters of the full convolution neural network are obtained through training, the training process can be understood as a learning process of the human head targets in various preset modes, such as learning the characteristics of dark-colored hair, learning the characteristics of light-colored hair, learning whether to wear a hat, and the like. The method includes the steps that preset training set sample images are required to be constructed according to various human head features, each image corresponds to different human head features, and the human head is always subjected to circular Gaussian distribution, so that the center position of a human head target needs to be obtained, and the center position can be calibrated.

S402, generating a human head confidence coefficient distribution true value image of the preset training set sample image according to a preset distribution law and the central position of each human head target in the preset training set sample image.

The preset distribution law is probability distribution obeyed by the confidence of the human head target, and the confidence of the human head target generally obeys circular gaussian distribution. As shown in fig. 5, the preset training set sample image shown in the left side diagram is operated with gaussian kernel phase to obtain the human head confidence coefficient distribution truth value diagram shown in the right side diagram, and it can be seen from the human head confidence coefficient distribution truth value diagram that each bright point corresponds to each human head target in the preset training set sample image. Assuming that the central position of each human head target in the calibration image is P_hThe confidence of the human head target obeys the circular Gaussian distribution N_hThen, according to the formulas (1) and (2), a human head confidence coefficient distribution true value graph is obtained.

Wherein, p represents any pixel position coordinate on the human head confidence coefficient distribution true value graph; d (p) representing the human head confidence coefficient at the position coordinate of p on the human head confidence coefficient distribution true value graph; sigma_hRepresenting a circular Gaussian distribution N_hThe variance of (a); h represents a human head; p_hRepresenting the center position of each human head target; n is a radical of_hRepresenting a circular Gaussian distribution obeyed by the confidence of the human head target; equation (2) indicates that the center position of the calibrated human head target has the highest confidence level of 1.0, and the confidence level decreases to 0 toward the edge.

And S403, inputting the preset training set sample image into the initial full-convolution neural network to obtain a human head confidence coefficient distribution map of the preset training set sample image.

And the network parameters of the initial full-convolution neural network are preset values. The human head confidence coefficient distribution diagram of the preset training set sample image can be obtained through the initial full convolution neural network, the human head confidence coefficient distribution diagram is used for being compared with the human head confidence coefficient distribution true value diagram, the human head confidence coefficient distribution diagram is made to be close to the human head confidence coefficient distribution true value diagram through continuous training learning and network parameter updating, and the full convolution neural network is determined to be the trained full convolution neural network capable of carrying out the human flow statistics when the human head confidence coefficient distribution diagram is close enough.

Optionally, the fully convolutional neural network may further include: convolutional layers, downsampling layers, and deconvolution layers.

The full convolution neural network usually includes at least one convolution layer and at least one down-sampling layer, the anti-convolution layer is an optional layer, in order to make the resolution of the obtained feature map the same as the resolution of the input preset training set sample image, so as to reduce the step of conversion of image compression ratio, and facilitate the calculation of human head confidence, after the last convolution layer, an anti-convolution layer can be set.

Optionally, the step of inputting the preset training set sample image into the initial fully-convolutional neural network to obtain the human head confidence level distribution map of the preset training set sample image may include:

firstly, inputting a preset training set sample image into an initial full convolution neural network, and extracting the characteristics of the preset training set sample image through a network structure in which convolution layers and down-sampling layers are arranged alternately.

And secondly, upsampling the features through the deconvolution layer until the resolution is the same as that of the preset training set sample image, and obtaining an upsampled result.

Inputting a preset training set sample image into an initial full-convolution neural network, as shown in fig. 6, sequentially extracting features from a lower layer to a higher layer by using a series of convolutional layers and downsampling layers, wherein the series of convolutional layers and downsampling layers are arranged at intervals. And then connecting the deconvolution layer to up-sample the features to the size of the input preset training set sample image.

And thirdly, calculating the result by using the 1 multiplied by 1 convolutional layer to obtain a human head confidence distribution graph with the same resolution as the preset training set sample image.

In order to ensure that the resolution of the human head confidence level distribution graph has the same resolution as that of an input preset training set sample image, the up-sampled result can be operated through a convolution layer, the size of the convolution kernel of the convolution layer can be 1 × 1, 3 × 3 or 5 × 5, and the like, but in order to accurately extract the characteristic of one pixel point, the size of the convolution kernel of the convolution layer can be selected to be 1 × 1, the human head confidence level distribution graph can be obtained through the operation of the convolution layer, and each pixel point on the obtained human head confidence level distribution graph represents the human head confidence level corresponding to the image position.

S404, calculating the average error between the human head confidence coefficient distribution graph of the preset training set sample image and the human head confidence coefficient distribution true value graph of the preset training set sample image.

S405, when the average error is larger than a preset error threshold value, updating network parameters according to the average error and a preset gradient operation strategy to obtain an updated full convolution neural network; calculating the average error between the human head confidence coefficient distribution graph of the preset training set sample image obtained by the updated full convolution neural network and the human head confidence coefficient distribution true value graph of the preset training set sample image; and determining the corresponding full convolution neural network as the trained full convolution neural network until the average error is less than or equal to the preset error threshold.

The full convolution neural network is trained by adopting a classical back propagation algorithm, a preset gradient operation strategy can be a common gradient descent method or a random gradient descent method, the gradient descent method uses a negative gradient direction as a search direction, the closer to a target value, the smaller the step length and the slower the forward speed, and because the random gradient descent method only uses one sample each time, the speed of one iteration is much higher than that of gradient descent. Therefore, in order to improve the operation efficiency, the present embodiment may adopt a random gradient descent method to update the network parameters. In the training process, the average error between a human head confidence coefficient distribution graph output after a preset training set sample image passes through a full convolution neural network and a human head confidence coefficient distribution true value graph is calculated, as shown in formula (3), the network parameters of the full convolution neural network are updated by the average error, and the above process is iterated until the average error is not reduced, wherein the network parameters of the full convolution neural network comprise convolution kernel parameters and offset parameters of a convolution layer.

Wherein L is_D(theta) represents the average error between the human head confidence distribution graph output by the network and the human head confidence distribution true value graph; d represents a human head confidence coefficient distribution true value graph obtained by the formula (1); theta represents a network parameter of the full convolution neural network; n represents the number of sample images of a preset training set; f_d(X_i(ii) a Theta) represents a human head confidence distribution diagram output by forward calculation of the full convolution neural network obtained by training; x_iAn input image, numbered i, representing input to a network; i represents an image number; d_iRepresents X_iAnd (4) a corresponding human head confidence coefficient distribution true value graph.

By applying the embodiment, the acquired continuous frame images of the video are input into the trained full convolution neural network to generate the head confidence distribution map corresponding to each frame image, the head detection target in each frame image is determined according to the head confidence distribution map, the tracking identification is distributed to the associated tracking target through the association of the current frame and the target of the previous frame, and finally the number of all the tracking identifications is counted, so that the people flow counting result is obtained. By adopting the trained full convolution neural network, the essential characteristics of the human head can be extracted, the accuracy of the human flow statistics is improved, and the human head detection target can be determined by generating the human head confidence coefficient distribution map by only utilizing one full convolution neural network, so that the complexity of human head target recognition is reduced, and the computational efficiency of the human flow statistics is improved. Compared with a method based on feature point tracking, the embodiment can stably track the human head target only by recording the tracking identification, thereby improving the counting precision; compared with a method based on human body segmentation and tracking, the method has the advantages that the tracking identification can be recorded, the recording of the tracking identification is not influenced by shielding, and the precision is higher. In the training process of the full convolution neural network, a preset training set sample image is set for human head targets with different characteristics, the full convolution neural network obtained through training and iteration of the preset training set sample image has strong generalization capability, a complex classifier cascade mode is avoided, and the structure is simpler.

The people flow rate statistical method provided by the embodiment of the invention is described below by combining specific application examples.

Aiming at a scene of an intersection, video images are collected through a camera, one frame of image in the video images is input into a full convolution neural network obtained through training, and the head confidence of the frame of image is obtained; for the human head confidence coefficient distribution graph of the frame image, non-maximum value suppression is adopted, the position of the central point of each detection target is determined, the confidence coefficient of pixel points in the neighborhood of the central point of the detection target is greater than a preset confidence coefficient threshold value, and the human head detection target is determined, as shown by a black frame in the human head detection target schematic diagram shown in fig. 7.

Then, a tracking area composed of a line a, a line B and a line C as shown in fig. 8 is defined for the frame image, and it can be seen from fig. 8 that a detection line D is arranged in the tracking area, the detection line D is perpendicular to the movement direction of the people stream, and when a tracking target passes through the detection line, the tracking identification of the tracking target is recorded. In the tracking area, when 8 people pass through the detection line currently, the statistical pedestrian volume is 8 people.

Compared with the related technology, the scheme includes that collected video continuous frame images are input into a trained full convolution neural network to generate a head confidence degree distribution map corresponding to each frame of image, a head detection target in each frame of image is determined according to the head confidence degree distribution map, tracking identifications are distributed to the associated tracking targets through association of the current frame and the target of the previous frame, and finally the number of all the tracking identifications is counted to obtain a people flow counting result. The full convolution neural network obtained through training can be used for extracting essential features of the human head and improving the accuracy of the human flow statistics, and the human head detection target can be determined by generating the human head confidence coefficient distribution map only through one full convolution neural network, so that the complexity of human head target recognition is reduced, and the computational efficiency of the human flow statistics is improved. Compared with a method based on characteristic point tracking, the method only needs to record the tracking identification, so that the human head target can be stably tracked, and the counting precision is improved; compared with a method based on human body segmentation and tracking, the method has the advantages that the tracking identification can be recorded, the recording of the tracking identification is not influenced by shielding, and the precision is higher.

Corresponding to the above embodiments, an embodiment of the present invention provides a people flow rate statistics device, and as shown in fig. 9, the people flow rate statistics device may include:

a first obtaining module 910, configured to obtain consecutive frame images collected by an image collecting device;

a convolution module 920, configured to input the continuous frame image into a trained full convolution neural network, and generate a human head confidence distribution map of each frame image in the continuous frame image;

a human head detection target determining module 930, configured to determine, by using a preset target determining method, at least one human head detection target in each frame of image according to the human head confidence level distribution map of each frame of image;

a tracking identifier allocating module 940, configured to acquire and perform target association between a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image, to obtain a tracking target, and allocate a tracking identifier to the tracking target;

and the counting module 950 is configured to count the number of all tracking identifiers to obtain a people flow counting result.

Optionally, the human head detection target determining module 930 may be specifically configured to:

aiming at the human head confidence degree distribution graph of each frame of image, determining the position of the central point of at least one detection target by adopting a non-maximum suppression method;

obtaining the confidence of all pixel points in the neighborhood of the central point of each detection target;

and determining the detection target with the confidence coefficient of each pixel point being larger than the preset confidence coefficient threshold value as the human head detection target of the frame image.

Optionally, the tracking identifier allocating module 940 may be specifically configured to:

performing feature matching and smoothness analysis on any human head detection target of each frame of image in the continuous frame images of the video to obtain a feature matching result and motion smoothness of the human head detection target;

performing target association on the current frame and the previous frame of the current frame according to the feature matching result and the motion smoothness, and determining the human head detection target as a tracking target when the feature matching degree is higher than a preset matching degree threshold value and the motion smoothness is higher than a preset smoothness threshold value;

and allocating a tracking identifier to the tracking target.

Optionally, the statistical module 950 may be specifically configured to:

determining the direction of the people stream motion track according to the continuous frame images;

determining a detection line which is vertical to the direction of the people stream motion track in the continuous frame images;

when any tracking target passes through the detection line, recording a tracking identifier corresponding to the tracking target;

and counting the number of the tracking identifications corresponding to all the tracking targets passing through the detection line to obtain a people flow counting result.

Optionally, the apparatus may further include:

the defining module is used for defining at least one tracking area for the continuous frame images according to a preset pedestrian flow statistical condition;

the tracking identifier allocating module 940 may be further configured to:

in any tracking area, acquiring and carrying out target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and respectively allocating tracking identifications to the tracking target;

the statistical module 950 may be specifically configured to:

and counting the number of tracking identifications corresponding to all tracking targets passing through the detection line in the tracking area to obtain a people flow counting result.

It should be noted that the people flow rate statistical apparatus according to the embodiment of the present invention is an apparatus applying the people flow rate statistical method, and all embodiments of the people flow rate statistical method are applicable to the apparatus and can achieve the same or similar beneficial effects.

Further, on the basis of the first obtaining module 910, the convolution module 920, the human head detection target determining module 930, the tracking identifier allocating module 940 and the statistic module 950, as shown in fig. 10, the human traffic statistic device provided in the embodiment of the present invention may further include:

a second obtaining module 1010, configured to obtain a preset training set sample image and a center position of each human head target in the preset training set sample image;

a generating module 1020, configured to generate a human head confidence coefficient distribution true value map of the preset training set sample image according to a preset distribution law and a central position of each human head target in the preset training set sample image;

an extracting module 1030, configured to input the preset training set sample image into an initial full-convolution neural network, so as to obtain a human head confidence distribution map of the preset training set sample image, where a network parameter of the initial full-convolution neural network is a preset value;

a calculating module 1040, configured to calculate an average error between the human head confidence distribution map of the preset training set sample image and the human head confidence distribution true value map of the preset training set sample image;

a cycle module 1050, configured to update a network parameter according to the average error and a preset gradient operation policy when the average error is greater than a preset error threshold, so as to obtain an updated full convolution neural network; calculating the average error between the human head confidence coefficient distribution diagram of the preset training set sample image obtained by the updated full convolution neural network and the human head confidence coefficient distribution true value diagram of the preset training set sample image until the average error is less than or equal to the preset error threshold value, and determining the corresponding full convolution neural network as the trained full convolution neural network.

By applying the embodiment, the acquired continuous frame images of the video are input into the trained full convolution neural network to generate the head confidence distribution map corresponding to each frame image, the head detection target in each frame image is determined according to the head confidence distribution map, the tracking identification is distributed to the associated tracking target through the association of the current frame and the target of the previous frame, and finally the number of all the tracking identifications is counted, so that the people flow counting result is obtained. The full convolution neural network obtained through training can be used for extracting essential features of the human head and improving the accuracy of the human flow statistics, and the human head detection target can be determined by generating the human head confidence coefficient distribution map only through one full convolution neural network, so that the complexity of human head target recognition is reduced, and the computational efficiency of the human flow statistics is improved. Compared with a method based on feature point tracking, the embodiment can stably track the human head target only by recording the tracking identification, thereby improving the counting precision; compared with a method based on human body segmentation and tracking, the method has the advantages that the tracking identification can be recorded, the recording of the tracking identification is not influenced by shielding, and the precision is higher. In the training process of the full convolution neural network, a preset training set sample image is set for human head targets with different characteristics, the full convolution neural network obtained through training and iteration of the preset training set sample image has strong generalization capability, a complex classifier cascade mode is avoided, and the structure is simpler.

Optionally, the full convolution neural network further includes: a convolutional layer, a downsampling layer and an deconvolution layer;

the extracting module 1030 may be specifically configured to:

inputting the preset training set sample image into an initial full convolution neural network, and extracting the characteristics of the preset training set sample image through a network structure in which convolution layers and down-sampling layers are arranged alternately;

the features are up-sampled through the deconvolution layer until the resolution is the same as that of the preset training set sample image, and an up-sampled result is obtained;

and calculating the result by using the 1 x 1 convolutional layer to obtain a human head confidence distribution map with the same resolution as the preset training set sample image.

Corresponding to the above embodiments, an embodiment of the present invention provides a people flow rate statistical system, as shown in fig. 11, the people flow rate statistical system may include:

an image capturing device 1110 for capturing successive frame images;

a processor 1120 for acquiring successive frame images acquired by the image acquisition apparatus 1110; inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence distribution graph of each frame image in the continuous frame images; aiming at the human head confidence coefficient distribution graph of each frame of image, a preset target determining method is adopted to determine at least one human head detection target in each frame of image; acquiring and carrying out target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and distributing a tracking identifier for the tracking target; and counting the number of all tracking identifications to obtain a people flow counting result.

Optionally, the processor 1120 may be specifically further configured to:

acquiring a preset training set sample image and the central position of each human head target in the preset training set sample image;

generating a human head confidence coefficient distribution true value graph of the preset training set sample image according to a preset distribution law and the central position of each human head target in the preset training set sample image;

inputting the preset training set sample image into an initial full-convolution neural network to obtain a human head confidence coefficient distribution map of the preset training set sample image, wherein the network parameter of the initial full-convolution neural network is a preset value;

calculating the average error between the human head confidence coefficient distribution graph of the preset training set sample image and the human head confidence coefficient distribution true value graph of the preset training set sample image;

when the average error is larger than a preset error threshold value, updating network parameters according to the average error and a preset gradient operation strategy to obtain an updated full convolution neural network; calculating the average error between the human head confidence coefficient distribution diagram of the preset training set sample image obtained by the updated full convolution neural network and the human head confidence coefficient distribution true value diagram of the preset training set sample image until the average error is less than or equal to the preset error threshold value, and determining the corresponding full convolution neural network as the trained full convolution neural network.

Optionally, the full convolution neural network includes: a convolutional layer, a downsampling layer and an deconvolution layer;

the processor 1120 inputs the preset training set sample image into an initial full-convolution neural network to obtain a human head confidence coefficient distribution map of the preset training set sample image, which may specifically be:

Optionally, the processor 1120 determines, by using a preset target determination method, at least one human head detection target of each frame of image according to the human head confidence level distribution map of each frame of image, and specifically may be:

and determining the detection target of which the confidence coefficient of each pixel point is greater than a preset confidence coefficient threshold value as the human head detection target of the frame image.

Optionally, the processor 1120 obtains and performs target association between a current frame and a previous frame of the current frame according to a feature matching result and a motion smoothness of any human head detection target in each frame of image, to obtain a tracking target, and allocates a tracking identifier to the tracking target, which may specifically be:

and allocating a tracking identifier to the tracking target.

Optionally, the processor 1120 counts the number of all tracking identifiers to obtain a people flow rate statistical result, including:

Optionally, the processor 1120 may be specifically further configured to:

according to a preset pedestrian flow statistical condition, at least one tracking area is defined for the continuous frame images;

the processor 1120 obtains and performs target association between a current frame and a previous frame of the current frame according to a feature matching result and a motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and allocates a tracking identifier to the tracking target, which may specifically be:

the processor 1120 counts the number of all tracking identifiers to obtain a people flow rate statistical result, which may specifically be:

It should be noted that the people flow rate statistical system according to the embodiment of the present invention is a system using the people flow rate statistical method, and all embodiments of the people flow rate statistical method are applicable to the system and can achieve the same or similar beneficial effects.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A people flow statistical method, characterized in that the method comprises:

acquiring continuous frame images acquired by image acquisition equipment;

inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence coefficient distribution map of each frame image in the continuous frame images, wherein the human head confidence coefficient distribution map represents a distribution map of the probability that the detected target is a human head target;

counting the number of all tracking identifications to obtain a people flow counting result;

the method for determining at least one human head detection target in each frame of image by adopting a preset target determination method aiming at the human head confidence coefficient distribution graph of each frame of image comprises the following steps:

obtaining the confidence of all pixel points in the neighborhood of the central point of each detection target; wherein the size of the neighborhood comprises: carrying out statistics on the average value obtained by the actual radius of the head;

2. The method of claim 1, wherein before inputting the continuous frame images into a trained full convolution neural network to generate a head confidence distribution map of each frame of image in the continuous frame images, the method further comprises:

3. The people flow statistical method according to claim 2, wherein the fully convolutional neural network comprises: a convolutional layer, a downsampling layer and an deconvolution layer;

inputting the preset training set sample image into an initial full-convolution neural network to obtain a human head confidence coefficient distribution map of the preset training set sample image, wherein the human head confidence coefficient distribution map comprises the following steps:

4. The people flow statistical method according to claim 1, wherein the obtaining and performing target association between a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and allocating a tracking identifier to the tracking target comprises:

performing feature matching and smoothness analysis on any human head detection target of each frame of image in the continuous frame of images to obtain a feature matching result and motion smoothness of the human head detection target;

and allocating a tracking identifier to the tracking target.

5. The people flow rate statistical method according to claim 1, wherein the counting the number of all tracking identifiers to obtain the people flow rate statistical result comprises:

6. The method according to claim 5, wherein before obtaining and performing target association between a current frame and a previous frame of the current frame according to a feature matching result and a motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and allocating a tracking identifier to the tracking target, the method further comprises:

the acquiring and performing target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and allocating a tracking identifier to the tracking target, includes:

the counting of the number of all tracking identifications to obtain a people flow counting result comprises the following steps:

7. A people flow statistics device, characterized in that the device comprises:

the convolution module is used for inputting the continuous frame images into a trained full convolution neural network and generating a human head confidence coefficient distribution map of each frame image in the continuous frame images, wherein the human head confidence coefficient distribution map represents a distribution map of the probability that the detected target is a human head target;

the statistical module is used for counting the number of all tracking identifications to obtain a people flow statistical result;

wherein, the human head detection target determining module is specifically used for:

obtaining the confidence of all pixel points in the neighborhood of the central point of each detection target; wherein the size of the neighborhood is: carrying out statistics on the average value obtained by the actual radius of the head;

8. The people flow statistics apparatus according to claim 7, characterized in that the apparatus further comprises:

the second acquisition module is used for acquiring a preset training set sample image and the central position of each human head target in the preset training set sample image;

the generating module is used for generating a human head confidence coefficient distribution true value image of the preset training set sample image according to a preset distribution law and the central position of each human head target in the preset training set sample image;

the extraction module is used for inputting the preset training set sample image into an initial full convolution neural network to obtain a human head confidence coefficient distribution map of the preset training set sample image, wherein the network parameter of the initial full convolution neural network is a preset value;

the calculation module is used for calculating the average error between the human head confidence coefficient distribution graph of the preset training set sample image and the human head confidence coefficient distribution true value graph of the preset training set sample image;

the circulation module is used for updating network parameters according to the average error and a preset gradient operation strategy when the average error is larger than a preset error threshold value to obtain an updated full convolution neural network; calculating the average error between the human head confidence coefficient distribution diagram of the preset training set sample image obtained by the updated full convolution neural network and the human head confidence coefficient distribution true value diagram of the preset training set sample image until the average error is less than or equal to the preset error threshold value, and determining the corresponding full convolution neural network as the trained full convolution neural network.

9. The people flow statistics device of claim 8, wherein the fully convolutional neural network further comprises: a convolutional layer, a downsampling layer and an deconvolution layer;

the extraction module is specifically configured to:

10. The people flow rate statistic device according to claim 7, wherein the tracking identifier assigning module is specifically configured to:

and allocating a tracking identifier to the tracking target.

11. The people flow rate statistic device according to claim 7, wherein the statistic module is specifically configured to:

12. The people flow statistics apparatus according to claim 11, characterized in that the apparatus further comprises:

the tracking identifier assigning module is specifically further configured to:

the statistics module is specifically further configured to:

13. A people flow statistics system, the system comprising:

the image acquisition equipment is used for acquiring continuous frame images;

a processor for acquiring successive frame images acquired by the image acquisition device; inputting the continuous frame images into a trained full convolution neural network to generate a human head confidence coefficient distribution map of each frame image in the continuous frame images, wherein the human head confidence coefficient distribution map represents a distribution map of the probability that the detected target is a human head target; aiming at the human head confidence coefficient distribution graph of each frame of image, a preset target determining method is adopted to determine at least one human head detection target in each frame of image; acquiring and carrying out target association on a current frame and a previous frame of the current frame according to a feature matching result and motion smoothness of any human head detection target in each frame of image to obtain a tracking target, and distributing a tracking identifier for the tracking target; counting the number of all tracking identifications to obtain a people flow counting result;

the processor determines a process of determining at least one human head detection target in each frame of image by adopting a preset target determination method according to the human head confidence level distribution graph of each frame of image, and the process specifically comprises the following steps: