CN112883768B

CN112883768B - Object counting method and device, equipment and storage medium

Info

Publication number: CN112883768B
Application number: CN201911219292.7A
Authority: CN
Inventors: 谢奕; 陆瑞智; 喻晓源; 陈普
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2024-02-09
Anticipated expiration: 2039-11-29
Also published as: CN112883768A

Abstract

The application provides an object counting method, which relates to the field of artificial intelligence and comprises the following steps: obtaining a target image, inputting the target image to a density estimation network to obtain an initial density map, inputting the target image to an object detection network to obtain a first detection result map, wherein the first detection result map comprises at least one detection frame, each detection frame comprises part or all parts of an object, and processing density values in the initial density map according to the first detection result map to obtain a first density map; and counting the plurality of objects according to the first detection result graph and the first density graph to obtain the number of objects included in the target image. The method improves the accuracy of object counting and can be suitable for more object counting scenes.

Description

Object counting method and device, equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence (artificial intelligence, AI), and more particularly to an object counting method and corresponding apparatus, device, storage medium.

Background

There is a need for object counting in many scenarios in different fields. The object refers to an entity needing to be counted, and can be a pedestrian, a vehicle or the like. For example, in the security field, statistics is often required on the number of people in a key place, so as to control a region with higher density, and avoid occurrence of malignant events such as trampling or abnormal aggregation, which endanger public security. For another example, in the traffic control field, it is often necessary to count vehicles on a key road section, determine a congestion road section based on the result of the statistics, prompt the vehicles to avoid the congestion road section, and implement diversion of the vehicles, so as to avoid traffic congestion.

With the development of computer vision technology, a method for directly detecting objects in a picture through a target detection algorithm and then counting the number of detection frames is proposed in the industry. The target detection algorithm comprises a One-step (One-stage) target detection algorithm and a Two-step (Two-stage) target detection algorithm based on deep learning.

However, the method based on object detection is mainly suitable for sparse crowd counting, and as crowd density is increased, occlusion between objects becomes more serious, and the existing object counting method can cause a larger deviation between a counting result and an actual number.

Disclosure of Invention

The object counting method is used for solving the problem that the counting result is inaccurate due to shielding among objects when the object density is improved, and the accuracy of object counting is improved. The application also provides corresponding apparatus, devices, readable storage media and computer program products.

According to the object counting method, on the basis of a target detection algorithm, the density estimation algorithm is utilized to make up for missed detection of the target detection algorithm, the target detection algorithm is utilized to correct false detection of the density estimation algorithm, the advantages of the target detection algorithm and the density estimation algorithm are fully exerted, and the object counting precision is improved.

In specific implementation, a target image is firstly obtained, the target image is obtained by shooting a geographic area through a camera, the target image comprises a plurality of objects to be counted, then the target image is input into a density estimation network, the density estimation network is used for estimating the density of the objects in the target image, an initial density image can be obtained, the target image is input into an object detection network, a first detection result image is obtained, the first detection result image comprises at least one detection frame, each detection frame comprises part or all parts of one object, then the density value in the initial density image is processed according to the first detection result image to obtain a first density image, and finally a plurality of objects are counted according to the first detection result image and the first density image to obtain the number of the objects contained in the target image.

In addition, the method utilizes two different counting modes to count the objects, can be suitable for sparse scenes and dense scenes, and has good compatibility. Moreover, when the sparse scene and the dense scene exist in the target image at the same time, the sparse scene and the dense scene in the target image are not required to be recognized and cut, but the target detection and the density estimation are carried out on the whole target image, so that the counting loss caused by the recognition precision is avoided, and the counting precision is further improved.

With reference to the first aspect, in a first implementation manner of the first aspect, considering a scene in which the object flows relatively severely, the stationary region may also be located based on a correlation between initial density maps corresponding to the multi-frame images. Specifically, K frame reference images are acquired, wherein the K frame reference images and the target image are continuous K+1 frame images in a video stream, K is a positive integer, then the K frame reference images are input to a density estimation network, an initial density map of each frame of reference image in the K frame reference images is obtained, and then a static area in the target image is determined according to the initial density map of each frame of reference image and the initial density map of the target image.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, considering that the number of objects in the static area may reflect the aggregation situation of the objects, the static area may also be counted, and an alarm may be given according to the result of the object counting.

In specific implementation, the first detection result graph and the counting area of the first density graph are determined according to the position of the static area in the target image, then objects in the static area are counted according to the first detection result graph and the counting area of the first density graph, the number of the objects included in the static area in the target image is obtained, and when the number of the objects included in the static area is not smaller than a number alarm threshold, an alarm signal is generated, wherein the alarm signal is used for indicating object aggregation.

With reference to the first implementation manner of the first aspect, in a third implementation manner of the first aspect, considering that the distance of the object in the static area may reflect the aggregation situation of the object, the distance of the object in the static area may also be determined, and an alarm is performed according to the distance of the object, so as to avoid false alarm caused by difference between the pixel position and the geographic position, and reduce false alarm rate.

In specific implementation, the geographic position of each object in the static area is determined according to the pixel position of each object in the target image and the calibration parameter of the camera, then the distance from each object to other objects is determined according to the geographic position of each object, the comprehensive distance of the objects in the static area is determined based on the distance from each object to other objects, and when the comprehensive distance is not more than a distance alarm threshold value, an alarm signal is generated, wherein the alarm signal is used for indicating object aggregation.

With reference to the first aspect or any implementation manner of the first to the third implementation manners of the first aspect, in a fourth implementation manner of the first aspect, considering that there may be a false detection of the density map, further optimization may be performed on the basis of the first density map, so that an influence of the false detection of the density map on counting is reduced, and counting accuracy is improved.

When the method is specifically implemented, the first density image is divided into a plurality of density block images, whether each density block image meets preset processing conditions or not is judged according to the density value in each density block image, the density value in the density block image meeting the preset processing conditions is set to zero, the density block image which does not meet the preset processing conditions is not processed, a second density image is obtained according to the processed density block image and the unprocessed density block image, and the objects are counted according to the first detection result image and the second density image.

With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the preset processing condition includes: the number of objects determined from the density values in the density block image is less than the image processing threshold; or the number of the objects determined according to the density value in the density block image is not smaller than the image processing threshold, and the number of the detection frames of the detection area corresponding to the density block image in the first detection result image is smaller than the image processing threshold, and the detection area is determined according to the position of the density block image in the first density image.

With reference to the first aspect or any implementation manner of the first to the fifth implementation manners of the first aspect, in a sixth implementation manner of the first aspect, processing the initial density map according to the first detection result map may specifically include determining a region to be processed in the initial density map according to a position of each detection frame in the first detection result map, and then setting a density value in the region to be processed to zero to obtain the first density map.

Of course, the density value in the region to be processed may be an illegal value or a negative value or any other value that can be distinguished from other regions in practical use. In this way, when counting objects based on the density map, the processed area may not be counted, and the repetition of the counting is avoided, thereby affecting the counting accuracy.

With reference to the first aspect or any one implementation manner of the first to the sixth implementation manners of the first aspect, in a seventh implementation manner of the first aspect, the plurality of objects are counted according to the first detection result graph and the first density graph, specifically, a first number is determined according to the first detection result graph, and a second number is determined according to the first density graph, where the first number is a number of detection frames in the first detection result graph, the second number is determined by a density value in the first density graph, and then a sum of the first number and the second number is calculated to obtain a number of objects included in the target image.

With reference to the first aspect or any implementation manner of the first to seven implementation manners of the first aspect, in an eighth implementation manner of the first aspect, a visual result graph may be further generated according to the first detection result graph and the first density graph, where the visual result graph is used to show a distribution number and a density of objects in the geographic area.

In a second aspect, the present application provides an object counting apparatus, the apparatus comprising:

the acquisition module is used for acquiring a target image, wherein the target image is obtained by shooting a geographic area by a camera, and the target image comprises a plurality of objects to be counted;

the density estimation module is used for inputting the target image into a density estimation network to obtain an initial density map, and the density estimation network is used for estimating the density of an object in the target image;

the object detection module is used for inputting the target image into an object detection network to obtain a first detection result diagram, wherein the first detection result diagram comprises at least one detection frame, and each detection frame comprises part or all parts of an object;

the processing module is used for processing the density values in the initial density map according to the first detection result map to obtain a first density map;

And the counting module is used for counting the plurality of objects according to the first detection result graph and the first density graph to obtain the number of objects included in the target image.

With reference to the second aspect, in a first implementation manner of the second aspect, the obtaining module is further configured to:

obtaining K frame reference images, wherein the K frame reference images and the target image are continuous K+1 frame images in a video stream, and K is a positive integer;

the density estimation module is further configured to:

inputting the K frame reference images to the density estimation network to obtain an initial density image of each frame of reference images in the K frame reference images;

the processing module is further configured to:

and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.

With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the counting module is further configured to:

determining counting areas of the first detection result graph and the first density graph according to the position of the static area in the target image;

counting objects in the static area according to the first detection result graph and the counting area of the first density graph to obtain the number of objects included in the static area in the target image;

The apparatus further comprises:

and the first alarm module is used for generating an alarm signal when the number of the objects included in the static area is not smaller than a number alarm threshold value, wherein the alarm signal is used for indicating the aggregation of the objects.

With reference to the first implementation manner of the second aspect, in a third implementation manner of the second aspect, the processing module is further configured to:

determining the geographic position of each object in the static area according to the pixel position of each object in the target image and the calibration parameters of the camera;

determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects;

the apparatus further comprises:

and the second alarm module is used for generating an alarm signal when the comprehensive distance is not greater than the distance alarm threshold value, and the alarm signal is used for indicating the aggregation of objects.

With reference to the second aspect or any implementation manner of the first to third implementation manners of the second aspect, the counting module is specifically configured to:

dividing the first density map into a plurality of density block images;

Judging whether each density block image meets preset processing conditions or not according to the density value in each density block image;

setting the density value in the density block image meeting the preset processing conditions to be zero, and not processing the density block image which does not meet the preset processing conditions;

obtaining a second density map according to the processed density block image and the unprocessed density block image;

and counting the plurality of objects according to the first detection result graph and the second density graph.

With reference to the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the preset processing condition includes:

the number of objects determined from the density values in the density block image is less than the image processing threshold; or,

the number of objects determined according to the density value in the density block image is not smaller than the image processing threshold, the number of detection frames of the corresponding detection area of the density block image in the first detection result image is smaller than the image processing threshold, and the detection area is determined according to the position of the density block image in the first density image.

With reference to the second aspect or any implementation manner of the first to fifth implementation manners of the second aspect, the processing module is specifically configured to:

And determining a region to be processed in the initial density map according to the position of each detection frame in the first detection result map, and setting zero for the density value in the region to be processed to obtain the first density map.

With reference to the second aspect or any implementation manner of the first to sixth implementation manners of the second aspect, the counting module is specifically configured to:

determining a first number according to the first detection result diagram and determining a second number according to the first density diagram, wherein the first number is the number of detection frames in the first detection result diagram, and the second number is determined by a density value in the first density diagram;

and calculating the sum of the first number and the second number to obtain the number of objects included in the target image.

With reference to the second aspect or any implementation manner of the first to seven implementation manners of the second aspect, the apparatus further includes:

the generation module is used for generating a visual result diagram according to the first detection result diagram and the first density diagram, and the visual result diagram is used for displaying the distribution number and density of the objects in the geographic area.

In a third aspect, the present application increases an object counting device comprising a processor and a memory; the memory is used for storing computer instructions; the processor is configured to perform the object counting method as in the first aspect or any one of the possible implementation manners of the first aspect according to the computer instructions.

In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method as described in the first aspect or any one of the possible implementations of the first aspect.

Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.

Drawings

FIG. 1 is a scene architecture diagram of an object counting method in an embodiment of the present application;

FIG. 2 is a flow chart of an object counting method in an embodiment of the present application;

FIG. 3 is a schematic view of the density distribution in the initial density map according to the embodiment of the present application;

FIG. 4 is a flow chart of a second round of optimization of a first density map in an embodiment of the present application;

FIG. 5 is a schematic diagram of a visual result diagram in an embodiment of the present application;

FIG. 6 is a flow chart of a method for locating a stationary region in an embodiment of the present application;

FIG. 7 is a schematic diagram of determining a still region mask in an embodiment of the present application;

FIG. 8 is a schematic diagram of a still region determination based on a still region mask in an embodiment of the present application;

FIG. 9 is a flow chart of a method for locating a stationary region in an embodiment of the present application;

FIG. 10 is a flow chart of a method for locating a stationary region in an embodiment of the present application;

FIG. 11 is a schematic illustration of a pedestrian imaged by a camera in an embodiment of the present application;

fig. 12 is a schematic structural diagram of an object counting apparatus in an embodiment of the present application.

Detailed Description

The application provides a method for counting objects by combining target detection and density estimation technologies, which can improve the accuracy of object counting. The method can also adapt to dense scenes and sparse scenes, and has good compatibility.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can appreciate, with the development of technology and the appearance of new scenes, the technical solutions provided in the embodiments of the present application are applicable to similar technical problems.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

It is understood that the object counting method provided in the present application can be applied to any processing device having image processing capability. The processing device may be a terminal having a central processor (Central Processing Unit, CPU) and/or a graphics processor (Graphics Processing Unit, GPU), including but not limited to a personal computer (Personal Computer, PC), workstation, etc. The processing device may also be a server with a CPU and/or GPU, and the server may be a stand-alone server or a cluster of servers. In some cases, the terminal and the server may also cooperatively implement the above-described object counting method.

In practical applications, the object counting method provided by the application includes, but is not limited to, application in an application environment as shown in fig. 1.

As shown in fig. 1, the server 120 is connected to the camera 140 and the terminal 160 via a network. The camera 140 may capture an image of a target in a geographic area, such as a square, a station, etc. The target image includes a plurality of objects to be counted, and the server 120 is provided with an object counting device 1200, where the object counting device 1200 includes an acquisition module 1201, a density estimation module 1202, an object detection module 1203, a processing module 1204, and a counting module 1205. The acquisition module 1201 may acquire the target image from the camera 140, the density estimation module 1202 includes a density estimation network, and the density estimation module 1202 may input the target image acquired by the acquisition module 1201 to the density estimation network to obtain an initial density map. The object detection module 1203 includes an object detection network, and the object detection module 1203 may input the target image acquired by the acquisition module 1201 to the object detection network to obtain a first detection result map. The processing module 1204 may process the density values in the initial density map output by the density estimation module 1202 according to the first detection result map output by the object detection module 1203, to obtain a first density map. The counting module 1205 may count a plurality of objects according to the first detection result graph output by the object detecting module 1203 and the first density graph output by the processing module 1204, to obtain the number of objects included in the target image, and further, the server 120 may output the number of objects included in the target image obtained by the counting module 1205 in the object counting device 1200 to the terminal 160 for display.

Next, each step of the object counting method provided in the embodiment of the present application will be described in detail from the perspective of the server.

Referring to the flow chart of the object counting method shown in fig. 2, the method comprises:

s201, acquiring a target image.

The target image comprises a plurality of objects to be counted. Wherein the object may be any countable entity, and in different application scenarios, the object may be a different entity. For example, in a security scene, the object may be a pedestrian, and the number of pedestrians is counted based on the target image, so that the crowd gathering condition can be monitored, and the occurrence of malignant events such as trampling is avoided. For another example, in the traffic control scenario, the object may be a vehicle, and by counting the number of vehicles on the road, the road congestion condition can be monitored, and congestion information is timely prompted to other vehicles, so as to avoid congestion aggravation.

It can be understood that a camera is generally deployed in a geographical area such as a square, a road and the like where the aggregation condition of objects needs to be monitored, the camera at least comprises a camera, the camera can shoot the geographical area through the camera to obtain a target image, and the server can acquire the target image from the camera. The server may receive the target image sent by the camera, or may actively obtain the target image from the camera. The server may acquire the target image in real time, or may acquire the target image according to a preset period, which is not limited in this embodiment.

S202, inputting the target image to a density estimation network to obtain an initial density map.

The density estimation network is specifically a network for estimating the density of an object in an image, and takes the image as an input, takes a density map corresponding to the image as an output, and in order to distinguish the density map from other density maps, the density map directly output by the density estimation network is recorded as an initial density map. The size of the initial density map is the same as the size of an image (hereinafter referred to as an input image) of the input density estimation network, each pixel point in the initial density map corresponds to a pixel point in the input image one by one, and the pixel value of each pixel point in the initial density map can represent the density value of the corresponding pixel point in the input image.

Referring specifically to fig. 3, the pixel values of each pixel point in the initial density map are shown, where each pixel value is used to represent the density of the corresponding pixel point in the input image, so that the initial density map may show the density distribution, and thus the object distribution. Specifically, the region where the pixel value is 0 in the initial density map is clustered can represent that no object exists in the corresponding region of the input image, the region where the pixel value is not 0 in the initial density map is clustered can represent that an object exists in the corresponding region of the input image, and the object in the input image can be represented by a plurality of pixels whose pixel values are in gaussian distribution in the initial density map, so that the server can count the objects according to the pixel values of the plurality of pixels.

Taking into account the difference in size of objects in the distant view and the close view, different variances of gaussian distributions may be used to characterize objects of different sizes. In the example of fig. 3, a gaussian distribution with variance 3*3 may be used to characterize objects in the distant view, as shown at 301, and a gaussian distribution with variance 5*5 and a gaussian distribution with variance 7*7 may be used to characterize objects in the close view, as shown at 302 and 303.

In this embodiment, the density estimation network may be obtained by training a deep learning model. The deep learning model may be a convolutional neural network model, for example, a model such as multi-column convolutional neural network (multi-column convolutional neural network, MCNN), scale-adaptive convolutional neural network (SaCNN), or congestion scene recognition network (congested scene recognition net, CSRNet).

Taking CSRNet as an example, the network includes at least a feature extraction layer and a feature mapping layer. The feature extraction layer can be divided into a front end and a rear end, and the front end and the rear end are respectively used for extracting semantic features of different layers. In specific implementation, the front end may be a visual geometry group (visual geometry group, VGG) network with all connection layers removed, the VGG network may be 16-layer VGG16 or 19-layer VGG19, taking VGG16 as an example, VGG16 with all connection layers removed mainly includes convolution layers and pooling layers, for example, 10-layer convolution layers and 3-layer pooling layers, and the rear end may be a cavity convolution neural network dilated convolution, for example, a cavity convolution neural network composed of 6-layer cavity convolution layers, where compared with ordinary convolution, cavity convolution can expand receptive fields through sparse convolution kernels, and extract semantic features of deeper layers. The feature mapping layer is mainly used for mapping high-dimensional features to a low-dimensional mode space, and specifically, the feature mapping layer can be a 1*1 convolution layer, and can map multi-channel features extracted by the feature extraction layer into single-channel features, so that a high-quality density map is obtained.

When training the CSRNet, a plurality of pixel points with Gaussian pixel values can be used as labels of objects in the image, so that a training sample can be obtained. And inputting the training samples into the CSRNet in batches, wherein the CSRNet can predict the pixel values of all pixel points in the training samples, calculate the loss according to the predicted pixel values and pixel value labels, and update the parameters of the CSRNet based on the loss so as to realize model training. When the loss meets the training end condition, such as the loss tends to converge, training may be stopped, and the trained CSRNet may be used as a density estimation network to estimate the density of the object in the target image.

It should be noted that the image is reduced after passing through the pooling layer, for example, the image is reduced by half every time it passes through one pooling layer, for this purpose, an expansion layer may be added after CSRNet, and this expansion layer may expand the resolution of the density map output by the 1*1 convolution layer in CSRNet, to obtain an initial density map with the same size as the target image. For example, where the CSRNet front-end includes a 3-layer pooling layer, the size of the density map output by the 1*1 convolution layer in CSRNet is 1/8 of the target image, and the expansion layer may expand the density map resolution by a factor of 8 and then divide the pixel values by 64, thereby obtaining the initial density map.

S203, inputting the target image to an object detection network to obtain a first detection result graph.

The object detection network is in particular a network that detects objects in an image. The object detection network takes a target image as input and takes a first detection result diagram as output, wherein the first detection result diagram comprises at least one detection frame, and each detection frame comprises part or all parts of an object. For example, in a crowd counting scenario, the detection frame may include a person or a head-shoulder portion of a person, and in a vehicle counting scenario, the detection frame may include a whole vehicle or a head-head portion, for example.

In particular implementations, the object detection network may be generated by any training of the object detection algorithm. Among other algorithms, the target detection algorithm may be a look-only-once (you only look once, yolo), a single-point multi-anchor detector (single shot multi-box detector, SSD), a Faster area-generation network based convolutional neural network (Faster region proposal network-convolutional neural network, fast-RCNN), or a feature pyramid network (feature pyramid networks, FPN).

Taking the third version of the Yolo algorithm, yolo v3 as an example, the Yolo v3 network mainly comprises a feature extraction layer, a feature conversion layer and a feature mapping layer. The feature extraction layer may be implemented by dark network dark net, specifically, dark net53 with the full connection layer removed may be used as the feature extraction layer, where the feature extraction layer includes 52 layers of convolution layers, the 52 layers of convolution layers are divided into three segments (i.e. 3 stages) according to depth, 1-26 layers of convolution layers are stage1, 27-43 layers of convolution layers are stage2, 44-52 layers of convolution layers are stage3, and the convolution layers of different stages have different receptive fields, so that detection of objects with different scales can be implemented by the convolution layers of different stages. The feature conversion layer is used for converting the features extracted by the feature extraction layer into low-dimensional features and inputting the low-dimensional features into the feature mapping layer for mapping. In order to improve the detection rate of objects with different scales, the feature conversion layer and the feature mapping layer are realized by adopting a plurality of branches, for example, the objects with different scales can be predicted through three branches.

In the training stage, all or part of the object in the image, such as the head and shoulder of a person, can be marked to obtain a training sample, the training sample is input into the Yolo v3 in batches to train, the Yolo v3 can detect all or part of the object in the training sample, such as the head and shoulder, and then the loss is calculated according to the detection result and marking information, and the parameters of the Yolo v3 are updated based on the loss, so that the Yolo v3 training is realized, when the Yolo v3 meets the training ending condition, such as the fact that the loss of the Yolo v3 tends to converge, the trained Yolo v3 can be used as an object detection network to detect the object in the image.

Considering that some objects with smaller sizes, namely small targets, possibly exist in the target image, the server can also adopt a pyramid image input mode, so that the detection rate of the small targets is improved. In this embodiment, the pyramid image input mode specifically refers to that an input target image is segmented to obtain a plurality of sub-images as input of an object detection network, and then a server may detect objects in the plurality of sub-images by using the object detection network to obtain a detection result sub-image corresponding to each sub-image.

The detection result graph obtained by the object detection network detecting the object in the target image can be recorded as an initial detection result graph, and the server can superimpose the plurality of detection result subgraphs with the initial detection result graph according to the positions of the sub-images in the target image, so that fusion of the detection result subgraphs and the initial detection result graph is realized.

Considering that the fused detection result graphs may have a situation that the detection frames overlap, the server may further perform non-maximum suppression (Non Maximum Suppression, NMS) on the fused detection result graphs, specifically, for a plurality of detection frames with an overlapping rate, that is, an intersection ratio (Intersection over Union, ioU), greater than an overlapping rate threshold, a detection frame with the highest probability is reserved, and a detection frame with a smaller suppression probability is reserved, so that redundant detection frames are removed, and a first detection result graph is obtained. The first detection result graph fuses the detection result of the object in the sub-image, so that the omission of a small object is avoided, the detection result of overlapping of the object image and the sub-image is filtered through the NMS, repeated counting is avoided, and therefore counting accuracy is improved.

It should be noted that S202 and S203 may be executed in parallel, or may be executed according to a set sequence, which is not limited in this embodiment.

S204, processing the density values in the initial density map according to the first detection result map to obtain a first density map.

And S205, counting a plurality of objects according to the first detection result graph and the first density graph to obtain the number of objects included in the target image.

The first detection result diagram comprises at least one detection frame, each detection frame comprises part or all parts of an object, and object counting can be realized by counting the detection frames. The pixel value of each pixel point in the initial density map represents the density value of the corresponding pixel point in the target image, and the number of objects in the target image can be estimated based on the density value, so that object counting is realized.

The first detection result graph has higher detection rate for the objects in the sparse area, and the object in the dense area is likely to have missed detection, and the accuracy of object counting is not high directly based on the first detection result graph.

When the server counts the objects by combining the initial density map, the server needs to process the initial density map first, so as to avoid that the first detection result map and the initial density map repeatedly count the same object. Specifically, the server may determine the area to be processed in the initial density map according to the position of each detection frame in the first detection result map, set the density value in the area to be processed to zero, so as to obtain the first density map, and then determine the number of detection frames in the first detection result map, that is, the first number, according to the first detection result map, determine the second number according to the density value in the first density map, calculate the sum of the first number and the second number, and obtain the number of objects included in the target image.

Wherein, determining the second number according to the density values in the first density map may be achieved by summing the density values in the first density map. Specifically, the server may sum the density values in the first density map, and determine the second number based on the sum value, where it is noted that when the sum value is not an integer, the sum value may be rounded to obtain the second number according to a rounding or the like principle. The zeroing of the density values of the areas to be processed in the initial density map is only one exemplary implementation manner of processing the density values in the initial density map, and in other possible implementations of the application, the server may also set the density values of the areas to be processed in the initial density map to illegal values or negative values, and so on, and correspondingly, when determining the second number according to the first density map, the server may sum the legal values or sum the positive values, and then round the sum value, thereby obtaining the second number.

As can be seen from the foregoing, the embodiment of the present application provides an object counting method, which obtains an initial density map of a target image through a density estimation network, obtains a first detection result map of the target image through an object detection network, processes a density value in the initial density map according to the first detection result map to obtain a first density map, and counts objects according to the first detection result map and the first density map, so as to implement that the object counting method uses a density estimation algorithm to make up for missed detection of the target detection algorithm, uses the target detection algorithm to correct false detection of the density estimation algorithm, fully plays respective advantages of the target detection algorithm and the density estimation algorithm, and improves object counting accuracy. Moreover, the method can be suitable for sparse scenes or dense scenes, and has good compatibility.

It will be appreciated that S204 in the embodiment of FIG. 2 corresponds to a first round of optimization of the initial density map, and thus the first density map may also be referred to as a first round of optimized density mapWherein t represents the frame number of the target image in the video stream, and t is a positive integer. Considering that the density estimation network may misdetect complex texture regions such as shrub regions as dense objects in a sparse scene, the server may also perform +. >Further optimization is performed on the basis of the density noise filtered out, so that a second density map, namely a second round of optimized density map +.>Correspondingly, in S205, the server may be based on the first detection result map and by the first density map +.>Second density map +.obtained by performing the second round of optimization>The plurality of objects are counted, and thus, the counting accuracy can be further improved. The second round of optimization of the density map is described in detail below.

Referring to the flow chart of the second round of optimization of the density map shown in fig. 4, the method specifically comprises the following steps:

s401, dividing the first density map into a plurality of density block images.

In a specific implementation, the server may divide the first density map into a plurality of density block images in an equally divided manner, so that each density block image may determine whether it belongs to density noise using the same criteria. The number of divided density block images may be set according to an empirical value, and in one example, the server may divide the first density map into 16 density block images of the same size in a 4x4 division manner.

S402, judging whether each density block image meets preset processing conditions according to the density value in each density block image.

S403, setting the density value in the density block image meeting the preset processing condition to zero, and not processing the density block image which does not meet the preset processing condition.

For any density block image, when the number of objects determined based on the density value in the density block image is small, the density block image can be considered to reflect the density value of the sparse scene, and the object detection network trained based on the target detection algorithm has a higher detection rate for the objects in the sparse scene, so that when the objects in the sparse scene are counted, the density value in the density block image can be set to zero, the first detection result image output by the first object detection network is directly utilized for counting, and the false detection of the density block image is prevented from influencing the counting precision. Based on this, the preset processing condition may include that the number of objects determined from the density values in the density block image is smaller than the image processing threshold.

In some possible implementations, when the number of objects determined based on the density value in the density block image is larger, the detection result of the object detection network may be further combined to determine whether the density block image is misdetected, for example, a region with complex texture such as a shrub region is misdetected as an object gathering region, if so, the density value in the density block image may be set to zero, and the misdetection is corrected by using the first detection result graph, so as to avoid affecting the counting precision. Based on this, the preset processing condition may also include that the number of objects determined according to the density value in the density block image is not less than the image processing threshold, and the number of detection frames of the corresponding detection area of the density block image in the first detection result map is less than the image processing threshold.

Wherein the detection area may be determined from the position of the density patch image in the first density map. For example, when the first density map is divided into 16 density block images according to the 4*4 division manner, the detection areas corresponding to the density block images located in the first row and the first column in the first detection result map are specifically the upper left corner area of the first detection result map, and the side length of the detection area is 1/4 of the side length of the first detection result map.

Further, when determining whether the density block image is falsely detected in combination with the detection result of the object detection network, in order to improve the accuracy, the server may further process the density block image according to the detection frame of the corresponding detection area in the first detection result graph when the number of objects determined according to the density value in the density block image is greater than or equal to the image processing threshold, specifically, set the density value in the area where the area in the detection frame is mapped to the density block image to zero, so as to obtain a processed density block image, and if the number of objects determined according to the processed density block image is still greater, determine that the density block image is falsely detected. Based on this, the preset processing condition may also include that the number of detection frames of the detection area corresponding to the density block image in the first detection result map is smaller than the image processing threshold, and the number of objects determined according to the density value in the processed density block image is not smaller than the image processing threshold.

When the density block image is processed, the server may also set the density values in the area where the inner area of the detection frame and the partial area outside the detection frame are mapped to the density block image to zero, where the partial area outside the detection frame may be an annular area with the detection frame as an inner edge and the distance from the outer edge to the inner edge being half of the side length of the detection frame.

When the density block image meets any one or more of the preset processing conditions, setting the density value in the density block image to zero; when the density block image does not meet any of the preset processing conditions, the density value in the density block image is not processed.

The image processing threshold may be set according to an empirical value, where the image processing threshold has a certain correlation with a size of the density block image, a smaller preset threshold may be set when the size of the density block image is smaller, and a larger preset threshold may be set when the size of the density block image is larger. In one example, where the first density map is divided into 16 density block images, each density block image is relatively small in size, the image processing threshold may be set to 3.

S404, obtaining a second density map according to the processed density block image and the unprocessed density block image.

Specifically, the server splices the processed density block image and the unprocessed density block image according to the positions of the density block images in the first density map to obtain a second density map. On the basis of the first density map, the second density map also sets the density value of the region with sparse objects to zero, and the density value of the region with false detection is set to zero, so that object counting is performed according to the first detection result map and the second density map, and the counting precision can be further improved.

After the object counting is realized, the server can also present the counting result in a visual mode. Specifically, the server may generate a visualized result graph according to the first detection result graph and the processed density graph. As shown in fig. 5, the visualization result chart 500 may be used to show the distribution number and density of the objects in the geographic area, specifically, a detection frame 501 is displayed in the visualization result chart 500, a part of the parts including the objects in each detection frame 501, for example, the head and shoulder of a person, further, a detection block 502 is also displayed in the visualization result chart 500, the detection block 502 is a pixel block composed of a plurality of pixels with gaussian distribution of pixel values, the number of the objects not detected in the first detection result chart can be determined based on the number of the detection frames 501 and the number of the detection blocks 502, and the distribution of the objects in the geographic area can be determined based on the distribution of the detection frames 501 and the distribution of the detection blocks 502. It should be noted that, there is a difference between the object sizes in the distant view and the close view, so when the detection block 502 is used to identify the object, the detection block 502 composed of different numbers of pixels may be used to identify the object with different sizes.

The processed density map may be the first density map or the second density map. In specific implementation, the first detection result graph, the first density graph and the second density graph have the same size, the pixel points in the first detection result graph are in one-to-one correspondence with the pixel points in the first density graph or the pixel points in the second density graph, and the server can perform weighted operation on the pixel values of the first detection result graph and the processed density graph (the first density graph or the second density graph) according to the pixel points, wherein the operation result is used as the pixel value of the corresponding pixel point in the visual result graph.

The single frame image can reflect the number of objects at a certain moment, and in consideration of the situation that the object flow is severe, such as a scene of a station, a street and other pedestrians flowing is severe, the server can also locate the static area based on the relevance between initial density maps corresponding to the multi-frame images, so that the server can further determine the distribution situation of the objects in the static area based on the static area, and when the distribution situation of the objects in the static area represents the object aggregation, the server can timely alarm to avoid adverse effects.

Referring to the flow chart of the still region locating method shown in fig. 6, on the basis of the embodiment shown in fig. 2, the method further includes:

S601, acquiring a K frame reference image.

Wherein the K frame reference image and the target image are videosK+1 frames of images are consecutive in the stream, K being a positive integer. In specific implementation, the server may obtain a video frame sampling sequence i= { I from a video stream captured by the camera _t-K ,...I _t-1 ,I _t In this way, in the acquisition of the target image I _t At the same time, acquire I _t Previous K frame reference image I _t-K ,...I _t-1 。

S602, inputting K frame reference images into a density estimation network, and obtaining an initial density map of each frame of reference images in the K frame reference images.

The specific implementation of S602 may be described in reference to S202, which is not described herein.

S603, determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.

In particular implementations, referring to FIG. 7, the server may map the initial density { D ] of the K-frame reference image and the target image _t-K ,D _t-(K-1) ,...D _t And processing is performed sequentially, wherein the process of performing the and processing on the initial density map can be seen in the following formula:

wherein i is any one of t-K and … t, m and n are used for identifying pixel positions in the initial density map, and m and n are positive integers.

The server can then binarize and expand the processed results to obtain a still region mask Wherein, the static area mask->It can be divided into two parts, one part being a block of highlighted pixels for characterizing the stationary region and the other part being a block of non-highlighted pixels for characterizing the non-stationary region.

The pixel value of the highlight pixel block is the objectThe target pixel value may be 1 or 255, and based on this, the above-mentioned high bright pixel block may also be called target pixel block, which is used to characterize the still region maskThe middle pixel value is the pixel block of the target pixel value. In particular implementations, the server may determine the minimum circumscribed area of the target pixel block as the stationary area.

Referring to FIG. 8, a server may mask from a static region using findContours functions in an open source computer vision library (Open Computer Vision, openCV)And then determining the minimum circumscribed rectangle R of the outline by using a boundingRect function in OpenCV, so that the server can determine the minimum circumscribed rectangle R as a candidate region.

It should be noted that, fig. 8 is an example of the minimum circumscribed rectangle, and in other possible implementations, the minimum circumscribed area may also be other shapes, for example, a minimum circumscribed circle, etc., which is not limited in this embodiment.

Further, the number of objects in the static area may reflect the object aggregation situation, and the server may also count the objects in the static area to determine the object aggregation situation in the static area, and alarm according to the object aggregation situation.

Referring to fig. 9, on the basis of the embodiment shown in fig. 6, the method further includes:

and S901, determining a counting area of a first detection result diagram and a first density diagram according to the position of the static area in the target image.

It can be understood that the sizes of the target image and the first detection result graph and the first density graph are the same, the pixel points in the target image and the first detection result graph and the pixel points in the first density graph are in one-to-one correspondence, and the server can obtain a pixel point m "corresponding to the pixel coordinate in the first detection result graph and a pixel point m" corresponding to the first density graph according to the pixel coordinate of the pixel point m of the static area in the target image, wherein the area formed by the pixel point m "in the first detection result graph is the counting area of the first detection result graph, and the area formed by the pixel point m" in the first density graph is the counting area of the first density graph.

And S902, counting objects in the static area according to the first detection result graph and the counting area of the first density graph, and obtaining the number of the objects included in the static area in the target image.

Specifically, the server may sum the number of detection frames included in the count region of the first detection result map and the number of objects determined from the count region of the first density map, to obtain the number of objects included in the stationary region in the target image.

In some possible implementations, the server may also determine a counting area of the second density map according to the position of the stationary area in the target image, and count the objects in the stationary area according to the first detection result map and the counting area that the second density map can count, so as to obtain the number of the objects included in the stationary area in the target image.

S903, generating an alarm signal when the number of objects included in the still region is not less than the number alarm threshold.

When the number of the objects included in the static area is not smaller than the number alarm threshold, the static area is indicated to have more object aggregation, and the server can generate an alarm signal to indicate the object aggregation. Depending on the type of alarm, the server may generate different types of alarm signals. In some possible implementations, the server may generate a voice alarm signal, the terminal may play the alarm voice through the speaker to remind when receiving the voice alarm signal, and of course, the server may also generate an image-text alarm signal, the terminal may display an alarm text and/or an alarm image in the display to remind when receiving the image-text alarm signal, and in addition, the server may also generate an indicator alarm signal, and control the on/off, flashing or color of the indicator to remind based on the indicator alarm signal.

The embodiments shown in fig. 6 to 9 are to locate a still region for a target image and count objects in the still region, and in practical application, the server may also locate the still region for each frame image and count objects for the still region of each frame image, so that the flow trend of the objects in the still region can be determined.

In some possible implementations, the server may further determine a distance between objects included in the static area in the target image, determine an object aggregation situation according to the distance between the objects, and alarm the objects according to the object aggregation situation, so as to avoid false alarms caused by differences between the pixel position and the geographic position.

Referring to fig. 10, on the basis of the embodiment shown in fig. 6, the method further includes:

s1001, determining the geographic position of each object according to the pixel position of each object in the static area in the target image and the calibration parameters of the camera. In particular implementations, the server may determine the pixel position of each object in the target image based on the stationary focus region and camera calibration parameters such as the depression angle α and focal length f _x And f _y The geographic location of each object is determined.

Specifically, the server may set up a coordinate system with the camera as the origin of the spatial coordinate system and the direction of the focal length as the coordinate axis, and then calculate the geographic position of each object based on the similar triangle relationship among the object, the plane of the camera, and the plane of the target image.

For ease of understanding, a crowd counting scenario is illustrated as an example. Referring to the schematic view of pedestrian imaging by the camera shown in fig. 11, the pixel positions of the pedestrian head and shoulder in the target imageWhere N is the number of pedestrians. Let the pixel position of the mth pedestrian head and shoulder in the target image be +.>x _m 、y _m Respectively representing the abscissa and the ordinate of the head and the shoulder of the mth pedestrian on the plane of the target image, h _m 、w _m Characterizing the height and width of the mth pedestrian head and shoulder in the target image respectively, the server can solve the world coordinate system X centered on the camera based on the similar triangle simultaneous equations ^r Y ^r Z ^r In the geographical position of pedestrians->

As shown in fig. 11, assuming that the target image has undergone distortion correction, based on a general rule, the actual height of the head and shoulder of the pedestrian is assumedAccording to the rule of similar triangles, the distance of a pedestrian from the camera is +.>The method comprises the following steps:

assume that the coordinates of the center point of the picture are (x ₉ ,y ₉ ) The geographical location of the pedestrianThe method comprises the following steps:

/>

combining the above formulas (2) to (4) can obtain:

and S1002, determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects.

In particular implementations, the server may calculate a distance from each object to the other objects by coordinates of each object in the spatial coordinate system, and may determine a composite distance of the objects in the stationary region from the distances from each object to the other objects in the stationary region.

In some possible implementations, the server may determine the neighboring object of each object from the distance of each object to other objects in the stationary region, and then determine the composite distance of the objects in the stationary region from the distances of the respective objects to the neighboring objects, e.g., the distances of the respective objects to the neighboring objects may be maximized or averaged to obtain the composite distance of the objects in the stationary region. The integrated distance is used to characterize the distribution of objects within the stationary region. Specifically, a smaller comprehensive distance indicates that the whole objects in the static area are gathered in a comparison way, and a larger comprehensive distance indicates that the whole objects in the static area are scattered in a comparison way.

Wherein the adjacent object of each object may be the object closest to each object. In some cases, for example, the static area includes at least two groups of objects, the objects in the groups are closer, and the objects between the groups are farther, in order to avoid that the objects in the groups are adjacent to each other, resulting in the distortion of the comprehensive distance of the objects in the static area, the matched objects can be eliminated when determining the adjacent objects. I.e. the neighboring object of each object may also be the closest object to each object among the unmatched objects. For ease of understanding, the specific examples are described in connection. In this example, if the object with the smallest distance is determined as the object b, it is determined that the object a is matched, and the object b is a neighboring object of the object a, and further, when determining the neighboring object, it is possible to determine the object with the smallest distance from objects other than the object a as the neighboring object of the object b.

The process of determining the integrated distance is described in detail below in connection with crowd counting scenarios. For each pedestrian i, the distance of that pedestrian i to the adjacent pedestriansThe method comprises the following steps:

wherein i and j take any two integers from 1 to L, L represents the number of pedestrians in the static aggregation area, and ψ (i) is an object set which is matched with the pedestrian i currently.

Correspondingly, the comprehensive distance may specifically be:

and S1003, when the integrated distance is not greater than a distance alarm threshold, generating an alarm signal, wherein the alarm signal is used for indicating object aggregation.

When the comprehensive distance is smaller than or equal to the distance alarm threshold, the distance of the objects in the static area is smaller as a whole, the distribution of the objects is more dense, and the server can generate an alarm signal to indicate the aggregation of the objects. Wherein the distance alarm threshold may be set according to an empirical value, which may be set to 1 meter in one example, which is not limited in this embodiment.

In this embodiment, the distance from the object to the adjacent object is taken as the maximum value or the average value to be illustrated as the integrated distance, other indexes may be adopted as the integrated distance in practical application, for example, the minimum value of the distance from each object to the adjacent object may be adopted as the integrated distance of the objects in the static area, when the integrated distance is greater than the distance alarm threshold, it is indicated that the distance between the objects in the static area is greater as a whole, it may be determined that the distribution of the objects in the static area is relatively sparse, and the server may continuously monitor the integrated distance between the objects in the static area.

It should be noted that, for simplicity of description, the above method embodiments are all described as a series of action combinations, but those skilled in the art should appreciate that the present application is not limited by the described order of actions.

Other reasonable combinations of steps that can be conceived by those skilled in the art from the foregoing description are also within the scope of the present application. Further, those skilled in the art will also be familiar with the preferred embodiments, and the description of the embodiments is not necessarily a requirement of the present application.

The object counting method provided in the present application is described in detail above with reference to fig. 1 to 11, and the object counting apparatus and device provided in accordance with the present application will be described below with reference to the accompanying drawings.

Referring to a schematic structural diagram of an object counting apparatus in a system architecture diagram shown in fig. 1, an object counting apparatus 1200 includes:

an acquisition module 1201, configured to acquire a target image, where the target image is obtained by capturing a photograph of a geographic area with a camera, and the target image includes a plurality of objects to be counted;

a density estimation module 1202 for inputting the target image to a density estimation network, to obtain an initial density map, the density estimation network being used for estimating the density of the object in the target image;

The object detection module 1203 is configured to input the target image to an object detection network, and obtain a first detection result graph, where the first detection result graph includes at least one detection frame, and each detection frame includes a part or all of a part of an object;

the processing module 1204 is configured to process the density values in the initial density map according to the first detection result map, so as to obtain a first density map;

and a counting module 1205, configured to count the plurality of objects according to the first detection result graph and the first density graph, to obtain the number of objects included in the target image.

Optionally, the obtaining module 1201 is further configured to:

the density estimation module 1202 is further configured to:

the processing module 1204 is further configured to:

Optionally, the counting module 1205 is further configured to:

the apparatus further comprises:

Optionally, the processing module 1204 is further configured to:

the apparatus further comprises:

Optionally, the processing module 1204 is further configured to:

dividing the first density map into a plurality of density block images;

the counting module 1205 is specifically configured to:

Optionally, the preset processing conditions include:

Optionally, the processing module 1204 is specifically configured to:

Optionally, the counting module 1205 is specifically configured to:

Optionally, the apparatus further comprises:

The object counting apparatus according to the embodiments of the present application may correspond to performing the methods described in the embodiments of the present application, and the above and other operations and/or functions of each module in the object counting apparatus are respectively for implementing the corresponding flows of each method in fig. 2, fig. 4, fig. 6, fig. 9, and fig. 10, which are not described herein for brevity.

It should be further noted that the embodiments described above are merely illustrative, and that the modules described as separate components may or may not be physically separate, and that components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.

Fig. 12 is a schematic diagram of an object counting device 100 according to an embodiment of the present application, where the object counting device 100 includes a processor 101, a memory 102, a communication interface 103, and a bus 104. The processor 101, the memory 102, and the communication interface 103 communicate via the bus 104, or may communicate via other means such as wireless transmission. The memory 102 stores executable program code and the processor 101 may invoke the program code stored in the memory 102 to perform the object counting method in the method embodiments described above.

It should be appreciated that in the embodiments of the present application, the processor 101 may be a central processing unit CPU, and the processor 101 may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), field programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or any conventional processor or the like.

The memory 102 may include read only memory and random access memory and provides instructions and data to the processor 101. The memory 102 may also include non-volatile random access memory. For example, the memory 102 may also store training data sets.

The memory 102 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

The bus 104 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus 104 in the figures.

It should be understood that the object counting apparatus 100 according to the embodiment of the present application may correspond to the object counting device in the embodiment of the present application and may correspond to the execution of the respective subject matters in the methods shown in fig. 2, 4, 6, 9, and 10 according to the embodiment of the present application, and that the foregoing and other operations and/or functions of the respective devices in the object counting apparatus 100 are respectively for realizing the respective flows of the respective methods in fig. 2, 4, 6, 9, and 10, and are not repeated herein for brevity.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits.

However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a training device, or a network device, etc.) to perform the method described in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus.

The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims

1. An object counting method, comprising:

acquiring a target image, wherein the target image is obtained by shooting a geographic area by a camera, and the target image comprises a plurality of objects to be counted;

inputting the target image to a density estimation network to obtain an initial density map, wherein the density estimation network is used for estimating the density of an object in the target image;

Inputting the target image to an object detection network to obtain a first detection result diagram, wherein the first detection result diagram comprises at least one detection frame, and each detection frame comprises part or all parts of an object;

processing the density values in the initial density map according to the first detection result map to obtain a first density map;

correcting the first density map according to the first detection result map, counting a plurality of objects according to the first detection result map and the corrected first density map to obtain the number of objects included in the target image, and setting a false detection region in the corrected first density map to zero.

2. The method according to claim 1, wherein the method further comprises:

and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image, wherein the static area is an area determined by performing AND processing on the initial density map of the reference image and the initial density map of the target image.

3. The method according to claim 2, wherein the method further comprises:

and when the number of the objects included in the static area is not less than the number alarm threshold, generating an alarm signal, wherein the alarm signal is used for indicating the aggregation of the objects.

4. A method according to claim 2 or 3, characterized in that the method further comprises:

determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects, wherein the comprehensive distance is used for representing the distribution situation of the objects in the static area;

And when the comprehensive distance is not greater than the distance alarm threshold value, generating an alarm signal, wherein the alarm signal is used for indicating the aggregation of objects.

5. The method of any one of claims 1 to 4, wherein said modifying said first density map from said first test result map comprises:

dividing the first density map into a plurality of density block images;

and obtaining a second density map according to the processed density block image and the unprocessed density block image, wherein the corrected first density map comprises the second density map.

6. The method of claim 5, wherein the preset processing conditions comprise:

7. The method according to any one of claims 1 to 6, wherein processing the initial density map according to the first detection result map to obtain a first density map includes:

8. The method according to any one of claims 1 to 7, wherein the counting the plurality of objects from the first detection result map and the corrected first density map to obtain the number of objects included in the target image includes:

determining a first number according to the first detection result diagram and determining a second number according to the corrected first density diagram, wherein the first number is the number of detection frames in the first detection result diagram, and the second number is determined by a density value in the corrected first density diagram;

9. The method according to any one of claims 1 to 8, further comprising:

And generating a visual result diagram according to the first detection result diagram and the corrected first density diagram, wherein the visual result diagram is used for displaying the distribution number and density of the objects in the geographic area.

10. An object counting apparatus, comprising:

the acquisition module is used for acquiring a target image, wherein the target image is obtained by shooting a geographic area through a camera and comprises a plurality of objects to be counted;

the counting module is used for correcting the first density map according to the first detection result map, counting a plurality of objects according to the first detection result map and the corrected first density map to obtain the number of objects included in the target image, and setting the false detection area in the corrected first density map to zero.

11. The apparatus of claim 10, wherein the acquisition module is further configured to:

the density estimation module is further configured to:

the processing module is further configured to:

12. The apparatus of claim 10, wherein the counting module is further to:

The apparatus further comprises:

13. The apparatus of claim 11 or 12, wherein the processing module is further configured to:

the apparatus further comprises:

14. The apparatus according to any one of claims 10 to 13, wherein the counting module is specifically configured to:

dividing the first density map into a plurality of density block images;

15. The apparatus of claim 14, wherein the preset processing conditions comprise:

16. The apparatus according to any one of claims 10 to 15, wherein the processing module is specifically configured to:

17. The apparatus according to any one of claims 10 to 16, wherein the counting module is specifically configured to:

18. The apparatus according to any one of claims 10 to 17, further comprising:

the generation module is used for generating a visual result diagram according to the first detection result diagram and the corrected first density diagram, and the visual result diagram is used for displaying the distribution number and density of the objects in the geographic area.

19. An object processing apparatus, characterized by comprising:

a processor and a memory;

the memory is used for storing computer instructions;

the processor being configured to perform the method of any one of claims 1 to 9 in accordance with the computer instructions.

20. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 9.