CN112883768A - Object counting method and device, equipment and storage medium - Google Patents

Object counting method and device, equipment and storage medium Download PDF

Info

Publication number
CN112883768A
CN112883768A CN201911219292.7A CN201911219292A CN112883768A CN 112883768 A CN112883768 A CN 112883768A CN 201911219292 A CN201911219292 A CN 201911219292A CN 112883768 A CN112883768 A CN 112883768A
Authority
CN
China
Prior art keywords
density
map
image
objects
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911219292.7A
Other languages
Chinese (zh)
Other versions
CN112883768B (en
Inventor
谢奕
陆瑞智
喻晓源
陈普
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201911219292.7A priority Critical patent/CN112883768B/en
Publication of CN112883768A publication Critical patent/CN112883768A/en
Application granted granted Critical
Publication of CN112883768B publication Critical patent/CN112883768B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an object counting method, which relates to the field of artificial intelligence and comprises the following steps: acquiring a target image, inputting the target image to a density estimation network to acquire an initial density map, inputting the target image to an object detection network to acquire a first detection result map, wherein the first detection result map comprises at least one detection frame, each detection frame comprises part or all of an object, and processing the density value in the initial density map according to the first detection result map to acquire a first density map; and counting the plurality of objects according to the first detection result image and the first density image to obtain the number of the objects included in the target image. The method improves the accuracy of object counting, and can be applied to more object counting scenes.

Description

Object counting method and device, equipment and storage medium
Technical Field
The present application relates to the field of Artificial Intelligence (AI), and in particular, to an object counting method, and a corresponding apparatus, device, and storage medium.
Background
There is a need for object counting in many scenarios in different fields. The object is an entity to be counted, and may be a pedestrian or a vehicle. For example, in the security field, it is often necessary to count the number of people in a key location so as to control an area with a high density and avoid a serious event that endangers public safety, such as trampling or abnormal aggregation. For another example, in the field of traffic control, statistics is often required for vehicles in an important road section, a congested road section is determined based on the statistical result, the vehicles are prompted to avoid the congested road section, shunting of the vehicles is achieved, and traffic congestion is prevented from being aggravated.
With the development of computer vision technology, a method for directly detecting an object in a picture by using a target detection algorithm and then counting the number of detection frames is proposed in the industry. The target detection algorithm comprises a One-stage target detection algorithm and a Two-stage target detection algorithm based on deep learning.
However, the method based on target detection is mainly suitable for sparse population counting, and as the population density increases, the shielding between objects becomes more and more serious, and the existing object counting method can cause a larger deviation between the counting result and the actual number.
Disclosure of Invention
The application provides an object counting method, which is used for solving the problem that the counting result is inaccurate due to shielding among objects when the density of the objects is improved, and the object counting accuracy is improved. Corresponding apparatus, devices, readable storage media and computer program products are also provided.
In the first aspect, on the basis of a target detection algorithm, the density estimation algorithm is used for making up for missing detection of the target detection algorithm, the target detection algorithm is used for correcting false detection of the density estimation algorithm, the respective advantages of the target detection algorithm and the density estimation algorithm are fully exerted, and the object counting precision is improved.
In the specific implementation, a target image is obtained, the target image is obtained by shooting a geographic area by a camera, the target image comprises a plurality of objects to be counted, then inputting the target image into a density estimation network for estimating the density of the object in the target image, thus, an initial density map can be obtained, a target image is input to the object detection network, a first detection result map is obtained, the first detection result map includes at least one detection frame, each detection frame includes a part or all of an object, and then, according to the first detection result map, and finally, counting a plurality of objects according to the first detection result graph and the first density graph to obtain the number of the objects included in the target image.
In addition, the method utilizes two different counting modes to count the objects, can be suitable for sparse scenes and dense scenes, and has better compatibility. Moreover, when a sparse scene and a dense scene exist in the target image at the same time, the sparse scene and the dense scene in the target image do not need to be identified and cut, but the target detection and density estimation are carried out on the whole target image, so that the counting loss caused by the identification precision is avoided, and the counting precision is further improved.
With reference to the first aspect, in a first implementation manner of the first aspect, in consideration of a scene in which an object flows more severely, the still region may be further located based on a correlation between initial density maps corresponding to multiple frames of images. Specifically, K frames of reference images and target images are obtained, the K frames of reference images and the target images are continuous K +1 frames of images in a video stream, K is a positive integer, then the K frames of reference images are input to a density estimation network, an initial density map of each frame of reference images in the K frames of reference images is obtained, and then a static area in the target images is determined according to the initial density map of each frame of reference images and the initial density map of the target images.
With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, considering that the number of objects in the static area may reflect an object aggregation condition, the static area may further be subjected to object counting, and an alarm may be performed according to an object counting result.
In a specific implementation, the first detection result map and the count area of the first density map are determined according to the position of the static area in the target image, then the objects in the static area are counted according to the first detection result map and the count area of the first density map, the number of the objects included in the static area in the target image is obtained, and when the number of the objects included in the static area is not less than a number alarm threshold, an alarm signal is generated, where the alarm signal is used to indicate that the objects are aggregated.
With reference to the first implementation manner of the first aspect, in a third implementation manner of the first aspect, considering that the distance of the object in the static area may also reflect the object aggregation condition, the distance of the object in the static area may also be determined, and an alarm is performed according to the distance of the object, so that a false alarm caused by a difference between a pixel position and a geographic position is avoided, and a false alarm rate is reduced.
In specific implementation, the geographic position of each object in the static area is determined according to the pixel position of each object in the target image and the calibration parameters of the camera, then the distance from each object to other objects is determined according to the geographic position of each object, the comprehensive distance of the objects in the static area is determined according to the distance from each object to other objects, and when the comprehensive distance is not greater than a distance alarm threshold value, an alarm signal is generated and used for indicating the aggregation of the objects.
With reference to the first aspect or any one of the first to third implementation manners of the first aspect, in a fourth implementation manner of the first aspect, considering that a density map may have false detection, the first density map may be further optimized on the basis of the first density map, so that an influence of the false detection of the density map on counting is reduced, and counting accuracy is improved.
In specific implementation, the first density map is divided into a plurality of density block images, whether each density block image meets a preset processing condition or not is judged according to the density value in each density block image, the density value in the density block image meeting the preset processing condition is set to zero, the density block image not meeting the preset processing condition is not processed, a second density map is obtained according to the processed density block image and the unprocessed density block image, and the plurality of objects are counted according to the first detection result map and the second density map.
With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the preset processing condition includes: the number of objects determined from the density values in the density block image is less than an image processing threshold; or, the number of objects determined according to the density values in the density block image is not less than the image processing threshold, and the number of detection frames of a corresponding detection area of the density block image in the first detection result map is less than the image processing threshold, the detection area being determined according to the position of the density block image in the first density map.
With reference to the first aspect or any one implementation manner of the first to fifth implementation manners of the first aspect, in a sixth implementation manner of the first aspect, according to the first detection result map, processing the initial density map may specifically be determining a to-be-processed area in the initial density map according to a position of each detection frame in the first detection result map, and then setting a density value in the to-be-processed area to zero, so as to obtain the first density map.
Of course, in practical applications, the density value in the region to be processed may be set to an illegal value or a negative value, which can be distinguished from other regions. In this way, when counting objects based on the density map, the processed region is not counted, and the counting precision is prevented from being affected by repeated counting.
With reference to the first aspect or any one of the first to sixth implementation manners of the first aspect, in a seventh implementation manner of the first aspect, the plurality of objects are counted according to the first detection result map and the first density map, specifically, a first number is determined according to the first detection result map, and a second number is determined according to the first density map, where the first number is the number of detection frames in the first detection result map, and the second number is determined by density values in the first density map, and then the sum of the first number and the second number is calculated to obtain the number of objects included in the target image.
With reference to the first aspect or any one implementation manner of the first to seventh implementation manners of the first aspect, in an eighth implementation manner of the first aspect, a visualization result graph may further be generated according to the first detection result graph and the first density graph, where the visualization result graph is used to show the distribution number and density of the objects in the geographic area.
In a second aspect, the present application provides an object counting apparatus, the apparatus comprising:
the system comprises an acquisition module, a counting module and a counting module, wherein the acquisition module is used for acquiring a target image, the target image is obtained by shooting a geographical area by a camera, and the target image comprises a plurality of objects to be counted;
a density estimation module, configured to input the target image to a density estimation network to obtain an initial density map, where the density estimation network is used to estimate a density of an object in the target image;
the object detection module is used for inputting the target image to an object detection network to obtain a first detection result graph, wherein the first detection result graph comprises at least one detection frame, and each detection frame comprises part or all parts of an object;
the processing module is used for processing the density value in the initial density map according to the first detection result map to obtain a first density map;
and the counting module is used for counting the plurality of objects according to the first detection result graph and the first density graph to obtain the number of the objects in the target image.
With reference to the second aspect, in a first implementation manner of the second aspect, the obtaining module is further configured to:
acquiring a K frame reference image, wherein the K frame reference image and the target image are continuous K +1 frame images in a video stream, and K is a positive integer;
the density estimation module is further to:
inputting the K frames of reference images to the density estimation network to obtain an initial density map of each frame of reference image in the K frames of reference images;
the processing module is further configured to:
and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.
With reference to the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the counting module is further configured to:
determining a counting area of the first detection result map and the first density map according to the position of a static area in the target image;
counting the objects in the static area according to the first detection result image and the counting area of the first density image to obtain the number of the objects included in the static area in the target image;
the device further comprises:
the device comprises a first alarm module and a second alarm module, wherein the first alarm module is used for generating an alarm signal when the number of the objects included in the static area is not less than a number alarm threshold value, and the alarm signal is used for indicating the aggregation of the objects.
With reference to the first implementation manner of the second aspect, in a third implementation manner of the second aspect, the processing module is further configured to:
determining the geographic position of each object in the static area according to the pixel position of each object in the target image and the calibration parameters of the camera;
determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects;
the device further comprises:
and the second alarm module is used for generating an alarm signal when the comprehensive distance is not greater than a distance alarm threshold value, wherein the alarm signal is used for indicating the aggregation of the objects.
With reference to the second aspect or any one of the first to third implementation manners of the second aspect, the counting module is specifically configured to:
dividing the first density map into a plurality of density block images;
judging whether each density block image meets a preset processing condition or not according to the density value in each density block image;
setting the density value in the density block image which meets the preset processing condition to zero, and not processing the density block image which does not meet the preset processing condition;
obtaining a second density map according to the processed density block image and the unprocessed density block image;
counting the plurality of objects according to the first detection result map and the second density map.
With reference to the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the preset processing condition includes:
the number of objects determined from the density values in the density block image is less than an image processing threshold; alternatively, the first and second electrodes may be,
the number of objects determined according to the density values in the density block image is not less than the image processing threshold, and the number of detection frames of a corresponding detection area of the density block image in the first detection result map is less than the image processing threshold, the detection area being determined according to the position of the density block image in the first density map.
With reference to the second aspect or any one implementation manner of the first to fifth implementation manners of the second aspect, the processing module is specifically configured to:
and determining a region to be processed in the initial density map according to the position of each detection frame in the first detection result map, and setting the density value in the region to be processed to zero to obtain the first density map.
With reference to the second aspect or any one of the first to sixth implementation manners of the second aspect, the counting module is specifically configured to:
determining a first number according to the first detection result map, wherein the first number is the number of detection frames in the first detection result map, and determining a second number according to the first density map, and the second number is determined by density values in the first density map;
and calculating the sum of the first number and the second number to obtain the number of the objects included in the target image.
With reference to the second aspect or any one of the first to seventh implementation manners of the second aspect, the apparatus further includes:
and the generation module is used for generating a visualization result graph according to the first detection result graph and the first density graph, and the visualization result graph is used for displaying the distribution number and density of the objects in the geographic area.
In a third aspect, the present application provides an object counting apparatus comprising a processor and a memory; the memory to store computer instructions; the processor is configured to execute the object counting method according to the computer instructions as described in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which, when executed on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.
The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.
Drawings
Fig. 1 is a scene architecture diagram of a subject counting method in an embodiment of the present application;
FIG. 2 is a flow chart of an object counting method in an embodiment of the present application;
FIG. 3 is a schematic diagram of a density distribution in an initial density map in an example of the present application;
FIG. 4 is a flow chart of a second round of optimization of the first density map in an embodiment of the present application;
FIG. 5 is a schematic diagram of a visualization result graph in an embodiment of the present application;
FIG. 6 is a flowchart of a method for locating a static area in an embodiment of the present application;
FIG. 7 is a diagram illustrating the determination of a mask for a quiescent zone in an embodiment of the present application;
FIG. 8 is a diagram illustrating the determination of a quiescent zone based on a quiescent zone mask in an embodiment of the present application;
FIG. 9 is a flowchart of a method for locating a static area in an embodiment of the present application;
FIG. 10 is a flowchart of a method for locating a static area in an embodiment of the present application;
FIG. 11 is a schematic diagram of a pedestrian imaged by a camera in an embodiment of the present application;
fig. 12 is a schematic structural diagram of an object counting apparatus in an embodiment of the present application.
Detailed Description
The application provides a method for counting objects by combining a target detection and density estimation technology, which can improve the accuracy of object counting. The method can also be self-adaptive to dense scenes and sparse scenes, and has good compatibility.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
It is to be understood that the object counting method provided herein can be applied to any processing device having image processing capabilities. The Processing device may be a terminal having a Central Processing Unit (CPU) and/or a Graphics Processing Unit (GPU), wherein the terminal includes, but is not limited to, a Personal Computer (PC), a workstation, and the like. The processing device may also be a server with a CPU and/or a GPU, which may be independent or a cluster of servers. In some cases, the terminal and the server may also implement the above object counting method in cooperation.
In practical applications, the object counting method provided by the present application includes, but is not limited to, applications in the application environment as shown in fig. 1.
As shown in fig. 1, the server 120 is connected to the camera 140 and the terminal 160 via a network. The camera 140 may capture images of objects in a geographic area, such as a square, a station, etc. The target image includes a plurality of objects to be counted, an object counting apparatus 1200 is disposed in the server 120, and the object counting apparatus 1200 includes an obtaining module 1201, a density estimating module 1202, an object detecting module 1203, a processing module 1204, and a counting module 1205. The obtaining module 1201 may obtain the target image from the camera 140, the density estimation module 1202 includes a density estimation network, and the density estimation module 1202 may input the target image obtained by the obtaining module 1201 into the density estimation network to obtain an initial density map. The object detection module 1203 includes an object detection network, and the object detection module 1203 may input the target image acquired by the acquisition module 1201 into the object detection network to obtain a first detection result map. The processing module 1204 may process the density value in the initial density map output by the density estimation module 1202 according to the first detection result map output by the object detection module 1203 to obtain a first density map. The counting module 1205 may count a plurality of objects according to the first detection result map output by the object detection module 1203 and the first density map output by the processing module 1204, to obtain the number of objects included in the target image, and further, the server 120 may output the number of objects included in the target image obtained by the counting module 1205 in the object counting apparatus 1200 to the terminal 160 for displaying.
Next, each step of the object counting method provided in the embodiment of the present application will be described in detail from the perspective of the server.
Referring to the flowchart of the object counting method shown in fig. 2, the method includes:
s201: and acquiring a target image.
The target image includes a plurality of objects to be counted. The object may be any countable entity, and in different application scenarios, the object may be different entities. For example, in a security scene, the object can be a pedestrian, the number of the pedestrian is counted based on the target image, the crowd gathering condition can be monitored, and the occurrence of severe events such as trampling is avoided. For another example, in a traffic control scenario, the object may be a vehicle, and by counting the number of vehicles on a road, the road congestion condition can be monitored, congestion information is timely prompted to other vehicles, and congestion is prevented from being aggravated.
It can be understood that a camera is generally deployed in a geographic area, such as a square, a road, and the like, where the condition of gathering of the monitoring objects needs to be monitored, the camera at least includes a camera, the camera can shoot the geographic area through the camera to obtain a target image, and the server can obtain the target image from the camera. The server may receive the target image sent by the camera, or may actively acquire the target image from the camera. The server may obtain the target image in real time, or may obtain the target image according to a preset period, which is not limited in this embodiment.
S202: and inputting the target image to a density estimation network to obtain an initial density map.
The density estimation network is a network for estimating the density of an object in an image, and takes the image as an input and a density map corresponding to the image as an output, and the density map directly output by the density estimation network is referred to as an initial density map for distinguishing from other density maps. The size of the initial density map is the same as the size of an image of the input density estimation network (hereinafter referred to as an input image), each pixel point in the initial density map corresponds to a pixel point in the input image one by one, and the pixel value of each pixel point in the initial density map can represent the density value of the corresponding pixel point in the input image.
Specifically, refer to fig. 3, which shows pixel values of each pixel point in the initial density map, where each pixel value is used to represent the density of a corresponding pixel point in the input image, so that the initial density map can show the density distribution, and thus the object distribution. Specifically, the region where the pixel points with the pixel values of 0 are gathered in the initial density map can represent that no object exists in the corresponding region of the input image, the region where the pixel points with the pixel values of not 0 are gathered in the initial density map can represent that an object exists in the corresponding region of the input image, and the object in the input image can be represented by a plurality of pixel points with the pixel values in the initial density map in gaussian distribution, so that the server can count the objects according to the pixel values of the plurality of pixel points.
Given the different sizes of objects in the distant view and the near view, different variance gaussian distributions may be used to characterize objects of different sizes. In the example of fig. 3, objects in the distant view may be characterized by a gaussian distribution with a variance of 3 x 3, as shown at 301, and objects in the near view may be characterized by a gaussian distribution with a variance of 5 x 5 and a gaussian distribution with a variance of 7 x 7, as shown at 302 and 303.
In this embodiment, the density estimation network can be obtained by training a deep learning model. The deep learning model may be a convolutional neural network model, and may be, for example, a multi-column convolutional neural network (MCNN), a scale-adaptive convolutional neural network (SaCNN), a crowded scene recognition network (CSRNet), or the like.
Taking CSRNet as an example, the network includes at least a feature extraction layer and a feature mapping layer. The feature extraction layer can be divided into a front end and a rear end, and the front end and the rear end are respectively used for extracting semantic features of different levels. In specific implementation, the front end may be a Visual Geometry Group (VGG) network with full connection layers removed, the VGG network may be a VGG16 with 16 layers, or a VGG19 with 19 layers, taking VGG16 as an example, the VGG16 with full connection layers removed mainly includes convolutional layers and pooling layers, for example, 10 convolutional layers and 3 pooling layers, and the rear end may be a hole convolutional neural network scaled constraint, for example, a hole convolutional neural network composed of 6 layers of hole convolutional layers. The feature mapping layer is mainly used for mapping high-dimensional features to a low-dimensional pattern space, specifically, the feature mapping layer may be a 1 × 1 convolutional layer, and may map multi-channel features extracted by the feature extraction layer into single-channel features, so as to obtain a high-quality density map.
When the CSRNet is trained, a plurality of pixel points with gaussian pixel values can be used as labels of objects in the image to obtain a training sample. Inputting the training samples into the CSRNet in batches, wherein the CSRNet can predict the pixel value of each pixel point in the training samples, calculate the loss according to the predicted pixel value and the pixel value label, and update the parameters of the CSRNet based on the loss, thereby realizing model training. When the loss meets the training end condition, such as the loss tends to converge, the training can be stopped, and the trained CSRNet can be used as a density estimation network for estimating the density of the object in the target image.
It should be noted that the image is reduced after passing through the pooling layer, for example, the image is reduced by half every time passing through one pooling layer, for this reason, an expansion layer may be added after CSRNet, and the expansion layer may expand the resolution of the density map output by the 1 × 1 convolution layer in CSRNet, so as to obtain the initial density map with the same size as the target image. For example, when the CSRNet front end includes 3 pooling layers, the size of the density map output by 1 × 1 convolution layer in CSRNet is 1/8 of the target image, and the expansion layer may expand the resolution of the density map by 8 times and then divide the pixel values by 64 to obtain the initial density map.
S203: and inputting the target image to an object detection network to obtain a first detection result graph.
The object detection network is specifically a network that detects an object in an image. The object detection network takes a target image as input and a first detection result graph as output, wherein the first detection result graph comprises at least one detection frame, and each detection frame comprises part or all parts of an object. For example, in a crowd counting scenario, the detection frame may include a person or a part of the head and the shoulder of the person, and for example, in a vehicle counting scenario, the detection frame may include a whole vehicle or a part of the head of the vehicle.
In a specific implementation, the object detection network may be generated by training any target detection algorithm. The target detection algorithm may be an algorithm that looks only once (you only look once), a single-point multi-box detector (SSD), a Faster convolutional neural network (fast-RCNN) based on a region-generated network, or a Feature Pyramid Network (FPN).
Taking a third version of the Yolo algorithm, that is, Yolo v3 as an example, the Yolo v3 network mainly includes a feature extraction layer, a feature conversion layer, and a feature mapping layer. The feature extraction layer can be implemented by adopting dark net darknet, specifically, darknet53 with a fully connected layer removed can be adopted as the feature extraction layer, the feature extraction layer comprises 52 layers of convolution layers, the 52 layers of convolution layers are divided into three sections (namely 3 stages) according to the depth, 1-26 layers of convolution layers are stage1, 27-43 layers of convolution layers are stage2, 44-52 layers of convolution layers are stage3, convolution layers of different stages have different receptive fields, and therefore detection of objects with different scales can be achieved through convolution layers of different stages. The feature conversion layer is used for converting the features extracted by the feature extraction layer into low-dimensional features, and inputting the low-dimensional features into the feature mapping layer for mapping. In order to improve the detection rate of objects with different scales, the feature conversion layer and the feature mapping layer are implemented by adopting a plurality of branches, and for example, the objects with different scales can be predicted through three branches.
In the training stage, all or part of the object in the image, such as the head and shoulder of a person, may be labeled to obtain a training sample, the training sample is input to Yolo v3 in batches for training, Yolo v3 may detect all or part of the object in the training sample, such as the head and shoulder, and then a loss is calculated according to the detection result and the labeling information, and based on the loss, a parameter of Yolo v3 is updated, so that the training of Yolo v3 is realized, when Yolo v3 satisfies a training end condition, such as the loss of Yolo v3 tends to converge, the training is stopped, and the trained Yolo v3 may serve as an object detection network for detecting the object in the image.
Considering that some objects with smaller sizes may exist in the target image, namely, the small target, the server may also adopt a pyramid image input mode to improve the detection rate of the small target. In this embodiment, the pyramid image input method specifically means that an input target image is divided into a plurality of sub-images as input of an object detection network, and then a server may detect an object in the plurality of sub-images by using the object detection network to obtain a detection result sub-image corresponding to each sub-image.
The detection result graph obtained by detecting the object in the target image by the object detection network can be recorded as an initial detection result graph, and the server can superpose a plurality of detection result subgraphs with the initial detection result graph according to the positions of the subimages in the target image, so that the fusion of the detection result subgraphs and the initial detection result graph is realized.
Considering that there may be a situation of overlapping detection frames in the fused detection result graph, the server may further perform Non-Maximum Suppression (NMS) on the fused detection result graph, specifically, for a plurality of detection frames whose overlap rate, i.e., Intersection over unit (IoU), is greater than an overlap rate threshold, retain the detection frame with the highest probability, and suppress the detection frame with a lower probability, thereby removing the redundant detection frame and obtaining the first detection result graph. The first detection result graph fuses the detection results of the objects in the sub-images, so that the missing detection of the small target is avoided, and the detection results of the target image and the sub-images which are overlapped are filtered through the NMS, so that the repeated counting is avoided, and the counting precision is improved.
It should be noted that S202 and S203 may be executed in parallel, or may be executed according to a set sequence, which is not limited in this embodiment.
S204: and processing the density values in the initial density map according to the first detection result map to obtain a first density map.
S205: and counting the plurality of objects according to the first detection result image and the first density image to obtain the number of the objects included in the target image.
The first detection result graph comprises at least one detection frame, each detection frame comprises part or all parts of an object, and object counting can be achieved by counting the detection frames. The pixel value of each pixel point in the initial density map represents the density value of the corresponding pixel point in the target image, and the number of objects in the target image can be estimated based on the density value, so that the object counting is realized.
The first detection result graph has a high detection rate for objects in a sparse area, objects in a dense area may have missed detection, and the accuracy of object counting based on the first detection result graph is not high.
When the server counts the objects in combination with the initial density map, the server needs to process the initial density map first to avoid the first detection result map and the initial density map from repeatedly counting the same object. Specifically, the server may determine, according to the position of each detection frame in the first detection result map, a to-be-processed area in the initial density map, set a density value in the to-be-processed area to zero, to obtain a first density map, and then the server may determine, according to the first detection result map, the number of detection frames in the first detection result map, that is, the first number, determine, according to the density value in the first density map, the second number, calculate the sum of the first number and the second number, to obtain the number of objects included in the target image.
Wherein determining the second number according to the density values in the first density map may be performed by performing a summation operation on the density values in the first density map. Specifically, the server may sum the density values in the first density map, and determine the second number based on the sum, where the sum is not an integer, and may round the sum to obtain the second number according to a rounding principle or the like. The zeroing of the density value of the to-be-processed region in the initial density map is only one exemplary implementation manner of processing the density value in the initial density map, in other possible implementation manners of the present application, the server may also set the density value of the to-be-processed region in the initial density map to an illegal value, a negative value, or the like, and correspondingly, when the server determines the second number according to the first density map, the server may sum the legal values, or sum positive values, and then round the sum value, thereby obtaining the second number.
Therefore, the embodiment of the present application provides an object counting method, where an initial density map of a target image is obtained through a density estimation network, a first detection result map of the target image is obtained through an object detection network, a density value in the initial density map is processed according to the first detection result map to obtain the first density map, and object counting is performed according to the first detection result map and the first density map, so that missing detection of the target detection algorithm is compensated by using the density estimation algorithm, false detection of the density estimation algorithm is corrected by using the target detection algorithm, respective advantages of the target detection algorithm and the density estimation algorithm are fully exerted, and object counting accuracy is improved. And the method can be suitable for sparse scenes or dense scenes, and has good compatibility.
It is to be understood that S204 in the embodiment shown in fig. 2 corresponds to a first round of optimization of the initial density map, and thus the first density map may also be referred to as a first round of optimized density map
Figure BDA0002297007420000091
Wherein t represents the frame number of the target image in the video stream, and t is a positive integer. Density estimation network considering sparse sceneThe server can falsely detect complex texture areas such as shrub areas and the like as dense objects, and can also use the first density map
Figure BDA0002297007420000092
On the basis, further optimization is carried out to filter out density noise so as to obtain a second density map, namely a second round optimized density map
Figure BDA0002297007420000093
Correspondingly, in S205, the server may be configured to obtain the first density map according to the first detection result map
Figure BDA0002297007420000094
Second density map obtained by second round optimization
Figure BDA0002297007420000095
A plurality of objects are counted, and thus, the counting accuracy can be further improved. The second round of optimization of the density map is explained in detail next.
Referring to the flowchart of the second round of optimization of the density map shown in fig. 4, the method specifically includes the following steps:
s401: dividing the first density map into a plurality of density block images.
In a specific implementation, the server may divide the first density map into a plurality of density block images in a uniform manner, and thus each density block image may adopt the same criterion to judge whether it belongs to density noise. The number of divided density block images may be set based on empirical values, and in one example, the server may divide the first density map into 16 density block images of the same size in a division manner of 4 × 4.
S402: and judging whether each density block image meets a preset processing condition or not according to the density value in each density block image.
S403: and setting the density value in the density block image which meets the preset processing condition to zero, and not processing the density block image which does not meet the preset processing condition.
For any density block image, when the number of the objects determined based on the density value in the density block image is small, the density block image can be considered to reflect the density value of the sparse scene, and the object detection network trained based on the target detection algorithm has a high detection rate for the objects in the sparse scene, so that when the objects in the sparse scene are counted, the density value in the density block image can be set to zero, the first detection result image output by the first object detection network is directly used for counting, and the influence of false detection of the density block image on the counting precision is avoided. Based on this, the preset processing condition may include that the number of objects determined from the density values in the density block image is smaller than the image processing threshold.
In some possible implementations, when the number of objects determined based on the density value in the density block image is large, it may be further determined whether the density block image is falsely detected by combining the detection results of the object detection network, for example, a region with complex texture such as a shrub region is falsely detected as an object aggregation region, if so, the density value in the density block image may be set to zero, and the false detection may be corrected by using the first detection result map, so as to avoid affecting the counting accuracy. Based on this, the preset processing condition may also include that the number of objects determined according to the density value in the density block image is not less than the image processing threshold, and the number of detection frames of the detection area corresponding to the density block image in the first detection result map is less than the image processing threshold.
Wherein the detection area may be determined based on the position of the density patch image in the first density map. For example, when the first density map is divided into 16 density block images according to a 4 × 4 division manner, the detection region corresponding to the density block images in the first row and the first column in the first detection result map is specifically the upper left corner region of the first detection result map, and the side length of the detection region is 1/4 of the side length of the first detection result map.
Further, when determining whether the density block image is falsely detected according to the detection result of the object detection network, in order to improve the accuracy, the server may further process the density block image according to the detection frame of the corresponding detection area in the first detection result map when the number of objects determined according to the density value in the density block image is greater than or equal to the image processing threshold, specifically, set the density value in the area mapped to the density block image in the detection frame to zero to obtain the processed density block image, and determine that the density block image is falsely detected if the number of objects determined according to the processed density block image is still large. Based on this, the preset processing condition may also include that the number of detection frames of the detection area corresponding to the density block image in the first detection result map is less than the image processing threshold, and the number of objects determined according to the density value in the processed density block image is not less than the image processing threshold.
When the density block image is processed, the server may set the density values of the area inside the detection frame and the partial area outside the detection frame, which are mapped to the area of the density block image, to zero, where the partial area outside the detection frame may be an annular area with the detection frame as an inner edge and a distance from the outer edge to the inner edge being half of the side length of the detection frame.
When the density block image meets any one or more of the preset processing conditions, setting the density value in the density block image to zero; and when the density block image does not meet any one of the preset processing conditions, the density value in the density block image is not processed.
The image processing threshold value can be set according to an empirical value, the image processing threshold value has certain correlation with the size of the density block image, when the size of the density block image is small, a small preset threshold value can be set, and when the size of the density block image is large, a large preset threshold value can be set. In one example, when the first density map is divided into 16 density block images, each of the density block images is relatively small in size, and the image processing threshold may be set to 3.
S404: and obtaining a second density map according to the processed density block image and the unprocessed density block image.
Specifically, the server splices the processed density block images and the unprocessed density block images according to the positions of the density block images in the first density map to obtain a second density map. On the basis of the first density map, the second density map also sets the density value of an area with sparser objects to zero, and sets the density value of an area with false detection to zero, so that the counting precision can be further improved by counting the objects according to the first detection result map and the second density map.
After the object counting is realized, the server can also present the counting result in a visual mode. Specifically, the server may generate a visualization result map according to the first detection result map and the processed density map. As shown in fig. 5, the visualization result graph 500 may be used to show the distribution number and density of the objects in the geographic area, specifically, the visualization result graph 500 displays the detection boxes 501, each detection box 501 includes a part of the object, such as the head and shoulder of a person, further, the visualization result graph 500 also displays the detection blocks 502, the detection blocks 502 are pixel blocks composed of a plurality of pixel points with gaussian pixel values and used to identify the objects that are not detected in the first detection result graph, the distribution number of the objects in the geographic area may be determined based on the number of the detection boxes 501 and the number of the detection blocks 502, and the distribution of the objects in the geographic area may be determined based on the distribution of the detection boxes 501 and the distribution of the detection blocks 502. It should be noted that there is a difference in the sizes of the objects in the distant view and the near view, so when the detection block 502 is used to identify the object, the detection block 502 composed of different numbers of pixel points can be used to identify the objects with different sizes, respectively.
The processed density map may be the first density map or the second density map. In a specific implementation, the first detection result graph, the first density graph and the second density graph have the same size, the pixel points in the first detection result graph correspond to the pixel points in the first density graph or the pixel points in the second density graph one by one, the server can perform weighting operation on the pixel values of the first detection result graph and the processed density graph (the first density graph or the second density graph) according to the pixel points, and the operation result is used as the pixel value of the corresponding pixel point in the visualization result graph.
The single-frame image can reflect the number of the objects at a certain moment, and in consideration of some situations, the objects flow more severely, for example, scenes such as stations, streets and the like, in which pedestrians flow more severely, the server can also position the static area based on the correlation among the initial density maps corresponding to the multi-frame images, so that the server can further determine the distribution condition of the objects in the static area based on the static area, and can give an alarm in time when the distribution condition of the objects in the static area represents the aggregation of the objects, thereby avoiding generating adverse effects.
Referring to the flowchart of the method for locating a static area shown in fig. 6, on the basis of the embodiment shown in fig. 2, the method further includes:
s601: acquiring K frame reference images.
The K frame reference image and the target image are continuous K +1 frame images in the video stream, and K is a positive integer. In a specific implementation, the server may obtain a video frame sample sequence I ═ { I ═ from a video stream captured by the camerat-K,...It-1,ItThus, at the time of acquiring the target image ItWhile obtaining ItPrevious K frame reference picture It-K,...It-1
S602: inputting the K frames of reference images to a density estimation network to obtain an initial density map of each frame of reference image in the K frames of reference images.
The specific implementation of S602 may refer to the description of related contents of S202, which is not described herein again.
S603: and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.
In particular implementations, referring to FIG. 7, the server may map the initial densities of the K frame reference image and the target image { D }t-K,Dt-(K-1),...DtAnd processing the initial density map in sequence, wherein the process of and processing the initial density map can be seen in the following formula:
Figure BDA0002297007420000121
wherein, i takes any value of t-K and … t, m and n are used for identifying the pixel position in the initial density map, and m and n are positive integers.
Then, the server can carry out binarization and expansion on the processed result to obtain a static area mask
Figure BDA0002297007420000122
Wherein the static area mask
Figure BDA0002297007420000123
It can be divided into two parts, one part is a highlight pixel block used for representing a static area, and the other part is a non-highlight pixel block used for representing a non-static area.
The pixel value of the highlight pixel block is a target pixel value, wherein the target pixel value may be 1 or 255, and based on this, the highlight pixel block may also be called a target pixel block for representing a mask of a static area
Figure BDA0002297007420000124
The middle pixel value is a pixel block of the target pixel value. In particular implementations, the server may determine a minimum bounding region of the target pixel block as the static region.
Referring to FIG. 8, the server may mask from the static area a findContours function in the Open Computer Vision library (OpenCV)
Figure BDA0002297007420000125
The outline of the target pixel block is extracted, and then a bounding rectangle R of the outline is determined by using a bounding function in OpenCV, so that the server can determine the minimum bounding rectangle R as a candidate area.
It should be noted that fig. 8 illustrates an example of a minimum circumscribed rectangle, and in other possible implementations, the minimum circumscribed area may also be in other shapes, such as a minimum circumscribed circle, which is not limited in this embodiment.
Further, the number of the objects in the static area may reflect the object aggregation condition, and the server may count the objects in the static area to determine the object aggregation condition in the static area, and perform an alarm according to the object aggregation condition.
Referring to fig. 9, on the basis of the embodiment shown in fig. 6, the method further includes:
s901: and determining a counting area of the first detection result image and the first density image according to the position of the static area in the target image.
It can be understood that the sizes of the target image, the first detection result graph and the first density graph are the same, the pixel points in the target image are in one-to-one correspondence with the pixel points in the first detection result graph and the first density graph, and the server can obtain the pixel point m 'corresponding to the pixel coordinate in the first detection result graph and the pixel point m ″ corresponding to the first density graph according to the pixel coordinate of the pixel point m in the static region in the target image, where the region formed by the pixel point m' in the first detection result graph is the counting region of the first detection result graph, and the region formed by the pixel point m ″ in the first density graph is the counting region of the first density graph.
S902: counting the objects in the static area according to the first detection result map and the counting area of the first density map, and obtaining the number of the objects included in the static area in the target image.
Specifically, the server may sum the number of detection frames included in the count area in the first detection result map and the number of objects determined from the count area of the first density map, to obtain the number of objects included in the still area in the target image.
In some possible implementations, the server may also determine a counting area of the second density map according to the position of the still area in the target image, count the objects in the still area according to the first detection result map and the counting area of the second density map, and obtain the number of the objects included in the still area in the target image.
S903: when the number of objects included in the static area is not less than the number alarm threshold, an alarm signal is generated.
When the number of the objects included in the static area is not less than the number alarm threshold, the static area is indicated to have more object aggregation, and the server can generate an alarm signal to indicate the object aggregation. The server may generate different types of alert signals depending on the type of alert. In some possible implementation manners, the server may generate a voice alarm signal, the terminal may play the warning voice through the speaker to remind when receiving the voice alarm signal, and of course, the server may also generate a text alarm signal, and the terminal may display warning text and/or warning images in the display to remind when receiving the text alarm signal, and in addition, the server may also generate an indicator light alarm signal, and control the on/off, flashing or color of the indicator light to remind based on the indicator light alarm signal.
In the embodiments shown in fig. 6 to 9, the still region is located for the target image, and the objects in the still region are counted, and in practical application, the server may also locate the still region for each frame of image, and count the objects for the still region of each frame of image, so as to determine the object flowing trend in the still region.
In some possible implementation manners, the server may further determine a distance between objects included in the static region in the target image, determine an object aggregation condition according to the distance between the objects, perform an alarm according to the object aggregation condition, and avoid a false alarm caused by a difference between the pixel position and the geographic position.
Referring to fig. 10, on the basis of the embodiment shown in fig. 6, the method further includes:
s1001: and determining the geographic position of each object according to the pixel position of each object in the static area in the target image and the calibration parameters of the camera. In particular, the server can be used for calibrating the camera according to the pixel position of each object in the target image in the static aggregation area and calibration parameters of the camera, such as the depression angle alpha and the focal length fxAnd fyThe geographic location of each object is determined.
Specifically, the server may use the camera as an origin of a spatial coordinate system, use the direction of the focal length as a coordinate axis to establish the coordinate system, and then calculate the geographic position of each object based on the similar triangular relationship among the object, the plane where the camera is located, and the plane where the target image is located.
For ease of understanding, the crowd counting scenario is described as an example. Referring to the schematic diagram of pedestrian imaging by camera shown in fig. 11, the pixel positions of the head and shoulder of the pedestrian in the target image
Figure BDA0002297007420000131
Where N is the number of pedestrians. Setting the pixel position of the mth pedestrian head and shoulder in the target image as
Figure BDA0002297007420000132
xm、ymRespectively representing the horizontal and vertical coordinates h of the head and the shoulder of the mth pedestrian on the plane of the target imagem、wmRespectively representing the height and the width of the mth pedestrian head and shoulder in the target image, and the server can solve the world coordinate system X centered on the camera based on the similar triangle simultaneous equationsrYrZrGeographic location of the pedestrian
Figure BDA0002297007420000141
As shown in FIG. 11, assuming that the target image has been distortion-corrected, based on the general rule, the actual height of the head and shoulder of the pedestrian is assumed
Figure BDA0002297007420000142
The distance from the pedestrian to the camera can be obtained according to the theorem of similar triangles
Figure BDA0002297007420000143
Comprises the following steps:
Figure BDA0002297007420000144
suppose the coordinate of the center point of the picture is (x)o,yo) Then the geographic location of the pedestrian
Figure BDA0002297007420000145
Comprises the following steps:
Figure BDA0002297007420000146
Figure BDA0002297007420000147
combining the above equations (2) to (4), one can obtain:
Figure BDA0002297007420000148
Figure BDA0002297007420000149
Figure BDA00022970074200001410
s1002: determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects.
In a specific implementation, the server may calculate the distance from each object to other objects according to the coordinates of each object in the spatial coordinate system, and may determine the integrated distance of the objects in the static area according to the distance from each object to other objects in the static area.
In some possible implementations, the server may determine the neighboring objects of each object according to the distance from each object to other objects in the static area, and then determine the integrated distance of the objects in the static area according to the distance from each object to the neighboring objects, for example, the distances from each object to the neighboring objects may be maximized or averaged, so as to obtain the integrated distance of the objects in the static area. The integrated distance is used for representing the distribution of the objects in the static area. Specifically, a smaller integrated distance indicates that the objects in the static area are relatively aggregated as a whole, and a larger integrated distance indicates that the objects in the static area are relatively dispersed as a whole.
Wherein the neighboring object of each object may be an object closest to each object. In some cases, for example, in the case that the stationary region includes at least two groups of objects, the objects in the groups are closer in distance, and the objects between the groups are farther in distance, in order to avoid that the objects in the groups are adjacent to each other, which results in distortion of the overall distance of the objects in the stationary region, the objects that have been matched may also be excluded when determining the adjacent objects. That is, the neighboring objects of each object may also be the objects closest to each object among the objects that are not matched. To facilitate understanding, reference is made to specific examples. In this example, if it is determined that the object a having the smallest distance is the object b, it is determined that the object a is matched and the object b is an adjacent object to the object a.
The process of determining the composite distance is described in detail below in conjunction with a population count scenario. For each pedestrian i, the distance of the pedestrian i to the adjacent pedestrian
Figure BDA0002297007420000151
Comprises the following steps:
Figure BDA0002297007420000152
and the values of i and j are any two integers from 7 to L, L represents the number of pedestrians in the static gathering area, and psi (i) is an object set which is matched with the pedestrian i currently.
Correspondingly, the comprehensive distance may specifically be:
Figure BDA0002297007420000153
s1003: and when the comprehensive distance is not greater than the distance alarm threshold value, generating an alarm signal, wherein the alarm signal is used for indicating the aggregation of the objects.
When the integrated distance is less than or equal to the distance alarm threshold, it indicates that the distance of the object in the static area is small as a whole, the object distribution is dense, and the server may generate an alarm signal indicating the aggregation of the objects. The distance alarm threshold may be set according to an empirical value, and in one example, may be set to 1 meter, which is not limited in this embodiment.
In the present embodiment, the maximum value or the average value of the distances from the object to the adjacent objects is taken as the comprehensive distance, and in practical application, other indexes may also be used as the comprehensive distance, for example, the minimum value of the distances from each object to the adjacent objects may be used as the comprehensive distance of the object in the static area, and when the comprehensive distance is greater than the distance alarm threshold, it indicates that the distance of the object in the static area is overall greater, and it may be determined that the objects in the static area are distributed relatively sparsely, and the server may continuously monitor the comprehensive distance of the object in the static area.
It should be noted that, for simplicity of description, the above method embodiments are all expressed as a series of action combinations, but those skilled in the art should understand that the present application is not limited by the described action sequence.
Other reasonable combinations of steps that can be conceived by one skilled in the art from the above description are also within the scope of the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
The object counting method provided by the present application is described in detail above with reference to fig. 1 to 11, and the object counting apparatus and device provided by the present application will be described below with reference to the accompanying drawings.
Referring to a schematic structural diagram of the object counting apparatus in the system architecture diagram shown in fig. 1, an object counting apparatus 1200 includes:
an obtaining module 1201, configured to obtain a target image, where the target image is obtained by shooting a geographic area with a camera, and the target image includes a plurality of objects to be counted;
a density estimation module 1202, configured to input the target image to a density estimation network to obtain an initial density map, where the density estimation network is used to estimate a density of an object in the target image;
an object detection module 1203, configured to input the target image to an object detection network, to obtain a first detection result graph, where the first detection result graph includes at least one detection frame, and each detection frame includes a part or all of an object;
a processing module 1204, configured to process, according to the first detection result map, a density value in the initial density map to obtain a first density map;
a counting module 1205, configured to count the multiple objects according to the first detection result map and the first density map, so as to obtain the number of objects included in the target image.
Optionally, the obtaining module 1201 is further configured to:
acquiring a K frame reference image, wherein the K frame reference image and the target image are continuous K +1 frame images in a video stream, and K is a positive integer;
the density estimation module 1202 is further configured to:
inputting the K frames of reference images to the density estimation network to obtain an initial density map of each frame of reference image in the K frames of reference images;
the processing module 1204 is further configured to:
and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.
Optionally, the counting module 1205 is further configured to:
determining a counting area of the first detection result map and the first density map according to the position of a static area in the target image;
counting the objects in the static area according to the first detection result image and the counting area of the first density image to obtain the number of the objects included in the static area in the target image;
the device further comprises:
the device comprises a first alarm module and a second alarm module, wherein the first alarm module is used for generating an alarm signal when the number of the objects included in the static area is not less than a number alarm threshold value, and the alarm signal is used for indicating the aggregation of the objects.
Optionally, the processing module 1204 is further configured to:
determining the geographic position of each object in the static area according to the pixel position of each object in the target image and the calibration parameters of the camera;
determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects;
the device further comprises:
and the second alarm module is used for generating an alarm signal when the comprehensive distance is not greater than a distance alarm threshold value, wherein the alarm signal is used for indicating the aggregation of the objects.
Optionally, the processing module 1204 is further configured to:
dividing the first density map into a plurality of density block images;
judging whether each density block image meets a preset processing condition or not according to the density value in each density block image;
setting the density value in the density block image which meets the preset processing condition to zero, and not processing the density block image which does not meet the preset processing condition;
obtaining a second density map according to the processed density block image and the unprocessed density block image;
the counting module 1205 is specifically configured to:
counting the plurality of objects according to the first detection result map and the second density map.
Optionally, the preset processing conditions include:
the number of objects determined from the density values in the density block image is less than an image processing threshold; alternatively, the first and second electrodes may be,
the number of objects determined according to the density values in the density block image is not less than the image processing threshold, and the number of detection frames of a corresponding detection area of the density block image in the first detection result map is less than the image processing threshold, the detection area being determined according to the position of the density block image in the first density map.
Optionally, the processing module 1204 is specifically configured to:
and determining a region to be processed in the initial density map according to the position of each detection frame in the first detection result map, and setting the density value in the region to be processed to zero to obtain the first density map.
Optionally, the counting module 1205 is specifically configured to:
determining a first number according to the first detection result map, wherein the first number is the number of detection frames in the first detection result map, and determining a second number according to the first density map, and the second number is determined by density values in the first density map;
and calculating the sum of the first number and the second number to obtain the number of the objects included in the target image.
Optionally, the apparatus further comprises:
and the generation module is used for generating a visualization result graph according to the first detection result graph and the first density graph, and the visualization result graph is used for displaying the distribution number and density of the objects in the geographic area.
The object counting apparatus according to the embodiment of the present application may correspond to performing the method described in the embodiment of the present application, and the above and other operations and/or functions of each module in the object counting apparatus are respectively for implementing corresponding flows of each method in fig. 2, fig. 4, fig. 6, fig. 9, and fig. 10, and are not described herein again for brevity.
It should be noted that the above-described embodiments are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Fig. 12 is a schematic diagram of an object counting device 100 according to an embodiment of the present disclosure, and as shown in the figure, the object counting device 100 includes a processor 101, a memory 102, a communication interface 103, and a bus 104. The processor 101, the memory 102, and the communication interface 103 communicate with each other via the bus 104, or may communicate with each other via other means such as wireless transmission. The memory 102 stores executable program code and the processor 101 may call the program code stored in the memory 102 to perform the object counting method in the aforementioned method embodiments.
It should be understood that, in the embodiment of the present application, the processor 101 may be a central processing unit CPU, and the processor 101 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or any conventional processor or the like.
The memory 102 may include both read-only memory and random access memory and provides instructions and data to the processor 101. The memory 102 may also include non-volatile random access memory. For example, the memory 102 may also store a training data set.
The memory 102 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as static random access memory (static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM).
The bus 104 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. But for clarity of illustration the various buses are labeled as bus 104 in the figures.
It should be understood that the object counting apparatus 100 according to the embodiment of the present application may correspond to an object counting device in the embodiment of the present application, and may correspond to a corresponding main body for executing the methods shown in fig. 2, fig. 4, fig. 6, fig. 9, and fig. 10 according to the embodiment of the present application, and the above and other operations and/or functions of each device in the object counting apparatus 100 are respectively to implement corresponding flows of the methods in fig. 2, fig. 4, fig. 6, fig. 9, and fig. 10, and are not described herein again for brevity.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits.
However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device.
The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (20)

1. An object counting method, comprising:
acquiring a target image, wherein the target image is obtained by shooting a geographical area by a camera and comprises a plurality of objects to be counted;
inputting the target image to a density estimation network to obtain an initial density map, wherein the density estimation network is used for estimating the density of an object in the target image;
inputting the target image to an object detection network to obtain a first detection result graph, wherein the first detection result graph comprises at least one detection frame, and each detection frame comprises part or all parts of an object;
processing the density value in the initial density map according to the first detection result map to obtain a first density map;
and counting the plurality of objects according to the first detection result image and the first density image to obtain the number of the objects included in the target image.
2. The method of claim 1, further comprising:
acquiring a K frame reference image, wherein the K frame reference image and the target image are continuous K +1 frame images in a video stream, and K is a positive integer;
inputting the K frames of reference images to the density estimation network to obtain an initial density map of each frame of reference image in the K frames of reference images;
and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.
3. The method of claim 2, further comprising:
determining a counting area of the first detection result map and the first density map according to the position of a static area in the target image;
counting the objects in the static area according to the first detection result image and the counting area of the first density image to obtain the number of the objects included in the static area in the target image;
when the number of objects included in the static area is not less than a number alarm threshold, generating an alarm signal, wherein the alarm signal is used for indicating the aggregation of the objects.
4. A method according to claim 2 or 3, characterized in that the method further comprises:
determining the geographic position of each object in the static area according to the pixel position of each object in the target image and the calibration parameters of the camera;
determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects;
and when the comprehensive distance is not greater than the distance alarm threshold value, generating an alarm signal, wherein the alarm signal is used for indicating the aggregation of the objects.
5. The method of any of claims 1 to 4, wherein said counting the plurality of objects from the first detection result map and the first density map comprises:
dividing the first density map into a plurality of density block images;
judging whether each density block image meets a preset processing condition or not according to the density value in each density block image;
setting the density value in the density block image which meets the preset processing condition to zero, and not processing the density block image which does not meet the preset processing condition;
obtaining a second density map according to the processed density block image and the unprocessed density block image;
counting the plurality of objects according to the first detection result map and the second density map.
6. The method of claim 5, wherein the preset processing conditions comprise:
the number of objects determined from the density values in the density block image is less than an image processing threshold; alternatively, the first and second electrodes may be,
the number of objects determined according to the density values in the density block image is not less than the image processing threshold, and the number of detection frames of a corresponding detection area of the density block image in the first detection result map is less than the image processing threshold, the detection area being determined according to the position of the density block image in the first density map.
7. The method according to any one of claims 1 to 6, wherein the processing the initial density map according to the first detection result map to obtain a first density map comprises:
and determining a region to be processed in the initial density map according to the position of each detection frame in the first detection result map, and setting the density value in the region to be processed to zero to obtain the first density map.
8. The method according to any one of claims 1 to 7, wherein the counting the plurality of objects according to the first detection result map and the first density map to obtain the number of objects included in the target image comprises:
determining a first number according to the first detection result map, wherein the first number is the number of detection frames in the first detection result map, and determining a second number according to the first density map, and the second number is determined by density values in the first density map;
and calculating the sum of the first number and the second number to obtain the number of the objects included in the target image.
9. The method according to any one of claims 1 to 8, further comprising:
and generating a visualization result graph according to the first detection result graph and the first density graph, wherein the visualization result graph is used for showing the distribution number and density of the objects in the geographic area.
10. An object counting apparatus, comprising:
the system comprises an acquisition module, a counting module and a counting module, wherein the acquisition module is used for acquiring a target image, the target image is obtained by shooting a geographical area by a camera, and the target image comprises a plurality of objects to be counted;
a density estimation module, configured to input the target image to a density estimation network to obtain an initial density map, where the density estimation network is used to estimate a density of an object in the target image;
the object detection module is used for inputting the target image to an object detection network to obtain a first detection result graph, wherein the first detection result graph comprises at least one detection frame, and each detection frame comprises part or all parts of an object;
the processing module is used for processing the density value in the initial density map according to the first detection result map to obtain a first density map;
and the counting module is used for counting the plurality of objects according to the first detection result graph and the first density graph to obtain the number of the objects in the target image.
11. The apparatus of claim 10, wherein the obtaining module is further configured to:
acquiring a K frame reference image, wherein the K frame reference image and the target image are continuous K +1 frame images in a video stream, and K is a positive integer;
the density estimation module is further to:
inputting the K frames of reference images to the density estimation network to obtain an initial density map of each frame of reference image in the K frames of reference images;
the processing module is further configured to:
and determining a static area in the target image according to the initial density map of each frame of reference image and the initial density map of the target image.
12. The apparatus of claim 10, wherein the counting module is further configured to:
determining a counting area of the first detection result map and the first density map according to the position of a static area in the target image;
counting the objects in the static area according to the first detection result image and the counting area of the first density image to obtain the number of the objects included in the static area in the target image;
the device further comprises:
the device comprises a first alarm module and a second alarm module, wherein the first alarm module is used for generating an alarm signal when the number of the objects included in the static area is not less than a number alarm threshold value, and the alarm signal is used for indicating the aggregation of the objects.
13. The apparatus of claim 11 or 12, wherein the processing module is further configured to:
determining the geographic position of each object in the static area according to the pixel position of each object in the target image and the calibration parameters of the camera;
determining the distance from each object to other objects according to the geographic position of each object, and determining the comprehensive distance of the objects in the static area based on the distance from each object to other objects;
the device further comprises:
and the second alarm module is used for generating an alarm signal when the comprehensive distance is not greater than a distance alarm threshold value, wherein the alarm signal is used for indicating the aggregation of the objects.
14. The apparatus according to any one of claims 10 to 13, wherein the counting module is specifically configured to:
dividing the first density map into a plurality of density block images;
judging whether each density block image meets a preset processing condition or not according to the density value in each density block image;
setting the density value in the density block image which meets the preset processing condition to zero, and not processing the density block image which does not meet the preset processing condition;
obtaining a second density map according to the processed density block image and the unprocessed density block image;
counting the plurality of objects according to the first detection result map and the second density map.
15. The apparatus of claim 14, wherein the preset processing condition comprises:
the number of objects determined from the density values in the density block image is less than an image processing threshold; alternatively, the first and second electrodes may be,
the number of objects determined according to the density values in the density block image is not less than the image processing threshold, and the number of detection frames of a corresponding detection area of the density block image in the first detection result map is less than the image processing threshold, the detection area being determined according to the position of the density block image in the first density map.
16. The apparatus according to any one of claims 10 to 15, wherein the processing module is specifically configured to:
and determining a region to be processed in the initial density map according to the position of each detection frame in the first detection result map, and setting the density value in the region to be processed to zero to obtain the first density map.
17. The apparatus according to any one of claims 10 to 16, wherein the counting module is specifically configured to:
determining a first number according to the first detection result map, wherein the first number is the number of detection frames in the first detection result map, and determining a second number according to the first density map, and the second number is determined by density values in the first density map;
and calculating the sum of the first number and the second number to obtain the number of the objects included in the target image.
18. The apparatus of any one of claims 10 to 17, further comprising:
and the generation module is used for generating a visualization result graph according to the first detection result graph and the first density graph, and the visualization result graph is used for displaying the distribution number and density of the objects in the geographic area.
19. An object processing apparatus, characterized by comprising:
a processor and a memory;
the memory to store computer instructions;
the processor configured to perform the method of any one of claims 1 to 9 according to the computer instructions.
20. A computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 9.
CN201911219292.7A 2019-11-29 2019-11-29 Object counting method and device, equipment and storage medium Active CN112883768B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911219292.7A CN112883768B (en) 2019-11-29 2019-11-29 Object counting method and device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911219292.7A CN112883768B (en) 2019-11-29 2019-11-29 Object counting method and device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112883768A true CN112883768A (en) 2021-06-01
CN112883768B CN112883768B (en) 2024-02-09

Family

ID=76039487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911219292.7A Active CN112883768B (en) 2019-11-29 2019-11-29 Object counting method and device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112883768B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782412A (en) * 2022-05-26 2022-07-22 马上消费金融股份有限公司 Image detection method, and training method and device of target detection model
CN114926409A (en) * 2022-04-29 2022-08-19 贵州航天云网科技有限公司 Intelligent industrial component data acquisition method
CN115100703A (en) * 2022-05-17 2022-09-23 交通运输部水运科学研究所 Method and device for determining sneak behavior

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341480A (en) * 2017-07-12 2017-11-10 中国电子科技集团公司第二十八研究所 A kind of crowd massing detection method of modified PCCNN neural network models
CN108717528A (en) * 2018-05-15 2018-10-30 苏州平江历史街区保护整治有限责任公司 A kind of global population analysis method of more strategies based on depth network
CN109446989A (en) * 2018-10-29 2019-03-08 上海七牛信息技术有限公司 Crowd massing detection method, device and storage medium
CN109726658A (en) * 2018-12-21 2019-05-07 上海科技大学 Crowd counts and localization method, system, electric terminal and storage medium
CN110059581A (en) * 2019-03-28 2019-07-26 常熟理工学院 People counting method based on depth information of scene
CN110501278A (en) * 2019-07-10 2019-11-26 同济大学 A kind of method for cell count based on YOLOv3 and density estimation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341480A (en) * 2017-07-12 2017-11-10 中国电子科技集团公司第二十八研究所 A kind of crowd massing detection method of modified PCCNN neural network models
CN108717528A (en) * 2018-05-15 2018-10-30 苏州平江历史街区保护整治有限责任公司 A kind of global population analysis method of more strategies based on depth network
CN109446989A (en) * 2018-10-29 2019-03-08 上海七牛信息技术有限公司 Crowd massing detection method, device and storage medium
CN109726658A (en) * 2018-12-21 2019-05-07 上海科技大学 Crowd counts and localization method, system, electric terminal and storage medium
CN110059581A (en) * 2019-03-28 2019-07-26 常熟理工学院 People counting method based on depth information of scene
CN110501278A (en) * 2019-07-10 2019-11-26 同济大学 A kind of method for cell count based on YOLOv3 and density estimation

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926409A (en) * 2022-04-29 2022-08-19 贵州航天云网科技有限公司 Intelligent industrial component data acquisition method
CN114926409B (en) * 2022-04-29 2024-05-28 贵州航天云网科技有限公司 Intelligent industrial component data acquisition method
CN115100703A (en) * 2022-05-17 2022-09-23 交通运输部水运科学研究所 Method and device for determining sneak behavior
CN115100703B (en) * 2022-05-17 2024-05-14 交通运输部水运科学研究所 Method and device for determining stealing behavior
CN114782412A (en) * 2022-05-26 2022-07-22 马上消费金融股份有限公司 Image detection method, and training method and device of target detection model

Also Published As

Publication number Publication date
CN112883768B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN112883768A (en) Object counting method and device, equipment and storage medium
CN110781756A (en) Urban road extraction method and device based on remote sensing image
US8995714B2 (en) Information creation device for estimating object position and information creation method and program for estimating object position
WO2013186662A1 (en) Multi-cue object detection and analysis
CN112446316B (en) Accident detection method, electronic device, and storage medium
CN109815787B (en) Target identification method and device, storage medium and electronic equipment
CN112668480A (en) Head attitude angle detection method and device, electronic equipment and storage medium
CN112766137B (en) Dynamic scene foreign matter intrusion detection method based on deep learning
CN110942456B (en) Tamper image detection method, device, equipment and storage medium
CN112084892B (en) Road abnormal event detection management device and method thereof
CN112634369A (en) Space and or graph model generation method and device, electronic equipment and storage medium
CN112634368A (en) Method and device for generating space and OR graph model of scene target and electronic equipment
US20170053172A1 (en) Image processing apparatus, and image processing method
CN110636281A (en) Real-time monitoring camera shielding detection method based on background model
CN113051980A (en) Video processing method, device, system and computer readable storage medium
CN116052026A (en) Unmanned aerial vehicle aerial image target detection method, system and storage medium
CN111539360A (en) Safety belt wearing identification method and device and electronic equipment
CN113505643A (en) Violation target detection method and related device
CN110765875B (en) Method, equipment and device for detecting boundary of traffic target
CN113450459A (en) Method and device for constructing three-dimensional model of target object
CN116543333A (en) Target recognition method, training method, device, equipment and medium of power system
CN114004876A (en) Dimension calibration method, dimension calibration device and computer readable storage medium
CN114373162A (en) Dangerous area personnel intrusion detection method and system for transformer substation video monitoring
CN114999183B (en) Traffic intersection vehicle flow detection method
CN117576634B (en) Anomaly analysis method, device and storage medium based on density detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220209

Address after: 550025 Huawei cloud data center, jiaoxinggong Road, Qianzhong Avenue, Gui'an New District, Guiyang City, Guizhou Province

Applicant after: Huawei Cloud Computing Technologies Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

GR01 Patent grant
GR01 Patent grant