CN113239743A

CN113239743A - Crowd density detection method, device, equipment and storage medium

Info

Publication number: CN113239743A
Application number: CN202110445331.6A
Authority: CN
Inventors: 肖传利
Original assignee: Pulian International Co ltd
Current assignee: Pulian International Co ltd
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-08-10

Abstract

The invention provides a crowd density detection method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be trained; carrying out human head detection on the image to be trained to obtain an initial detection result; acquiring undetected heads on the image to be trained according to the initial detection result; classifying the undetected human head to obtain a classification result; separating the images to be trained according to the classification result; constructing a mask according to the separated image to be trained to obtain a mask image; filling the image to be trained according to the mask image to obtain a filled image; generating a crowd density image from the filler image; training according to the filling images and the crowd density images to obtain a crowd density estimation network; and carrying out crowd density detection on the image to be detected according to the crowd density estimation network. The embodiment of the invention can effectively improve the accuracy of the crowd density detection result.

Description

Crowd density detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for crowd density detection.

Background

With the development of artificial intelligence technology, crowd density estimation technology has emerged. The crowd density estimation technology can automatically deduce the total number of people in the image, and plays an important role in the fields of video monitoring, public safety and the like. The crowd density can be estimated through the crowd density graph. The existing population density map generation method is mainly limited as follows: the density map is generated in a mode that all human head targets are consistent in size, the actual size of each human head target is not considered, and the difference of the density map generated by the large and small targets cannot be effectively represented. The size of the human head target is not accurately estimated, and especially for small targets, the length and the width of the small target manually marked do not have higher precision due to the problem of resolution. The two limitations result in the inaccuracy of the existing population density map generation method, and thus the inaccuracy of the population density detection result.

Disclosure of Invention

The invention aims to provide a crowd density detection generation method, a device, equipment and a storage medium, so as to improve the accuracy of crowd density detection results.

In order to solve the above technical problem, in a first aspect, an embodiment of the present invention provides a crowd density detection method, including:

acquiring an image to be trained;

carrying out human head detection on the image to be trained to obtain an initial detection result;

acquiring undetected heads on the image to be trained according to the initial detection result;

classifying the undetected human head to obtain a classification result;

separating the images to be trained according to the classification result;

constructing a mask according to the separated image to be trained to obtain a mask image;

filling the image to be trained according to the mask image to obtain a filled image;

generating a crowd density image from the filler image;

training according to the filling images and the crowd density images to obtain a crowd density estimation network;

and carrying out crowd density detection on the image to be detected according to the crowd density estimation network.

Further, the human head detection is performed on the image to be trained to obtain an initial detection result, and the method specifically includes:

performing header marking on the image to be trained;

and carrying out human head detection on the marked image to be trained, and taking the size of the target with the highest response as the human head size of the corresponding position to obtain the detected human head as the initial detection result.

Further, the classifying the undetected human head to obtain a classification result specifically includes:

acquiring a plurality of human head targets around an undetected human head;

when one head target exists in the plurality of head targets and is a detected head, the size of the head target is given to the undetected head, and the undetected head and the detected head are classified into a class which is marked as a normal detection class;

and when one head target does not exist in the plurality of head targets and is the detected head, marking the undetected head as an interference class.

Further, the separating the images to be trained according to the classification result specifically includes:

and separating the normal detection class from the interference class on the image to be trained.

Further, the constructing a mask according to the separated image to be trained to obtain a mask image specifically includes:

constructing masks with the same size according to the images to be trained;

and setting the value of the mask at the position corresponding to the normal detection class as 1, and setting the value of the mask at the position corresponding to the interference class as 0 to obtain the mask image.

Further, the filling processing is performed on the image to be trained according to the mask image to obtain a filled image, and the method specifically includes:

acquiring a region corresponding to the mask image with the value of 0 on the image to be trained, and recording the region as a filling region;

and filling the filling area by adopting a negative sample image to obtain a filling image.

Further, the crowd density detection method further includes:

scaling the image to be trained and the mask image in the same proportion to obtain a scaled image to be trained and a scaled mask image;

filling the image to be trained according to the zoom mask image to obtain a zoom filling image;

generating a scaled population density image from the scaled fill image;

and training according to the filling image, the crowd density image, the scaling filling image and the scaling crowd density image to obtain the crowd density estimation network.

In a second aspect, an embodiment of the present invention provides a crowd density detecting device, including:

the image to be trained acquiring unit is used for acquiring an image to be trained;

the human head detection unit is used for carrying out human head detection on the image to be trained to obtain an initial detection result;

the undetected head acquisition unit is used for acquiring undetected heads on the image to be trained according to the initial detection result;

the classification unit is used for classifying the undetected heads to obtain a classification result;

the separation unit is used for separating the images to be trained according to the classification result;

the mask construction unit is used for constructing a mask according to the separated image to be trained to obtain a mask image;

the filling unit is used for filling the image to be trained according to the mask image to obtain a filled image;

a crowd density image generating unit for generating a crowd density image from the filler image;

the training unit is used for training according to the filling images and the crowd density images to obtain a crowd density estimation network;

and the crowd density detection unit is used for carrying out crowd density detection on the image to be detected according to the crowd density estimation network.

In a third aspect, an embodiment of the present invention provides a crowd density detection apparatus, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor, when executing the computer program, implements the crowd density detection method according to any one of the above.

In a fourth aspect, the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, the apparatus in which the computer-readable storage medium is located is controlled to execute the crowd density detection method according to any one of the above.

Compared with the prior art, the embodiment of the invention can obtain the actual size information of each human head by detecting the human head of the image to be trained, and the generated crowd density map is more accurate compared with the crowd density map generated by adopting a mode of consistent sizes of all human heads in the prior art. In addition, the embodiment of the invention classifies the undetected human heads and combines the mask technology, thereby solving the problem that the human heads which cannot be detected influence the network training of the human population density estimation, for example, by eliminating the human heads with the small head size, compared with the prior art, the embodiment of the invention has low marking precision on the target size of the small head, and the embodiment of the invention improves the accuracy of the human population density detection result.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a crowd density detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a plurality of detection frames detecting the same human head if the human head detector is a classifier according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a crowd density detection apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are for convenience of description only and are not intended as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

Example 1:

referring to fig. 1, an embodiment of the invention provides a crowd density detection method, including steps S1 to S10:

and S1, acquiring an image to be trained.

It should be noted that the execution subject in the embodiment of the present invention may be a server or a terminal device, and the execution subject in the embodiment of the present invention is not limited as long as the crowd density detection method in the embodiment of the present invention can be implemented.

And S2, performing human head detection on the image to be trained to obtain an initial detection result.

In the embodiment of the invention, specifically, the human head detector is used for detecting the human head of the image to be trained, and the human head detector is obtained by using a fast-rcnn, yolo or ssd and other object detection algorithms and a human head data set for training.

And S3, acquiring the undetected human head on the image to be trained according to the initial detection result.

In the embodiment of the present invention, it should be noted that if the size of the human head on the image to be trained is too small and exceeds the detection range of the human head detector, the human head may not be detected. Further, although the size of the head on the image to be trained is a normal size, it may be impossible to detect it due to the head posture or the head detector, and therefore, the undetected head mainly includes two types: the first is normal head size, but may not be detected due to head pose or head detector; the second category is undersized heads, which exceed the detection range of the head detector.

And S4, classifying the undetected heads to obtain a classification result.

In the embodiment of the invention, because the first-type undetected human head is a human head with a normal size, if the human head is not detected, the accuracy of the crowd density map is reduced. In the second type, the human head is too small to be estimated, and therefore, the human head needs to be removed. Therefore, it is necessary to classify the undetected human head so as to facilitate the separate processing.

And S5, separating the images to be trained according to the classification result.

And S6, constructing a mask according to the separated image to be trained to obtain a mask image.

And S7, filling the image to be trained according to the mask image to obtain a filled image. In the embodiment of the invention, it should be understood that the mask image is a binary image composed of 0 and 1, and the image to be trained is filled according to the mask image, so that the small head can be removed, and the head with the normal size can be reserved, thereby improving the accuracy of the crowd density map.

S8, generating a crowd density image according to the filling image;

in the embodiment of the present invention, specifically, a crowd density map is generated according to the coordinates of the center point of each head in the filling image and the size of the head.

And S9, training according to the filling images and the crowd density images to obtain a crowd density estimation network.

In the embodiment of the invention, because the crowd density map is generated according to the images to be filled, the trained crowd density estimation network has higher accuracy.

And S10, carrying out crowd density detection on the image to be detected according to the crowd density estimation network.

In the embodiment of the present invention, as can be seen from the above description, since the accuracy of the crowd density estimation network is higher, the accuracy of the crowd density detection result is also higher.

As an example of the embodiment of the present invention, the performing human head detection on the image to be trained to obtain an initial detection result specifically includes:

performing header marking on the image to be trained;

Specifically, first, a person head is marked on the image to be trained according to the person head center coordinates of each person head on the image to be trained, and then, the person head detector is used for detecting the person head on each person head center coordinate to obtain the person head size of each person head. The head detector does not necessarily detect all the heads corresponding to the head center coordinates, and therefore, the head size of the head cannot be obtained for the undetected head.

In the embodiment of the present invention, if the human head detector is a classifier, as shown in fig. 2, the dashed line box represents the size of the human head detector, the circle represents the human head, and the solid line box represents detection boxes with different sizes centered on the center coordinates of the human head. When a plurality of detection frames detect the same head at the same time, the size of the detection frame closest to the size of the head is selected as the size of the head, namely the size of the target with the highest response is used as the size of the head of the corresponding position. The length and width of the detection frame are proportional to the length and width of the human head detector, and the ratio of the length and width of the ith detection frame to the length of the human head detector is r_i，r_iIs a preset value.

As an example of the embodiment of the present invention, the classifying the undetected human head to obtain a classification result specifically includes:

acquiring a plurality of human head targets around an undetected human head;

Specifically, a plurality of human head targets around an undetected human head are obtained firstly; then, calculating the distance between the undetected head and each head target according to the head central point coordinate of the undetected head and the central point coordinate of each head target; sequencing the distances from small to large; and after sorting, sequentially inquiring whether each head target is a detected head according to a sorting sequence, if the inquired head target is the detected head, giving the size of the inquired head target to the undetected head, and otherwise, continuously inquiring until the plurality of head targets are inquired completely.

It should be noted that the normal detection type is a head of a normal size, the interference type is a head of a person whose head is too small, and the head size is beyond the range of the detector.

Compared with the prior art that the actual size of each human head is determined only according to the preset fixed kernel, the generated crowd density graph is more consistent with the actual crowd density graph, namely the crowd density detection result is more accurate.

As an example of the embodiment of the present invention, the separating the image to be trained according to the classification result specifically includes:

Specifically, the normal detection class and the interference class are separated by using an image boundary.

As an example of the embodiment of the present invention, the constructing a mask according to the separated image to be trained, and obtaining the mask image specifically includes:

constructing masks with the same size according to the images to be trained;

In the embodiment of the invention, the value of the mask at the position corresponding to the interference class is set to be 0 so as to eliminate the interference class, namely, the undersized head in the image to be trained is eliminated; the value of the mask at the corresponding position of the normal detection class is set to be 1 so as to reserve the normal detection class, so that a crowd density map can be generated according to the head of the normal detection class, and the accuracy of the crowd density map is improved.

In specific implementation, there are many construction modes of the mask, and one of the construction modes is specifically as follows:

let (x)_pi,y_pi) For normal detection of head centre point p of ith individual_iCoordinate of (1), width_iAnd height_iHead width and height of the ith individual in the normal detection class, respectively, (x)_qi,y_qi) Head center point q for interfering with the head of an ith person_iWidth _ min and height _ min are the head width and height, respectively, of the minimum head size detectable by the head detector, and condition 1 and condition 2 are set as follows:

condition 1: for coordinate point (x, y), there is p_iPoint of, satisfy

And is

Condition 2: for coordinate point (x, y), there is q_iPoint of, satisfy

And is

If the coordinate point (x, y) satisfies the condition 1 and does not satisfy the condition 2, Mask (x, y) is 1; wherein Mask (x, y) is a Mask value of the coordinate point (x, y);

if the coordinate point (x, y) does not satisfy the condition 1 and satisfies the condition 2, Mask (x, y) is 0;

if the coordinate point (x, y) satisfies the condition 1 and the condition 2, then

Wherein, L2(x, y) ═ x²+y²，min_i(z_i) Represents that in all values of i, z_iZ when the value is minimum_iA value;

if the coordinate point (x, y) does not satisfy the condition 1 and does not satisfy the condition 2, then

The coordinates are coordinates in units of pixels.

As an example of the embodiment of the present invention, the filling the image to be trained according to the mask image to obtain a filled image specifically includes:

In the embodiment of the present invention, the purpose of filling the filling area with the negative sample image is to avoid interference of interference types, that is, to avoid that too small human heads affect the accurate generation of the density map, where the negative sample image is an image that does not include human figures, and may be a tree image, for example.

In order to further improve the accuracy of the crowd density detection result, as an example of the embodiment of the present invention, the crowd density detection method further includes:

generating a scaled population density image from the scaled fill image;

After the image to be trained and the mask image are zoomed and the corresponding crowd density estimation network is obtained, the crowd density estimation network can be used for detecting the head with the normal size and the smaller head, the detection precision of the head is improved, the zooming proportion can be adjusted as required, and the zooming times can be set as required.

Example 2:

referring to fig. 3, an embodiment of the invention provides a crowd density detecting device, including:

the training image acquisition unit 1 is used for acquiring images to be trained;

the human head detection unit 2 is used for performing human head detection on the image to be trained to obtain an initial detection result;

the undetected human head acquisition unit 3 is used for acquiring the undetected human head on the image to be trained according to the initial detection result;

the classification unit 4 is used for classifying the undetected heads to obtain a classification result;

the separation unit 5 is used for separating the images to be trained according to the classification result;

the mask construction unit 6 is used for constructing a mask according to the separated image to be trained to obtain a mask image;

the filling unit 7 is used for performing filling processing on the image to be trained according to the mask image to obtain a filled image;

a crowd density image generating unit 8 for generating a crowd density image from the filler image;

a training unit 9, configured to train according to the filling image and the crowd density image to obtain a crowd density estimation network;

and the crowd density detection unit 10 is used for carrying out crowd density detection on the image to be detected according to the crowd density estimation network.

performing header marking on the image to be trained;

acquiring a plurality of human head targets around an undetected human head;

As an example of the embodiment of the present invention, the constructing a mask according to the separated image to be trained to obtain a mask image specifically includes:

constructing masks with the same size according to the images to be trained;

As an example of the embodiment of the present invention, the crowd density detecting apparatus further includes:

the scaling unit is used for scaling the image to be trained and the mask image in the same proportion to obtain a scaled image to be trained and a scaled mask image;

the second filling unit is used for filling the zooming to-be-trained image according to the zooming mask image to obtain a zooming filling image;

a scaled population density map generating unit for generating a scaled population density image from the scaled filler image;

and the second training unit is used for training according to the filling image, the crowd density image, the scaling filling image and the scaling crowd density image to obtain the crowd density estimation network.

Example 3:

an embodiment of the present invention further provides a crowd density detection apparatus, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the crowd density detection apparatus implements the crowd density detection method according to any of the above embodiments.

Example 4:

an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where when the computer program runs, a device in which the computer-readable storage medium is located is controlled to execute the crowd density detection method according to any one of the above embodiments.

It should be noted that, all or part of the flow in the method according to the above embodiments of the present invention may also be implemented by a computer program instructing related hardware, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the above embodiments of the method may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be further noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for crowd density detection, comprising:

acquiring an image to be trained;

classifying the undetected human head to obtain a classification result;

separating the images to be trained according to the classification result;

generating a crowd density image from the filler image;

2. The crowd density detection method according to claim 1, wherein the detecting the head of the image to be trained to obtain an initial detection result specifically comprises:

performing header marking on the image to be trained;

3. The method according to claim 2, wherein the classifying the undetected human head to obtain a classification result specifically comprises:

acquiring a plurality of human head targets around an undetected human head;

4. The crowd density detection method according to claim 3, wherein the separating the image to be trained according to the classification result specifically comprises:

5. The crowd density detection method according to claim 4, wherein the constructing a mask according to the separated image to be trained to obtain a mask image specifically comprises:

constructing masks with the same size according to the images to be trained;

6. The crowd density detection method according to claim 5, wherein the filling processing is performed on the image to be trained according to the mask image to obtain a filled image, and specifically includes:

7. The crowd density detection method according to claim 1, further comprising:

generating a scaled population density image from the scaled fill image;

8. A crowd density detection device, comprising:

9. A crowd density detection device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor when executing the computer program implementing the crowd density detection method according to any one of claims 1 to 7.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the crowd density detection method according to any one of claims 1 to 7.