CN113902772A

CN113902772A - Crowd counting method and device, computer storage medium and processor

Info

Publication number: CN113902772A
Application number: CN202111401821.2A
Authority: CN
Inventors: 张少杰
Original assignee: Shenzhen Lianzhou International Technology Co Ltd
Current assignee: Shenzhen Lianzhou International Technology Co Ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-01-07

Abstract

The application provides a crowd counting method, a device, a computer storage medium and a processor. Wherein, the method comprises the following steps: acquiring a target scene image with depth information, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; performing predetermined processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image; according to the size of the depth information, dividing the pedestrian motion area image into a plurality of subarea images; acquiring the number of pedestrians contained in each subregion image; and calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene. The method and the device solve the technical problem that in the prior art, the counting precision of the crowd counting method is low.

Description

Crowd counting method and device, computer storage medium and processor

Technical Field

The present application relates to the field of image processing, and in particular, to a method and an apparatus for counting people, a computer storage medium, and a processor.

Background

The population counting methods in the prior art roughly include the following two methods:

1) count from top to bottom: firstly, detecting the heads or pedestrians in the crowd by methods such as a target detection model and the like, and then counting the number;

2) counting from bottom to top: and directly regressing the crowd density in the image by using a plurality of deep learning models, predicting to obtain a thermodynamic diagram of the crowd density, and then integrating according to the thermodynamic diagram to estimate the crowd number.

For the first method: the method comprises the steps of obtaining a target boundary frame through a human head detection mode or a pedestrian detection mode and the like, then counting to obtain the number of people, wherein the estimated precision is completely determined by the precision of the human head detection or the pedestrian detection, once a target detection network fails to detect, the people counting precision is inevitably reduced, in addition, under the scene with high pedestrian flow density, the target is small, the postures are different, the shielding is serious, and the precision of the pedestrian detection or the human head detection is often poor, so the precision of the method is low;

for the second type of method: the estimation of the number of the crowds is obtained by directly estimating the crowd density thermodynamic diagram, the method also faces the problem of poor multi-scale performance, and particularly under a high density scene, a small-scale target is easy to miss report due to small Gaussian kernel distribution; and because this method only considers one overall quantity, the position information associated with the target is lost, a large number of training samples are required, and accuracy is difficult to guarantee.

Disclosure of Invention

The present application mainly aims to provide a crowd counting method, a device, a computer storage medium and a processor, so as to solve the technical problem of low counting accuracy of the crowd counting method in the prior art.

In order to achieve the above object, according to one aspect of the present application, there is provided a population counting method including: acquiring a target scene image with depth information, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; performing predetermined processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image; dividing the pedestrian motion area image into a plurality of sub-area images according to the size of the depth information; acquiring the number of pedestrians contained in each subregion image; and calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene.

Further, acquiring the number of pedestrians contained in each subregion image, wherein the method comprises the following steps: performing region communication processing on the subregion images to obtain a plurality of independent communication domains; determining a minimum bounding rectangle frame of each connected domain; determining the number of pedestrians in the minimum circumscribed rectangular frame according to the width information, the depth information and the foreground density of the minimum circumscribed rectangular frame; and calculating the sum of the pedestrian numbers contained in all the minimum circumscribed rectangular frames of the sub-region images to obtain the pedestrian number contained in the sub-region images.

Further, the number of pedestrians contained in the minimum bounding rectangle frame is determined according to the width information, the depth information and the foreground density of the minimum bounding rectangle frame, and the method comprises the following steps: constructing a machine learning model, wherein the machine learning model is obtained by machine learning training by using a plurality of groups of training data, and each group of training data in the plurality of groups of training data comprises: the width information of the minimum circumscribed rectangular frame, the depth information of the minimum circumscribed rectangular frame, the foreground density of the minimum circumscribed rectangular frame and the number of pedestrians contained in the minimum circumscribed rectangular frame; and determining the number of pedestrians in the minimum bounding rectangle frame through the machine learning model.

Further, the method includes dividing the pedestrian motion region image into a plurality of sub-region images according to the size of the depth information, and the method includes: acquiring a maximum value and a minimum value in the depth information; determining the number and the distance of the groups according to the maximum value and the minimum value; and carrying out N times of foreground segmentation on the pedestrian movement region image according to the starting threshold and the ending threshold of each group of depth information to obtain N sub-region images, wherein N represents the number of the groups.

Further, the method comprises the following steps of carrying out preset processing on the target scene image to obtain a pedestrian motion area image, wherein the method comprises the following steps: performing frame difference processing on the target scene image to obtain a pedestrian edge profile; performing morphological expansion and connected domain segmentation on the pedestrian edge contour to obtain a scattered motion region; and removing the non-pedestrian noise area in the scattered motion area according to the position information and the area information to obtain the image of the pedestrian motion area.

Further, before the predetermined processing is performed on the target scene image, the method further includes: and carrying out binarization processing on the target scene image.

According to another aspect of the present application there is provided a people counting device comprising: the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target scene image with depth information, the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; the processing unit is used for carrying out preset processing on the target scene image to obtain a pedestrian movement area image, and other objects except pedestrians are not included in the pedestrian movement area image; a dividing unit, configured to divide the pedestrian motion region image into a plurality of sub-region images according to the size of the depth information; the second acquisition unit is used for acquiring the number of pedestrians contained in each subregion image; and the calculating unit is used for calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene.

Further, the second acquisition unit includes: the processing module is used for carrying out region communication processing on the subregion images to obtain a plurality of independent communication domains; the first determining module is used for determining the minimum circumscribed rectangle frame of each connected domain; the second determining module is used for determining the number of pedestrians in the minimum circumscribed rectangular frame according to the width information, the depth information and the foreground density of the minimum circumscribed rectangular frame; and the calculation module is used for calculating the sum of the pedestrian numbers contained in all the minimum external rectangular frames of the sub-region images to obtain the pedestrian number contained in the sub-region images.

According to another aspect of the present application, there is also provided a computer readable storage medium comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer readable storage medium is located to perform any one of the above-mentioned people counting methods.

According to another aspect of the application, there is also provided a processor for executing a program, wherein the program when executed performs any one of the above-mentioned people counting methods.

By applying the technical scheme, firstly, a target scene image with depth information is obtained, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; then, performing preset processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image; dividing the pedestrian movement area image into a plurality of subarea images according to the depth information; acquiring the number of pedestrians contained in each subarea image; and finally, calculating the sum of the number of the pedestrians in all the subarea images to obtain the number of the pedestrians contained in the target scene. According to the method, the pedestrian movement area image with the depth information is divided into the plurality of sub-area images, and based on the calculation of the number of pedestrians in the sub-area images, the sum of the number of pedestrians in all the sub-area images is more accurate, the number of pedestrians contained in the whole target scene is accurately acquired, the accuracy of the people counting is improved, the technical effect of improving the accuracy under the dense scene particularly can be achieved, and the technical problem that the counting accuracy of the people counting method in the prior art is low is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a flow chart of a population counting method according to an embodiment of the present application;

FIG. 2 is an alternative population count depth data histogram according to an embodiment of the present application;

fig. 3 is a schematic diagram of a people counting device according to an embodiment of the application.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when an element is described as being "connected" to another element, the element may be "directly connected" to the other element or "connected" to the other element through a third element.

Example 1

According to an embodiment of the present application, a method of population counting is provided. It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

Fig. 1 is a flow chart of a population counting method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:

step S101, acquiring a target scene image with depth information, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera.

The target scene in the above steps can be various types of public areas such as shopping malls, tourist attractions, subway stations and the like; the target scene image may be directly obtained by a depth camera or other devices, or may be estimated by a depth estimation algorithm, for example, calculated by monocular depth estimation or binocular depth estimation, and the specific limitation is not made on the manner of obtaining the target scene image here.

In an alternative embodiment, an image I (R, G, B, D) of the scene with depth information is obtained, wherein R, G, B is the three color channels of red, green and blue of the image, and D is the depth information channel of the image.

Step S102, carrying out preset processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image.

The preset processing in the above steps may be converting the image into a gray image by using RGB three-channel information in the image, obtaining a frame difference result by using a frame difference method, optimizing the frame difference result, and obtaining the image of the pedestrian motion region through the above processing.

And step S103, dividing the pedestrian motion area image into a plurality of subarea images according to the size of the depth information.

The depth information in the above steps can be obtained through a depth information channel; the depth information may include a maximum value and a minimum value, and the pedestrian motion region image may be divided into a plurality of sub-region images based on correlation processing of data of the maximum value and the minimum value.

And step S104, acquiring the number of pedestrians contained in each subarea image.

The number of pedestrians contained in the sub-region image in the above step can be processed by region communication to obtain a plurality of independent communication domains, and then the obtained plurality of independent communication domains are processed in a related manner to obtain a plurality of data information, and the number of pedestrians contained in each sub-region image can be determined according to the information.

And step S105, calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene.

The pedestrian number in the scene can be accurately obtained by traversing and counting the pedestrian number of the plurality of sub-region images by calculating the sum of the pedestrian number in all the sub-region images in the steps.

Further, acquiring the number of pedestrians contained in each subregion image includes: performing area communication processing on the sub-area images to obtain a plurality of independent communication areas; determining a minimum external rectangular frame of each connected domain; determining the number of pedestrians in the minimum circumscribed rectangular frame according to the width information, the depth information and the foreground density of the minimum circumscribed rectangular frame; and calculating the sum of the number of the pedestrians contained in all the minimum bounding rectangle frames of each subregion image to obtain the number of the pedestrians contained in the subregion image.

In an alternative embodiment, after the i (i ═ 1, …, n) th segmentation is completed, that is, after the plurality of sub-region images are obtained, the foreground region at the depth, that is, the sub-region images are subjected to region connection processing, and the foreground region is scattered into a plurality of independent connected domains, where the number of connected domains is m. And calculating the minimum bounding rectangle of each connected domain. According to the formula, the prior knowledge is known as follows: the number P of pedestrians contained in the frame_j(j ═ 1,2, …, m) relates to the width w of the frame, the depth information d of the frame, and the foreground density p of the frame, and in general: the larger the bounding box width w, P_jThe larger; the greater the depth data d, P_jThe larger the value d represents the distance of the pedestrian from the lens, and the farther the pedestrian with the same size is from the lens, the smaller the imaging size is; the greater the foreground density P, P_jThe larger the foreground density p is, the more serious the foreground density p is, the larger the foreground density p is, the shielding problem of pedestrians is considered.

Further, the number of pedestrians contained in the minimum bounding rectangle frame is determined according to the width information, the depth information and the foreground density of the minimum bounding rectangle frame, and the method comprises the following steps: constructing a machine learning model, wherein the machine learning model is obtained by machine learning training by using a plurality of groups of training data, and each group of training data in the plurality of groups of training data comprises: the information of the width of the minimum circumscribed rectangular frame, the information of the depth of the minimum circumscribed rectangular frame, the foreground density of the minimum circumscribed rectangular frame and the number of pedestrians contained in the minimum circumscribed rectangular frame; and determining the number of pedestrians in the minimum bounding rectangle frame through a machine learning model.

In an alternative embodiment, as shown in formula 1, a machine learning algorithm such as linear regression is used to construct a mapping relationship between the number of pedestrians Pj and the 3 parameters: and f (w, d, p), and taking the calculated value of Pj as the number of pedestrians contained in the frame. Traversing all independent connected domains under the depth to obtain the number of pedestrians under the depth:

wherein N is_iThe number of pedestrians at the depth corresponding to the ith segmentation.

Further, according to the size of the depth information, the pedestrian motion area image is divided into a plurality of subarea images, and the method comprises the following steps: acquiring a maximum value and a minimum value in a plurality of depth information; determining the number and the distance of the groups according to the maximum value and the minimum value; and carrying out N times of foreground segmentation on the pedestrian movement region image according to the initial threshold and the termination threshold of each group of depth information to obtain N sub-region images, wherein N represents the number of groups.

In an optional embodiment, the depth information of an image depth information channel in a pedestrian motion area in an image is counted, and a corresponding depth information data histogram is generated, wherein the group distance of the histogram is m, and the group number is N. The size of m can be set according to the counting precision, the smaller m is, the higher the final counting precision is, the smaller the influence of phenomena such as pedestrian shielding on the precision is, and the histogram is schematically shown in a figure 2, wherein m is the group distance and represents the grouping interval; n is the number of groups, i.e. the number of the strips in the group, N is 8 in fig. 2, Min is the minimum value of the depth data, and Max is the maximum value of the depth data. And carrying out foreground segmentation on the pedestrian motion area for N times according to the group number N, wherein the threshold range of each segmentation is the starting threshold and the ending threshold of each group, and the pedestrian area in a certain depth range can be obtained by each segmentation.

Further, the method comprises the following steps of carrying out preset processing on the target scene image to obtain a pedestrian motion area image, wherein the method comprises the following steps: performing frame difference processing on the target scene image to obtain a pedestrian edge profile; performing morphological expansion and connected domain segmentation on the pedestrian edge contour to obtain a scattered motion region; and removing the non-pedestrian noise area in the scattered motion area according to the position information and the area information to obtain a pedestrian motion area image.

In an alternative embodiment, the predetermined processing on the target scene image may be to convert the image into a grayscale image by using RGB three-channel information in the image, perform grayscale subtraction on two or three adjacent frames in the video stream by using a frame difference method, for example, an adjacent frame difference method or a three-frame difference method, and if the three-frame difference method is used, the two subtraction results need to be performed by an intersection operation as a final result, so that the effect of the three-frame difference method is better. After the difference is made, the edge contour of the moving pedestrian is preliminarily extracted, and some noise regions of non-pedestrians are also included; further, morphological expansion and connected domain segmentation are carried out on the frame difference result to obtain a scattered motion region, a non-pedestrian noise region is removed according to prior indexes such as position information and area information, the frame difference result is optimized, a more accurate pedestrian motion region is obtained, and a pedestrian motion region image is obtained.

Example 2

The embodiment of the present application further provides a crowd counting apparatus, and it should be noted that the crowd counting apparatus according to the embodiment of the present application may be used to execute the crowd counting method according to the embodiment of the present application. The crowd counting device provided by the embodiment of the application is introduced below.

Fig. 3 is a schematic diagram of a people counting device according to an embodiment of the application. As shown in fig. 3, the apparatus includes:

a first acquiring unit 31, configured to acquire a target scene image having depth information, where the target scene includes pedestrians, and the depth information refers to a distance between each pixel point in the target scene image and the camera;

a processing unit 32, configured to perform predetermined processing on the target scene image to obtain a pedestrian movement region image, where other objects except pedestrians are not included in the pedestrian movement region image;

a dividing unit 33 configured to divide the pedestrian motion region image into a plurality of sub-region images according to the size of the depth information;

a second acquiring unit 34 configured to acquire the number of pedestrians included in each of the sub-region images;

and the calculating unit 35 is configured to calculate the sum of the numbers of pedestrians in all the sub-region images to obtain the number of pedestrians included in the target scene.

Further, the second acquisition unit includes: the processing module is used for carrying out region communication processing on the sub-region images to obtain a plurality of independent communication regions; the first determining module is used for determining the minimum external rectangular frame of each connected domain; the second determining module is used for determining the number of pedestrians in the minimum circumscribed rectangular frame according to the width information, the depth information and the foreground density of the minimum circumscribed rectangular frame; and the calculation module is used for calculating the sum of the number of the pedestrians contained in all the minimum circumscribed rectangular frames of the sub-region images to obtain the number of the pedestrians contained in the sub-region images.

Further, the second determining module includes: the construction submodule is used for constructing a machine learning model, wherein the machine learning model is obtained by using a plurality of groups of training data through machine learning training, and each group of training data in the plurality of groups of training data comprises: the information of the width of the minimum circumscribed rectangular frame, the information of the depth of the minimum circumscribed rectangular frame, the foreground density of the minimum circumscribed rectangular frame and the number of pedestrians contained in the minimum circumscribed rectangular frame; and the determining submodule is used for determining the number of the pedestrians in the minimum circumscribed rectangular frame through a machine learning model.

Further, the dividing unit includes: the acquisition module is used for acquiring the maximum value and the minimum value in the plurality of depth information; the third determining module is used for determining the group number and the group distance according to the maximum value and the minimum value; and the segmentation module is used for carrying out N times of foreground segmentation on the pedestrian movement region image according to the initial threshold and the termination threshold of each group of depth information to obtain N sub-region images, wherein N represents the number of groups.

Further, the processing unit includes: the first processing module is used for carrying out frame difference processing on the target scene image to obtain a pedestrian edge contour; the second processing module is used for performing morphological expansion and connected domain segmentation processing on the pedestrian edge contour to obtain a scattered motion region; and the removing module is used for removing the non-pedestrian noise area in the scattered motion area according to the position information and the area information to obtain a pedestrian motion area image.

Further, the processing unit further comprises: and the third processing module is used for carrying out binarization processing on the target scene image.

By applying the technical scheme, a target scene image with depth information is obtained through a first obtaining unit, the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; the method comprises the steps that a target scene image is subjected to preset processing through a processing unit, so that a pedestrian movement area image is obtained, and other objects except pedestrians are not included in the pedestrian movement area image; dividing the pedestrian movement area image into a plurality of subarea images according to the size of the depth information by a dividing unit; acquiring the number of pedestrians contained in each subarea image through a second acquisition unit; and calculating the sum of the pedestrian numbers in all the subarea images through a calculating unit to obtain the pedestrian number contained in the target scene. According to the method, the pedestrian motion area image is divided into the plurality of sub-area images, the sum of the pedestrian number in all the sub-area images is more accurate based on the calculation of the pedestrian number in the sub-area images, the pedestrian number contained in the whole target scene is accurately acquired, the accuracy of the people counting is improved, the technical effect of improving the accuracy under the dense scene particularly can be improved, and the technical problem that the counting accuracy of the people counting method in the prior art is lower is solved.

The crowd counting device comprises a processor and a memory, wherein the first acquiring unit, the processing unit, the dividing unit, the second acquiring unit, the calculating unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the number of pedestrians contained in the whole target scene is accurately acquired by adjusting the kernel parameters, so that the technical effect of improving the accuracy in crowd counting is achieved, particularly the technical effect of improving the accuracy in a dense scene is improved, and the technical problem that the counting accuracy of a crowd counting method in the prior art is low is solved.

The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.

The embodiment of the invention provides a computer-readable storage medium, which comprises a stored program, wherein when the program runs, a device where the computer-readable storage medium is located is controlled to execute the crowd counting method.

The embodiment of the invention provides a processor, which is used for running a program, wherein the program executes the crowd counting method during running.

The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the processor executes the program, at least the following steps are realized: step S101, acquiring a target scene image with depth information, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; step S102, carrying out preset processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image; step S103, dividing the pedestrian movement area image into a plurality of subarea images according to the depth information; step S104, acquiring the number of pedestrians contained in each subregion image; and step S105, calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene. The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program of initializing at least the following method steps when executed on a data processing device: step S101, acquiring a target scene image with depth information, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; step S102, carrying out preset processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image; step S103, dividing the pedestrian movement area image into a plurality of subarea images according to the depth information; step S104, acquiring the number of pedestrians contained in each subregion image; and step S105, calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

Examples

The embodiment relates to a specific crowd counting method, which specifically comprises the following steps:

the crowd counting refers to counting the number of pedestrians in a monitoring area, has significance in numerous scenes, can help various public areas such as shopping malls, tourist attractions, subway stations and the like to monitor the pedestrian flow density of specific areas in real time, and timely evacuate and guide the pedestrian flow, so that the queuing efficiency is improved, and dangerous events such as treading events are prevented. The following is a crowd counting method based on depth information, which can accurately count the number of pedestrians in a crowd.

The method comprises the following steps: firstly, obtaining a scene image I (R, G, B, D) with depth information, wherein R, G, B is a red color channel, a green color channel and a blue color channel of the image, and D is a depth information channel of the image;

step two: converting an image into a gray image by using RGB three-channel information in the image, performing gray level difference on two or three adjacent frames in a video stream by using a frame difference method, such as an adjacent frame difference method or a three-frame difference method, and preliminarily extracting edge contours of moving pedestrians after the difference is performed, wherein the edge contours include some non-pedestrian noise regions;

step three: and performing morphological expansion and connected domain segmentation on the frame difference result to obtain a scattered motion region. Eliminating noise regions of non-pedestrians according to prior indexes such as position information and area information, optimizing a frame difference result, and obtaining a more accurate pedestrian motion region;

step four: counting D channel depth information in a pedestrian movement area in the image, and generating a corresponding depth data histogram, wherein the group distance of the histogram is m, and the group number is n;

step five: after the ith (i is 1, …, n) division is completed, performing area communication processing on the foreground area at the depth, scattering the foreground area into a plurality of independent communication domains, recording the number of the communication domains as m, and calculating the minimum circumscribed rectangle frame of each communication domain. The number P of pedestrians contained in the frame_j(j ═ 1,2, …, m) relates to the width w of the frame, the depth information d of the frame, and the foreground density p of the frame, and in general: the larger the bounding box width w, P_jThe larger; the greater the depth data d, P_jThe larger the value d represents the distance of the pedestrian from the lens, and the farther the pedestrian with the same size is from the lens, the smaller the imaging size is; the greater the foreground density P, P_jThe larger the foreground density p is, the more serious the foreground density p is, the larger the foreground density p is, the shielding problem of pedestrians is considered. And (3) constructing a mapping relation between the pedestrian number Pj and the 3 parameters by utilizing a machine learning algorithm such as linear regression: p_jF (w, d, P) to calculate the resulting P_jThe value is taken as the number of pedestrians contained in the frame. Traversing all independent connected domains at the depth, and obtaining the number of pedestrians at the depth according to the following formula:

Step six: and traversing and counting the number of the pedestrians segmented for N times, and accurately obtaining the number N of the pedestrians in the scene according to the following formula.

Through the above description, the method of the application avoids complex calculation brought by adopting a deep learning algorithm, reduces the difficulty of deploying the algorithm to the end-side equipment, can adjust the accuracy of crowd counting simply by controlling algorithm parameters, and integrates image depth information in the fields of crowd counting, pedestrian number counting and the like so as to improve the counting accuracy; relevant influence factors such as foreground density, depth, width of a boundary frame and the like are comprehensively considered during calculation of the number of pedestrians, so that the influence of phenomena such as pedestrian shielding and the like on counting can be effectively reduced, and the counting precision is further improved; different requirements are met, and the precision in a dense scene can be particularly improved.

From the above description, it can be seen that the above-described embodiments of the present application achieve the following technical effects:

1) according to the technical scheme, a target scene image with depth information is obtained, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; then, performing preset processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image; dividing the pedestrian movement area image into a plurality of subarea images according to the depth information; acquiring the number of pedestrians contained in each subarea image; and finally, calculating the sum of the number of the pedestrians in all the subarea images to obtain the number of the pedestrians contained in the target scene. According to the method, the pedestrian movement area image with the depth information is divided into the plurality of sub-area images, and based on the calculation of the number of pedestrians in the sub-area images, the sum of the number of pedestrians in all the sub-area images is more accurate, the number of pedestrians contained in the whole target scene is accurately acquired, the accuracy of the people counting is improved, the technical effect of improving the accuracy under the dense scene particularly can be achieved, and the technical problem that the counting accuracy of the people counting method in the prior art is low is solved.

2) By applying the technical scheme, a target scene image with depth information is obtained through a first obtaining unit, the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera; the method comprises the steps that a target scene image is subjected to preset processing through a processing unit, so that a pedestrian movement area image is obtained, and other objects except pedestrians are not included in the pedestrian movement area image; dividing the pedestrian movement area image into a plurality of subarea images according to the size of the depth information by a dividing unit; acquiring the number of pedestrians contained in each subarea image through a second acquisition unit; and calculating the sum of the pedestrian numbers in all the subarea images through a calculating unit to obtain the pedestrian number contained in the target scene. According to the method, the pedestrian movement area image with the depth information is divided into the plurality of sub-area images, and based on the calculation of the number of pedestrians in the sub-area images, the sum of the number of pedestrians in all the sub-area images is more accurate, the number of pedestrians contained in the whole target scene is accurately acquired, the accuracy of the people counting is improved, the technical effect of improving the accuracy under the dense scene particularly can be achieved, and the technical problem that the counting accuracy of the people counting method in the prior art is low is solved.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of population counting, comprising:

acquiring a target scene image with depth information, wherein the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera;

performing predetermined processing on the target scene image to obtain a pedestrian movement area image, wherein other objects except pedestrians are not included in the pedestrian movement area image;

dividing the pedestrian motion area image into a plurality of sub-area images according to the size of the depth information;

acquiring the number of pedestrians contained in each subregion image;

and calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene.

2. The method of claim 1, wherein acquiring the number of pedestrians contained in each of the subregion images comprises:

performing region communication processing on the subregion images to obtain a plurality of independent communication domains;

determining a minimum bounding rectangle frame of each connected domain;

determining the number of pedestrians in the minimum circumscribed rectangular frame according to the width information, the depth information and the foreground density of the minimum circumscribed rectangular frame;

and calculating the sum of the pedestrian numbers contained in all the minimum circumscribed rectangular frames of the sub-region images to obtain the pedestrian number contained in the sub-region images.

3. The method according to claim 2, wherein determining the number of pedestrians contained in the minimum bounding rectangle frame according to the width information, the depth information and the foreground density of the minimum bounding rectangle frame comprises:

constructing a machine learning model, wherein the machine learning model is obtained by machine learning training by using a plurality of groups of training data, and each group of training data in the plurality of groups of training data comprises: the width information of the minimum circumscribed rectangular frame, the depth information of the minimum circumscribed rectangular frame, the foreground density of the minimum circumscribed rectangular frame and the number of pedestrians contained in the minimum circumscribed rectangular frame;

and determining the number of pedestrians in the minimum bounding rectangle frame through the machine learning model.

4. The method according to any one of claims 1 to 3, wherein segmenting the pedestrian motion region image into a plurality of sub-region images according to the size of the depth information comprises:

acquiring a maximum value and a minimum value in the depth information;

determining the number and the distance of the groups according to the maximum value and the minimum value;

and carrying out N times of foreground segmentation on the pedestrian movement region image according to the starting threshold and the ending threshold of each group of depth information to obtain N sub-region images, wherein N represents the number of the groups.

5. The method of claim 1, wherein performing the predetermined processing on the target scene image to obtain a pedestrian motion area image comprises:

performing frame difference processing on the target scene image to obtain a pedestrian edge profile;

performing morphological expansion and connected domain segmentation on the pedestrian edge contour to obtain a scattered motion region;

and removing the non-pedestrian noise area in the scattered motion area according to the position information and the area information to obtain the image of the pedestrian motion area.

6. The method of claim 1, wherein prior to the predetermined processing of the target scene image, the method further comprises:

and carrying out binarization processing on the target scene image.

7. A people counting device, comprising:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring a target scene image with depth information, the target scene comprises pedestrians, and the depth information refers to the distance between each pixel point in the target scene image and a camera;

the processing unit is used for carrying out preset processing on the target scene image to obtain a pedestrian movement area image, and other objects except pedestrians are not included in the pedestrian movement area image;

a dividing unit, configured to divide the pedestrian motion region image into a plurality of sub-region images according to the size of the depth information;

the second acquisition unit is used for acquiring the number of pedestrians contained in each subregion image;

and the calculating unit is used for calculating the sum of the pedestrian numbers in all the subarea images to obtain the pedestrian number contained in the target scene.

8. The apparatus of claim 7, wherein the second obtaining unit comprises:

the processing module is used for carrying out region communication processing on the subregion images to obtain a plurality of independent communication domains;

the first determining module is used for determining the minimum circumscribed rectangle frame of each connected domain;

the second determining module is used for determining the number of pedestrians in the minimum circumscribed rectangular frame according to the width information, the depth information and the foreground density of the minimum circumscribed rectangular frame;

and the calculation module is used for calculating the sum of the pedestrian numbers contained in all the minimum external rectangular frames of the sub-region images to obtain the pedestrian number contained in the sub-region images.

9. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the people counting method according to any one of claims 1 to 6.

10. A processor for running a program, wherein the program when running performs the people counting method of any one of claims 1 to 6.