WO2019114145A1

WO2019114145A1 - Head count detection method and device in surveillance video

Info

Publication number: WO2019114145A1
Application number: PCT/CN2018/079856
Authority: WO
Inventors: 刘若鹏; 钟凯宇
Original assignee: 深圳光启合众科技有限公司; 深圳光启创新技术有限公司
Priority date: 2017-12-12
Filing date: 2018-03-21
Publication date: 2019-06-20
Also published as: CN109918971B; CN109918971A

Abstract

Disclosed is a head count detection method and device in a surveillance video. The method comprises: histogram equalization is performed on each frame of image captured in the surveillance video to obtain an equalized image after histogram equalization; a human body in the equalized image is recognized by means of a cascade classifier; the cascade classifier is used for recognizing the human body according to human body features in the equalized image after histogram equalization; and the identified human body is counted. The present invention solves the technical problems in the prior art that the safety in public places cannot be guaranteed due to the inability to realize detection of an abnormal state of the crowd under remote monitoring.

Description

Method and device for detecting number of people in surveillance video

Technical field

The present invention relates to the field of video detection, and in particular to a method and apparatus for detecting a number of people in a surveillance video.

Background technique

People's safety requirements for public places are increasing. So far, there have been many violent terrorist incidents in China, causing a large number of casualties and property losses. Establishing an efficient and complete intelligent video surveillance system has become an urgent need in today's society. Although the video surveillance system, which is widely used at this stage, provides a large amount of video information, it does not have the ability to pre-alarm for emergencies and situations, and must be artificially involved in the monitoring work. With the continuous advancement of machine vision technology and image processing technology, traditional video surveillance systems that require a lot of manpower can no longer meet the needs of social development. A highly automated and intelligent next-generation video surveillance system will gradually replace the traditional video surveillance system in security. The position in the field, while ensuring system performance, frees manpower and thus reduces costs.

The video-based abnormal state detection of the crowd refers to the intelligent analysis of the behavioral state of the mass incidents in the public places of large-scale people, and judges whether there are detection methods such as crowd trampling, fighting, riots and other abnormal events. At present, the research on intelligent monitoring systems at home and abroad is at the initial stage of development, and there are few products that can be truly applied in real life. After a lot of research work by experts and scholars in recent years, in the aspect of video content analysis and understanding, focusing on population density estimation or a few individual studies has achieved certain scientific research results.

The scientific research results obtained at this stage are only applicable to the conditions of close-range monitoring equipment such as street and indoor. In such an environment, the target display resolution is high, the area is large, and the recognition difficulty is relatively low, but at a long distance. Under the conditions of detection, the scenes shot under such conditions are large, and the target characters are very small and very fuzzy, and the detection is more difficult, so the above scientific research results are not applicable.

In view of the above-mentioned problem of difficulty in detecting abnormal state of the population under remote monitoring, an effective solution has not yet been proposed.

Summary of the invention

The embodiments of the present invention provide a method and a device for detecting a number of people in a surveillance video, so as to at least solve the technical problem that the security of a public place cannot be guaranteed due to the failure of the prior art to detect the abnormal state of the crowd under remote monitoring.

According to an aspect of the embodiments of the present invention, a method for monitoring a number of people in a video is provided, including: performing histogram equalization on each frame image of the collected monitoring video to obtain equalization after histogram equalization. Identifying a human body in the equalized image by a cascade classifier, wherein the cascade classifier is configured to identify a human body according to a human body feature of the equalized image after the histogram equalization; The human body performs statistics.

Optionally, the cascade classifier is formed by at least two weak classifiers, wherein the cascade classifier is equalized according to the histogram equalization by the at least two weak classifiers that are superimposed The human body features of the image identify the human body.

Optionally, in the case that the collected monitoring video is colored, before performing histogram equalization on each frame of the collected monitoring video, the method further includes: Performing grayscale on each frame image in the monitoring video; performing histogram equalization on each frame image of the collected monitoring video includes: performing histogram equalization on each frame image after graying .

Optionally, before the human body in the equalized image is identified by the cascade classifier, the method further includes: extracting high frequency components in each frame image by using a Laplacian, And applying the weight to the high frequency component to obtain the enhanced high frequency component; superimposing the enhanced high frequency component on the histogram equalized image to obtain an enhanced equalized image; And the classifier, identifying the human body in the equalized image includes: identifying, by the cascade classifier, a human body in the enhanced equalized image.

Optionally, before the human body in the equalized image is identified by the cascade classifier, the method further includes: performing edge detection on the histogram equalized equalized image by using a Canny operator, Obtaining a contour included in the histogram equalized equalized image; determining an area of a contour included in the histogram equalized equalized image; and using a non-target contour having an area larger than a predetermined threshold Performing a filling to obtain an equalized image including the non-target contour, wherein the non-target contour is a non-target region identifying a human body; and the human body in the equalized image is identified by the cascade classifier The method includes: identifying, by the cascade classifier, a human body in the equalized image including the non-target contour.

Optionally, before the human body in the equalized image is identified by the cascading classifier, the method further includes: obtaining the cascading classifier by training a plurality of sets of data, where the multiple Each set of data in the group data includes: a sample image, and a human body recognition result for identifying whether the sample image includes a human body.

Optionally, the identification function of the weak classifier under the Haar-like rectangular feature is:

Where g _haar (x) is used to identify whether the equalized image includes a recognition result of the human body based on the human body feature x, f _j (x) is a feature value; θ _j is a threshold value of the weak classifier; j is used to identify j weak classifiers; α and β are the confidence of the classification result, the value range is [-1, +1], the negative is not the human body, and the regular is the human body.

According to another aspect of the present invention, an apparatus for monitoring the number of people in a video is provided, including: a first obtaining module, configured to perform histogram equalization on each frame image in the collected monitoring video, Obtaining a histogram equalized equalized image; an identifying module, configured to identify a human body in the equalized image by using a cascade classifier, wherein the cascaded classifier is configured to perform equalization according to a histogram The human body features of the equalized image identify the human body; the statistical module is used to perform statistics on the identified human body.

Optionally, the device further includes: a grayscale module, configured to perform histogram equalization on each frame of the collected monitoring video in a case where the collected monitoring video is colored Previously, each frame of the color of the surveillance video is grayscaled; and a module is obtained for performing histogram equalization on each frame of the grayscale image.

Optionally, the device further includes: a second obtaining module, configured to extract each frame by using a Laplacian before identifying the human body in the equalized image by using the cascade classifier a high-frequency component in the image, and assigning a weight to the high-frequency component to obtain an enhanced high-frequency component; and superimposing the enhanced high-frequency component on the histogram-equalized image to obtain an enhanced equalization And an identification module, configured to identify a human body in the enhanced equalized image by the cascade classifier.

Optionally, the device further includes: an obtaining module, configured to equalize the histogram equalization by using a Canny operator before identifying the human body in the equalized image by using the cascade classifier Performing edge detection on the image to obtain an outline included in the histogram equalized equalized image; determining an area of the contour included in the histogram equalized equalized image; and determining an area larger than a predetermined threshold The target contour is filled with the water, and the equalized image including the non-target contour is obtained, wherein the non-target contour is a non-target area for identifying the human body; and the identification module is configured to identify by the cascade classifier A human body in the equalized image of the non-target contour is included.

Optionally, the device further includes: a third obtaining module, configured to obtain the cascading classification by training a plurality of sets of data before identifying the human body in the equalized image by using the cascade classifier And each of the plurality of sets of data includes: a sample image, and a human body recognition result for identifying whether the sample image includes a human body.

According to another aspect of the embodiments of the present invention, there is also provided a robot comprising the number of persons detecting means in the monitoring video according to any one of the above.

According to another aspect of an embodiment of the present invention, a storage medium is provided, the storage medium including a stored program, wherein, when the program is running, controlling a device in which the storage medium is located performs any of the above Monitor the number of people in the video.

According to another aspect of the embodiments of the present invention, there is further provided a processor, wherein the processor is configured to execute a program, wherein the program is executed to perform a method for detecting a number of people in a monitoring video according to any one of the above.

In the embodiment of the present invention, the captured video is captured, and each frame image of the collected monitoring video is subjected to histogram equalization to obtain a histogram equalized equalized image, and is identified by a cascade classifier. The human body in the equalized image, by counting the human body identified from each frame image in the surveillance video, by designing a filtering and enhancement algorithm, and by testing several commonly used classifier algorithms, The Haar classifier with the best detection effect is continuously optimized to achieve the purpose of detecting portraits in remote monitoring, and realizes the technical effect of detecting and counting the number of people more accurately in remote monitoring, thereby solving the problem that cannot be detected due to the prior art. Under remote monitoring, the technical problems of public space safety cannot be guaranteed due to the detection of abnormal population status.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:

1 is a flow chart of a method for monitoring number of people in a video according to an embodiment of the present invention;

2 is a schematic diagram of a region of a Laplacian operator 3*3 according to an embodiment of the present invention;

3 is a schematic diagram of a Haar feature template in accordance with an embodiment of the present invention;

4 is a structural block diagram of a person detecting device in a monitoring video according to an embodiment of the present invention;

5 is a block diagram 1 of an optimized structure of a number of people detecting device in a surveillance video according to an embodiment of the present invention;

6 is a block diagram 2 of an optimized structure of a person detecting device in a monitoring video according to an embodiment of the present invention;

7 is a block diagram 3 of an optimized structure of a number of people detecting device in a monitoring video according to an embodiment of the present invention;

8 is a block diagram 4 of an optimized structure of a number of people detecting device in a surveillance video according to an embodiment of the present invention;

9 is a flowchart of an optimization method for monitoring a number of people in a video according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of a process for detecting a number of people based on a cascade classifier according to an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.

It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.

In accordance with an embodiment of the present invention, an embodiment of a method for monitoring the number of people in a video is provided. It is noted that the steps illustrated in the flowchart of the figures may be performed in a computer system such as a set of computer executable instructions. Also, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.

FIG. 1 is a flowchart of a method for monitoring number of people in a video according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:

Step S102, performing histogram equalization on each frame image in the collected monitoring video to obtain a histogram equalized equalized image;

Step S104, identifying a human body in the equalized image by using a cascade classifier, wherein the cascade classifier is configured to identify the human body according to the human body feature of the equalized image after the histogram equalization;

In step S106, statistics are performed on the identified human body.

In the embodiment of the present invention, a histogram equalization is performed on each frame image of the collected monitoring video by using the captured video, and the equalized image after the histogram equalization is obtained, and the cascaded classifier is used to identify the image. The human body in the image is equalized, and the human body identified from each frame image in the surveillance video is statistically analyzed, and an enhanced image histogram equalization algorithm is designed, and the cascading classification with the best portrait detection effect is designed. And the number of people is counted to achieve the purpose of detecting portraits in remote monitoring, and the technical effect of detecting and counting the number of people more accurately in remote monitoring is realized, thereby solving the problem that the existing technology cannot realize the crowd under remote monitoring. The technical problem that the safety of public places cannot be guaranteed due to the detection of abnormal conditions.

The cascade classifier may be superposed by at least two weak classifiers, wherein the cascade classifier identifies the human body according to the human body features of the histogram equalized image by the superimposed at least two weak classifiers.

Preferably, in the case that the collected monitoring video is colored, before the histogram equalization is performed on each frame of the collected monitoring video, the method may further include: displaying each frame image in the color monitoring video Performing grayscale; performing histogram equalization on each frame image of the collected monitoring video includes: performing histogram equalization on each frame image after graying.

Grayscale, in the RGB model, if R=G=B, the color represents a grayscale color, where R=G=B is the gray value, so the grayscale image only needs one pixel per pixel. The byte stores the gray value, and the gray scale ranges from 0 to 255. Image graying is required by the cascade classifier to gray out the original image as an input. For example, it can be implemented by calling OpenCV's cvCvtColor function.

Among them, OpenCV is a cross-platform computer vision library based on BSD license (open source) distribution, which is lightweight and efficient - consists of a series of C functions and a small number of C++ classes, and provides interfaces for languages such as Python, Ruby, and MATLAB. Achieve many general algorithms in image processing and computer vision.

It should be noted that since the distance of shooting is relatively long, the human body in the image is relatively small and blurred, so it is necessary to enhance the image before the detection. Histogram equalization is to enhance image contrast, improve image quality, and help detect effects. For example, an embodiment of the present invention uses an improved histogram equalization method: using a Laplacian operator to extract high frequency components in each frame image before identifying the human body in the equalized image by a cascade classifier, and The high frequency component is assigned a weight, and the enhanced high frequency component is obtained; the enhanced high frequency component is superimposed on the histogram equalized image to obtain an enhanced equalized image. Meanwhile, identifying the human body in the equalized image by the cascade classifier includes: identifying the human body in the enhanced equalized image through the cascade classifier.

The "central idea" of the histogram equalization process is to change the gray histogram of the original image from a certain gray interval in the comparison set to a uniform distribution in the entire gray range. Histogram equalization is to nonlinearly stretch the image and redistribute the image pixel values so that the number of pixels in a certain gray range is approximately the same. Histogram equalization is the change of the histogram distribution of a given image to a "uniform" distribution histogram distribution. But there are two disadvantages:

1) The gray level of the transformed image is reduced, and some details disappear;

2) Some images, such as histograms, have peaks, and the contrast is unnaturally over-enhanced after processing.

Considering the details of the human body tending to be in the low gray value part of the image, in order to improve the shortcomings caused by histogram equalization, enhance the performance of the detail part, and improve the algorithm by introducing histogram equalization into the edge information. The Laplace algorithm can achieve faster edge detection and better detection of high-frequency edges. The Laplacian is a second-order differential operator. In the case of discrete, it is expressed as follows:

among them

It can be represented in digital form in a variety of ways. For a 3*3 area, the most recommended form of experience is:

2 is a schematic diagram of a region of a Laplacian operator 3*3, wherein the region of 3*3 is as shown, in accordance with an embodiment of the present invention.

In summary, the improved histogram equalization steps are as follows:

(1) Using the Laplacian operator to extract the high-frequency component of the original image and assign the corresponding weight λ (in this embodiment, λ=3 is selected, and the enhanced image high-frequency component λ|f(x, y) is obtained. |;

(2) Obtain another image using conventional histogram equalization;

(3) Add the images obtained in two steps (1) and (2), and take the result of the pixel value exceeding 255 to obtain 255, to obtain the final enhanced image.

The improved histogram equalization method greatly enhances the details of the image and facilitates subsequent detection.

It should be noted that, because the person in the image appears to be small, the embodiment of the present invention can fill the contour area with a larger area by searching the contour, thereby eliminating the non-target area and improving the detection accuracy. Before the human body in the equalized image is identified by the cascade classifier, the Canny operator is used to perform edge detection on the histogram equalized equalized image to obtain a contour included in the histogram equalized equalized image; Determining an area of a contour included in the histogram equalized equalized image; filling a non-target contour having an area larger than a predetermined threshold with diffused water to obtain an equalized image including a non-target contour, wherein the non-target contour is Identify non-target areas of the human body. Meanwhile, identifying the human body in the equalized image by the cascade classifier includes: identifying the human body in the equalized image including the non-target contour by the cascade classifier.

It can be implemented by the following specific processes:

(1) Edge detection using the Canny operator;

(2) call findContours() of the OpenCV library to find the outline in the binary image;

(3) Call the OpenCV library's drawContours() to draw each outline, and use contourArea() to calculate each contour area. Fill the contour with the area larger than the threshold with the flood fill fill method, that is, call cvFloodFill in OpenCV.

At this point, most of the non-target areas in the figure can be removed.

Preferably, before the human body in the equalized image is identified by the cascade classifier, the method further includes: obtaining a cascade classifier by training the plurality of sets of data, wherein each of the plurality of sets of data includes: a sample An image, and a human body recognition result for identifying whether the sample image includes a human body. It should be noted that, when selecting a sample image of the training cascade classifier, an image of some specific representative scenes may be selected, so that the trained cascade classifier can identify the human body in the image. The recognition result is more accurate.

It should be noted that the core of the acquisition process of the cascaded classifier is to find a small and critical part of the feature from a large number of Haar-like features by iterative method, and use this feature to generate an effective classifier, using a large number of classification capabilities. The general weak classifiers are superimposed by a certain method to form a classifier with strong classification ability, and then these classifiers are cascaded to obtain the final strong classifier.

Among them, the Haar-like rectangle feature is a digital image feature for object detection. 3 is a schematic diagram of a Haar feature template according to an embodiment of the present invention. As shown in FIG. 3, such a rectangular feature template is composed of two or more congruent black and white rectangles adjacent to each other, and the rectangular feature value is a white rectangle. The sum of the gray value and the gray value of the black rectangle. The rectangle feature is sensitive to some simple graphic structures such as line segments and edges. If such a rectangle is placed in a non-face area, the calculated feature values should be different from the face feature values, so these rectangles are used to quantify face features to distinguish between faces and non-faces.

The feature-based approach was chosen without the pixel-based approach because, in the case of a given finite data sample, feature-based detection can not only encode the state of a particular region, but also be based on a feature-based system. The pixel system is fast.

Preferably, the recognition function of the weak classifier under the Haar-like rectangular feature may be:

Where g _haar (x) is used to identify whether the equalized image includes a recognition result of the human body based on the human body feature x, f _j (x) is a feature value; θ _j is a threshold value of the weak classifier; j is used to identify the jth Weak classifier; α and β are the confidence of the classification result, the value range is [-1, +1], the negative is not the human body, and the regular is the human body.

It should be noted that for the Haar-like weak classifier, a weak classifier corresponds to a Haar-like rectangle feature. The weak classifier form of the Haar-like feature is as described above. The classifier trained by OpenCV contains a series of feature thresholds. To determine whether the intercepted image passes the classifier, it is necessary to calculate the Haar-like eigenvalues of the image under all Haar-like feature templates, and compare the thresholds of the corresponding feature templates in the classifier.

According to another aspect of the present invention, there is also provided a device for monitoring the number of people in a video. FIG. 4 is a structural block diagram of a person detecting device in a monitoring video according to an embodiment of the present invention. As shown in FIG. 4, the monitoring is performed. The number of people in the video detecting device includes a first obtaining module 44, an identifying module 46, and a statistic module 48. The number of people detecting devices in the monitoring video will be described in detail below.

The first obtaining module 44 is configured to perform histogram equalization on each frame image of the collected monitoring video to obtain a histogram equalized equalized image;

The identification module 46 is connected to the first obtaining module 44 for identifying the human body in the equalized image by using a cascade classifier, wherein the cascade classifier is used for human body feature recognition of the equalized image after the histogram equalization Out of the body;

The statistics module 48 is connected to the identification module 46 for counting the identified human body.

FIG. 5 is a block diagram of an optimized structure of a person detecting device in a monitoring video according to an embodiment of the present invention. As shown in FIG. 5, the number of detecting devices in the monitoring video includes, in addition to all the structures in FIG. 4, a graying module. 52. The gradation module 52 will be described in detail below.

The graying module 52 is connected to the first obtaining module 44, and is configured to: before the collected monitoring video is colored, in the case of performing histogram equalization on each frame of the collected monitoring video, the coloring is performed. Each frame of the video in the surveillance video is grayed out.

FIG. 6 is a block diagram 2 of an optimized structure of a number of people detecting device in a monitoring video according to an embodiment of the present invention. As shown in FIG. 6 , the number of detecting devices in the monitoring video includes, in addition to all the structures in FIG. 4 , a second obtaining module. 62. The second obtaining module 62 will be described in detail below.

The second obtaining module 62 is connected to the first obtaining module 44 and the identifying module 46 for extracting each frame image by using a Laplacian before identifying the human body in the equalized image by the cascade classifier. The high frequency component is assigned to the high frequency component to obtain the enhanced high frequency component; the enhanced high frequency component is superimposed on the histogram equalized image to obtain the enhanced equalized image.

FIG. 7 is a block diagram 3 of an optimized structure of a person detecting device in a monitoring video according to an embodiment of the present invention. As shown in FIG. 7 , the number of detecting devices in the monitoring video includes: obtaining module 72 in addition to all the structures in FIG. 4 . The obtaining module 72 will be described in detail below.

The obtaining module 72 is connected to the first obtaining module 44 and the identifying module 46, and is configured to perform the histogram equalized equalized image by using the Canny operator before identifying the human body in the equalized image by the cascade classifier. Edge detection, obtaining a contour included in the histogram equalized equalized image; determining an area of a contour included in the histogram equalized equalized image; and using a diffused water for the non-target contour having an area larger than a predetermined threshold Filling, obtaining an equalized image including a non-target contour, wherein the non-target contour is a non-target area that identifies the human body.

FIG. 8 is a block diagram of an optimized structure of a number of people detecting device in a monitoring video according to an embodiment of the present invention. As shown in FIG. 8 , the number of detecting devices in the monitoring video includes, in addition to all the structures in FIG. 4 , a third obtaining module. 82. The third obtaining module 82 will be described in detail below.

The third obtaining module 82 is connected to the foregoing identifying module 46, and is configured to obtain a cascade classifier by training a plurality of sets of data before identifying the human body in the equalized image by using the cascade classifier, wherein the plurality of data sets Each set of data includes: a sample image, and a human body recognition result for identifying whether the sample image includes a human body.

FIG. 9 is a flowchart of an optimization method for monitoring a number of people in a video according to an embodiment of the present invention. As shown in FIG. 9, the method includes the following steps:

Step S902, cascading classifier training;

Among them, using OpenCV to train the cascading classifier based on Haar-like features, it is necessary to provide corresponding positive sample pictures and counter sample pictures of the features to be identified. The positive sample is a sample picture of a human body image; the negative sample is a background image, which requires no one, and the aspect ratio is 1:2. The cascaded classifier is trained by the corresponding program provided by OpenCV to extract features and train classifiers, and the trained classifier model can identify these things.

Step S904, pedestrian detection and population statistics.

The specific process of this step is:

(1) The input data to be detected is a real-time video, and grayscale processing is performed on each frame of the input video;

(2) Adopting an improved histogram equalization method to enhance the image;

(3) filling the area with a large contour by using the flood filling method;

(4) Then extract the feature of the preprocessed image, call the related method of the CascadeClassifier class in OpenCV, and extract the Haar-like feature;

(5) Loading the trained Haar feature classifier, detecting each frame of the input image, and marking the detected pedestrian with a yellow rectangular frame;

(6) Screening the test results of the classifier, and counting the number of people, marking the pedestrian position in the original icon, and finally displaying the number of people.

In the target screening process, since the target area is small, the result detected by the classifier may have a larger area, so by traversing the detection result of the classifier (rectangular frame), the height of the target rectangular frame is greater than the threshold. Eliminate, thereby improving detection accuracy.

Compared with the related art, a real-time video stream is collected, and a plurality of original sample map samples and a velocity sample map sample are obtained by line sampling based on the obtained video stream; space-time correction is performed on the obtained velocity sample map sample; based on the original sample map and The velocity sampling map, the offline training obtains the deep learning model, the deep learning model includes the classification model and the statistical model; and the obtained deep learning model is used for the population state analysis of the real-time video stream. The embodiments of the invention have good adaptability to different environments, light intensity, weather conditions and camera angles; for a crowded environment such as a large flow of people, a high accuracy rate can be ensured; the calculation amount is small, and the real-time video can be satisfied. The requirements for processing can be widely applied to the monitoring and management of public places densely populated by buses, subways and plazas.

In the related art, some research has been done on the population density estimation and motion analysis. In the population density estimation, for the low-density population, the pixel density method is used to estimate the population density. For the high-density population, the wavelet packet decomposition is used. The crowd image was analyzed by multi-scale. Finally, the population density level was classified by SVM (Support Vector Machine). In the motion analysis of the crowd, the block matching method based on the full search algorithm with the average absolute error as the matching criterion was used. The speed of the crowd is estimated.

In other related technologies, by analyzing the spectrogram of the crowd image, it is found that the spectrum images corresponding to the images of different population densities are significantly different, and the spectrum map of the crowd is regarded as the texture image, and the spectrum map of the population is adopted. The texture analysis method extracts the texture features for analysis, and finally uses the Adaboost classifier to classify the population density level.

The current research focus on population density estimation and motion analysis is to solve the problem that when there are a large number of pedestrians, there is a large amount of occlusion in the crowd, and it is difficult to accurately detect, segment and track individual pedestrians in the crowd accurately. Under the premise of not detecting and tracking a single target, using the characteristics of the foreground image as a whole, an effective statistical learning method is used to establish a reasonable decision rule, directly estimating the number of pedestrians, and determining the motion state of the target, and detecting the abnormal event. occur.

However, the solution in the above related art is only applicable to the conditions of a close-range monitoring device such as a street or an indoor. In such an environment, the target display resolution is high, the area is large, and the recognition difficulty is relatively low, but at a long distance. Under the conditions of detection, it is not applicable. The scenes shot under such conditions are large, the target characters are very small and very fuzzy, and the detection is more difficult. The above methods are not applicable.

According to the foregoing embodiment and the preferred embodiment, the monitoring video of the captured video is used to equalize the histogram of each frame in the collected monitoring video to obtain a histogram equalized image, and the histogram is identified by the cascade classifier. The human body in the image after equalization is used to statistically analyze the human body identified from each frame of the video in the surveillance video, by designing filtering and enhancement algorithms, and by testing several commonly used classifier algorithms. The Haar classifier with the best detection effect is obtained and continuously optimized to achieve the purpose of detecting portraits in remote monitoring, and the technical effect of detecting and counting the number of people more accurately in remote monitoring is realized, thereby solving the existing The technology cannot solve the technical problem that the safety of public places cannot be guaranteed due to the detection of abnormal state of the people under remote monitoring.

The problem solved by the embodiment of the present invention is that during the running of the cloud number, the monitoring camera of the cloud number will focus on monitoring some important places on the ground, and the number of people is counted to determine the security status of the place, and the warning effect of the crowd is intensive. Fully guarantee the effectiveness and accuracy of the warning.

The embodiment of the present invention can be applied to a system for counting the number of people, and can be used in a remote monitoring environment (such as cloud number, drone, etc.) to implement pedestrian detection, and count the number of pedestrians. Some warnings can be made.

10 is a schematic diagram of a process for detecting a number of people based on a cascaded classifier according to an embodiment of the present invention. As shown in the figure, to implement the detection of the number of people of an image, it is necessary to use a feature classifier for detecting human body features, and the classifier can include the identification image. The part of the human body is detected, and when a feature containing a person is detected, the number counter is automatically incremented by one. In the embodiment of the present invention, the cascading classifier is used for human body detection, and the corresponding program is designed based on the interface function of the pre-processing and cascading classifier in the OpenCV library.

According to another aspect of an embodiment of the present invention, there is also provided a robot comprising the number of persons detecting means in the monitoring video of any of the above.

According to another aspect of the embodiments of the present invention, there is further provided a storage medium, comprising: a stored program, wherein the device in which the storage medium is located controls the number of people in the monitoring video in any one of the above-mentioned items when the program is running.

According to another aspect of an embodiment of the present invention, there is further provided a processor for running a program, wherein the program is executed while performing the number of people in the monitoring video of any of the above.

The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

In the several embodiments provided herein, it should be understood that the disclosed technical contents may be implemented in other manners. The device embodiments described above are only schematic. For example, the division of the unit may be a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims

A method for detecting a number of people in a surveillance video, characterized in that it comprises:

Histogram equalization is performed on each frame image of the collected monitoring video to obtain a histogram equalized equalized image;

Recognizing a human body in the equalized image by a cascade classifier, wherein the cascade classifier is configured to identify a human body according to a human body feature of the equalized image after the histogram equalization;

Statistics are made on the identified human body.
The method according to claim 1, wherein the cascade classifier is formed by at least two weak classifiers, wherein the cascade classifier is based on the at least two weak classifiers superimposed The human body features of the histogram equalized image of the equalized image identify the human body.
The method of claim 1 wherein

In the case that the collected monitoring video is colored, before the histogram equalization is performed on each of the collected monitoring videos, the method further includes: in the color of the monitoring video Each frame of image is grayscaled;

Performing histogram equalization on each of the acquired images of the monitored video includes performing histogram equalization on each frame of the grayscaled image.
The method of claim 1 wherein

Before identifying the human body in the equalized image by the cascade classifier, the method further includes: extracting high frequency components in each frame image by using a Laplacian, and The high frequency component is assigned to the weight, and the enhanced high frequency component is obtained; the enhanced high frequency component is superimposed on the histogram equalized image to obtain the enhanced equalized image;

Identifying the human body in the equalized image by the cascade classifier includes: identifying, by the cascade classifier, a human body in the enhanced equalized image.
The method of claim 1 wherein

Before the human body in the equalized image is identified by the cascade classifier, the method further includes: performing edge detection on the histogram equalized equalized image by using a Canny operator to obtain the histogram a contour included in the equalized image after equalization; determining an area of a contour included in the histogram equalized equalized image; and filling a non-target contour having an area larger than a predetermined threshold with a flood water to obtain An equalized image of the non-target contour is included, wherein the non-target contour is a non-target area that identifies a human body;

Identifying the human body in the equalized image by the cascade classifier includes: identifying, by the cascade classifier, a human body in the equalized image including the non-target contour.
The method according to claim 1, wherein before the identifying the human body in the equalized image by the cascade classifier, the method further comprises:

The cascade classifier is obtained by training a plurality of sets of data, wherein each of the plurality of sets of data includes: a sample image, and a human body recognition result for identifying whether the sample image includes a human body.
The method according to any one of claims 2 to 6, wherein the identification function of the weak classifier under the Haar-like rectangular feature is:

Where g haar (x) is used to identify whether the equalized image includes a recognition result of the human body based on the human body feature x, f j (x) is a feature value; θ j is a threshold value of the weak classifier; j is used to identify j weak classifiers; α and β are the confidence of the classification result, the value range is [-1, +1], the negative is not the human body, and the regular is the human body.
A monitoring device for monitoring a number of people in a video, comprising:

a first obtaining module, configured to perform histogram equalization on each frame image of the collected monitoring video to obtain a histogram equalized equalized image;

An identification module, configured to identify a human body in the equalized image by using a cascade classifier, wherein the cascade classifier is configured to identify a human body according to a human body feature of the equalized image after the histogram equalization;

A statistical module for counting the identified human body.
The apparatus according to claim 8, wherein said cascade classifier is formed by at least two weak classifiers, wherein said cascade classifier is based on said at least two weak classifiers superimposed The human body features of the histogram equalized image of the equalized image identify the human body.
The device of claim 8 wherein:

The device further includes: a graying module, configured to: before the collected image of the monitored video is histogram equalized, in a case where the collected monitoring video is colored Each frame of the monitoring video is grayscaled;

A module is obtained for performing histogram equalization on each frame of image after gradation.
The device of claim 8 wherein:

The apparatus further includes: a second obtaining module, configured to extract, in the image of each frame by using a Laplacian, before identifying the human body in the equalized image by the cascade classifier a frequency component, and assigning a weight to the high frequency component to obtain an enhanced high frequency component; and superimposing the enhanced high frequency component on the histogram equalized image to obtain an enhanced equalized image;

An identification module, configured to identify a human body in the enhanced equalized image by the cascade classifier.
The device of claim 8 wherein:

The apparatus further includes: an obtaining module, configured to perform edge detection on the histogram equalized equalized image by using a Canny operator before identifying the human body in the equalized image by the cascade classifier Obtaining a contour included in the histogram equalized equalized image; determining an area of a contour included in the histogram equalized equalized image; and using a non-target contour having an area larger than a predetermined threshold Filling the water to obtain an equalized image including the non-target contour, wherein the non-target contour is a non-target area identifying the human body;

And an identification module, configured to identify a human body in the equalized image including the non-target contour by the cascade classifier.
The device according to claim 8, wherein the device further comprises:

a third obtaining module, configured to obtain the cascade classifier by training a plurality of sets of data before identifying the human body in the equalized image by using the cascade classifier, wherein the plurality of sets of data are Each set of data includes: a sample image, and a human body recognition result for identifying whether the sample image includes a human body.
The apparatus according to any one of claims 9 to 13, wherein the identification function of the weak classifier under the Haar-like rectangular feature is:

Where g haar (x) is used to identify whether the equalized image includes a recognition result of the human body based on the human body feature x, f j (x) is a feature value; θ j is a threshold value of the weak classifier; j is used to identify j weak classifiers; α and β are the confidence of the classification result, the value range is [-1, +1], the negative is not the human body, and the regular is the human body.
A robot, characterized in that the robot comprises the number of persons detecting means in the monitoring video according to any one of claims 8 to 14.
A storage medium, comprising: a stored program, wherein, when the program is running, controlling a device in which the storage medium is located performs the number of monitoring videos according to any one of claims 1 to 7. Detection method.
A processor, wherein the processor is configured to execute a program, wherein the program is executed to execute the method for detecting a number of people in the monitoring video according to any one of claims 1 to 7.