CN111325048A

CN111325048A - Personnel gathering detection method and device

Info

Publication number: CN111325048A
Application number: CN201811523516.9A
Authority: CN
Inventors: 曾钦清; 童超; 车军
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-06-23
Anticipated expiration: 2038-12-13
Also published as: CN111325048B

Abstract

The application provides a personnel gathering detection method and a personnel gathering detection device, wherein the method comprises the following steps: extracting a target to be detected from the collected video stream, and tracking the target to be detected to obtain a tracking linked list; determining the moving speed of each target to be detected according to the tracking linked list; judging whether the moving speed is less than a first preset threshold value, and determining a static area of a frame of image according to a plurality of areas corresponding to a plurality of targets to be detected with the moving speed less than the first preset threshold value in the frame of image; taking an image with a static area in the video stream as a candidate aggregation image; when the number N of the candidate aggregated images reaches a second preset threshold value, judging whether the candidate aggregated images are the personnel aggregated images or not through a preset personnel aggregation prediction model, and counting the number M of the candidate aggregated images as a judgment result; and when the M is determined to be larger than the third preset threshold value, determining that the people gathering event occurs. The method can improve the accuracy of personnel gathering detection.

Description

Personnel gathering detection method and device

Technical Field

The invention relates to the technical field of monitoring, in particular to a personnel gathering detection method and a personnel gathering detection device.

Background

In the video monitoring in the prior art, when a person gathering state occurs in a monitoring scene, the management risk and the control difficulty of a monitoring area are increased, and a scheme different from a normal state is required to be adopted to manage the monitoring scene.

Because the modern video monitoring system is huge in deployment amount and numerous in cameras, people gathering phenomenon is found in all monitoring scenes, a large amount of manpower is needed for monitoring all cameras for a long time, the labor cost is consumed, and the phenomenon of missing report is easily caused. Therefore, the finding of the people gathering phenomenon in the monitoring scene through the video analysis means becomes the demand of the intelligent monitoring system.

In the existing implementation, a method for detecting people gathering based on video is provided. Learning a monitoring area according to the continuous video images to obtain a current background image of the monitoring area; and performing threshold segmentation on the foreground image to obtain a segmented image, performing pixel statistics on a connected region of the target image, and judging whether a personnel gathering region exists according to the area of each connected region in the target image and a preset threshold area.

According to the method, the aggregation area is obtained by adopting a method of segmenting the static area image threshold, the aggregation judgment condition is simpler, the information of multi-frame videos is not fully utilized, the scene applicability is poorer, the false detection is generated by fusion under the condition of a more complex scene, and the detection accuracy is poorer.

Disclosure of Invention

In view of this, the present application provides a method and an apparatus for detecting people gathering, which can improve the accuracy of people gathering detection.

In order to solve the above technical problem, a first aspect of the present application provides a people gathering detection method, including:

extracting a target to be detected from the collected video stream, and tracking the target to be detected to obtain a tracking linked list;

determining the moving speed of each target to be detected according to the tracking linked list;

judging whether the moving speed is less than a first preset threshold value, and determining a static area of a frame of image according to a plurality of areas corresponding to a plurality of targets to be detected with the moving speed less than the first preset threshold value in the frame of image;

taking an image with a static area in the video stream as a candidate aggregation image; when the number N of the candidate aggregated images reaches a second preset threshold value, judging whether the candidate aggregated images are the personnel aggregated images or not through a preset personnel aggregation prediction model, and counting the number M of the candidate aggregated images as a judgment result;

and when the M is determined to be larger than the third preset threshold value, determining that the people gathering event occurs.

A second aspect of the present application provides a people gathering detection apparatus, the apparatus comprising: the device comprises an acquisition unit, a first determination unit, a second determination unit, a third determination unit, a fourth determination unit, a statistic unit and a fifth determination unit;

the acquisition unit is used for extracting a target to be detected from the acquired video stream, tracking the target to be detected and acquiring a tracking linked list;

the first determining unit is used for determining the moving speed of each target to be detected according to the tracking linked list acquired by the acquiring unit;

the second determining unit is configured to determine whether the moving speed determined by the first determining unit is smaller than a first preset threshold, and determine a static area of a frame of image according to a plurality of areas corresponding to a plurality of targets to be detected in the frame of image, where the moving speed is smaller than the first preset threshold;

the third determining unit is used for taking the image of the static area determined by the second determining unit in the video stream as a candidate aggregation image; determining whether the number N of candidate aggregated images reaches a second preset threshold;

the fourth determining unit is configured to determine, when the third unit determines that the number N of candidate aggregated images reaches a second preset threshold, whether the candidate aggregated image is a person aggregated image by a preset person aggregation prediction model;

the statistical unit is used for counting the number M of the personnel gathering images which are judged by the fourth determining unit;

and the fifth determining unit is used for determining that a people gathering event occurs when the M counted by the counting unit is determined to be greater than a third preset threshold value.

A third aspect of the application provides a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the people gathering detection method as described.

An electronic device comprising the non-transitory computer readable storage medium as described above, and the processor having access to the non-transitory computer readable storage medium.

The method and the device have the advantages that the preset personnel gathering prediction model obtained through static area recognition and deep learning is combined, the multi-frame image of the video image is subjected to secondary recognition, the process information of personnel gathering events is fully utilized, and the accuracy of personnel gathering detection can be improved.

Drawings

FIG. 1 is a schematic diagram of a process for detecting people gathering in an embodiment of the present application;

FIG. 2 is a schematic diagram of a location area where an aggregation event occurs in an embodiment of the present application;

fig. 3 is a schematic structural diagram of an apparatus applied to the above-described technology in the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clearly apparent, the technical solutions of the present invention are described in detail below with reference to the accompanying drawings and examples.

The embodiment of the application provides a personnel gathering detection method, a preset personnel gathering prediction model obtained through static region identification and deep learning is combined, multi-frame images of video images are subjected to secondary identification, process information of personnel gathering events is fully utilized, and the accuracy of personnel gathering detection can be improved.

The method and the device are applied to detection of the personnel gathering events in public places and important areas, and the process for realizing the personnel gathering detection in the embodiment of the method and the device is described in detail below with reference to the attached drawings.

For convenience of description, the apparatus that implements the person gathering detection is hereinafter simply referred to as a detection apparatus.

Referring to fig. 1, fig. 1 is a schematic diagram of a flow of detecting people gathering in the embodiment of the present application. The method comprises the following specific steps:

step 101, extracting a target to be detected from a collected video stream by a detection device, and tracking the target to be detected to obtain a tracking linked list.

The collected video stream can be used for acquiring video images in a monitored scene in real time through video monitoring equipment such as a camera and the like and transmitting the video images to detection equipment, and the detection equipment receives and stores the video stream sent by the video monitoring equipment, so that the video stream can be acquired in real time.

In this step, the target to be detected is extracted from the collected video stream, and the target to be detected is tracked to obtain a tracking linked list, which includes the following two steps:

the method comprises the steps of firstly, obtaining a foreground image of each frame of image of a video stream, and obtaining a target to be detected in the foreground image.

This step can be implemented in two ways, but is not limited to the following two ways:

the first method comprises the following steps:

foreground objects can be extracted from the video stream through a foreground model for foreground detection, so that the foreground objects are used as detection targets of target personnel. The background modeling method may include a gaussian mixture Model (gaussian mixture Model), a ViBe (visual background extraction) algorithm, and the like.

And the second method comprises the following steps:

the feature object can be extracted from the video stream through a trained Convolutional Neural Network (Convolutional Neural Network), so that the feature object is used as the detection object of the target person. The convolutional neural network can identify a characteristic target appearing in one frame of image in a video by training personnel characteristics in advance, and as an embodiment, the convolutional neural network can be trained by limbs of personnel, so that the subsequently trained convolutional neural network can extract the limb target of the personnel from a video stream, and thus the target to be detected is obtained.

And secondly, tracking each target to be detected in the foreground image of each frame of image to obtain a tracking linked list.

After the target to be detected is obtained, the obtained target to be detected can be tracked, and the tracking result is recorded in the tracking table.

In specific implementation, the detection target can be tracked through modes such as Kalman filtering, particle filtering or multi-target tracking technology.

In particular implementations, a tracking list table may be used to generate one tracking entry for each detected target. A tracking linked list may also be generated for a detection target using a tracking linked list.

The tracking chain table in the embodiment of the application at least comprises: the mapping relation comprises the mark of the target to be detected, the video frame mark of the video frame where the target to be detected is located, and the historical coordinate of the target to be detected.

And the historical coordinates of the target to be detected are the coordinates of the central point of the circumscribed rectangular frame corresponding to the outline of the target to be detected.

When the tracking linked list is specifically implemented, the tracking linked list can be implemented in the form of the following table. Referring to table 1, table 1 is the contents included in the tracking chain table in the embodiment of the present application.

TABLE 1

The historical coordinates of the object 1 to be detected appearing in the video frames 2, 3 and 8, and the coordinate information corresponding to each time, that is, the two-dimensional coordinate information are shown in table 1. The coordinate at time t1 may be (3,5), etc.

And 102, the detection equipment determines the moving speed of each target to be detected according to the tracking linked list.

In this step, determining the moving speed of each target to be detected according to the tracking linked list includes:

and calculating the moving speed of each target to be detected within a preset time according to the historical coordinates of each target to be detected in the tracking table.

In specific implementation, the moving speed of the target to be detected can be calculated according to the following formula:

wherein, (x [ T ], y [ T ]) is the historical coordinate of the target to be detected in the tracking table at the moment T, (x [ T-T ], y [ T-T ]) is the historical coordinate of the target to be detected in the tracking table at the moment T-T, and T is the preset duration.

Taking table 1 as an example, assuming that the time interval between T1 and T2 is a preset time length T, the moving speed v [ T2] of the object 1 to be detected at time T2:

assuming that the coordinates of the object 1 to be detected at time t2(13:45:38) are (6,9) and the coordinates at time t1(13:45:36) are (3,5), the moving speed at time t2 is determined as: 2.5CM/S, where the displacement is in CM, and the movement unit can be determined according to actual needs in practical application.

In the embodiment of the application, the moving speed of the detection target is calculated at the Tth time, the moving speed of the detection target before the Tth time can be ignored, T can be configured according to actual needs, and the smaller point can be set for more accurate speed calculation.

Step 103, the detection device determines whether the moving speed is less than a first preset threshold, and determines a static area of a frame of image according to a plurality of areas corresponding to a plurality of targets to be detected in the frame of image, where the moving speed is less than the first preset threshold.

In this step, determining a static area of a frame of image according to a plurality of areas corresponding to a plurality of targets to be detected whose moving speeds are smaller than a first preset threshold in the frame of image, includes:

when K objects to be detected with the moving speed smaller than a first preset threshold exist in a frame image, taking the maximum external rectangular frame corresponding to the union set of the K objects to be detected in the frame image as the static area of the frame image, or directly taking the union set of the K objects to be detected in the frame image as the static area of the frame image. And K is greater than a fourth preset value, and the value of the fourth preset value is determined according to actual requirements, such as 5, 8 and the like.

Referring to fig. 2, fig. 2 is a schematic view of a static area in the embodiment of the present application. In fig. 2, a circumscribed rectangular frame corresponding to a union set of target regions corresponding to 7 detection targets is used as a static region, a region corresponding to each detection target is also marked by a rectangular frame, and there is an overlapping portion between regions corresponding to the detection target 6 and the detection target 7 in fig. 2.

In the embodiment of the application, when the first preset value of the moving speed cell of a plurality of targets exists, the area corresponding to the union set of the areas corresponding to the plurality of targets is used as a static area, and the image with the static area is used as a candidate aggregated image for secondary identification; this can prevent frequent secondary recognition of the person-focused image.

104, the detection device takes the image with static area in the video stream as a candidate gathering image; when the number N of the candidate aggregated images reaches a second preset threshold value, judging whether the candidate aggregated images are the personnel aggregated images or not through a preset personnel aggregation prediction model, and counting the number M of the candidate aggregated images as a judgment result.

The personnel gathering prediction model in the embodiment of the application is obtained by training images of a plurality of gathering events and images of non-gathering events as sample images, and the specific training process is as follows:

the personnel gathering prediction model consists of a convolutional neural network model and a regression learning model; outputting confidence of aggregation and non-aggregation when inputting an image based on the convolutional neural network model; setting an aggregation confidence threshold value in a regression learning model, and inputting the output of the convolutional neural network model based on the regression learning model, namely the confidence of aggregation and non-aggregation; outputting whether the image is a person aggregated image identifier or not, and outputting the person aggregated image identifier when the aggregated confidence coefficient is greater than an aggregated confidence coefficient threshold; otherwise, outputting the non-person aggregated image identifier.

The aggregation confidence threshold may be set according to actual needs, and is not limited in this embodiment.

The establishment of the convolutional neural network model specifically comprises the following steps:

learning in a convolutional neural network by taking A images of the aggregation events and B images of the non-aggregation events as sample data to obtain the capability of identifying the aggregation and non-aggregation images, and further establishing a convolutional neural network model. The convolutional neural network learns according to the input data and the class labels to obtain the capability of identifying aggregated and non-aggregated images. The convolutional neural network can adopt Googlenet, ResNet, VGG, Alexnet and other deep learning networks.

The detection device continues to acquire the candidate aggregated images when determining that the number N of candidate aggregated images does not reach the second preset threshold.

M, N is set according to the practical application environment, and is not limited specifically, M is an integer not greater than N, N and is an integer greater than 0.

And inputting an image based on the trained personnel gathering preset model, and outputting whether the image is the personnel gathering image identification.

When determining that M is greater than a third preset threshold, the detection device determines that a people gathering event occurs, step 105.

And when the M is not more than a third preset value, emptying the current candidate aggregated image, and acquiring the candidate aggregated image again according to the video stream acquired in real time.

In this embodiment of the application, when the detection device determines that a people gathering event occurs, the method further includes:

the detection equipment sends out alarm information, wherein the alarm information comprises an aggregation event occurrence mark and a position area where an aggregation event occurs.

The sign of the occurrence aggregation event is to display the occurrence of the current aggregation event, and the sign can be realized by using characters and symbols, can also be realized by using patterns such as red, yellow and the like, and is not limited to the above implementation manner;

the location area where the gathering event occurs may be an area gathered by marking people in the image; the following expressions are given but not limited to when the present application is embodied:

the image with the highest confidence of aggregation is selected as the image to be displayed, and a static area in the image is displayed.

When an alarm is given, the administrator can take corresponding early warning measures such as alarming, warning, dredging and the like according to the current actual environment;

or a processing strategy can be configured in advance on the equipment aiming at the alarm information, namely the corresponding relation between the alarm information and the processing strategy is configured;

when alarm occurs, the alarm information is matched with the configured alarm information, and if the matching is successful, the processing strategy corresponding to the corresponding alarm information is used for processing, such as alarm, warning in a broadcasting mode, dredging and the like.

The implementation mode of timely alarming can be fast, and gathering personnel can be informed in time, so that the life safety of the personnel is guaranteed.

Based on the same inventive concept, the application also provides a personnel gathering detection device. Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus applied to the above technology in the embodiment of the present application. The device includes: an acquisition unit 301, a first determination unit 302, a second determination unit 303, a third determination unit 304, a fourth determination unit 305, a statistic unit 306, and a fifth determination unit 307;

an obtaining unit 301, configured to extract a target to be detected from an acquired video stream, track the target to be detected, and obtain a tracking linked list;

a first determining unit 302, configured to determine a moving speed of each target to be detected according to the tracking linked list acquired by the acquiring unit 301;

a second determining unit 303, configured to determine whether the moving speed determined by the first determining unit 302 is smaller than a first preset threshold, and determine a static area of a frame of image according to a plurality of areas corresponding to a plurality of targets to be detected in the frame of image, where the moving speed is smaller than the first preset threshold;

a third determining unit 304 for determining that the image of the static area determined by the second determining unit 303 exists in the video stream as a candidate aggregated image; determining whether the number N of candidate aggregated images reaches a second preset threshold;

a fourth determining unit 305 configured to determine, when the third unit determines that the number N of candidate aggregated images reaches a second preset threshold, whether the candidate aggregated image is a person aggregated image by a preset person aggregation prediction model;

a counting unit 306, configured to count the number M of the person aggregated images as a result of the determination by the fourth determining unit 305;

a fifth determining unit 307, configured to determine that a people group event occurs when it is determined that M counted by the counting unit 306 is greater than a third preset threshold.

Preferably, the first and second liquid crystal films are made of a polymer,

the obtaining unit 301 is specifically configured to extract a target to be detected from a collected video stream, track the target to be detected, and obtain a tracking linked list, and includes: obtaining a foreground image of each frame of image of the video stream; and tracking each target to be detected in the foreground image of each frame of image to obtain a tracking linked list.

Preferably, the first and second liquid crystal films are made of a polymer,

the first determining unit 302 is specifically configured to, when determining the moving speed of each target to be detected according to the tracking linked list, include: calculating the moving speed of each target to be detected within a preset time length according to the historical coordinates of each target to be detected in the tracking table; the tracking linked list comprises an identifier of the target to be detected, a video frame identifier of a video frame where the target to be detected is located, and a mapping relation of historical coordinates of the target to be detected.

Preferably, the first and second liquid crystal films are made of a polymer,

a first determining unit 302, configured to calculate a preset time duration according to the historical coordinates of each target to be detected in the tracking tableWhen the moving speed of each target to be detected is higher than the moving speed of each target to be detected, the method comprises the following steps: calculating the moving speed v [ t ] of the target to be detected according to the following formula]：

Wherein, (x [ t ]],y[t]) For the historical coordinates of the target to be detected in the tracking table at the time T, (x [ T-T ]],y[t-T]) The historical coordinate of the detection target in the tracking table at the time T-T is shown, and T is preset duration.

Preferably, the first and second liquid crystal films are made of a polymer,

the fifth determining unit 307 is further configured to, when it is determined that M is not greater than the third preset value, clear the current candidate aggregate image, and trigger the acquiring unit 301 to acquire the candidate aggregate image again according to the video stream acquired in real time.

Preferably, the first and second liquid crystal films are made of a polymer,

the people gathering prediction model is obtained by training a plurality of images of gathering events and images of non-gathering events as sample images.

The units of the above embodiments may be integrated into one body, or may be separately deployed; may be combined into one unit or further divided into a plurality of sub-units.

Further, a non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the people gathering detection method is also provided in an embodiment of the present application.

Additionally, an electronic device is also provided, comprising the non-transitory computer-readable storage medium as described, and the processor having access to the non-transitory computer-readable storage medium.

To sum up, this application combines together through static region discernment and degree of depth study, carries out the secondary discernment to the multiframe image of video image, make full use of the process information that personnel's gathering incident took place, can improve personnel's gathering accuracy that detects.

In the embodiment of the application, an off-line deep learning multi-frame secondary identification method is adopted, so that the process information of the event occurrence is fully learned, and the event judgment is more accurate;

the deep learning method is combined with the auxiliary judgment of static region identification, and is different from the method only using basic information or the deep learning method for judgment, so that the false alarm can be effectively reduced. Meanwhile, the position of the aggregation area can be output, the information output is richer, and the post-processing of alarming is facilitated.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A people gathering detection method, the method comprising:

2. The method according to claim 1, wherein the extracting the target to be detected from the collected video stream and tracking the target to be detected to obtain a tracking linked list comprises:

obtaining a foreground image of each frame of image of the video stream;

and tracking each target to be detected in the foreground image of each frame of image to obtain a tracking linked list.

3. The method according to claim 1, wherein the tracking linked list includes a mapping relationship between an identifier of the target to be detected, a video frame identifier of a video frame where the target to be detected is located, and a historical coordinate of the target to be detected;

determining the moving speed of each target to be detected according to the tracking linked list comprises the following steps:

4. The method according to claim 3, wherein the calculating the moving speed of each target to be detected within a preset time period according to the historical coordinates of each target to be detected in the tracking table comprises:

calculating the moving speed v [ t ] of the target to be detected according to the following formula:

5. The method of claim 1, further comprising:

6. The method according to any one of claims 1 to 5, wherein the people-gathering prediction model is trained by using images of a plurality of gathering events and images of non-gathering events as sample images.

7. A people gathering detection device, characterized in that the device comprises:

8. The apparatus of claim 7,

the acquiring unit is specifically configured to extract a target to be detected from a collected video stream, track the target to be detected, and obtain a tracking linked list, and includes: obtaining a foreground image of each frame of image of the video stream; and tracking each target to be detected in the foreground image of each frame of image to obtain a tracking linked list.

9. The apparatus of claim 7,

the first determining unit, specifically configured to determine the moving speed of each target to be detected according to the tracking linked list, includes: calculating the moving speed of each target to be detected within a preset time length according to the historical coordinates of each target to be detected in the tracking table; the tracking linked list comprises an identifier of the target to be detected, a video frame identifier of a video frame where the target to be detected is located, and a mapping relation of historical coordinates of the target to be detected.

10. The apparatus of claim 9,

the first determining unit is specifically configured to, when calculating the moving speed of each target to be detected within a preset time period according to the historical coordinates of each target to be detected in the tracking table, include: calculating the moving speed v [ t ] of the target to be detected according to the following formula]：

11. The apparatus of claim 7,

the fifth determining unit is further configured to empty the current candidate aggregate image and trigger the obtaining unit to obtain the candidate aggregate image again according to the video stream acquired in real time when it is determined that M is not greater than the third preset value.

12. The apparatus according to any one of claims 7-11, wherein the people-gathering prediction model is trained with images of a plurality of gathering events and images of non-gathering events as sample images.

13. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the people gathering detection method of any one of claims 1 to 6.

14. An electronic device comprising the non-transitory computer readable storage medium of claim 13, and the processor having access to the non-transitory computer readable storage medium.