CN110390226B

CN110390226B - Crowd event identification method and device, electronic equipment and system

Info

Publication number: CN110390226B
Application number: CN201810340168.5A
Authority: CN
Inventors: 曾钦清; 童超; 车军
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-04-16
Filing date: 2018-04-16
Publication date: 2021-09-21
Anticipated expiration: 2038-04-16
Also published as: CN110390226A

Abstract

The invention discloses a crowd event identification method, a device, electronic equipment and a system, and belongs to the field of crowd event identification. The method comprises the following steps: acquiring a monitoring video of a target area; acquiring at least one target monitoring image group from a monitoring video, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames; performing crowd event recognition on each target monitoring image group through a preset crowd event model, wherein the crowd event model is obtained by training according to a plurality of crowd event image groups, and each crowd event image group comprises a plurality of crowd event images reflecting the occurrence process of crowd events; and determining whether the target area has the crowd event according to the recognition result. The method and the device can improve the accuracy of crowd event identification.

Description

Crowd event identification method and device, electronic equipment and system

Technical Field

The invention relates to the field of crowd event identification, in particular to a crowd event identification method, a device, electronic equipment and a system.

Background

With the rapid development of economy and the continuous increase of social activities of people, the probability of crowd events in public places such as transportation hubs, superstores, large activity sites and the like is higher and higher. By crowd event is meant an event where a crowd suddenly gathers in large quantities or where a crowd suddenly disperses in large quantities. The crowd events are likely to be caused by sudden public safety events and are likely to cause accidents such as trampling. For example, when a public security incident such as a terrorist attack occurs, a crowd incident in which a crowd suddenly disperses in a large amount is likely to occur, and when a crowd incident in which a crowd suddenly gathers in a large amount occurs, the crowd incident is likely to change to a security incident such as stepping on. Therefore, the method has important significance in rapidly handling sudden public safety events and avoiding safety accidents such as trampling and the like.

In the related art, when a crowd event is identified, a monitoring video frame can be obtained from a monitoring video at intervals, and a crowd density map corresponding to the monitoring video frame is obtained, wherein the crowd density map can reflect the density of pedestrians, and when the density of the pedestrians reflected by the crowd density map is higher, the crowd event can be considered to occur.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

the information reflected by the crowd density graph corresponding to one monitoring video frame is single, and the crowd event identification accuracy is low easily caused by adopting the crowd density graph to identify the crowd event.

Disclosure of Invention

The embodiment of the invention provides a crowd event identification method, a device, electronic equipment and a system, which can improve the accuracy of crowd event identification. The technical scheme is as follows:

in a first aspect, a crowd event identification method is provided, the method comprising:

acquiring a monitoring video of a target area;

acquiring at least one target monitoring image group from the monitoring video, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames;

performing crowd event recognition on each target monitoring image group through a preset crowd event model, wherein the crowd event model is obtained by training according to a plurality of crowd event image groups, and each crowd event image group comprises a plurality of crowd event images reflecting the occurrence process of a crowd event;

and determining whether the target area has a crowd event according to the recognition result.

Optionally, the monitoring video includes a plurality of monitoring image groups, each monitoring image group includes a plurality of continuous monitoring video frames, each monitoring video frame includes an image of a pedestrian, and the acquiring at least one target monitoring image group from the monitoring video includes:

for each monitoring image group, acquiring an image of a pedestrian in each monitoring video frame of the monitoring image group;

judging whether the position change of the pedestrian meets a preset condition according to the image of the pedestrian in each monitoring video frame, wherein the preset condition is a position change condition reflecting convergence or dispersion of the pedestrian;

and when the position change meets the preset condition, acquiring the monitoring image group as the target monitoring image group.

Optionally, the determining, according to the image of the pedestrian in each monitored video frame, whether the position change of the pedestrian meets a preset condition includes:

respectively generating pedestrian histograms for a plurality of monitoring video frames of the monitoring image group by adopting the same generation mode, wherein each monitoring video frame is divided into a plurality of regions at equal intervals in a first direction, each pedestrian histogram comprises a plurality of rectangular bars, the positions of the rectangular bars on a first axis correspond to the regions of the corresponding monitoring video frame one by one, the length of each rectangular bar on a second axis represents the sum of pixel values of pedestrian images in the corresponding region, and the first axis and the second axis are vertical to each other;

determining a central rectangular strip and an edge rectangular strip from each pedestrian histogram to obtain a plurality of central rectangular strips and a plurality of edge rectangular strips;

and judging whether the position change of the pedestrian meets the preset condition according to the length change conditions of the central rectangular strips and the edge rectangular strips.

Optionally, the arrangement sequence of the central rectangular strips and the edge rectangular strips is consistent with the acquisition sequence of the monitoring video frames of the monitoring image group, and the determining whether the position change of the pedestrian meets the preset condition according to the length change condition of the central rectangular strips and the edge rectangular strips includes:

and when the lengths of the central rectangular strips are gradually increased and the lengths of the edge rectangular strips are gradually decreased, determining that the position change of the pedestrian meets the preset condition.

Optionally, the arrangement sequence of the central rectangular strips and the edge rectangular strips is consistent with the acquisition sequence of the monitoring video frames of the monitoring image group, and the determining whether the position change of the pedestrians meets the preset condition according to the length change condition of the central rectangular strips and the edge rectangular strips includes:

and when the lengths of the edge rectangular strips are gradually increased and the lengths of the central rectangular strips are gradually decreased, determining that the position change of the pedestrian meets the preset condition.

Optionally, each of the surveillance video frames is divided into a plurality of regions at equal intervals in a first direction by using a target length as a dividing unit, where the target length is a length of a pixel in the first direction, and the first direction is a row direction or a column direction.

Optionally, the performing crowd event recognition on each target monitoring image group through a preset crowd event model includes:

for each target monitoring image group, acquiring a first crowd density map corresponding to each monitoring video frame in the target monitoring image group to obtain a plurality of first crowd density maps;

acquiring a first light flow diagram corresponding to images of pedestrians in a plurality of monitoring video frames included in the target monitoring image group;

performing crowd event recognition on the target monitoring image group based on the plurality of first crowd density maps and the first light flow map through the crowd event model.

Optionally, before performing crowd event recognition on each target monitoring image group through a preset crowd event model, the method further includes:

for each group of crowd event images in the plurality of groups of crowd event images, acquiring a second crowd density map corresponding to each crowd event image in the group of crowd event images to obtain a plurality of second crowd density maps;

acquiring a second light flow diagram corresponding to a pedestrian image in a plurality of crowd event images included in the crowd event image group;

performing model training based on the plurality of second crowd density maps and the second optical flow map to obtain the crowd event model.

Optionally, the crowd event model is a convolutional neural network model.

In a second aspect, there is provided a crowd event identification apparatus, the apparatus comprising:

the video acquisition module is used for acquiring a monitoring video of a target area;

the image group acquisition module is used for acquiring at least one target monitoring image group from the monitoring video, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames;

the identification module is used for carrying out crowd event identification on each target monitoring image group through a preset crowd event model, the crowd event model is obtained by training according to a plurality of crowd event image groups, and each crowd event image group comprises a plurality of crowd event images reflecting the occurrence process of crowd events;

and the determining module is used for determining whether the target area has the crowd event according to the recognition result.

Optionally, the surveillance video includes a plurality of surveillance image groups, each of the surveillance image groups includes a plurality of consecutive surveillance video frames, each of the surveillance video frames includes an image of a pedestrian, and the image group acquiring module is configured to:

Optionally, the image group acquiring module is configured to:

Optionally, the arrangement order of the central rectangular strips and the edge rectangular strips is consistent with the acquisition order of the monitoring video frames of the monitoring image group, and the image group acquisition module is configured to:

Optionally, the identification module is configured to:

Optionally, the apparatus further includes a training module, where the training module is configured to:

Optionally, the crowd event model is a convolutional neural network model.

In a third aspect, an electronic device is provided, comprising a processor and a memory, wherein the memory is configured to store a computer program;

the processor is configured to execute the program stored in the memory to implement the crowd event identification method according to any one of the first aspect.

In a fourth aspect, there is provided a crowd event identification system, the crowd event identification system comprising the crowd event identification device and the monitoring device according to any one of the second aspect

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining at least one target monitoring image group from monitoring videos of a target area, carrying out crowd event identification on each target monitoring image group by using a preset crowd event model, and determining whether crowd events occur in the target area according to an identification result, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a population density graph according to an embodiment of the present invention.

Fig. 2A is a schematic diagram of an implementation environment provided by an embodiment of the invention.

Fig. 2B is a schematic diagram of an implementation environment provided by the embodiment of the invention.

Fig. 3 is a flowchart of a crowd event identification method according to an embodiment of the present invention.

Fig. 4A is a flowchart of a crowd event identification method according to an embodiment of the present invention.

Fig. 4B is a histogram provided by the embodiment of the present invention.

Fig. 5 is a block diagram of a crowd event recognition apparatus according to an embodiment of the present invention.

Fig. 6 is a block diagram of a crowd event recognition apparatus according to an embodiment of the present invention.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Fig. 8 is a block diagram of a crowd event identification system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The crowd event identification has important significance in rapidly handling sudden public safety events and avoiding safety accidents such as trampling and the like.

In the prior art, when crowd event identification is performed for a certain area, a server may obtain a surveillance video frame from a surveillance video of the area at intervals. Then, the server may obtain a crowd density map corresponding to the surveillance video frame, where the crowd density map may reflect the density of the pedestrians in the area. Fig. 1 is an exemplary population density map, as shown in fig. 1, the population density map is a gray scale map, which uses the gray scale value of a pixel to represent the density of a population, wherein the larger the gray scale value is, the higher the density of the population is, and the smaller the gray scale value is, the lower the density of the population is. After the crowd density map corresponding to the monitoring video frame is obtained, the server can determine whether the density of the pedestrians in the area reflected by the crowd density map is higher than a preset density threshold value, and when the density of the pedestrians in the area reflected by the crowd density map is higher than the preset density threshold value, the server can determine that a crowd event occurs in the area.

However, the information reflected by the crowd density map corresponding to one monitoring video frame is relatively single, and therefore, the recognition accuracy of the crowd event is relatively low when the crowd density map is used for recognizing the crowd event.

Fig. 2A is a schematic diagram of an implementation environment related to the crowd event identification method, as shown in fig. 2A, the implementation environment may include a monitoring device 101 and a server 102, and the monitoring device 101 may communicate with the server 102 in a wired or wireless manner. The monitoring device 101 may be a monitoring camera, the server 102 may be one server, or may be a server cluster composed of a plurality of servers, which is not specifically limited in this embodiment of the present invention.

Fig. 2B is a schematic diagram of another implementation environment related to the crowd event identification method, as shown in fig. 2B, the implementation environment may include a monitoring device 103, and the monitoring device 103 may be a monitoring camera.

Fig. 3 is a flowchart of a crowd event identification method according to an embodiment of the present invention, where the crowd event identification method may be applied to the implementation environment shown in fig. 2A or fig. 2B, and the crowd event identification method may be executed by the server 102 or the monitoring device 101 in fig. 2A, or may be executed by the monitoring device 103 in fig. 2B, where the embodiment of the present invention is only described by being executed by the server, and a technical process executed by the monitoring device is the same as that of the above, and details of the embodiment of the present invention are not repeated herein. As shown in fig. 3, the crowd event recognition method may include the following steps:

step 301, the server obtains a monitoring video of the target area.

Step 302, the server obtains at least one target monitoring image group from the monitoring video, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames.

Step 303, the server performs crowd event recognition on each target monitoring image group through a preset crowd event model, wherein the crowd event model is obtained by training according to a plurality of crowd event image groups, and each crowd event image group comprises a plurality of crowd event images reflecting the occurrence process of the crowd event.

And step 304, the server determines whether the target area has a crowd event according to the identification result.

In summary, the crowd event identification method provided in the embodiments of the present invention obtains at least one target monitoring image group from the monitoring video of the target area, and the preset crowd event model is utilized to identify the crowd event of each target monitoring image group so as to determine whether the crowd event occurs in the target area according to the identification result, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames, because the occurrence of crowd events is related to the motion condition of crowd in a period of time, while a plurality of surveillance video frames in succession may reflect changes in the position of the pedestrian in the target area over a period of time, therefore, the motion state of the crowd in the target area within a period of time is reflected, and the accuracy of crowd event identification can be improved by utilizing a plurality of continuous monitoring video frames, namely utilizing the target monitoring image group to identify the crowd events.

Fig. 4A is a flowchart of another crowd event identifying method according to an embodiment of the present invention, where the crowd event identifying method may be applied to the implementation environment shown in fig. 2A or fig. 2B, where the crowd event identifying method may be executed by the server 102 or the monitoring device 101 in fig. 2A, or may be executed by the monitoring device 103 in fig. 2B, where the embodiment of the present invention is only described by being executed by the server, and a technical process executed by the monitoring device is the same as that of the above, and is not described again in the embodiment of the present invention, as shown in fig. 4A, the crowd event identifying method may include the following steps:

step 401, the server obtains a monitoring video of a target area.

In the embodiment of the invention, the server can communicate with the monitoring equipment erected in the target area in real time or periodically to obtain the monitoring video shot by the monitoring equipment, so that the crowd event identification is carried out in the subsequent steps according to the monitoring video to determine whether the crowd event occurs in the target area.

It should be noted that the target area may be an area preset by a technician, and the target area may be a public place with a large traffic flow, such as a transportation hub, a large mall, or a large event site.

Step 402, the server acquires at least one target monitoring image group from the monitoring video of the target area.

The surveillance video of the target area may include a plurality of surveillance image sets, wherein each surveillance image set may include a plurality of surveillance video frames in succession, and each surveillance video frame may include an image of a pedestrian (which may also be referred to as a foreground image or a moving object image).

In step 402, the server may acquire all monitoring image groups included in the monitoring video as target monitoring image groups, thereby acquiring a plurality of target monitoring image groups.

In an embodiment of the present invention, the server may also determine at least one monitoring image group from a plurality of monitoring image groups included in the monitoring video, and acquire the determined at least one monitoring image group as at least one target monitoring image group, in this case, a plurality of consecutive monitoring video frames included in the target monitoring image group may be monitoring video frames corresponding to suspected crowd events, and in the subsequent step, the server may perform crowd event identification according to the at least one target monitoring image group, so that the server may perform crowd event identification only according to the target monitoring image group composed of the monitoring video frames corresponding to the suspected crowd events in the subsequent step, and does not need to perform crowd event identification according to all monitoring image groups included in the monitoring video, thereby reducing the computational burden of the server.

Next, an embodiment of the present invention will describe a technical process in which a server acquires at least one target monitoring image group from a plurality of monitoring image groups included in the monitoring video:

for each monitoring image group included in the monitoring video, the server may acquire an image of a pedestrian in each monitoring video frame of the monitoring image group by using a frame difference method, a background modeling method, or the like, and then the server may determine whether the position change of the pedestrian satisfies a preset condition according to the acquired image of the pedestrian in each monitoring video frame, and when the position change of the pedestrian satisfies the preset condition, the server may acquire the monitoring image group as a target monitoring image group. The preset condition is a position change condition reflecting convergence or dispersion of pedestrians.

For each monitoring image group included in the monitoring video, the technical process of judging whether the position change of the pedestrian meets the preset condition or not by the server according to the image of the pedestrian in each monitoring video frame of the monitoring image group may include the following sub-steps:

a1, the server divides each monitoring video frame of the monitoring image group into a plurality of areas.

Alternatively, the server may divide each surveillance video frame into a plurality of regions at equal intervals in the first direction, where the divided intervals may be a target length, the target length may be a length of one pixel or a plurality of pixels in the first direction, and the first direction may be a row direction or a column direction of the surveillance video frame. In other words, the server may divide each surveillance video frame into a plurality of striped areas, wherein each striped area may include at least one pixel row or at least one pixel column.

B1, for each surveillance video frame, the server may obtain the sum of pixel values of images of pedestrians in each area included in the surveillance video frame.

For example, a certain region in a certain surveillance video frame may be a pixel row a of the certain surveillance video frame, where the pixel row a includes n (n is a positive integer greater than 1) pixels, m (m is a natural number greater than or equal to 0) pixels of the n pixels are pixels in a pedestrian image, and the remaining pixels are pixels in a background image, and then in step B1, the server may add pixel values of the m pixels to obtain a sum of pixel values of the pedestrian image in the certain region.

And C1, the server correspondingly generates a pedestrian histogram for each monitoring video frame by adopting the same generation mode.

Each pedestrian histogram may include a plurality of rectangular bars, positions of the plurality of rectangular bars on the first axis correspond to a plurality of regions of the corresponding surveillance video frame one to one, an arrangement order of the plurality of rectangular bars on the first axis is the same as an arrangement order of the plurality of regions of the corresponding surveillance video frame in the first direction, and a length of each rectangular bar on the second axis represents a sum of pixel values of the pedestrian images in the corresponding region. The first shaft and the second shaft are mutually vertical, the first shaft can be an x shaft, and the second shaft can be a y shaft; alternatively, the first axis may be a y-axis and the second axis may be an x-axis.

Fig. 4B shows an exemplary pedestrian histogram corresponding to a surveillance video frame z, where the surveillance video frame z may include p regions, and each region is a pixel column of the surveillance video frame z. As shown in fig. 4B, the pedestrian histogram includes p rectangular bars, the positions of the p rectangular bars on the x axis correspond to p regions included in the surveillance video frame z one by one, the arrangement order of the p rectangular bars on the x axis is the same as the arrangement order of the p regions of the surveillance video frame z in the row direction, and the length of each rectangular bar in the p rectangular bars on the y axis represents the sum of pixel values of pedestrian images in the corresponding region. For example, the k-th rectangular bar of the p rectangular bars corresponds to the k-th area of the p areas included in the surveillance video frame z (the k-th area is the k-th pixel column of the surveillance video frame z), and the length of the k-th rectangular bar on the y-axis represents the sum of the pixel values of the pedestrian images in the k-th pixel column of the surveillance video frame z.

D1, the server judges whether the position change of the pedestrian meets the preset condition according to a plurality of pedestrian histograms corresponding to a plurality of monitoring video frames included in the monitoring image group.

The server may determine a central rectangular bar and an edge rectangular bar from each of the plurality of pedestrian histograms, thereby obtaining a plurality of central rectangular bars and a plurality of edge rectangular bars, wherein the central rectangular bar refers to a rectangular bar located in a central region at a first axis, and the edge rectangular bar refers to a rectangular bar located in an edge region at the first axis. It should be noted that the center rectangle determined by the server from each pedestrian histogram is located at the same position on the first axis, and the edge rectangle determined by the server from each pedestrian histogram is also located at the same position on the first axis. After the central rectangular bars and the edge rectangular bars are obtained, the server can judge whether the position change of the pedestrian meets the preset condition according to the length change conditions of the central rectangular bars and the edge rectangular bars.

Optionally, the technical process of determining, by the server, whether the position change of the pedestrian meets the preset condition according to the length change condition of the central rectangular bars and the edge rectangular bars may include the following sub-steps:

a2, the server arranges the plurality of central rectangular strips and the plurality of edge rectangular strips according to the arrangement sequence of the plurality of monitoring video frames in the monitoring image group.

After the arrangement, the sequence of each central rectangular bar in the central rectangular bars is the same as the sequence of the monitoring video frame corresponding to the pedestrian histogram corresponding to the central rectangular bar in the monitoring video frames included in the monitoring image group, and the sequence of each edge rectangular bar in the edge rectangular bars is the same as the sequence of the monitoring video frame corresponding to the pedestrian histogram corresponding to the edge rectangular bar in the monitoring video frames included in the monitoring image group.

B2, the server determines the length change condition of the central rectangular bars according to the sequence of the central rectangular bars.

C2, the server determines the length change condition of the edge rectangle strips according to the sequence of the edge rectangle strips.

D2, the server judges whether the position change of the pedestrian meets the preset condition according to the length change conditions of the central rectangular strips and the edge rectangular strips.

Optionally, when the lengths of the central rectangular bars are gradually increased and the lengths of the edge rectangular bars are gradually decreased, the pedestrian is indicated to gradually converge, and at this time, the server may determine that the position change of the pedestrian satisfies a preset condition reflecting the convergence of the pedestrian. Or, when the lengths of the edge rectangular bars are gradually increased and the lengths of the central rectangular bars are gradually decreased, the pedestrian is gradually dispersed, and at this time, the server may determine that the position change of the pedestrian satisfies a preset condition reflecting the pedestrian dispersion.

And step 403, the server identifies the crowd event for each target monitoring image group through a preset crowd event model.

It should be noted that the crowd event model may be a convolutional neural network model, for example, the crowd event model may be a google convolutional neural network or a residual convolutional neural network.

In the following, the technical process of step 403 will be described by taking the example that the server performs crowd event identification on the first target monitoring image group in the at least one target monitoring image group, where the technical process may include the following sub-steps:

a3, the server obtains a first crowd density map corresponding to each monitoring video frame in the first target monitoring image group to obtain a plurality of first crowd density maps.

The server may obtain the crowd density map corresponding to the surveillance video frame in many ways, for example: in a possible mode, the server can divide the monitoring video frame into different areas, calculate the number of pedestrian images contained in each area, determine the gray value or color value corresponding to each area based on the number of pedestrian images in each area, and generate a crowd density image according to the gray value or color value corresponding to each area; in another possible mode, the server may determine each pedestrian image included in the monitoring video frame, for each pedestrian image, the server may determine a heat region with a point of the pedestrian image as a center and a preset value as a radius, so as to obtain a plurality of heat regions, the server may set an initial gray value or an initial pixel value for each heat region, then the server may determine a crossing region, the crossing region corresponds to at least two heat regions, the crossing region is a region where the corresponding at least two heat regions coincide, and the server may change the pixel value or the color value of the crossing region according to the number of the heat regions corresponding to the crossing region, so as to finally obtain the crowd density map.

And B3, the server acquires a first light flow diagram corresponding to the image of the pedestrian in a plurality of monitoring video frames included in the first target monitoring image group.

The first optical flow graph may be a motion vector graph of each pixel in an image of a pedestrian.

Optionally, the server may obtain each pixel included in an image of a pedestrian in a tth surveillance video frame of the multiple surveillance video frames, for each pixel, the server may traverse an image of a pedestrian in a t +1 th surveillance video frame, and obtain a target pixel having a similarity greater than a preset threshold with the pixel from the image of the pedestrian in the t +1 th surveillance video frame, and then, the server may determine a motion vector of the pixel between the tth surveillance video frame and the t +1 th surveillance video frame based on a coordinate value of the pixel in an image coordinate system and a coordinate value of the target pixel in the image coordinate system, and the server may generate the first light flow graph according to the motion vector of each pixel in the image of the pedestrian.

And C3, the server identifies the crowd event based on the plurality of first crowd density graphs and the first optical flow graphs through the crowd event model.

The server can input the first light-flow graph and the plurality of first crowd density graphs into a crowd event model, the crowd event model can perform corresponding identification operation according to the first light-flow graph and the plurality of first crowd density graphs and output an operation result, and the server can determine whether a plurality of monitoring video frames included in the first target monitoring image group are monitoring video frames corresponding to crowd events or not according to the operation result.

It should be noted that the crowd event model may be obtained by training according to a plurality of crowd event image groups, where each crowd event image group includes a plurality of crowd event images reflecting the occurrence process of the crowd event.

The following embodiments of the present invention will briefly describe the training process of the crowd event model:

for each group of crowd event images in the group of crowd event images, the server may obtain a second crowd density map corresponding to each crowd event image in the group of crowd event images to obtain a plurality of second crowd density maps, where a technical process of obtaining the plurality of second crowd density maps is the same as the technical process of obtaining the plurality of first crowd density maps described above, and the embodiments of the present invention are not described again. The server may further obtain a second light flow graph corresponding to an image of a pedestrian in a plurality of crowd event images included in the crowd event image group, where a technical process of obtaining the second light flow graph is the same as the above-described technical process of obtaining the first light flow graph, and the embodiments of the present invention are not described again. The server conducts model training based on the plurality of second crowd density graphs and the second optical flow graph to obtain a crowd event model.

And step 404, the server determines whether the target area has a crowd event according to the identification result.

When a plurality of surveillance video frames included in any one of the at least one target surveillance image group are surveillance video frames corresponding to a crowd event, the server may determine that the crowd event occurs in the target area.

Step 405, when it is determined that the crowd event occurs in the target area, the server responds.

In an embodiment of the invention, after the server determines that the crowd event occurs in the target area, the server may send alarm information to the designated terminal, so that a worker who owns the designated terminal can quickly respond to the crowd event.

It should be noted that, when the monitoring device executes the crowd event identification method, in step 405, the monitoring device may send a response instruction to the server when determining that a crowd event occurs in the target area, and the server may respond accordingly after receiving the response instruction, for example, the server may send an alarm message to a designated terminal, so that a worker who owns the designated terminal may quickly respond to the crowd event.

Alternatively, in step 405, the monitoring device may respond accordingly when it is determined that a crowd event occurs in the target area, for example, the monitoring device may send alarm information to a designated terminal, so that a worker who owns the designated terminal may respond to the crowd event quickly. In this case, a communication module may be installed in the monitoring device, so that the monitoring device may transmit alarm information to the specified terminal through the communication module.

Fig. 5 is a block diagram of a crowd event identification apparatus 500 according to an embodiment of the present invention, as shown in fig. 5, the crowd event identification apparatus 500 may include: a video acquisition module 501, an image group acquisition module 502, a recognition module 503, and a determination module 504.

The video obtaining module 501 is configured to obtain a monitoring video of a target area.

The image group acquiring module 502 is configured to acquire at least one target monitoring image group from the monitoring video, where each target monitoring image group includes a plurality of consecutive monitoring video frames.

The identifying module 503 is configured to perform crowd event identification on each target monitoring image group through a preset crowd event model, where the crowd event model is obtained by training according to a plurality of crowd event image groups, and each crowd event image group includes a plurality of crowd event images reflecting a crowd event occurrence process.

The determining module 504 is configured to determine whether a crowd event occurs in the target area according to the recognition result.

In an embodiment of the present invention, the surveillance video includes a plurality of surveillance image groups, each of the surveillance image groups includes a plurality of consecutive surveillance video frames, each of the surveillance video frames includes an image of a pedestrian, and the image group acquiring module 502 is configured to: for each monitoring image group, acquiring an image of a pedestrian in each monitoring video frame of the monitoring image group; judging whether the position change of the pedestrian meets a preset condition according to the image of the pedestrian in each monitoring video frame, wherein the preset condition is a position change condition reflecting convergence or dispersion of the pedestrian; and when the position change meets the preset condition, acquiring the monitoring image group as the target monitoring image group.

In an embodiment of the present invention, the image group obtaining module 502 is configured to generate pedestrian histograms for a plurality of surveillance video frames of the surveillance image group respectively in the same generation manner, where each surveillance video frame is divided into a plurality of regions at equal intervals in a first direction, each pedestrian histogram includes a plurality of rectangular bars, positions of the rectangular bars on a first axis are in one-to-one correspondence with the plurality of regions of the corresponding surveillance video frame, a length of each rectangular bar on a second axis represents a sum of pixel values of a pedestrian image in the corresponding region, and the first axis and the second axis are perpendicular to each other; determining a central rectangular strip and an edge rectangular strip from each pedestrian histogram to obtain a plurality of central rectangular strips and a plurality of edge rectangular strips; and judging whether the position change of the pedestrian meets the preset condition according to the length change conditions of the central rectangular strips and the edge rectangular strips.

In an embodiment of the present invention, the arrangement order of the central rectangular bars and the edge rectangular bars is consistent with the acquiring order of the monitoring video frames of the monitoring image group, and the image group acquiring module 502 is configured to determine that the position change of the pedestrian satisfies the preset condition when the lengths of the central rectangular bars gradually increase and the lengths of the edge rectangular bars gradually decrease.

In an embodiment of the present invention, the image group acquiring module 502 is configured to determine that the position change of the pedestrian satisfies the preset condition when the lengths of the edge rectangular bars gradually increase and the lengths of the center rectangular bars gradually decrease.

In one embodiment of the present invention, each of the surveillance video frames is equally divided into a plurality of regions in a first direction at a division unit of a target length, the target length being a length of a pixel in the first direction, the first direction being a row direction or a column direction.

In an embodiment of the present invention, the identifying module 503 is configured to: for each target monitoring image group, acquiring a first crowd density map corresponding to each monitoring video frame in the target monitoring image group to obtain a plurality of first crowd density maps; acquiring a first light flow diagram corresponding to images of pedestrians in a plurality of monitoring video frames included in the target monitoring image group; and performing crowd event recognition on the target monitoring image group based on the plurality of first crowd density graphs and the first light flow graph through the crowd event model.

In one embodiment of the invention, the crowd event model is a convolutional neural network model.

The embodiment of the present invention further provides another crowd event identifying apparatus 600, where the crowd event identifying apparatus 600 may further include a training module 505 in addition to the modules included in the crowd event identifying apparatus 500.

Wherein the training module 505 is configured to: for each group of crowd event images in the plurality of groups of crowd event images, acquiring a second crowd density map corresponding to each crowd event image in the group of crowd event images to obtain a plurality of second crowd density maps; acquiring a second light flow diagram corresponding to a pedestrian image in a plurality of crowd event images included in the crowd event image group; and performing model training based on the plurality of second crowd density graphs and the second optical flow graph to obtain the crowd event model.

In summary, the crowd event recognition apparatus provided in the embodiments of the present invention obtains at least one target monitoring image group from the monitoring video of the target area, and the preset crowd event model is utilized to identify the crowd event of each target monitoring image group so as to determine whether the crowd event occurs in the target area according to the identification result, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames, because the occurrence of crowd events is related to the motion condition of crowd in a period of time, while a plurality of surveillance video frames in succession may reflect changes in the position of the pedestrian in the target area over a period of time, therefore, the motion state of the crowd in the target area within a period of time is reflected, and the accuracy of crowd event identification can be improved by utilizing a plurality of continuous monitoring video frames, namely utilizing the target monitoring image group to identify the crowd events.

It should be noted that: in the crowd event recognition device provided in the above embodiment, when the crowd event is recognized, only the division of the above functional modules is taken as an example, and in practical application, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the crowd event identification device provided by the above embodiment and the embodiment of the crowd event identification method belong to the same concept, and specific implementation processes thereof are detailed in the embodiment of the method and are not described herein again.

Fig. 7 is a schematic diagram illustrating a structure of an electronic device, which may be a server, according to an example embodiment. The electronic device is configured to perform the crowd event recognition method provided in the embodiment shown in fig. 3 or fig. 4A. The electronic device 700 includes a Central Processing Unit (CPU)701, a system memory 704 including a Random Access Memory (RAM)702 and a Read Only Memory (ROM)703, and a system bus 705 connecting the system memory 704 and the central processing unit 701. The electronic device 700 also includes a basic input/output system (I/O system) 706 that facilitates transfer of information between devices within the computer, and a mass storage device 707 for storing an operating system 713, application programs 714, and other program modules 715.

The basic input/output system 706 comprises a display 708 for displaying information and an input device 709, such as a mouse, keyboard, etc., for a user to input information. Wherein the display 708 and input device 709 are connected to the central processing unit 701 through an input output controller 710 coupled to the system bus 705. The basic input/output system 706 may also include an input/output controller 710 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 710 may also provide output to a display screen, a printer, or other type of output device.

The mass storage device 707 is connected to the central processing unit 701 through a mass storage controller (not shown) connected to the system bus 705. The mass storage device 707 and its associated computer-readable media provide non-volatile storage for the electronic device 700. That is, the mass storage device 707 may include a computer-readable medium (not shown), such as a hard disk or CD-ROM drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 704 and mass storage device 707 described above may be collectively referred to as memory.

The electronic device 700 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present invention. That is, the electronic device 700 may be connected to the network 712 through the network interface unit 711 connected to the system bus 705, or may be connected to another type of network or a remote computer system (not shown) using the network interface unit 711.

The memory further includes one or more programs, the one or more programs are stored in the memory, and the central processing unit 701 implements the crowd event identification method shown in fig. 3 or 4A by executing the one or more programs.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as a memory, including instructions executable by a processor of an electronic device to perform the crowd event identification method illustrated by the various embodiments of the present invention is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 8 is a block diagram of a crowd event identification system 800 according to an embodiment of the present invention, and as shown in fig. 8, the crowd event identification system 800 may include: a server 801 and a monitoring device 802.

The server 801 is configured to execute the crowd event identification method provided in the embodiment shown in fig. 4A.

The monitoring device 802 is configured to capture a monitoring video of a target area and send the captured monitoring video to the server 801.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for crowd event identification, the method comprising:

acquiring a monitoring video of a target area; acquiring at least one target monitoring image group from the monitoring video, wherein each target monitoring image group comprises a plurality of continuous monitoring video frames;

performing crowd event recognition on each target monitoring image group through a preset crowd event model, wherein the crowd event model is obtained by training according to a crowd density graph corresponding to each crowd event image of each crowd event image group in a plurality of crowd event image groups and an optical flow graph corresponding to a pedestrian image in a plurality of crowd event images of each crowd event image group, and each crowd event image group comprises a plurality of crowd event images reflecting the occurrence process of a crowd event;

determining whether a crowd event occurs in the target area according to the recognition result;

the monitoring video comprises a plurality of monitoring image groups, each monitoring image group comprises a plurality of continuous monitoring video frames, each monitoring video frame comprises an image of a pedestrian, and the method for acquiring at least one target monitoring image group from the monitoring video comprises the following steps:

for each monitoring image group, acquiring an image of a pedestrian in each monitoring video frame of the monitoring image group; judging whether the position change of the pedestrian meets a preset condition according to the image of the pedestrian in each monitoring video frame, wherein the preset condition is a position change condition reflecting convergence or dispersion of the pedestrian;

2. The method according to claim 1, wherein the determining whether the position change of the pedestrian meets a preset condition according to the image of the pedestrian in each monitoring video frame comprises:

3. The method according to claim 2, wherein the arrangement sequence of the central rectangular strips and the edge rectangular strips is consistent with the acquisition sequence of the monitoring video frames of the monitoring image group, and the determining whether the position change of the pedestrian meets the preset condition according to the length change condition of the central rectangular strips and the edge rectangular strips comprises:

4. The method according to claim 2, wherein the arrangement sequence of the central rectangular strips and the edge rectangular strips is consistent with the acquisition sequence of the monitoring video frames of the monitoring image group, and the determining whether the position change of the pedestrian meets the preset condition according to the length change condition of the central rectangular strips and the edge rectangular strips comprises:

5. The method according to any one of claims 2 to 4,

each monitoring video frame is divided into a plurality of areas at equal intervals in a first direction by taking a target length as a division unit, the target length is the length of a pixel in the first direction, and the first direction is a row direction or a column direction.

6. The method according to claim 1, wherein the performing crowd event recognition on each target monitoring image group through a preset crowd event model comprises:

7. The method according to claim 1, wherein before performing crowd event recognition on each of the target monitoring image groups through a preset crowd event model, the method further comprises:

8. The method of claim 1, wherein the crowd event model is a convolutional neural network model.

9. A crowd event identification device, the device comprising:

the identification module is used for carrying out crowd event identification on each target monitoring image group through a preset crowd event model, the crowd event model is obtained through training according to a crowd density graph corresponding to each crowd event image of each crowd event image group in a plurality of crowd event image groups and an optical flow graph corresponding to a pedestrian image in a plurality of crowd event images of each crowd event image group, and each crowd event image group comprises a plurality of crowd event images reflecting the occurrence process of a crowd event;

the determining module is used for determining whether the target area has a crowd event according to the recognition result;

the monitoring video comprises a plurality of monitoring image groups, each monitoring image group comprises a plurality of continuous monitoring video frames, each monitoring video frame comprises an image of a pedestrian, and the image group acquisition module is used for:

10. The apparatus of claim 9, wherein the image group acquisition module is configured to:

11. The apparatus of claim 10, wherein the arrangement order of the central rectangular strips and the edge rectangular strips is consistent with the acquisition order of the surveillance video frames of the group of surveillance images, and the image group acquisition module is configured to:

12. The apparatus of claim 10, wherein the arrangement order of the central rectangular strips and the edge rectangular strips is consistent with the acquisition order of the surveillance video frames of the group of surveillance images, and the image group acquisition module is configured to:

13. The apparatus according to any one of claims 10 to 12, wherein each of the surveillance video frames is divided into a plurality of regions at equal intervals in a first direction in a unit of a target length, the target length being a length of a pixel in the first direction, and the first direction being a row direction or a column direction.

14. The apparatus of claim 9, wherein the identification module is configured to:

15. The apparatus of claim 9, further comprising a training module to:

16. The apparatus of claim 9, wherein the crowd event model is a convolutional neural network model.

17. An electronic device, characterized in that the electronic device comprises a processor and a memory,

wherein, the memory is used for storing computer programs;

the processor is configured to execute the program stored in the memory to implement the crowd event identification method according to any one of claims 1 to 8.

18. A crowd event identification system, comprising a crowd event identification device according to any one of claims 9 to 16 and a monitoring apparatus.