CN111291597B

CN111291597B - Crowd situation analysis method, device, equipment and system based on image

Info

Publication number: CN111291597B
Application number: CN201811494297.6A
Authority: CN
Inventors: 童超; 车军; 任烨; 朱江
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2023-10-13
Anticipated expiration: 2038-12-07
Also published as: CN111291597A

Abstract

The embodiment of the invention provides a crowd situation analysis method, device, equipment and system based on images, wherein the method comprises the following steps: inputting the image into a neural network model obtained by training in advance; counting the population density in the image by using population density statistics branches in the neural network model; analyzing crowd behaviors in the image by using crowd behavior analysis branches in the neural network model; based on crowd density and crowd behavior, it is determined whether a crowd event has occurred. Therefore, in the scheme, the neural network model at least comprises two branches, wherein one branch counts the crowd density in the image, the other branch analyzes the crowd behaviors in the image, and whether the crowd events occur or not is analyzed from the crowd density and the behaviors, so that the accuracy of analysis results is improved.

Description

Crowd situation analysis method, device, equipment and system based on image

Technical Field

The present invention relates to the field of monitoring technologies, and in particular, to a crowd situation analysis method, device, equipment, and system based on images.

Background

The public safety is greatly affected by the occurrence of crowd events in some public places, such as crowd gathering, fighting, crowd trampling and the like. At present, crowd situation can be analyzed based on the monitoring image so as to timely process the crowd events and improve public safety.

In some related schemes, monitoring images of public places are generally acquired, the number of people in the monitoring images is counted, and if the number is large, the occurrence of a swarm event is indicated. However, in a place where crowding is frequent, such as a subway or a railway station at a peak in the morning and evening, a large number of people does not necessarily indicate occurrence of the above-described crowd event. It can be seen that the analysis results of this approach are less accurate.

Disclosure of Invention

The embodiment of the invention aims to provide a crowd situation analysis method, device, equipment and system based on images so as to improve the accuracy of analysis results.

In order to achieve the above objective, the embodiment of the present invention provides an image-based crowd situation analysis method, including:

acquiring an image to be analyzed;

inputting the image to be analyzed into a neural network model obtained by training in advance;

counting the crowd density in the image to be analyzed by using crowd density statistics branches in the neural network model;

analyzing crowd behaviors in the image to be analyzed by utilizing crowd behavior analysis branches in the neural network model;

and judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors.

Optionally, the method further comprises:

identifying crowd density values at each pixel point in the image to be analyzed by utilizing density distribution prediction branches in the neural network model;

the determining whether a crowd event occurs based on the crowd density and the crowd behavior includes:

and judging whether a crowd event occurs or not based on the crowd density, the crowd behaviors and the crowd density value at each pixel point in the image to be analyzed.

Optionally, the training to obtain the neural network model includes:

inputting the sample image into a neural network with a preset structure;

carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into crowd density statistics branches, crowd behavior analysis branches and density distribution prediction branches in the neural network;

iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch;

And when the adjustment of the parameters in the convolution layer meets the convergence condition, obtaining the neural network model after training.

Optionally, the determining whether a crowd event occurs based on the crowd density and the crowd behavior includes:

if the crowd density exceeds a preset threshold value and the crowd behaviors are abnormal, judging that a crowd event occurs;

in the event of a determination that a crowd event has occurred, the method further comprises: and outputting alarm information.

Optionally, the acquiring the image to be analyzed includes:

acquiring an image to be analyzed through the unmanned aerial vehicle cradle head;

in the event of a determination that a crowd event has occurred, the method further comprises:

based on the position information of the unmanned aerial vehicle, adjusting the cradle head to acquire images aiming at the area where the crowd event occurs;

or under the condition that the occurrence of the crowd event is judged, determining the moving direction and the moving speed of the crowd based on the image to be analyzed; and controlling the unmanned aerial vehicle to track and collect the crowd according to the moving direction and the moving speed.

Optionally, the adjusting the pan-tilt to perform image acquisition for the area where the crowd event occurs based on the position information of the unmanned aerial vehicle includes:

Determining image coordinates of people in the image to be analyzed; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle holder and the image coordinate system to be analyzed and the determined image coordinate, PTZ adjustment information of the holder is calculated, and based on the PTZ adjustment information, the holder is adjusted to acquire images for areas where the crowd events occur.

Optionally, in the case of determining that a swarm event occurs, the method further includes:

identifying a human body target with abnormal behaviors in the image to be analyzed;

extracting characteristics of the human body target, and determining identity information and/or movement tracks of the human body target based on the extracted characteristics.

In order to achieve the above object, the embodiment of the present invention further provides an image-based crowd situation analysis device, including:

the acquisition module is used for acquiring an image to be analyzed;

the input module is used for inputting the image to be analyzed into a neural network model obtained by training in advance;

the statistics module is used for counting the population density in the image to be analyzed by using the population density statistics branches in the neural network model;

the analysis module is used for analyzing the crowd behaviors in the image to be analyzed by utilizing crowd behavior analysis branches in the neural network model;

And the judging module is used for judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors.

Optionally, the apparatus further includes:

the identification module is used for utilizing the density distribution prediction branches in the neural network model to identify crowd density values at each pixel point in the image to be analyzed;

the judging module is specifically configured to: and judging whether a crowd event occurs or not based on the crowd density, the crowd behaviors and the crowd density value at each pixel point in the image to be analyzed.

Optionally, the apparatus further includes:

the training module is used for inputting the sample image into a neural network with a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into crowd density statistics branches, crowd behavior analysis branches and density distribution prediction branches in the neural network; iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch; and when the adjustment of the parameters in the convolution layer meets the convergence condition, obtaining the neural network model after training.

Optionally, the judging module is specifically configured to:

if the crowd density exceeds a preset threshold value and the crowd behaviors are abnormal, judging that a crowd event occurs; the apparatus further comprises:

and the alarm module is used for outputting alarm information under the condition of judging occurrence of the swarm event.

Optionally, the acquiring module is specifically configured to:

acquiring an image to be analyzed through the unmanned aerial vehicle cradle head; the apparatus further comprises:

the control module is used for adjusting the cradle head to acquire images aiming at the area where the swarm event occurs based on the position information of the unmanned aerial vehicle;

Optionally, the control module is further configured to:

Optionally, the apparatus further includes:

the determining module is used for identifying a human body target with abnormal behaviors in the image to be analyzed under the condition that the occurrence of the swarm event is judged; extracting characteristics of the human body target, and determining identity information and/or movement tracks of the human body target based on the extracted characteristics.

In order to achieve the above objective, the embodiment of the present invention further provides an image-based crowd situation analysis system, including: unmanned aerial vehicles and ground stations; wherein,

the unmanned aerial vehicle is used for collecting images and sending the collected images to the ground station;

the ground station is used for receiving the image sent by the unmanned aerial vehicle and taking the image as an image to be analyzed; inputting the image to be analyzed into a neural network model obtained by training in advance; counting the crowd density in the image to be analyzed by using crowd density statistics branches in the neural network model; analyzing crowd behaviors in the image to be analyzed by utilizing crowd behavior analysis branches in the neural network model; and judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors.

Optionally, the ground station is further configured to:

Under the condition that the occurrence of the swarm event is judged, based on the position information of the unmanned aerial vehicle, adjusting a cradle head of the unmanned aerial vehicle to acquire images aiming at the area where the occurrence of the swarm event occurs;

Optionally, the ground station is further configured to:

In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor and a memory;

a memory for storing a computer program;

and the processor is used for realizing any one of the crowd situation analysis methods based on the image when executing the program stored in the memory.

In order to achieve the above object, an embodiment of the present invention further provides a computer readable storage medium, in which a computer program is stored, the computer program implementing any one of the above image-based crowd situation analysis methods when executed by a processor.

In the embodiment of the invention, an image is input into a neural network model which is obtained by training in advance; counting the population density in the image by using population density statistics branches in the neural network model; analyzing crowd behaviors in the image by using crowd behavior analysis branches in the neural network model; based on crowd density and crowd behavior, it is determined whether a crowd event has occurred. Therefore, in the scheme, the neural network model at least comprises two branches, wherein one branch counts the crowd density in the image, the other branch analyzes the crowd behaviors in the image, and whether the crowd events occur or not is analyzed from the crowd density and the behaviors, so that the accuracy of analysis results is improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of a crowd situation analysis method based on images provided by an embodiment of the invention;

Fig. 2 is a schematic diagram of a first structure of a neural network according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a second structure of a neural network according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a first structure of a neural network model according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a second structure of a neural network model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a crowd situation analysis device based on an image according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a crowd situation analysis system based on images according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Corresponding to the embodiment of the method, the embodiment of the invention provides a crowd situation analysis method, device, equipment and system based on images, and the method and the device can be applied to various electronic equipment such as unmanned aerial vehicles, ground stations, personal computers, servers and the like, and are not particularly limited. The crowd situation analysis method based on the image provided by the embodiment of the invention is firstly described in detail below.

Fig. 1 is a flow chart of a crowd situation analysis method based on an image, which includes:

s101: and acquiring an image to be analyzed.

For convenience of description, in this embodiment, an image that needs to be subjected to crowd situation analysis is referred to as an image to be analyzed. If the execution main body is an unmanned aerial vehicle, the unmanned aerial vehicle can acquire images through a cloud platform of the unmanned aerial vehicle, and the acquired images are used as images to be analyzed. If the execution subject is a ground station, the ground station may receive an image transmitted by the unmanned aerial vehicle as an image to be analyzed. If the execution subject is other electronic equipment, the electronic equipment can receive the image sent by the monitoring equipment as an image to be analyzed. The image acquisition modes are numerous and are not listed.

S102: and inputting the image to be analyzed into a neural network model obtained by training in advance.

S103: and counting the crowd density in the image to be analyzed by using crowd density counting branches in the neural network model.

S104: and analyzing the crowd behaviors in the image to be analyzed by using crowd behavior analysis branches in the neural network model.

In the embodiment of the invention, the neural network model obtained by training at least comprises two branches: one branch is used for counting the crowd density in the image, and is called crowd density counting branch; the other branch is used for analyzing crowd behaviors in the image, and is called crowd behavior analysis branch.

As an embodiment, the training to obtain the neural network model may include:

acquiring a sample image required by training, and inputting the sample image into a neural network with a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into a crowd density statistics branch and a crowd behavior analysis branch in the neural network; iteratively adjusting parameters in the convolution layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, and a second output result of the crowd behavior analysis branch; and when the adjustment of the parameters in the convolution layer meets the convergence condition, obtaining the neural network model after training.

The neural network has the same structure as the neural network model, the training process is the process of adjusting the parameters of the neural network, and when the parameter adjustment is completed, the trained neural network model is obtained.

In this embodiment, for the purpose of distinguishing descriptions, the loss function of the crowd density statistics branch is referred to as a first loss function, and the second loss function of the crowd behavior analysis branch is referred to as a second loss function.

For example, the first loss function may be a binary cross entropy function, such as:

wherein ,L_Behave A loss value representing a first loss function, i representing sample images during training, N representing the total number of sample images,predictive value representing crowd behavior label corresponding to sample image i, S _i And representing the true value of the crowd behavior label corresponding to the sample image i. For example, the crowd behavior tag may include both cases of presence of abnormal behavior and absence of abnormal behavior, such as crowd behavior tags may be 1 or-1, 1 indicating no abnormal behavior and-1 indicating abnormal behavior.

The second loss function may be a multi-class cross entropy loss function, such as:

wherein ,L_Density The loss value of the second loss function is represented by i, N represents the total number of sample images, j represents the population density level, in formula 2, j includes 5 levels, the specific number of levels is not limited, and 5 in formula 2 can be changed to other values.Probability value representing prediction of sample image i as density level j, S _ij A true value indicating whether the sample image i is of density level j, e.g., if the density level corresponding to the sample image i is 1, S _i1 Can be 1, S _i2 -S _i5 May be 0. For example, crowd density may be divided into 5 classes: very thin (0-20), sparse (21-50), medium (51-100), dense (101-500), very dense (over 501). The classification can be performed according to actual conditions, and the specific classification mode is not limited.

Referring to fig. 2, a sample image to be trained is input to a convolution layer in a neural network, and data output by the convolution layer is respectively input to a full-connection layer of a crowd density statistics branch and a full-connection layer of a crowd behavior analysis branch; the neural network is trained, i.e. parameters in the neural network are adjusted, using the first and second loss functions.

In one case, the neural network with the preset structure may be a residual neural network (Residual Neural Network, resnet). In this case, the convolution layer in fig. 2 may be a resnet18, which includes five layers of convolution. The residual neural network has better identification performance. Alternatively, the neural network may be another type of neural network, which is not particularly limited.

As another embodiment, a density distribution prediction branch may be further included in the neural network model, in which case the density distribution prediction branch may be used to identify a population density value at each pixel point in the image to be analyzed.

In this case, the training to obtain the neural network model may include:

inputting the sample image into a neural network with a preset structure;

The first and second loss functions may be referred to above, and the third loss function may be:

wherein ,L_CrowdHeatmap A loss value representing a third loss function, k representing a kth pixel point in the sample image, M representing the total number of pixel points of the sample image, S _k Representing the true population density value at pixel k,representing the predicted population density value at pixel k.

For example, if there is no person distribution at pixel k, the crowd density value at pixel k is 0. If 3 persons are included in the A area and 1000 pixels are included in the A area, the crowd density value at each pixel in the A area is 3/1000.

Referring to fig. 3, a sample image to be trained is input to a convolution layer in a neural network, and data output by the convolution layer is respectively input to a full-connection layer of a crowd density statistics branch, a full-connection layer of a crowd behavior analysis branch and a full-connection layer of a density distribution prediction branch; the neural network is trained, i.e. parameters in the neural network are adjusted, using the first, second and third loss functions.

In one case, the neural network with the preset structure may be a residual neural network (Residual Neural Network, resnet). In this case, the convolution layer in fig. 3 may be a resnet18, which includes five layers of convolution. In the density distribution prediction branch, the data output by the resnet18 may be input into a 1*1 convolution to obtain a density distribution heat map, where the density distribution heat map includes crowd density values at each pixel point in the sample image.

S105: based on the crowd density and the crowd behavior, whether a crowd event occurs is determined.

Referring to fig. 4, fig. 4 may be understood as a neural network model obtained after the neural network training in fig. 2 is completed, where a crowd density statistical branch in the neural network model may output crowd density in an image, and a crowd behavior analysis branch may output whether a crowd in the image has abnormal behaviors.

In this case, if the output result of the neural network model indicates: and if the crowd density exceeds a preset threshold value and the crowd behaviors are abnormal, judging that the crowd event occurs.

Or, in another case, the crowd density statistics branch output may be a density level corresponding to the crowd density in the image; in this case, if the output result of the neural network model indicates: and if the crowd density grade reaches a preset grade and the crowd behaviors are abnormal, judging that the crowd event occurs.

As described above, in another embodiment, the neural network model further includes a density distribution prediction branch. Referring to fig. 5, fig. 5 may be understood as a neural network model obtained after the neural network training in fig. 3 is completed, the crowd density statistics branch may output the crowd density (or the crowd density level) in the image, the crowd behavior analysis branch may output whether the crowd in the image has abnormal behaviors, and the density distribution prediction branch may output the crowd density value at each pixel point in the image.

In one case, the convolution layer in fig. 5 may be a resnet18, which includes five layers of convolutions. In the density distribution prediction branch, the data output by the resnet18 can be input into a 1*1 convolution to obtain a density distribution heat map, wherein the density distribution heat map comprises crowd density values at each pixel point in the image to be analyzed.

In this embodiment, it may be determined whether a swarm event occurs according to the output results of the three branches, that is, S105 may include: and judging whether a crowd event occurs or not based on the crowd density, the crowd behaviors and the crowd density value at each pixel point in the image to be analyzed.

For example, three judgment conditions may be set for the three branches: 1. crowd density exceeds a first preset threshold; 2. the number of the target pixel points exceeds a second preset threshold, and the target pixel points are as follows: pixels with crowd density values exceeding a third preset threshold; 3. abnormal crowd behavior exists. Specific values of the first preset threshold, the second preset threshold, the third preset threshold, and the other preset thresholds mentioned in this embodiment are not limited.

If the output result of the neural network model indicates that any two of the neural network models are satisfied, it may be determined that a swarm event has occurred. Alternatively, if the output result of the neural network model indicates that the above three conditions are satisfied, it may be determined that a population event has occurred.

In one embodiment, in the event that a crowd event is determined to occur, alarm information may be output. The alarm information can be text information, voice information, flash lamp alarm and the like, and the specific alarm mode is not limited.

As described above, in one embodiment, the image to be analyzed may be acquired by the unmanned aerial vehicle cradle head; in this case, when it is determined that the crowd event occurs, the pan-tilt may be adjusted to perform image acquisition for the area where the crowd event occurs based on the position information of the unmanned aerial vehicle.

For example, image coordinates of the crowd in the image to be analyzed can be determined; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle holder and the image coordinate system to be analyzed and the determined image coordinate, PTZ adjustment information of the holder is calculated, and based on the PTZ adjustment information, the holder is adjusted to acquire images for areas where the crowd events occur.

The PTZ coordinate system is: pan, tilt, zoom, the conversion relationship between the image coordinate system and the PTZ coordinate system of the unmanned aerial vehicle holder can be obtained in advance. Assuming that the coordinates of the images of the crowd in the images to be analyzed are (X1, Y1), and the coordinates of the center point of the images are (X0, Y0); and converting the (X1, Y1) -to (X0, Y0) position offset relation into PTZ adjustment information of the unmanned aerial vehicle holder according to the conversion relation between the image coordinate system and the PTZ coordinate system. Assuming that the PTZ coordinates of the unmanned aerial vehicle holder are (P1, T1 and Z1) when the image to be analyzed is acquired, the unmanned aerial vehicle holder is adjusted based on the PTZ adjustment information on the basis of the (P1, T1 and Z1). Like this, unmanned aerial vehicle cloud platform after the adjustment aims at crowd and carries out image acquisition to can Zoom in the crowd area in the image through adjusting the Zoom, obtain more details about crowd.

Alternatively, the conversion relationship between the image coordinate system and the world coordinate system is also acquired in advance. If the unmanned aerial vehicle is far away from the crowd, the coordinates of the crowd in the image coordinate system can be converted into the coordinates in the world coordinate system; and then according to the coordinates of the crowd and the unmanned aerial vehicle in the world coordinate system, the unmanned aerial vehicle is adjusted to fly to the crowd, and when the unmanned aerial vehicle flies to the crowd to be close to the crowd, the cradle head is adjusted to acquire images of the area where the crowd event occurs.

In another embodiment, after the unmanned aerial vehicle holder acquires the image to be analyzed, the moving direction and the moving speed of the crowd can be determined based on the image to be analyzed under the condition that the occurrence of the crowd event is determined; and controlling the unmanned aerial vehicle to track and collect the crowd according to the moving direction and the moving speed.

For example, the image to be analyzed may be a video frame image, and the moving direction and moving speed of the crowd in the image coordinate system may be determined based on the video frame image. The conversion relation between the image coordinate system and the world coordinate system can be obtained in advance, and the moving direction and the moving speed of the crowd in the world coordinate system are determined according to the conversion relation and the moving direction and the moving speed of the crowd in the image coordinate system. And adjusting the flight direction and the flight speed of the unmanned aerial vehicle according to the moving direction and the moving speed of the crowd in the world coordinate system so as to track and collect the crowd.

As an implementation manner, in the case of determining that a crowd event occurs, a human body target with abnormal behavior may be identified in the image to be analyzed; extracting characteristics of the human body target, and determining identity information and/or movement tracks of the human body target based on the extracted characteristics.

The extracted features may be facial features, or may also be clothing features, such as clothing color, whether a knapsack, etc., or may also be attribute features, such as gender, age, etc., without limitation. The extracted features may be stored in the form of preset structures, i.e. structured information.

Under the condition, the equipment (an execution main body) for executing the scheme is assumed to be a ground station, the extracted features of the ground station are face features, the ground station is connected with a database, and identity information and face features of a plurality of people are stored in the database; the ground station matches the extracted face features with the face features in the database, and the identity information of the personnel is determined according to the matching result. And further, follow-up investigation can be carried out on personnel participating in the group event according to the identity information.

In this case, it is assumed that the apparatus (execution body) that executes the present scheme is a ground station, and the ground station is connected to a plurality of monitoring apparatuses; the ground station identifies a human body target matched with the extracted features in the images acquired by the plurality of monitoring devices, and determines the motion trail of the human body target according to the time sequence of the human body target in each monitoring device. The human body target is a human body target with abnormal behaviors in the image to be analyzed, namely a person participating in a crowd event, and the motion trail of the person is determined so as to be convenient for tracking and subsequent processing of the person.

In the embodiment of the invention, in the first aspect, the neural network model at least comprises two branches, wherein one branch counts the crowd density in the image, and the other branch analyzes the crowd behaviors in the image, so that whether the crowd event occurs or not is analyzed from the crowd density and the behaviors, and the accuracy of the analysis result is improved. In the second aspect, the neural network model is obtained based on residual neural network training, so that the neural network model is good in recognition performance. In a third aspect, in the event of a determination that a crowd event has occurred, an alarm is automatically given to prompt the relevant personnel to process in time. In the fourth aspect, under the condition that the occurrence of the crowd event is determined, a detail image can be acquired for the crowd, or the unmanned aerial vehicle is controlled to track and collect the crowd, or the identity information or the movement track of the personnel participating in the event is determined, so that the follow-up processing of related personnel is facilitated.

Corresponding to the above method embodiment, the embodiment of the present invention further provides an image-based crowd situation analysis device, as shown in fig. 6, including:

an acquisition module 601, configured to acquire an image to be analyzed;

the input module 602 is configured to input the image to be analyzed into a neural network model obtained by training in advance;

A statistics module 603, configured to utilize the population density statistics branch in the neural network model to count the population density in the image to be analyzed;

the analysis module 604 is configured to analyze the crowd behavior in the image to be analyzed by using the crowd behavior analysis branch in the neural network model;

the judging module 605 is configured to judge whether a crowd event occurs based on the crowd density and the crowd behavior.

As an embodiment, the apparatus further comprises:

an identification module (not shown in the figure) for identifying a crowd density value at each pixel point in the image to be analyzed by using a density distribution prediction branch in the neural network model;

the judging module 605 is specifically configured to: and judging whether a crowd event occurs or not based on the crowd density, the crowd behaviors and the crowd density value at each pixel point in the image to be analyzed.

As an embodiment, the apparatus further comprises:

a training module (not shown in the figure) for inputting the sample image into a neural network of a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into crowd density statistics branches, crowd behavior analysis branches and density distribution prediction branches in the neural network; iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch; and when the adjustment of the parameters in the convolution layer meets the convergence condition, obtaining the neural network model after training.

As one embodiment, the judging module 605 is specifically configured to:

an alarm module (not shown in the figure) is used for outputting alarm information in the case of judging that the crowd event occurs.

As an embodiment, the obtaining module 601 is specifically configured to:

the control module (not shown in the figure) is used for adjusting the cradle head to acquire images aiming at the area where the crowd event occurs based on the position information of the unmanned aerial vehicle;

As an embodiment, the control module is further configured to:

As an embodiment, the apparatus further comprises:

a determining module (not shown in the figure) for identifying a human body target with abnormal behaviors in the image to be analyzed in the case of judging that a swarm event occurs; extracting characteristics of the human body target, and determining identity information and/or movement tracks of the human body target based on the extracted characteristics.

The embodiment of the invention also provides an image-based crowd situation analysis system, as shown in fig. 7, which comprises: unmanned aerial vehicles and ground stations; wherein,

Or in another embodiment, the unmanned aerial vehicle takes the image acquired by the unmanned aerial vehicle as an image to be analyzed; inputting the image to be analyzed into a neural network model obtained by training in advance; counting the crowd density in the image to be analyzed by using crowd density statistics branches in the neural network model; analyzing crowd behaviors in the image to be analyzed by utilizing crowd behavior analysis branches in the neural network model; and judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors. And then the unmanned aerial vehicle sends the acquired image and the judging result to the ground station.

In the embodiment, the unmanned aerial vehicle can convert the coordinates of the crowd in the image coordinate system into the coordinates in the world coordinate system under the condition that the occurrence of the crowd event is judged, and send the coordinates of the crowd in the world coordinate system to the ground station, so that the related personnel of the ground station can conveniently carry out subsequent processing.

As an implementation manner, the ground station may adjust, based on the position information of the unmanned aerial vehicle, a pan-tilt of the unmanned aerial vehicle to perform image acquisition for an area where the crowd event occurs, when it is determined that the crowd event occurs.

For example, the ground station may determine image coordinates of the population in the image to be analyzed; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle holder and the image coordinate system to be analyzed and the determined image coordinate, PTZ adjustment information of the holder is calculated, and based on the PTZ adjustment information, the holder is adjusted to acquire images for areas where the crowd events occur.

As another implementation manner, the ground station can determine the moving direction and moving speed of the crowd based on the image to be analyzed under the condition that the crowd event occurs; and controlling the unmanned aerial vehicle to track and collect the crowd according to the moving direction and the moving speed.

As an embodiment, the ground station may identify, in the image to be analyzed, a human target in which abnormal behavior exists; extracting characteristics of the human body target, and determining identity information and/or movement tracks of the human body target based on the extracted characteristics.

Under the condition that the extracted features of the ground station are face features, the ground station is connected with a database, and identity information and face features of a plurality of people are stored in the database; the ground station matches the extracted face features with the face features in the database, and the identity information of the personnel is determined according to the matching result. And further, follow-up investigation can be carried out on personnel participating in the group event according to the identity information.

In one case, the ground station is connected with a plurality of monitoring devices; the ground station identifies a human body target matched with the extracted features in the images acquired by the plurality of monitoring devices, and determines the motion trail of the human body target according to the time sequence of the human body target in each monitoring device. The human body target is a human body target with abnormal behaviors in the image to be analyzed, namely a person participating in a crowd event, and the motion trail of the person is determined so as to be convenient for tracking and subsequent processing of the person.

In the embodiment of the invention, in the first aspect, the neural network model at least comprises two branches, wherein one branch counts the crowd density in the image, and the other branch analyzes the crowd behaviors in the image, so that whether the crowd event occurs or not is analyzed from the crowd density and the behaviors, and the accuracy of the analysis result is improved. In the second aspect, under the condition that the occurrence of the crowd event is determined, the ground station can control the unmanned aerial vehicle to acquire detailed images for the crowd, or control the unmanned aerial vehicle to track and acquire the crowd, or determine the identity information or the movement track of the personnel participating in the event, so as to facilitate the subsequent processing of related personnel.

The embodiment of the invention also provides an electronic device, as shown in fig. 8, comprising a processor 801 and a memory 802,

a memory 802 for storing a computer program;

the processor 801 is configured to implement any one of the above-described image-based crowd situation analysis methods when executing the program stored in the memory 802.

The Memory mentioned in the electronic device may include a random access Memory (Random Access Memory, RAM) or may include a Non-Volatile Memory (NVM), such as at least one magnetic disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The electronic device may be various electronic devices such as an unmanned plane, a ground station, a personal computer, a server, and the like, and is not particularly limited.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes any one of the crowd situation analysis methods based on the image when being executed by a processor.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, with respect to apparatus embodiments, system embodiments, apparatus embodiments, and computer readable storage medium embodiments described above, since they are substantially similar to method embodiments, the description is relatively simple, and reference should be made to the description of method embodiments in part.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. The crowd situation analysis method based on the image is characterized by comprising the following steps of:

acquiring an image to be analyzed;

Judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors;

the training mode of the neural network model comprises the following steps:

inputting the sample image into a neural network with a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into crowd density statistics branches, crowd behavior analysis branches and density distribution prediction branches in the neural network; iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch; and when the adjustment of the parameters in the convolution layer meets the convergence condition, obtaining the neural network model after training.

2. The method according to claim 1, wherein the method further comprises:

3. The method of claim 1, wherein the determining whether a crowd event occurs based on the crowd density and the crowd behavior comprises:

4. The method of claim 1, wherein the acquiring the image to be analyzed comprises:

5. The method of claim 4, wherein adjusting the pan-tilt for image acquisition for the area where the crowd event occurs based on the positional information of the drone comprises:

determining image coordinates of people in the image to be analyzed; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle cradle head and the image coordinate system to be analyzed and the determined image coordinates, PTZ adjustment information of the cradle head is calculated, and based on the PTZ adjustment information, the cradle head is adjusted to acquire images for areas where the crowd events occur.

6. The method of claim 1, wherein in the event of a determination that a crowd event has occurred, the method further comprises:

7. An image-based crowd situation analysis device, comprising:

the acquisition module is used for acquiring an image to be analyzed;

the judging module is used for judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors;

the apparatus further comprises:

8. The apparatus of claim 7, wherein the apparatus further comprises:

9. The apparatus of claim 7, wherein the determining module is specifically configured to:

10. The apparatus of claim 7, wherein the obtaining module is specifically configured to:

11. The apparatus of claim 10, wherein the control module is further configured to:

12. The apparatus of claim 7, wherein the apparatus further comprises:

13. An image-based crowd situation analysis system, comprising: unmanned aerial vehicles and ground stations; wherein,

the ground station is used for receiving the image sent by the unmanned aerial vehicle and taking the image as an image to be analyzed; inputting the image to be analyzed into a neural network model obtained by training in advance; counting the crowd density in the image to be analyzed by using crowd density statistics branches in the neural network model; analyzing crowd behaviors in the image to be analyzed by utilizing crowd behavior analysis branches in the neural network model; judging whether a crowd event occurs or not based on the crowd density and the crowd behaviors;

the training mode of the neural network model comprises the following steps:

14. The system of claim 13, wherein the ground station is further configured to:

15. The system of claim 13, wherein the ground station is further configured to:

16. An electronic device comprising a processor and a memory;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.

17. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.