CN111291597A

CN111291597A - Image-based crowd situation analysis method, device, equipment and system

Info

Publication number: CN111291597A
Application number: CN201811494297.6A
Authority: CN
Inventors: 童超; 车军; 任烨; 朱江
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2020-06-16
Anticipated expiration: 2038-12-07
Also published as: CN111291597B

Abstract

The embodiment of the invention provides a method, a device, equipment and a system for analyzing crowd situation based on an image, wherein the method comprises the following steps: inputting the image into a neural network model obtained by pre-training; counting the crowd density in the image by using the crowd density counting branch in the neural network model; analyzing the crowd behaviors in the image by utilizing the crowd behavior analysis branch in the neural network model; and judging whether a group event occurs or not based on the crowd density and the crowd behavior. Therefore, in the scheme, the neural network model at least comprises two branches, one branch counts the crowd density in the image, the other branch analyzes the crowd behavior in the image, whether a group event occurs or not is analyzed from the crowd density and the behavior, and the accuracy of an analysis result is improved.

Description

Image-based crowd situation analysis method, device, equipment and system

Technical Field

The invention relates to the technical field of monitoring, in particular to a crowd situation analysis method, device, equipment and system based on images.

Background

Group events occurring in some public places, such as crowd gathering, fighting, crowd pedaling, etc., greatly affect public safety. At present, the crowd situation can be analyzed based on the monitoring image so as to timely process the crowd event and improve the public safety.

In some related schemes, monitoring images of public places are generally acquired, the number of people in the monitoring images is counted, and if the number is large, a group event occurs. However, in some places often crowded, such as subway stations and train stations at the peak of the morning and evening, the large number of people does not necessarily indicate that the group event occurs. As can be seen, the analysis result of the scheme is poor in accuracy.

Disclosure of Invention

The embodiment of the invention aims to provide a method, a device, equipment and a system for analyzing the crowd situation based on an image so as to improve the accuracy of an analysis result.

In order to achieve the above object, an embodiment of the present invention provides a method for analyzing a crowd situation based on an image, including:

acquiring an image to be analyzed;

inputting the image to be analyzed into a neural network model obtained by pre-training;

counting the crowd density in the image to be analyzed by using the crowd density counting branch in the neural network model;

analyzing the crowd behaviors in the image to be analyzed by utilizing the crowd behavior analysis branch in the neural network model;

and judging whether a group event occurs or not based on the crowd density and the crowd behavior.

Optionally, the method further includes:

utilizing a density distribution prediction branch in the neural network model to identify the crowd density value at each pixel point in the image to be analyzed;

the determining whether a groupware event occurs based on the crowd density and the crowd behavior includes:

and judging whether a population event occurs or not based on the crowd density, the crowd behavior and the crowd density value of each pixel point in the image to be analyzed.

Optionally, the process of training to obtain the neural network model includes:

inputting the sample image into a neural network with a preset structure;

carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into a crowd density statistic branch, a crowd behavior analysis branch and a density distribution prediction branch in the neural network;

iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch;

and when the adjustment of the parameters in the convolutional layer meets the convergence condition, obtaining the trained neural network model.

Optionally, the determining whether a group event occurs based on the crowd density and the crowd behavior includes:

if the crowd density exceeds a preset threshold value and the crowd behavior is abnormal, judging that a group event occurs;

in the case where it is determined that a groupware event has occurred, the method further comprises: and outputting alarm information.

Optionally, the acquiring an image to be analyzed includes:

acquiring an image to be analyzed through an unmanned aerial vehicle holder;

in the case where it is determined that a groupware event has occurred, the method further comprises:

adjusting the cloud deck to acquire images aiming at the region where the group event occurs based on the position information of the unmanned aerial vehicle;

or, under the condition that a group event is judged to occur, determining the moving direction and the moving speed of the crowd based on the image to be analyzed; and controlling the unmanned aerial vehicle to track and collect the crowd according to the moving direction and the moving speed.

Optionally, based on the position information of the unmanned aerial vehicle, adjusting the pan/tilt head to perform image acquisition for an area where a group event occurs includes:

determining image coordinates of the crowd in the image to be analyzed; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle holder and the image coordinate system to be analyzed and the determined image coordinate, PTZ adjustment information of the holder is calculated, and based on the PTZ adjustment information, the holder is adjusted to carry out image acquisition aiming at the region where the group event occurs.

Optionally, in the case that it is determined that a groupware event occurs, the method further includes:

identifying a human body target with abnormal behaviors in the image to be analyzed;

extracting the characteristics of the human body target, and determining the identity information and/or the motion trail of the human body target based on the extracted characteristics.

In order to achieve the above object, an embodiment of the present invention further provides an image-based crowd situation analyzing apparatus, including:

the acquisition module is used for acquiring an image to be analyzed;

the input module is used for inputting the image to be analyzed into a neural network model obtained by pre-training;

the statistical module is used for counting the crowd density in the image to be analyzed by utilizing the crowd density statistical branch in the neural network model;

the analysis module is used for analyzing the crowd behaviors in the image to be analyzed by utilizing the crowd behavior analysis branch in the neural network model;

and the judging module is used for judging whether a group event occurs or not based on the crowd density and the crowd behavior.

Optionally, the apparatus further comprises:

the identification module is used for identifying the crowd density value at each pixel point in the image to be analyzed by using the density distribution prediction branch in the neural network model;

the judgment module is specifically configured to: and judging whether a population event occurs or not based on the crowd density, the crowd behavior and the crowd density value of each pixel point in the image to be analyzed.

Optionally, the apparatus further comprises:

the training module is used for inputting the sample image into a neural network with a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into a crowd density statistic branch, a crowd behavior analysis branch and a density distribution prediction branch in the neural network; iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch; and when the adjustment of the parameters in the convolutional layer meets the convergence condition, obtaining the trained neural network model.

Optionally, the determining module is specifically configured to:

if the crowd density exceeds a preset threshold value and the crowd behavior is abnormal, judging that a group event occurs; the device further comprises:

and the alarm module is used for outputting alarm information under the condition of judging that the group event occurs.

Optionally, the obtaining module is specifically configured to:

acquiring an image to be analyzed through an unmanned aerial vehicle holder; the device further comprises:

the control module is used for adjusting the holder to acquire images aiming at the region where the group event occurs based on the position information of the unmanned aerial vehicle;

Optionally, the control module is further configured to:

Optionally, the apparatus further comprises:

the determining module is used for identifying a human body target with abnormal behaviors in the image to be analyzed under the condition of judging that a group event occurs; extracting the characteristics of the human body target, and determining the identity information and/or the motion trail of the human body target based on the extracted characteristics.

In order to achieve the above object, an embodiment of the present invention further provides an image-based crowd situation analysis system, including: an unmanned aerial vehicle and a ground station; wherein,

the unmanned aerial vehicle is used for acquiring images and sending the acquired images to the ground station;

the ground station is used for receiving the image sent by the unmanned aerial vehicle as an image to be analyzed; inputting the image to be analyzed into a neural network model obtained by pre-training; counting the crowd density in the image to be analyzed by using the crowd density counting branch in the neural network model; analyzing the crowd behaviors in the image to be analyzed by utilizing the crowd behavior analysis branch in the neural network model; and judging whether a group event occurs or not based on the crowd density and the crowd behavior.

Optionally, the ground station is further configured to:

under the condition that the occurrence of the group event is judged, adjusting a tripod head of the unmanned aerial vehicle to acquire images aiming at the region where the group event occurs based on the position information of the unmanned aerial vehicle;

Optionally, the ground station is further configured to:

In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor and a memory;

a memory for storing a computer program;

and the processor is used for realizing any one of the image-based crowd situation analysis methods when executing the program stored in the memory.

In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any one of the above image-based people situation analysis methods.

In the embodiment of the invention, an image is input into a neural network model obtained by pre-training; counting the crowd density in the image by using the crowd density counting branch in the neural network model; analyzing the crowd behaviors in the image by utilizing the crowd behavior analysis branch in the neural network model; and judging whether a group event occurs or not based on the crowd density and the crowd behavior. Therefore, in the scheme, the neural network model at least comprises two branches, one branch counts the crowd density in the image, the other branch analyzes the crowd behavior in the image, whether a group event occurs or not is analyzed from the crowd density and the behavior, and the accuracy of an analysis result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for analyzing a crowd situation based on an image according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a first structure of a neural network according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a second structure of a neural network according to an embodiment of the present invention;

FIG. 4 is a first structural diagram of a neural network model according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a second structure of a neural network model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an image-based crowd situation analyzing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a system for analyzing a crowd situation based on an image according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Corresponding to the above method embodiments, embodiments of the present invention provide a method, an apparatus, a device, and a system for analyzing a crowd situation based on an image, where the method and the apparatus may be applied to various electronic devices such as an unmanned aerial vehicle, a ground station, a personal computer, and a server, and are not limited specifically. First, the image-based crowd situation analysis method provided by the embodiment of the present invention is explained in detail below.

Fig. 1 is a schematic flow chart of a method for analyzing a crowd situation based on an image according to an embodiment of the present invention, including:

s101: and acquiring an image to be analyzed.

For convenience of description, in this embodiment, an image that needs to be subjected to crowd situation analysis is referred to as an image to be analyzed. If the execution main body is an unmanned aerial vehicle, the unmanned aerial vehicle can acquire images through a holder of the unmanned aerial vehicle, and the acquired images serve as images to be analyzed. If the execution subject is a ground station, the ground station can receive the image sent by the unmanned aerial vehicle as the image to be analyzed. If the execution subject is other electronic equipment, the electronic equipment can receive the image sent by the monitoring equipment as the image to be analyzed. The image acquisition modes are many and are not listed.

S102: and inputting the image to be analyzed into a neural network model obtained by pre-training.

S103: and counting the crowd density in the image to be analyzed by utilizing the crowd density counting branch in the neural network model.

S104: and analyzing the crowd behaviors in the image to be analyzed by utilizing the crowd behavior analysis branch in the neural network model.

In the embodiment of the invention, the trained neural network model at least comprises two branches: one branch is used for counting the crowd density in the image and is called as a crowd density counting branch; the other branch is used for analyzing the crowd behavior in the image and is called a crowd behavior analysis branch.

As an embodiment, the process of training the neural network model may include:

acquiring a sample image required by training, and inputting the sample image into a neural network with a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into a crowd density statistical branch and a crowd behavior analysis branch in the neural network; iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, and a second output result of the crowd behavior analysis branch; and when the adjustment of the parameters in the convolutional layer meets the convergence condition, obtaining the trained neural network model.

The neural network has the same structure as the neural network model, the training process is the process of adjusting the parameters of the neural network, and the trained neural network model is obtained when the parameter adjustment is completed.

In the present embodiment, for the purpose of description distinction, the loss function of the demographic branch is referred to as a first loss function, and the second loss function of the demographic behavior analysis branch is referred to as a second loss function.

For example, the first loss function may be a binary cross-entropy function, such as:

wherein ,L_BehaveA loss value representing a first loss function, i represents a sample image in the training process, N represents the total number of sample images,

the predicted value of the crowd behavior label corresponding to the sample image i is represented, S_iAnd representing the real value of the crowd behavior label corresponding to the sample image i. For example, crowd behavior tags may include both the presence of abnormal behavior and the absence of abnormal behaviorFor example, the crowd behavior tag may be 1 or-1, with 1 indicating the absence of abnormal behavior and-1 indicating the presence of abnormal behavior.

The second loss function may be a multi-class cross-entropy loss function, such as:

wherein ,L_DensityAnd (3) representing the loss value of the second loss function, i represents the sample image in the training process, N represents the total number of the sample images, j represents the crowd density grade, in the formula 2, j comprises 5 grades, the specific number of the grades is not limited, and 5 in the formula 2 can be changed into other values.

Representing the probability value, S, of predicting a sample image i to a density level j_ijIndicating whether the sample image i is the true value of the density level j, for example, if the density level corresponding to the sample image i is 1, S_i1Can be 1, S_i2-S_i5May be 0. For example, the crowd density may be divided into 5 levels: very rare (0-20), sparse (21-50), medium (51-100), dense (101-500), very dense (501 or more). The grade division can be performed according to the actual situation, and the specific division mode is not limited.

Referring to fig. 2, a sample image to be trained is input to a convolutional layer in a neural network, and data output by the convolutional layer is respectively input to a fully connected layer of a crowd density statistics branch and a fully connected layer of a crowd behavior analysis branch; the neural network is trained, i.e. parameters in the neural network are adjusted, using the first loss function and the second loss function.

In one case, the neural network of the preset structure may be a Residual neural network (resnet). In this case, the convolutional layer in fig. 2 may be resnet18, which includes a convolution of five layers. The recognition performance of the residual neural network is better. Alternatively, the neural network may be another type of neural network, and is not limited in particular.

As another embodiment, a density distribution prediction branch may be further included in the neural network model, in which case, the density distribution prediction branch may be used to identify a population density value at each pixel point in the image to be analyzed.

In this case, the process of training the neural network model may include:

inputting the sample image into a neural network with a preset structure;

The first loss function and the second loss function may refer to the above, and the third loss function may be:

wherein ,L_CrowdHeatmapExpressing the loss value of a third loss function, k expressing the kth pixel point in the sample image, M expressing the total number of the pixel points in the sample image, S_kThe true crowd density value at pixel point k is represented,

and representing the predicted crowd density value at the pixel point k.

For example, if there is no people distribution at pixel point k, the crowd density value at pixel point k is 0. If 3 people are included in the area a and 1000 pixels are included in the area a, the crowd density value at each pixel in the area a is 3/1000.

Referring to fig. 3, a sample image to be trained is input to a convolutional layer in a neural network, and data output by the convolutional layer is respectively input to a fully connected layer of a crowd density statistics branch, a fully connected layer of a crowd behavior analysis branch, and a fully connected layer of a density distribution prediction branch; the neural network is trained, i.e. parameters in the neural network are adjusted, using the first loss function, the second loss function and the third loss function.

In one case, the neural network of the preset structure may be a Residual neural network (resnet). In this case, the convolutional layer in fig. 3 may be resnet18, which includes a convolution of five layers. In the density distribution prediction branch, the data output by resnet18 may be input into a convolution of 1 × 1 to obtain a density distribution heat map, which includes the population density values at each pixel point in the sample image.

S105: and judging whether a group event occurs or not based on the crowd density and the crowd behavior.

Referring to fig. 4, fig. 4 may be understood as a neural network model obtained after the neural network in fig. 2 is trained, a population density statistical branch in the neural network model may output the population density in the image, and a population behavior analysis branch may output whether the population in the image has abnormal behavior.

In this case, if the output result of the neural network model indicates: and if the crowd density exceeds a preset threshold and the crowd behavior is abnormal, judging that a group event occurs.

Or, in another case, the output of the population density statistics branch may also be a density level corresponding to the population density in the image; in this case, if the output result of the neural network model indicates: and if the crowd density grade reaches a preset grade and the crowd behavior is abnormal, judging that a group event occurs.

As described above, in another embodiment, the neural network model further includes a density distribution prediction branch. Referring to fig. 5, fig. 5 may be understood as a neural network model obtained after the neural network training in fig. 3 is completed, where the population density statistical branch may output the population density (or the population density level) in the image, the population behavior analysis branch may output whether there is an abnormal behavior in the population in the image, and the density distribution prediction branch may output the population density value at each pixel point in the image.

In one case, the convolutional layer in fig. 5 may be resnet18, which includes a convolution of five layers. In the density distribution prediction branch, the data output by resnet18 may be input into a convolution of 1 × 1 to obtain a density distribution heat map, where the density distribution heat map includes the crowd density values at each pixel point in the image to be analyzed.

In this embodiment, it may be determined whether a population event occurs according to the output results of the three branches, that is, S105 may include: and judging whether a population event occurs or not based on the crowd density, the crowd behavior and the crowd density value of each pixel point in the image to be analyzed.

For example, three determination conditions may be set for the three branches: firstly, the crowd density exceeds a first preset threshold; secondly, the number of the target pixel points exceeds a second preset threshold, and the target pixel points are: pixel points with the crowd density value exceeding a third preset threshold value; thirdly, the behavior of the crowd is abnormal. The specific values of the first preset threshold, the second preset threshold, the third preset threshold and other preset thresholds mentioned in this embodiment are not limited.

If the output result of the neural network model indicates that any two of the output results are satisfied, it can be determined that a population event occurs. Alternatively, when the output result of the neural network model indicates that the three conditions are satisfied, it may be determined that a population event has occurred.

In one embodiment, in the event that a groupware event is determined to have occurred, alarm information may be output. The alarm information may be text information, voice information, or may also be a flash lamp alarm, etc., and the specific alarm mode is not limited.

As described above, in one embodiment, the image to be analyzed may be acquired by the unmanned aerial vehicle pan-tilt; in this case, when it is determined that a group event occurs, the pan/tilt head may be adjusted to perform image capturing for an area where the group event occurs, based on the position information of the drone.

For example, the image coordinates of the population in the image to be analyzed can be determined; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle holder and the image coordinate system to be analyzed and the determined image coordinate, PTZ adjustment information of the holder is calculated, and based on the PTZ adjustment information, the holder is adjusted to carry out image acquisition aiming at the region where the group event occurs.

The PTZ coordinate system is: pan, Tilt, Zoom coordinate system can acquire the conversion relation between image coordinate system and unmanned aerial vehicle cloud platform PTZ coordinate system in advance. Assuming that the image coordinates of the crowd in the image to be analyzed are determined to be (X1, Y1), and the coordinates of the center point of the image are (X0, Y0); according to the conversion relationship between the image coordinate system and the PTZ coordinate system, the (X1, Y1) → (X0, Y0) positional deviation relationship is converted into the unmanned aerial vehicle pan-tilt PTZ adjustment information. Assuming that the PTZ coordinates of the unmanned aerial vehicle pan-tilt head when the image to be analyzed is acquired are (P1, T1, Z1), the unmanned aerial vehicle pan-tilt head is adjusted based on the PTZ adjustment information on the basis of (P1, T1, Z1). Like this, unmanned aerial vehicle cloud platform after the adjustment aims at crowd and carries out image acquisition to can enlarge the crowd region in the image through adjusting Zoom, obtain more details about the crowd.

Alternatively, the conversion relationship between the image coordinate system and the world coordinate system is also acquired in advance. If the unmanned aerial vehicle is far away from the crowd, the coordinates of the crowd in the image coordinate system can be converted into the coordinates in the world coordinate system; and then adjusting the unmanned aerial vehicle to fly towards the crowd according to the crowd and the coordinates of the unmanned aerial vehicle in the world coordinate system, and adjusting the holder to acquire images of the area with the group event when the unmanned aerial vehicle flies to be closer to the crowd.

In another embodiment, after an image to be analyzed is acquired through an unmanned aerial vehicle holder, the moving direction and the moving speed of a crowd can be determined based on the image to be analyzed under the condition that a group event is judged to occur; and controlling the unmanned aerial vehicle to track and collect the crowd according to the moving direction and the moving speed.

For example, the image to be analyzed may be a video frame image, and based on the video frame image, the moving direction and the moving speed of the crowd in the image coordinate system may be determined. The conversion relation between the image coordinate system and the world coordinate system can be obtained in advance, and the moving direction and the moving speed of the crowd in the world coordinate system are determined according to the conversion relation and the moving direction and the moving speed of the crowd in the image coordinate system. According to the moving direction and the moving speed of the crowd in the world coordinate system, the flight direction and the flight speed of the unmanned aerial vehicle are adjusted so as to track and collect the crowd.

As an embodiment, in the case that it is determined that a population event occurs, a human target with abnormal behavior may be identified in the image to be analyzed; extracting the characteristics of the human body target, and determining the identity information and/or the motion trail of the human body target based on the extracted characteristics.

The extracted features may be face features, or may also be clothing features, such as clothing color, whether to be a backpack, or may also be attribute features such as gender and age, which are not limited specifically. The extracted features may be stored in the form of a preset structure, that is, structured information.

In one case, it is assumed that the device (execution main body) executing the scheme is a ground station, the features extracted by the ground station are face features, the ground station is connected with a database, and the database stores identity information and face features of a plurality of persons; and the ground station matches the extracted human face features with the human face features in the database, and determines the identity information of the personnel according to the matching result. Further, the personnel participating in the group event can be subsequently investigated according to the identity information.

In one case, it is assumed that the device (execution main body) executing the scheme is a ground station, and the ground station is connected with a plurality of monitoring devices; and the ground station identifies the human body target matched with the extracted features in the images acquired by the plurality of monitoring devices, and determines the motion track of the human body target according to the time sequence of the human body target appearing in each monitoring device. The human body target, namely the human body target with abnormal behavior in the image to be analyzed, namely the personnel participating in the group event, determines the motion trail of the personnel so as to facilitate the tracking and subsequent processing of the personnel.

In the embodiment of the invention, in the first aspect, the neural network model at least comprises two branches, one branch counts the crowd density in the image, and the other branch analyzes the crowd behavior in the image, so that whether a group event occurs or not is analyzed from the two aspects of the crowd density and the behavior, and the accuracy of an analysis result is improved. In the second aspect, the neural network model obtained based on the residual neural network training has better recognition performance. And in the third aspect, under the condition that the occurrence of the group event is judged, an alarm is automatically given to prompt relevant personnel to process in time. In the fourth aspect, under the condition that a group event is determined to occur, a detailed image can be acquired for a crowd, or an unmanned aerial vehicle is controlled to track and collect the crowd, or identity information or a motion track of a person participating in the event is determined, so that subsequent processing of related persons is facilitated.

Corresponding to the above method embodiment, an embodiment of the present invention further provides an image-based crowd situation analyzing apparatus, as shown in fig. 6, including:

an obtaining module 601, configured to obtain an image to be analyzed;

an input module 602, configured to input the image to be analyzed into a neural network model obtained through pre-training;

a statistic module 603, configured to utilize a population density statistic branch in the neural network model to count a population density in the image to be analyzed;

an analysis module 604, configured to analyze the crowd behavior in the image to be analyzed by using the crowd behavior analysis branch in the neural network model;

a determining module 605, configured to determine whether a group event occurs based on the crowd density and the crowd behavior.

As an embodiment, the apparatus further comprises:

an identification module (not shown in the figure) for identifying a crowd density value at each pixel point in the image to be analyzed by using a density distribution prediction branch in the neural network model;

the determining module 605 is specifically configured to: and judging whether a population event occurs or not based on the crowd density, the crowd behavior and the crowd density value of each pixel point in the image to be analyzed.

As an embodiment, the apparatus further comprises:

a training module (not shown in the figure) for inputting the sample image into a neural network of a preset structure; carrying out convolution processing on the sample image by utilizing a convolution layer in the neural network, and respectively inputting data obtained after the convolution processing into a crowd density statistic branch, a crowd behavior analysis branch and a density distribution prediction branch in the neural network; iteratively adjusting parameters in the convolutional layer based on a first loss function of the crowd density statistics branch, a first output result of the crowd density statistics branch, a second loss function of the crowd behavior analysis branch, a second output result of the crowd behavior analysis branch, a third loss function of the density distribution prediction branch, and a third output result of the density distribution prediction branch; and when the adjustment of the parameters in the convolutional layer meets the convergence condition, obtaining the trained neural network model.

As an embodiment, the determining module 605 is specifically configured to:

and an alarm module (not shown in the figure) for outputting alarm information in case of determining that a group event occurs.

As an embodiment, the obtaining module 601 is specifically configured to:

a control module (not shown in the figure) for adjusting the cradle head to perform image acquisition for an area where a group event occurs based on the position information of the unmanned aerial vehicle;

As an embodiment, the control module is further configured to:

As an embodiment, the apparatus further comprises:

a determining module (not shown in the figure) for identifying a human body target with abnormal behavior in the image to be analyzed under the condition that the occurrence of the group event is judged; extracting the characteristics of the human body target, and determining the identity information and/or the motion trail of the human body target based on the extracted characteristics.

An embodiment of the present invention further provides an image-based crowd situation analysis system, as shown in fig. 7, the system includes: an unmanned aerial vehicle and a ground station; wherein,

Or in another embodiment, the unmanned aerial vehicle takes the image acquired by the unmanned aerial vehicle as the image to be analyzed; inputting the image to be analyzed into a neural network model obtained by pre-training; counting the crowd density in the image to be analyzed by using the crowd density counting branch in the neural network model; analyzing the crowd behaviors in the image to be analyzed by utilizing the crowd behavior analysis branch in the neural network model; and judging whether a group event occurs or not based on the crowd density and the crowd behavior. Then the unmanned aerial vehicle sends the acquired image and the judgment result to the ground station.

In this embodiment, the unmanned aerial vehicle may further convert the coordinates of the crowd in the image coordinate system into coordinates in the world coordinate system when it is determined that the group event occurs, and transmit the coordinates of the crowd in the world coordinate system to the ground station, so as to facilitate subsequent processing by a person associated with the ground station.

As an embodiment, the ground station may adjust a pan-tilt head of the drone to perform image acquisition for an area where a group event occurs based on the position information of the drone in a case where the group event is determined to occur.

For example, the ground station may determine image coordinates of the population in the image to be analyzed; based on the conversion relation between the PTZ coordinate system of the unmanned aerial vehicle holder and the image coordinate system to be analyzed and the determined image coordinate, PTZ adjustment information of the holder is calculated, and based on the PTZ adjustment information, the holder is adjusted to carry out image acquisition aiming at the region where the group event occurs.

As another embodiment, the ground station may determine, based on the image to be analyzed, a moving direction and a moving speed of the crowd when it is determined that a group event occurs; and controlling the unmanned aerial vehicle to track and collect the crowd according to the moving direction and the moving speed.

As an embodiment, the ground station may identify a human target with abnormal behavior in the image to be analyzed; extracting the characteristics of the human body target, and determining the identity information and/or the motion trail of the human body target based on the extracted characteristics.

In one case, the features extracted by the ground station are face features, the ground station is connected with a database, and the database stores identity information and face features of a plurality of persons; and the ground station matches the extracted human face features with the human face features in the database, and determines the identity information of the personnel according to the matching result. Further, the personnel participating in the group event can be subsequently investigated according to the identity information.

In one case, a ground station is connected with a plurality of monitoring devices; and the ground station identifies the human body target matched with the extracted features in the images acquired by the plurality of monitoring devices, and determines the motion track of the human body target according to the time sequence of the human body target appearing in each monitoring device. The human body target, namely the human body target with abnormal behavior in the image to be analyzed, namely the personnel participating in the group event, determines the motion trail of the personnel so as to facilitate the tracking and subsequent processing of the personnel.

In the embodiment of the invention, in the first aspect, the neural network model at least comprises two branches, one branch counts the crowd density in the image, and the other branch analyzes the crowd behavior in the image, so that whether a group event occurs or not is analyzed from the two aspects of the crowd density and the behavior, and the accuracy of an analysis result is improved. In a second aspect, under the condition that a group event is determined to occur, the ground station can control the unmanned aerial vehicle to acquire detailed images for the crowd, or control the unmanned aerial vehicle to track and acquire the crowd, or determine the identity information or the motion track of the people participating in the event, so as to facilitate the subsequent processing of the related people.

An electronic device is also provided in the embodiments of the present invention, as shown in fig. 8, including a processor 801 and a memory 802,

a memory 802 for storing a computer program;

the processor 801 is configured to implement any of the above-described methods for analyzing a situation of a crowd based on an image when executing a program stored in the memory 802.

The Memory mentioned in the above electronic device may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The electronic equipment can be various electronic equipment such as an unmanned aerial vehicle, a ground station, a personal computer and a server, and is not limited specifically.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the method for analyzing the crowd situation based on the image is realized.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, system embodiments, device embodiments, and computer-readable storage medium embodiments described above are substantially similar to method embodiments, so that the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An image-based crowd situation analysis method is characterized by comprising the following steps:

acquiring an image to be analyzed;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein training the neural network model comprises:

inputting the sample image into a neural network with a preset structure;

4. The method of claim 1, wherein said determining whether a population event has occurred based on said population density and said population behavior comprises:

5. The method of claim 1, wherein the acquiring an image to be analyzed comprises:

acquiring an image to be analyzed through an unmanned aerial vehicle holder;

6. The method according to claim 5, wherein the adjusting the pan/tilt head for image acquisition of the region where the group event occurs based on the position information of the drone comprises:

7. The method according to claim 1, wherein in case it is decided that a population event occurs, the method further comprises:

8. An image-based crowd situation analysis apparatus, comprising:

the acquisition module is used for acquiring an image to be analyzed;

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 9, further comprising:

11. The apparatus of claim 8, wherein the determining module is specifically configured to:

12. The apparatus of claim 8, wherein the obtaining module is specifically configured to:

13. The apparatus of claim 12, wherein the control module is further configured to:

14. The apparatus of claim 8, further comprising:

15. An image-based crowd situational analysis system, comprising: an unmanned aerial vehicle and a ground station; wherein,

16. The system of claim 15, wherein the ground station is further configured to:

17. The system of claim 15, wherein the ground station is further configured to:

18. An electronic device comprising a processor and a memory;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

19. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.