CN113673360A

CN113673360A - Human body distribution detection method, aerial photography device, electronic device, and storage medium

Info

Publication number: CN113673360A
Application number: CN202110855695.1A
Authority: CN
Inventors: 宋子昂
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-11-19

Abstract

The application relates to a human body distribution detection method, aerial photography equipment, an electronic device and a storage medium, wherein human body detection is carried out on a first video stream acquired by equipment moving in the air at an overlooking angle to obtain human body position information of each of a plurality of human bodies; classifying each human body according to the human body position information of each human body, and determining the human body state of each human body according to the classification result, wherein the human body state comprises a non-isolated state and an isolated state; the method comprises the steps of obtaining route information of equipment moving in the air, determining a target area in each video frame in a first video stream according to the route information, and determining human body distribution characteristics in the target area according to human body position information of each human body and human body states of each human body, so that the problem of low accuracy of the human body distribution characteristics obtained based on overlook angle detection is solved, and the accuracy of detecting the human body distribution characteristics is improved.

Description

Human body distribution detection method, aerial photography device, electronic device, and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to a human body distribution detection method, an aerial photography device, an electronic apparatus, and a storage medium.

Background

Dense crowd detection is to carry out detection and count the number of small head targets of a dense human body in a visual mode by an aerial robot.

The related art provides a method for detecting dense crowd based on an unmanned aerial vehicle, wherein the unmanned aerial vehicle carrying an embedded GPU (graphic processing unit) computing module replaces a fixed server and fixed monitoring equipment; and calculating the crowd density distribution map by using a deep learning algorithm to calculate the crowd number. The disadvantages are: the mode of the crowd density distribution map is adopted to calculate the crowd quantity, the image shooting height and the image shooting angle are required to be stable, and the mode can be realized under the condition that the unmanned aerial vehicle is matched with a high-quality holder when hovering, but the method sacrifices the characteristic that the unmanned aerial vehicle flexibly moves, and the mode of the density distribution map cannot specifically count individual information, so that the difference between the operation result and the actual condition is large easily caused.

Aiming at the problem that the accuracy of pedestrian distribution characteristics obtained based on overlook angle detection in the related art is low, no effective solution is provided at present.

Disclosure of Invention

The embodiment provides a human body distribution detection method, an aerial photography device, an electronic device and a storage medium, so as to solve the problem that the accuracy of human body distribution characteristics obtained based on overlooking angle detection in the related art is low.

In a first aspect, in this embodiment, a human body distribution detection method is provided, including:

acquiring a first video stream acquired by equipment moving in the air at an overlooking angle, and carrying out human body detection on the first video stream to obtain human body position information of each human body in a plurality of human bodies;

classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result, wherein the human body states comprise a non-isolated state and an isolated state;

acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.

In some of these embodiments, the route information includes a corresponding timestamp and route location, and determining the target region in each video frame in the first video stream from the route information includes:

dividing the first video stream according to the timestamps to obtain each video frame;

and determining a geographic position corresponding to the target area according to the route position, and mapping the geographic position to each video frame to obtain the target area in each video frame.

In some embodiments, the target area includes a safe area and an unsafe area, and after determining the target area in each video frame in the first video stream according to the route information and determining the human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body, the method further includes:

determining that the human body with the human body state as the non-isolated state and the human body position information belonging to the safety region is a safety object; and the number of the first and second groups,

and determining that the human body with the human body state as the isolated state and the human body position information belonging to the unsafe area is a dangerous object.

In some embodiments, the human body whose human body state is determined to be the non-isolated state and whose human body position information belongs to the safety region is a safety object; and after determining that the human body with the human body state being the isolated state and the human body position information belonging to the unsafe area is a dangerous object, the method further comprises:

determining the number of the safety objects and acquiring the position information of the dangerous objects;

and reporting the number of the safety objects and the position information of the dangerous objects.

In some embodiments, performing human body detection on the first video stream to obtain human body position information of each of a plurality of human bodies includes:

inputting the first video stream to a trained deep convolutional neural network, performing target detection on the first video stream with the heads of the human bodies as targets based on the deep convolutional neural network, determining each head of the detected multiple heads as each human body, and determining the position information of each detected head as the human body position information of each human body.

In some embodiments, inputting the first video stream to a trained deep convolutional neural network, and performing target detection on the first video stream based on the deep convolutional neural network by using a human head as a target comprises:

introducing a channel attention mechanism in the deep convolutional neural network, and determining the weight of each characteristic channel through the channel attention mechanism;

according to the weight of each characteristic channel, carrying out down-sampling processing on the first video stream for multiple times to obtain a multi-scale characteristic map with channel attention;

and performing feature fusion on the multi-scale feature map, and outputting feature information of a plurality of human heads.

In some embodiments, before inputting the first video stream to a trained deep convolutional neural network, and performing target detection on the first video stream based on the deep convolutional neural network by using the human head as a target, the method further includes:

acquiring a second video stream acquired by equipment moving in the air in a downward angle;

marking the head of the human body in the second video stream, and dividing the marked video stream into a training set and a test set according to a preset proportion;

constructing a deep convolutional neural network, wherein the deep convolutional neural network comprises N downsampling units, channel attention units are arranged in the first M downsampling units, and N is greater than M;

and training the deep convolutional neural network according to the training set and the test set to obtain the trained deep convolutional neural network.

In some embodiments, classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result includes:

acquiring height information of the equipment relative to the ground, and determining a neighborhood radius according to the height information and a preset minimum contained point number;

acquiring a first feature point cluster which takes a human head as a feature point, selecting a first feature point which is not accessed in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point;

determining that the feature points in the first feature point cluster and the second feature point cluster belong to a first category, and determining that the feature points outside the first feature point cluster and the second feature point cluster belong to a second category;

determining the human body state corresponding to the feature points of the first category as the non-isolated state, and determining the human body state corresponding to the feature points of the second category as the isolated state.

In some embodiments, obtaining a first feature point cluster, selecting an unvisited first feature point in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point includes:

judging whether the number of adjacent feature points in the neighborhood range of the first feature point reaches the preset minimum contained point number or not;

and under the condition that the number of the adjacent feature points in the first feature point neighborhood range reaches the preset minimum contained point number, generating the second feature point cluster according to the first feature point and the adjacent feature points in the first feature point neighborhood range, and marking the accessed first feature point.

In some embodiments, in the case that it is determined that the number of neighboring feature points within the first feature point neighborhood range does not reach the preset minimum inclusion point number, the method further includes:

and determining that the first characteristic point is a noise point, and determining that the first characteristic point belongs to the second category.

In some embodiments, obtaining a first feature point cluster having a human head as a feature point comprises:

selecting a first characteristic point from the plurality of characteristic points which are not accessed by any characteristic point;

determining adjacent feature points in the neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point;

and generating the first characteristic point cluster according to the first characteristic point and the adjacent characteristic points in the neighborhood range of the first characteristic point.

In a second aspect, the present embodiment provides an aerial photographing apparatus, including a flying body carrying a camera and a processor connected to each other; the camera is used for shooting a video stream; the processor is configured to execute the human body distribution detection method according to the first aspect.

In a third aspect, in this embodiment, there is provided an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the human body distribution detection method according to the first aspect is implemented.

In a fourth aspect, in the present embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the human body distribution detection method described in the first aspect above.

Compared with the related art, the human body distribution detection method, the aerial photography equipment, the electronic device and the storage medium provided in the embodiment perform human body detection on the first video stream by acquiring the first video stream acquired by the equipment moving in the air at an overlooking angle to obtain human body position information of each of a plurality of human bodies; classifying each human body according to the human body position information of each human body, and determining the human body state of each human body according to the classification result, wherein the human body state comprises a non-isolated state and an isolated state; the method comprises the steps of obtaining route information of equipment moving in the air, determining a target area in each video frame in a first video stream according to the route information, and determining human body distribution characteristics in the target area according to human body position information of each human body and human body states of each human body, so that the problem of low accuracy of the human body distribution characteristics obtained based on overlook angle detection is solved, and the accuracy of detecting the human body distribution characteristics is improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a terminal of the human body distribution detection method of the present embodiment;

fig. 2 is a flowchart of a human body distribution detection method of the present embodiment;

FIG. 3 is a training flow chart of the YOLOv3 deep convolutional neural network of the present embodiment;

FIG. 4 is a schematic structural diagram of a YOLOv3 deep convolutional neural network of the present embodiment;

FIG. 5 is an adaptive density clustering method based on device height of an embodiment;

FIG. 6 is a diagram illustrating the adaptive density clustering result based on the device height according to this embodiment;

fig. 7 is a flowchart of the human body distribution detection method based on the unmanned aerial vehicle according to the preferred embodiment.

Detailed Description

For a clearer understanding of the objects, aspects and advantages of the present application, reference is made to the following description and accompanying drawings.

Unless defined otherwise, technical or scientific terms used herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of this application do not denote a limitation of quantity, either in the singular or the plural. The terms "comprises," "comprising," "has," "having," and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or modules, but may include other steps or modules (elements) not listed or inherent to such process, method, article, or apparatus. Reference throughout this application to "connected," "coupled," and the like is not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. In general, the character "/" indicates a relationship in which the objects associated before and after are an "or". The terms "first," "second," "third," and the like in this application are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or a similar computing device. For example, the method is executed on a terminal, and fig. 1 is a block diagram of a hardware structure of the terminal of the human body distribution detection method according to the embodiment. As shown in fig. 1, the terminal may include one or more processors 102 (only one shown in fig. 1) and a memory 104 for storing data, wherein the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those of ordinary skill in the art that the structure shown in fig. 1 is merely an illustration and is not intended to limit the structure of the terminal described above. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the human body distribution detection method in the embodiment, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. The network described above includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In this embodiment, a human body distribution detection method is provided, and fig. 2 is a flowchart of the human body distribution detection method of this embodiment, as shown in fig. 2, the flowchart includes the following steps:

step S1, acquiring a first video stream acquired by the aerial moving device from a top view angle, and performing human body detection on the first video stream to obtain human body position information of each of a plurality of human bodies.

The aerial mobile device may be an aerial device such as a drone, a radar, which shoots the land and/or the river in a top view while flying in the air, resulting in a first video stream.

When detecting the human body in the first video stream, the human body may be detected by using a manual feature design method, or by using a deep network feature learning method, which is not limited in this embodiment. Where the human body position information is the position relative to the individual video frames in the video stream, not the actual geographical position.

And step S2, classifying each human body according to the human body position information of each human body, and determining the human body state of each human body according to the classification result, wherein the human body state comprises a non-isolated state and an isolated state.

In this embodiment, when the distance between the plurality of human bodies in a group is smaller than a first threshold and/or the number of the plurality of human bodies is larger than a second threshold, the human body state of each human body in the group is defined as a non-isolated state, and the human body state of each human body not in the group is defined as an isolated state.

In specific implementation, a certain part of a human body can be used as a feature point, a Density-Based Spatial Clustering of Applications with Noise (DBSCAN for short) is adopted to cluster the feature points, the feature points capable of forming clusters with other feature points are classified as non-isolated feature points, and the feature points incapable of forming clusters with other feature points are classified as isolated feature points. The feature points that can form clusters with each other satisfy a preset condition, for example, the distance between the feature points is smaller than a first threshold, and/or the number of feature points belonging to the same cluster is larger than a second threshold.

Step S3, acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.

The route information is information for guiding equipment to patrol, and the route information comprises a corresponding timestamp and a route position, namely the flight time and the flight path of the equipment are planned. In some embodiments, the first video stream may be divided according to the timestamp to obtain each video frame, a geographic position corresponding to the target area is determined according to the airline position, and the geographic position is mapped to each video frame, that is, the geographic position of the target area is converted into a position relative to the video frame to obtain the target area in each video frame. Under the condition that the geographic position of the target area is known, the target area of each video frame can be updated in real time through route information, whether the human body is in the target area can be known by judging the position relation between the human body and the target area, and the human body distribution characteristics in the target area can be obtained based on the current target area and the state of the human body.

The human body distribution detection method can be applied to monitoring scenes of some traffic road sections or scenic spots, and provides technical support for treatment work of related management departments.

Compared with the mode of detecting crowds by adopting a density distribution diagram in the related art, the method distinguishes the human body in a non-isolated state and the human body in an isolated state by detecting the state of the human body, can count the individual information, and reduces the error between the operation result and the actual condition. And the target area in the video frame is updated according to the route information, the human body distribution characteristics in the target area are obtained based on the current target area and the state of the human body, the human body real-time distribution condition of the target area is obtained through the detection of the two dimensional information, and the real-time performance of the human body distribution characteristics is improved. Through the steps, the problem of low accuracy of the human body distribution characteristics obtained based on overlooking angle detection is solved, and the accuracy of detecting the human body distribution characteristics is improved.

In some embodiments, the target area may be divided into a plurality of areas, such as a safe area and an unsafe area, according to actual needs, and the distribution characteristics of the human body in the target area may be divided into the following cases: a non-isolated state human body exists in the safety area; a human body in an isolated state exists in the safety area; a non-isolated state human body exists in the non-safety area; isolated state human bodies exist in the unsafe zone.

After the human body distribution characteristics in the target area are determined, determining that the human body with the human body state being in a non-isolated state and the human body position information belonging to the safe area is a safe object; and determining that the human body with the human body state being an isolated state and the human body position information belonging to the unsafe area is a dangerous object.

Further, after the human body distribution characteristics are obtained, the number of the safety objects and the position information of the dangerous objects are reported by determining the number of the safety objects and acquiring the position information of the dangerous objects.

When the number of the safe objects and the position information of the dangerous objects are reported, the current position can be obtained through a Global Positioning System (GPS) of the equipment, the current position is reported to a user as the position information of the dangerous objects, the user can be a server, such as a ground command center, or a portable equipment, such as a smart phone, a person on duty can know the human body distribution characteristics of a target area through the ground command center or the smart phone, and then the dangerous objects are driven away from an unsafe area by the person on duty, so that the man-machine coordination capacity is improved through a combined mode of 'aerial equipment inspection and ground person driving', and the pedestrian monitoring and management capacity of a traffic road section and a scenic spot is improved.

In some preferred embodiments, can adopt many rotor unmanned aerial vehicle with predetermineeing the airline flight, many rotor unmanned aerial vehicle patrol and examine the shooting to intensive crowd gathering highway section. According to the dividing characteristics of the road sections, the movable area of the human body is divided into a safe area, and the non-movable area of the human body is divided into a non-safe area. For video streams shot by the multi-rotor unmanned aerial vehicle, coordinate information of boundary lines and key areas in each frame of video images in the video streams is determined according to route information, and the dividing ranges of safe areas and unsafe areas in the video images are continuously updated to obtain real-time human body distribution characteristics.

In step S1, the first video stream may be input to the trained deep convolutional neural network, target detection targeting the head of the human body may be performed on the first video stream based on the deep convolutional neural network, each of the detected plurality of heads may be determined as each human body, and position information of each detected head may be determined as human body position information of each human body. Compared with the feature extraction mode adopting manual design in the related technology, the method adopts the deep convolutional neural network to extract the human head features, avoids the problems of complexity and low efficiency existing in manual design of the features, and improves the scene adaptability.

Further, for the problem that the target recall rate is too low due to the fact that the head target of the dense crowd is small under the view angle of the unmanned aerial vehicle, the target is easy to shake and occupy fewer pixels, in step S1, a channel attention mechanism may be introduced into the deep convolutional neural network, and the weight of each feature channel is determined through the channel attention mechanism; according to the weight of each characteristic channel, carrying out down-sampling processing on the first video stream for multiple times to obtain a multi-scale characteristic graph with channel attention; and performing feature fusion on the multi-scale feature map, and outputting feature information of a plurality of human heads.

The training process of the deep convolutional neural network is as follows: acquiring a second video stream acquired by equipment moving in the air in a downward angle; marking the head of the human body in the second video stream, and dividing the marked video stream into a training set and a test set according to a preset proportion; constructing a deep convolutional neural network, wherein the deep convolutional neural network comprises N down-sampling units, wherein the first M down-sampling units are provided with channel attention units, and N is greater than M; and training the deep convolutional neural network according to the training set and the test set to obtain the trained deep convolutional neural network.

The training process of the deep convolutional neural network will be described below by a preferred embodiment.

In some preferred embodiments, a deep convolutional neural network may be constructed using YOLOv3 (young Only Look Once v3, third edition target detection algorithm) as a base structure, and feature information of feature points is extracted through the YOLOv3 deep convolutional neural network. Fig. 3 is a training flowchart of the YOLOv3 deep convolutional neural network of the present embodiment, and as shown in fig. 3, the training flowchart includes the following steps:

and step S11, acquiring 5000 video frames shot under the visual angle of the unmanned aerial vehicle, labeling the head of the human body, and randomly dividing the labeled video frames into a training set and a test set according to the ratio of 8: 2. Wherein each video frame contains different numbers, postures and bundled human bodies.

Step S12, constructing a YOLOv3 deep convolutional neural network. Fig. 4 is a schematic structural diagram of the YOLOv3 deep convolutional neural network of the present embodiment, and as shown in fig. 4, a channel attention module is provided in the YOLOv3 deep convolutional neural network, and the channel attention module includes: darknencv 2D _2BN _ leakage element, SE _ Res _ unit element, and SE _ Resblock _ body element.

The Darknetconv2D _2BN _ Leaky unit includes: convolutional layers, batch normalization layers, and Leaky relu activation functions.

The SE _ Res _ unit includes: a basic attention-residual unit consisting of 2 DBLs, 1 channel attention unit (SE) and a jumper structure.

The SE _ Resblock _ body unit includes: a filler layer, a convolutional layer, and n SE _ Res _ unit cells.

Wherein DBL stands for the abbreviation of darknenconv 2D _2BN _ leak.

SE _ Res1 represents SE _ resplock _ body (attention-residual block) containing 1 SE _ Res _ unit, SE _ Res2 represents SE _ resplock _ body (attention-residual block) containing 2 SE _ Res _ unit, and SE _ Res8 represents SE _ resplock _ body (attention-residual block) containing 8 SE _ Res _ unit.

Res4 represents a Resblock _ body (residual block) containing 4 Res _ unit cells, and Res8 represents a Resblock _ body (residual block) containing 8 Res _ unit cells.

y1, y2, y3 represent feature levels of three scales of network output, respectively.

zero _ padding represents a zero padding layer.

SE _ resn represents a residual block containing n SE _ Res _ units.

BN represents batch normalization of BatchNormalization.

leak _ relu represents an activation function.

concat represents a feature superposition function.

add represents the jumper function.

SE stands for attention mechanism unit.

Channel attention units are introduced at specific positions of the first 3 downsampling units of a Darknet-53 (the feature extractor of Yolov3, contains 53 convolutional layers) feature extraction Network, for example, SENET (Squeeze-and-Excitation Network) is used as a channel attention unit, wherein Squeeze represents one such operation: feature compression is performed along the spatial dimension, each two-dimensional feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is matched with the number of input feature channels. The specification represents one such operation: it is a mechanism similar to the gate in the recurrent neural network that generates a weight for each eigen-channel by a parameter w that is learned to explicitly model the correlation between eigen-channels.

Step S13, after the video frame is downsampled five times through the dark net-53 Feature extraction network, outputting 3 kinds of scale Feature maps, and outputting a Feature map for direct prediction after Feature fusion through FPN (Feature Pyramid Networks).

And step S14, outputting the characteristic information of each characteristic point according to the characteristic diagram for direct prediction.

In the embodiment, attention mechanism is introduced into the Darknet-53 feature extraction network to increase the weight of the effective feature channel of the small head targets of the dense group and suppress the weight of the non-important features, so that the recall rate of the small head targets at the depression angle is increased. The recall rate increased by more than 20% without a significant drop in detection speed with only a 0.8% increase in the parameter quantity.

Based on the feature points detected by the deep convolutional neural network, in step S2, the feature information of the feature points may be put into a two-dimensional rectangular coordinate system, and clustering may be performed by using a density clustering algorithm (DBSCAN). The density clustering algorithm needs to preset two hyper-parameters, namely a neighborhood radius epsilon and a minimum contained point number MinPts. Fig. 5 is an adaptive density clustering method based on device height according to an embodiment, and as shown in fig. 5, the process includes the following steps:

and step S21, acquiring height information H of the equipment relative to the ground, and determining a neighborhood radius epsilon according to the height information H and the preset minimum contained point MinPts. When MinPts is 10, the relationship between the neighborhood radius epsilon and the height H of the device can be expressed as follows:

step S22, a first feature point cluster which takes the head of a human body as a feature point is obtained, a first feature point which is not accessed in the first feature point cluster is selected, a neighboring feature point in the neighborhood range of the first feature point is determined according to the neighborhood radius and the positions of other feature points except the first feature point, and a second feature point cluster is generated according to the first feature point and the neighboring feature point in the neighborhood range of the first feature point.

Step S23, determining that the feature points in the first feature point cluster and the second feature point cluster belong to the first category, and determining that the feature points outside the first feature point cluster and the second feature point cluster belong to the second category.

In step S24, the human body state corresponding to the feature points of the first category is determined to be a non-isolated state, and the human body state corresponding to the feature points of the second category is determined to be an isolated state.

In the embodiment, a highly adaptive parameter adaptation mode is adopted, the advantages of a density clustering algorithm are fully utilized, and a human body in a non-isolated state and a human body in an isolated state can be effectively distinguished.

In step S22, it is further determined whether the number of neighboring feature points within the first feature point neighborhood range reaches a preset minimum inclusion point number. And under the condition that the number of the adjacent feature points in the neighborhood range of the first feature point is judged to reach the preset minimum contained point number, generating a second feature point cluster according to the first feature point and the adjacent feature points in the neighborhood range of the first feature point, and marking the accessed first feature point. And under the condition that the number of the adjacent characteristic points in the neighborhood range of the first characteristic point is judged not to reach the preset minimum contained point number, determining the first characteristic point as a noise point, and determining that the first characteristic point belongs to the second category.

The first feature point cluster can be acquired by the following method: selecting a first feature point from a plurality of feature points which are not accessed by any feature point, determining adjacent feature points in the neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a first feature point cluster according to the first feature point and the adjacent feature points in the neighborhood range of the first feature point.

In some embodiments, the clusters are expanded by deriving other points within the cluster that are not accessed, via recursive step S22. If the cluster is sufficiently expanded, that is, all the points in the cluster are accessed, then the above steps S21-S24 are repeated to access other points outside the cluster that are not accessed until all the detected feature points of the video frame are included in the accessed set. Referring to fig. 6 for the clustering process, fig. 6 is a schematic diagram of the self-adaptive density clustering result based on the device height in this embodiment, where a small black dot represents a detected feature point, the center of a circle is an accessed feature point, the size of the radius of the circle is a neighborhood radius epsilon, and the directions and connection relations of arrows represent the access sequence and access path of the feature point.

Fig. 7 is a flowchart of the human body distribution detection method based on the unmanned aerial vehicle according to the preferred embodiment, and as shown in fig. 7, the flowchart includes the following steps:

and step S71, the unmanned aerial vehicle patrols and shoots to obtain a video stream.

And step S72, detecting the head of the human body according to the video stream, processing the characteristic points by adopting a density clustering algorithm, distinguishing human body states, and determining the human body in a non-isolated state and the human body in an isolated state.

And step S73, performing area identification according to the video stream, and determining a safe area and an unsafe area.

In step S74, the human body in the non-isolated state and in the safe area is determined as the safe object, and the human body in the isolated state and in the non-safe area is determined as the dangerous object.

And step S75, counting the pedestrian volume of the safety object, and acquiring the position of the dangerous object.

And step S76, integrating the pedestrian volume of the safety object and the position of the dangerous object, and sending the integrated information to the ground command center.

With reference to the human body distribution detection method in the foregoing embodiment, there is also provided an aerial photography device in this embodiment, including: the flying robot comprises a flying robot body and a control system, wherein the flying robot body is provided with a camera and a processor which are connected with each other; the camera is used for shooting human body images; the processor is used for executing the human body distribution detection method of any one of the above embodiments.

The aerial photographing device can effectively detect dense crowds and give an alarm to an outlier individual involvement forbidden zone. The aerial photographing device includes, but is not limited to, an unmanned aerial vehicle having camera and positioning functions, and a radar.

There is also provided in this embodiment an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

and S1, acquiring a first video stream acquired by the equipment moving in the air in a overlooking angle, and carrying out human body detection on the first video stream to obtain human body position information of each human body in a plurality of human bodies.

And S2, classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result, wherein the human body states comprise a non-isolated state and an isolated state.

S3, acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.

It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementations, and details are not described again in this embodiment.

In addition, in combination with the human body distribution detection method provided in the above embodiment, a storage medium may also be provided to implement the method in this embodiment. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the human distribution detection methods in the above embodiments.

It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be derived by a person skilled in the art from the examples provided herein without any inventive step, shall fall within the scope of protection of the present application.

It is obvious that the drawings are only examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application can be applied to other similar cases according to the drawings without creative efforts. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

The term "embodiment" is used herein to mean that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly or implicitly understood by one of ordinary skill in the art that the embodiments described in this application may be combined with other embodiments without conflict.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A human body distribution detection method is characterized by comprising the following steps:

2. The human body distribution detection method of claim 1, wherein the course information includes a corresponding timestamp and a course position, and determining the target area in each video frame in the first video stream according to the course information comprises:

3. The human body distribution detection method according to claim 1, wherein the target areas include a safe area and an unsafe area, and after determining the target areas in the respective video frames in the first video stream according to the route information and determining the human body distribution characteristics in the target areas according to the human body position information of the respective human bodies and the human body states of the respective human bodies, the method further comprises:

4. The human body distribution detection method according to claim 3, wherein a human body whose human body state is determined to be the non-isolated state and whose human body position information belongs to the safety area is a safety object; and after determining that the human body with the human body state being the isolated state and the human body position information belonging to the unsafe area is a dangerous object, the method further comprises:

5. The method of claim 1, wherein performing human body detection on the first video stream to obtain human body position information of each of a plurality of human bodies comprises:

6. The method according to claim 5, wherein the first video stream is input to a trained deep convolutional neural network, and the target detection of the first video stream based on the deep convolutional neural network by targeting at the head of the human body comprises:

7. The method according to claim 5, wherein before inputting the first video stream to a trained deep convolutional neural network and performing target detection on the first video stream based on the deep convolutional neural network, the method further comprises:

8. The method according to claim 1, wherein classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result comprises:

9. The method of claim 8, wherein the steps of obtaining a first feature point cluster, selecting an unvisited first feature point in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point comprise:

10. The method of claim 8, wherein when determining that the number of neighboring feature points within the neighborhood of the first feature point does not reach the predetermined minimum inclusion point number, the method further comprises:

11. The human body distribution detection method according to claim 8, wherein acquiring the first feature point cluster having the human head as a feature point comprises:

12. An aerial device, comprising: the flight device comprises a flight machine body, a control unit and a control unit, wherein the flight machine body is provided with a camera and a processor which are connected with each other; the camera is used for shooting a video stream; the processor is configured to perform the human body distribution detection method of any one of claims 1 to 11.

13. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to perform the human body distribution detection method of any one of claims 1 to 11.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the human distribution detection method of any one of claims 1 to 11.