CN113673360A - Human body distribution detection method, aerial photography device, electronic device, and storage medium - Google Patents

Human body distribution detection method, aerial photography device, electronic device, and storage medium Download PDF

Info

Publication number
CN113673360A
CN113673360A CN202110855695.1A CN202110855695A CN113673360A CN 113673360 A CN113673360 A CN 113673360A CN 202110855695 A CN202110855695 A CN 202110855695A CN 113673360 A CN113673360 A CN 113673360A
Authority
CN
China
Prior art keywords
human body
feature point
determining
video stream
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110855695.1A
Other languages
Chinese (zh)
Inventor
宋子昂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110855695.1A priority Critical patent/CN113673360A/en
Publication of CN113673360A publication Critical patent/CN113673360A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a human body distribution detection method, aerial photography equipment, an electronic device and a storage medium, wherein human body detection is carried out on a first video stream acquired by equipment moving in the air at an overlooking angle to obtain human body position information of each of a plurality of human bodies; classifying each human body according to the human body position information of each human body, and determining the human body state of each human body according to the classification result, wherein the human body state comprises a non-isolated state and an isolated state; the method comprises the steps of obtaining route information of equipment moving in the air, determining a target area in each video frame in a first video stream according to the route information, and determining human body distribution characteristics in the target area according to human body position information of each human body and human body states of each human body, so that the problem of low accuracy of the human body distribution characteristics obtained based on overlook angle detection is solved, and the accuracy of detecting the human body distribution characteristics is improved.

Description

Human body distribution detection method, aerial photography device, electronic device, and storage medium
Technical Field
The present application relates to the field of image processing, and in particular, to a human body distribution detection method, an aerial photography device, an electronic apparatus, and a storage medium.
Background
Dense crowd detection is to carry out detection and count the number of small head targets of a dense human body in a visual mode by an aerial robot.
The related art provides a method for detecting dense crowd based on an unmanned aerial vehicle, wherein the unmanned aerial vehicle carrying an embedded GPU (graphic processing unit) computing module replaces a fixed server and fixed monitoring equipment; and calculating the crowd density distribution map by using a deep learning algorithm to calculate the crowd number. The disadvantages are: the mode of the crowd density distribution map is adopted to calculate the crowd quantity, the image shooting height and the image shooting angle are required to be stable, and the mode can be realized under the condition that the unmanned aerial vehicle is matched with a high-quality holder when hovering, but the method sacrifices the characteristic that the unmanned aerial vehicle flexibly moves, and the mode of the density distribution map cannot specifically count individual information, so that the difference between the operation result and the actual condition is large easily caused.
Aiming at the problem that the accuracy of pedestrian distribution characteristics obtained based on overlook angle detection in the related art is low, no effective solution is provided at present.
Disclosure of Invention
The embodiment provides a human body distribution detection method, an aerial photography device, an electronic device and a storage medium, so as to solve the problem that the accuracy of human body distribution characteristics obtained based on overlooking angle detection in the related art is low.
In a first aspect, in this embodiment, a human body distribution detection method is provided, including:
acquiring a first video stream acquired by equipment moving in the air at an overlooking angle, and carrying out human body detection on the first video stream to obtain human body position information of each human body in a plurality of human bodies;
classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result, wherein the human body states comprise a non-isolated state and an isolated state;
acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.
In some of these embodiments, the route information includes a corresponding timestamp and route location, and determining the target region in each video frame in the first video stream from the route information includes:
dividing the first video stream according to the timestamps to obtain each video frame;
and determining a geographic position corresponding to the target area according to the route position, and mapping the geographic position to each video frame to obtain the target area in each video frame.
In some embodiments, the target area includes a safe area and an unsafe area, and after determining the target area in each video frame in the first video stream according to the route information and determining the human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body, the method further includes:
determining that the human body with the human body state as the non-isolated state and the human body position information belonging to the safety region is a safety object; and the number of the first and second groups,
and determining that the human body with the human body state as the isolated state and the human body position information belonging to the unsafe area is a dangerous object.
In some embodiments, the human body whose human body state is determined to be the non-isolated state and whose human body position information belongs to the safety region is a safety object; and after determining that the human body with the human body state being the isolated state and the human body position information belonging to the unsafe area is a dangerous object, the method further comprises:
determining the number of the safety objects and acquiring the position information of the dangerous objects;
and reporting the number of the safety objects and the position information of the dangerous objects.
In some embodiments, performing human body detection on the first video stream to obtain human body position information of each of a plurality of human bodies includes:
inputting the first video stream to a trained deep convolutional neural network, performing target detection on the first video stream with the heads of the human bodies as targets based on the deep convolutional neural network, determining each head of the detected multiple heads as each human body, and determining the position information of each detected head as the human body position information of each human body.
In some embodiments, inputting the first video stream to a trained deep convolutional neural network, and performing target detection on the first video stream based on the deep convolutional neural network by using a human head as a target comprises:
introducing a channel attention mechanism in the deep convolutional neural network, and determining the weight of each characteristic channel through the channel attention mechanism;
according to the weight of each characteristic channel, carrying out down-sampling processing on the first video stream for multiple times to obtain a multi-scale characteristic map with channel attention;
and performing feature fusion on the multi-scale feature map, and outputting feature information of a plurality of human heads.
In some embodiments, before inputting the first video stream to a trained deep convolutional neural network, and performing target detection on the first video stream based on the deep convolutional neural network by using the human head as a target, the method further includes:
acquiring a second video stream acquired by equipment moving in the air in a downward angle;
marking the head of the human body in the second video stream, and dividing the marked video stream into a training set and a test set according to a preset proportion;
constructing a deep convolutional neural network, wherein the deep convolutional neural network comprises N downsampling units, channel attention units are arranged in the first M downsampling units, and N is greater than M;
and training the deep convolutional neural network according to the training set and the test set to obtain the trained deep convolutional neural network.
In some embodiments, classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result includes:
acquiring height information of the equipment relative to the ground, and determining a neighborhood radius according to the height information and a preset minimum contained point number;
acquiring a first feature point cluster which takes a human head as a feature point, selecting a first feature point which is not accessed in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point;
determining that the feature points in the first feature point cluster and the second feature point cluster belong to a first category, and determining that the feature points outside the first feature point cluster and the second feature point cluster belong to a second category;
determining the human body state corresponding to the feature points of the first category as the non-isolated state, and determining the human body state corresponding to the feature points of the second category as the isolated state.
In some embodiments, obtaining a first feature point cluster, selecting an unvisited first feature point in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point includes:
judging whether the number of adjacent feature points in the neighborhood range of the first feature point reaches the preset minimum contained point number or not;
and under the condition that the number of the adjacent feature points in the first feature point neighborhood range reaches the preset minimum contained point number, generating the second feature point cluster according to the first feature point and the adjacent feature points in the first feature point neighborhood range, and marking the accessed first feature point.
In some embodiments, in the case that it is determined that the number of neighboring feature points within the first feature point neighborhood range does not reach the preset minimum inclusion point number, the method further includes:
and determining that the first characteristic point is a noise point, and determining that the first characteristic point belongs to the second category.
In some embodiments, obtaining a first feature point cluster having a human head as a feature point comprises:
selecting a first characteristic point from the plurality of characteristic points which are not accessed by any characteristic point;
determining adjacent feature points in the neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point;
and generating the first characteristic point cluster according to the first characteristic point and the adjacent characteristic points in the neighborhood range of the first characteristic point.
In a second aspect, the present embodiment provides an aerial photographing apparatus, including a flying body carrying a camera and a processor connected to each other; the camera is used for shooting a video stream; the processor is configured to execute the human body distribution detection method according to the first aspect.
In a third aspect, in this embodiment, there is provided an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the human body distribution detection method according to the first aspect is implemented.
In a fourth aspect, in the present embodiment, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the human body distribution detection method described in the first aspect above.
Compared with the related art, the human body distribution detection method, the aerial photography equipment, the electronic device and the storage medium provided in the embodiment perform human body detection on the first video stream by acquiring the first video stream acquired by the equipment moving in the air at an overlooking angle to obtain human body position information of each of a plurality of human bodies; classifying each human body according to the human body position information of each human body, and determining the human body state of each human body according to the classification result, wherein the human body state comprises a non-isolated state and an isolated state; the method comprises the steps of obtaining route information of equipment moving in the air, determining a target area in each video frame in a first video stream according to the route information, and determining human body distribution characteristics in the target area according to human body position information of each human body and human body states of each human body, so that the problem of low accuracy of the human body distribution characteristics obtained based on overlook angle detection is solved, and the accuracy of detecting the human body distribution characteristics is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a terminal of the human body distribution detection method of the present embodiment;
fig. 2 is a flowchart of a human body distribution detection method of the present embodiment;
FIG. 3 is a training flow chart of the YOLOv3 deep convolutional neural network of the present embodiment;
FIG. 4 is a schematic structural diagram of a YOLOv3 deep convolutional neural network of the present embodiment;
FIG. 5 is an adaptive density clustering method based on device height of an embodiment;
FIG. 6 is a diagram illustrating the adaptive density clustering result based on the device height according to this embodiment;
fig. 7 is a flowchart of the human body distribution detection method based on the unmanned aerial vehicle according to the preferred embodiment.
Detailed Description
For a clearer understanding of the objects, aspects and advantages of the present application, reference is made to the following description and accompanying drawings.
Unless defined otherwise, technical or scientific terms used herein shall have the same general meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "a" and "an" and "the" and similar referents in the context of this application do not denote a limitation of quantity, either in the singular or the plural. The terms "comprises," "comprising," "has," "having," and any variations thereof, as referred to in this application, are intended to cover non-exclusive inclusions; for example, a process, method, and system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or modules, but may include other steps or modules (elements) not listed or inherent to such process, method, article, or apparatus. Reference throughout this application to "connected," "coupled," and the like is not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference to "a plurality" in this application means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. In general, the character "/" indicates a relationship in which the objects associated before and after are an "or". The terms "first," "second," "third," and the like in this application are used for distinguishing between similar items and not necessarily for describing a particular sequential or chronological order.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or a similar computing device. For example, the method is executed on a terminal, and fig. 1 is a block diagram of a hardware structure of the terminal of the human body distribution detection method according to the embodiment. As shown in fig. 1, the terminal may include one or more processors 102 (only one shown in fig. 1) and a memory 104 for storing data, wherein the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those of ordinary skill in the art that the structure shown in fig. 1 is merely an illustration and is not intended to limit the structure of the terminal described above. For example, the terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the human body distribution detection method in the embodiment, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network described above includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In this embodiment, a human body distribution detection method is provided, and fig. 2 is a flowchart of the human body distribution detection method of this embodiment, as shown in fig. 2, the flowchart includes the following steps:
step S1, acquiring a first video stream acquired by the aerial moving device from a top view angle, and performing human body detection on the first video stream to obtain human body position information of each of a plurality of human bodies.
The aerial mobile device may be an aerial device such as a drone, a radar, which shoots the land and/or the river in a top view while flying in the air, resulting in a first video stream.
When detecting the human body in the first video stream, the human body may be detected by using a manual feature design method, or by using a deep network feature learning method, which is not limited in this embodiment. Where the human body position information is the position relative to the individual video frames in the video stream, not the actual geographical position.
And step S2, classifying each human body according to the human body position information of each human body, and determining the human body state of each human body according to the classification result, wherein the human body state comprises a non-isolated state and an isolated state.
In this embodiment, when the distance between the plurality of human bodies in a group is smaller than a first threshold and/or the number of the plurality of human bodies is larger than a second threshold, the human body state of each human body in the group is defined as a non-isolated state, and the human body state of each human body not in the group is defined as an isolated state.
In specific implementation, a certain part of a human body can be used as a feature point, a Density-Based Spatial Clustering of Applications with Noise (DBSCAN for short) is adopted to cluster the feature points, the feature points capable of forming clusters with other feature points are classified as non-isolated feature points, and the feature points incapable of forming clusters with other feature points are classified as isolated feature points. The feature points that can form clusters with each other satisfy a preset condition, for example, the distance between the feature points is smaller than a first threshold, and/or the number of feature points belonging to the same cluster is larger than a second threshold.
Step S3, acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.
The route information is information for guiding equipment to patrol, and the route information comprises a corresponding timestamp and a route position, namely the flight time and the flight path of the equipment are planned. In some embodiments, the first video stream may be divided according to the timestamp to obtain each video frame, a geographic position corresponding to the target area is determined according to the airline position, and the geographic position is mapped to each video frame, that is, the geographic position of the target area is converted into a position relative to the video frame to obtain the target area in each video frame. Under the condition that the geographic position of the target area is known, the target area of each video frame can be updated in real time through route information, whether the human body is in the target area can be known by judging the position relation between the human body and the target area, and the human body distribution characteristics in the target area can be obtained based on the current target area and the state of the human body.
The human body distribution detection method can be applied to monitoring scenes of some traffic road sections or scenic spots, and provides technical support for treatment work of related management departments.
Compared with the mode of detecting crowds by adopting a density distribution diagram in the related art, the method distinguishes the human body in a non-isolated state and the human body in an isolated state by detecting the state of the human body, can count the individual information, and reduces the error between the operation result and the actual condition. And the target area in the video frame is updated according to the route information, the human body distribution characteristics in the target area are obtained based on the current target area and the state of the human body, the human body real-time distribution condition of the target area is obtained through the detection of the two dimensional information, and the real-time performance of the human body distribution characteristics is improved. Through the steps, the problem of low accuracy of the human body distribution characteristics obtained based on overlooking angle detection is solved, and the accuracy of detecting the human body distribution characteristics is improved.
In some embodiments, the target area may be divided into a plurality of areas, such as a safe area and an unsafe area, according to actual needs, and the distribution characteristics of the human body in the target area may be divided into the following cases: a non-isolated state human body exists in the safety area; a human body in an isolated state exists in the safety area; a non-isolated state human body exists in the non-safety area; isolated state human bodies exist in the unsafe zone.
After the human body distribution characteristics in the target area are determined, determining that the human body with the human body state being in a non-isolated state and the human body position information belonging to the safe area is a safe object; and determining that the human body with the human body state being an isolated state and the human body position information belonging to the unsafe area is a dangerous object.
Further, after the human body distribution characteristics are obtained, the number of the safety objects and the position information of the dangerous objects are reported by determining the number of the safety objects and acquiring the position information of the dangerous objects.
When the number of the safe objects and the position information of the dangerous objects are reported, the current position can be obtained through a Global Positioning System (GPS) of the equipment, the current position is reported to a user as the position information of the dangerous objects, the user can be a server, such as a ground command center, or a portable equipment, such as a smart phone, a person on duty can know the human body distribution characteristics of a target area through the ground command center or the smart phone, and then the dangerous objects are driven away from an unsafe area by the person on duty, so that the man-machine coordination capacity is improved through a combined mode of 'aerial equipment inspection and ground person driving', and the pedestrian monitoring and management capacity of a traffic road section and a scenic spot is improved.
In some preferred embodiments, can adopt many rotor unmanned aerial vehicle with predetermineeing the airline flight, many rotor unmanned aerial vehicle patrol and examine the shooting to intensive crowd gathering highway section. According to the dividing characteristics of the road sections, the movable area of the human body is divided into a safe area, and the non-movable area of the human body is divided into a non-safe area. For video streams shot by the multi-rotor unmanned aerial vehicle, coordinate information of boundary lines and key areas in each frame of video images in the video streams is determined according to route information, and the dividing ranges of safe areas and unsafe areas in the video images are continuously updated to obtain real-time human body distribution characteristics.
In step S1, the first video stream may be input to the trained deep convolutional neural network, target detection targeting the head of the human body may be performed on the first video stream based on the deep convolutional neural network, each of the detected plurality of heads may be determined as each human body, and position information of each detected head may be determined as human body position information of each human body. Compared with the feature extraction mode adopting manual design in the related technology, the method adopts the deep convolutional neural network to extract the human head features, avoids the problems of complexity and low efficiency existing in manual design of the features, and improves the scene adaptability.
Further, for the problem that the target recall rate is too low due to the fact that the head target of the dense crowd is small under the view angle of the unmanned aerial vehicle, the target is easy to shake and occupy fewer pixels, in step S1, a channel attention mechanism may be introduced into the deep convolutional neural network, and the weight of each feature channel is determined through the channel attention mechanism; according to the weight of each characteristic channel, carrying out down-sampling processing on the first video stream for multiple times to obtain a multi-scale characteristic graph with channel attention; and performing feature fusion on the multi-scale feature map, and outputting feature information of a plurality of human heads.
The training process of the deep convolutional neural network is as follows: acquiring a second video stream acquired by equipment moving in the air in a downward angle; marking the head of the human body in the second video stream, and dividing the marked video stream into a training set and a test set according to a preset proportion; constructing a deep convolutional neural network, wherein the deep convolutional neural network comprises N down-sampling units, wherein the first M down-sampling units are provided with channel attention units, and N is greater than M; and training the deep convolutional neural network according to the training set and the test set to obtain the trained deep convolutional neural network.
The training process of the deep convolutional neural network will be described below by a preferred embodiment.
In some preferred embodiments, a deep convolutional neural network may be constructed using YOLOv3 (young Only Look Once v3, third edition target detection algorithm) as a base structure, and feature information of feature points is extracted through the YOLOv3 deep convolutional neural network. Fig. 3 is a training flowchart of the YOLOv3 deep convolutional neural network of the present embodiment, and as shown in fig. 3, the training flowchart includes the following steps:
and step S11, acquiring 5000 video frames shot under the visual angle of the unmanned aerial vehicle, labeling the head of the human body, and randomly dividing the labeled video frames into a training set and a test set according to the ratio of 8: 2. Wherein each video frame contains different numbers, postures and bundled human bodies.
Step S12, constructing a YOLOv3 deep convolutional neural network. Fig. 4 is a schematic structural diagram of the YOLOv3 deep convolutional neural network of the present embodiment, and as shown in fig. 4, a channel attention module is provided in the YOLOv3 deep convolutional neural network, and the channel attention module includes: darknencv 2D _2BN _ leakage element, SE _ Res _ unit element, and SE _ Resblock _ body element.
The Darknetconv2D _2BN _ Leaky unit includes: convolutional layers, batch normalization layers, and Leaky relu activation functions.
The SE _ Res _ unit includes: a basic attention-residual unit consisting of 2 DBLs, 1 channel attention unit (SE) and a jumper structure.
The SE _ Resblock _ body unit includes: a filler layer, a convolutional layer, and n SE _ Res _ unit cells.
Wherein DBL stands for the abbreviation of darknenconv 2D _2BN _ leak.
SE _ Res1 represents SE _ resplock _ body (attention-residual block) containing 1 SE _ Res _ unit, SE _ Res2 represents SE _ resplock _ body (attention-residual block) containing 2 SE _ Res _ unit, and SE _ Res8 represents SE _ resplock _ body (attention-residual block) containing 8 SE _ Res _ unit.
Res4 represents a Resblock _ body (residual block) containing 4 Res _ unit cells, and Res8 represents a Resblock _ body (residual block) containing 8 Res _ unit cells.
y1, y2, y3 represent feature levels of three scales of network output, respectively.
zero _ padding represents a zero padding layer.
SE _ resn represents a residual block containing n SE _ Res _ units.
BN represents batch normalization of BatchNormalization.
leak _ relu represents an activation function.
concat represents a feature superposition function.
add represents the jumper function.
SE stands for attention mechanism unit.
Channel attention units are introduced at specific positions of the first 3 downsampling units of a Darknet-53 (the feature extractor of Yolov3, contains 53 convolutional layers) feature extraction Network, for example, SENET (Squeeze-and-Excitation Network) is used as a channel attention unit, wherein Squeeze represents one such operation: feature compression is performed along the spatial dimension, each two-dimensional feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is matched with the number of input feature channels. The specification represents one such operation: it is a mechanism similar to the gate in the recurrent neural network that generates a weight for each eigen-channel by a parameter w that is learned to explicitly model the correlation between eigen-channels.
Step S13, after the video frame is downsampled five times through the dark net-53 Feature extraction network, outputting 3 kinds of scale Feature maps, and outputting a Feature map for direct prediction after Feature fusion through FPN (Feature Pyramid Networks).
And step S14, outputting the characteristic information of each characteristic point according to the characteristic diagram for direct prediction.
In the embodiment, attention mechanism is introduced into the Darknet-53 feature extraction network to increase the weight of the effective feature channel of the small head targets of the dense group and suppress the weight of the non-important features, so that the recall rate of the small head targets at the depression angle is increased. The recall rate increased by more than 20% without a significant drop in detection speed with only a 0.8% increase in the parameter quantity.
Based on the feature points detected by the deep convolutional neural network, in step S2, the feature information of the feature points may be put into a two-dimensional rectangular coordinate system, and clustering may be performed by using a density clustering algorithm (DBSCAN). The density clustering algorithm needs to preset two hyper-parameters, namely a neighborhood radius epsilon and a minimum contained point number MinPts. Fig. 5 is an adaptive density clustering method based on device height according to an embodiment, and as shown in fig. 5, the process includes the following steps:
and step S21, acquiring height information H of the equipment relative to the ground, and determining a neighborhood radius epsilon according to the height information H and the preset minimum contained point MinPts. When MinPts is 10, the relationship between the neighborhood radius epsilon and the height H of the device can be expressed as follows:
Figure BDA0003184071320000101
step S22, a first feature point cluster which takes the head of a human body as a feature point is obtained, a first feature point which is not accessed in the first feature point cluster is selected, a neighboring feature point in the neighborhood range of the first feature point is determined according to the neighborhood radius and the positions of other feature points except the first feature point, and a second feature point cluster is generated according to the first feature point and the neighboring feature point in the neighborhood range of the first feature point.
Step S23, determining that the feature points in the first feature point cluster and the second feature point cluster belong to the first category, and determining that the feature points outside the first feature point cluster and the second feature point cluster belong to the second category.
In step S24, the human body state corresponding to the feature points of the first category is determined to be a non-isolated state, and the human body state corresponding to the feature points of the second category is determined to be an isolated state.
In the embodiment, a highly adaptive parameter adaptation mode is adopted, the advantages of a density clustering algorithm are fully utilized, and a human body in a non-isolated state and a human body in an isolated state can be effectively distinguished.
In step S22, it is further determined whether the number of neighboring feature points within the first feature point neighborhood range reaches a preset minimum inclusion point number. And under the condition that the number of the adjacent feature points in the neighborhood range of the first feature point is judged to reach the preset minimum contained point number, generating a second feature point cluster according to the first feature point and the adjacent feature points in the neighborhood range of the first feature point, and marking the accessed first feature point. And under the condition that the number of the adjacent characteristic points in the neighborhood range of the first characteristic point is judged not to reach the preset minimum contained point number, determining the first characteristic point as a noise point, and determining that the first characteristic point belongs to the second category.
The first feature point cluster can be acquired by the following method: selecting a first feature point from a plurality of feature points which are not accessed by any feature point, determining adjacent feature points in the neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a first feature point cluster according to the first feature point and the adjacent feature points in the neighborhood range of the first feature point.
In some embodiments, the clusters are expanded by deriving other points within the cluster that are not accessed, via recursive step S22. If the cluster is sufficiently expanded, that is, all the points in the cluster are accessed, then the above steps S21-S24 are repeated to access other points outside the cluster that are not accessed until all the detected feature points of the video frame are included in the accessed set. Referring to fig. 6 for the clustering process, fig. 6 is a schematic diagram of the self-adaptive density clustering result based on the device height in this embodiment, where a small black dot represents a detected feature point, the center of a circle is an accessed feature point, the size of the radius of the circle is a neighborhood radius epsilon, and the directions and connection relations of arrows represent the access sequence and access path of the feature point.
Fig. 7 is a flowchart of the human body distribution detection method based on the unmanned aerial vehicle according to the preferred embodiment, and as shown in fig. 7, the flowchart includes the following steps:
and step S71, the unmanned aerial vehicle patrols and shoots to obtain a video stream.
And step S72, detecting the head of the human body according to the video stream, processing the characteristic points by adopting a density clustering algorithm, distinguishing human body states, and determining the human body in a non-isolated state and the human body in an isolated state.
And step S73, performing area identification according to the video stream, and determining a safe area and an unsafe area.
In step S74, the human body in the non-isolated state and in the safe area is determined as the safe object, and the human body in the isolated state and in the non-safe area is determined as the dangerous object.
And step S75, counting the pedestrian volume of the safety object, and acquiring the position of the dangerous object.
And step S76, integrating the pedestrian volume of the safety object and the position of the dangerous object, and sending the integrated information to the ground command center.
With reference to the human body distribution detection method in the foregoing embodiment, there is also provided an aerial photography device in this embodiment, including: the flying robot comprises a flying robot body and a control system, wherein the flying robot body is provided with a camera and a processor which are connected with each other; the camera is used for shooting human body images; the processor is used for executing the human body distribution detection method of any one of the above embodiments.
The aerial photographing device can effectively detect dense crowds and give an alarm to an outlier individual involvement forbidden zone. The aerial photographing device includes, but is not limited to, an unmanned aerial vehicle having camera and positioning functions, and a radar.
There is also provided in this embodiment an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
and S1, acquiring a first video stream acquired by the equipment moving in the air in a overlooking angle, and carrying out human body detection on the first video stream to obtain human body position information of each human body in a plurality of human bodies.
And S2, classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result, wherein the human body states comprise a non-isolated state and an isolated state.
S3, acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiments and optional implementations, and details are not described again in this embodiment.
In addition, in combination with the human body distribution detection method provided in the above embodiment, a storage medium may also be provided to implement the method in this embodiment. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the human distribution detection methods in the above embodiments.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be derived by a person skilled in the art from the examples provided herein without any inventive step, shall fall within the scope of protection of the present application.
It is obvious that the drawings are only examples or embodiments of the present application, and it is obvious to those skilled in the art that the present application can be applied to other similar cases according to the drawings without creative efforts. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
The term "embodiment" is used herein to mean that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly or implicitly understood by one of ordinary skill in the art that the embodiments described in this application may be combined with other embodiments without conflict.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the patent protection. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (14)

1. A human body distribution detection method is characterized by comprising the following steps:
acquiring a first video stream acquired by equipment moving in the air at an overlooking angle, and carrying out human body detection on the first video stream to obtain human body position information of each human body in a plurality of human bodies;
classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result, wherein the human body states comprise a non-isolated state and an isolated state;
acquiring route information of the equipment moving in the air, determining a target area in each video frame in the first video stream according to the route information, and determining human body distribution characteristics in the target area according to the human body position information of each human body and the human body state of each human body.
2. The human body distribution detection method of claim 1, wherein the course information includes a corresponding timestamp and a course position, and determining the target area in each video frame in the first video stream according to the course information comprises:
dividing the first video stream according to the timestamps to obtain each video frame;
and determining a geographic position corresponding to the target area according to the route position, and mapping the geographic position to each video frame to obtain the target area in each video frame.
3. The human body distribution detection method according to claim 1, wherein the target areas include a safe area and an unsafe area, and after determining the target areas in the respective video frames in the first video stream according to the route information and determining the human body distribution characteristics in the target areas according to the human body position information of the respective human bodies and the human body states of the respective human bodies, the method further comprises:
determining that the human body with the human body state as the non-isolated state and the human body position information belonging to the safety region is a safety object; and the number of the first and second groups,
and determining that the human body with the human body state as the isolated state and the human body position information belonging to the unsafe area is a dangerous object.
4. The human body distribution detection method according to claim 3, wherein a human body whose human body state is determined to be the non-isolated state and whose human body position information belongs to the safety area is a safety object; and after determining that the human body with the human body state being the isolated state and the human body position information belonging to the unsafe area is a dangerous object, the method further comprises:
determining the number of the safety objects and acquiring the position information of the dangerous objects;
and reporting the number of the safety objects and the position information of the dangerous objects.
5. The method of claim 1, wherein performing human body detection on the first video stream to obtain human body position information of each of a plurality of human bodies comprises:
inputting the first video stream to a trained deep convolutional neural network, performing target detection on the first video stream with the heads of the human bodies as targets based on the deep convolutional neural network, determining each head of the detected multiple heads as each human body, and determining the position information of each detected head as the human body position information of each human body.
6. The method according to claim 5, wherein the first video stream is input to a trained deep convolutional neural network, and the target detection of the first video stream based on the deep convolutional neural network by targeting at the head of the human body comprises:
introducing a channel attention mechanism in the deep convolutional neural network, and determining the weight of each characteristic channel through the channel attention mechanism;
according to the weight of each characteristic channel, carrying out down-sampling processing on the first video stream for multiple times to obtain a multi-scale characteristic map with channel attention;
and performing feature fusion on the multi-scale feature map, and outputting feature information of a plurality of human heads.
7. The method according to claim 5, wherein before inputting the first video stream to a trained deep convolutional neural network and performing target detection on the first video stream based on the deep convolutional neural network, the method further comprises:
acquiring a second video stream acquired by equipment moving in the air in a downward angle;
marking the head of the human body in the second video stream, and dividing the marked video stream into a training set and a test set according to a preset proportion;
constructing a deep convolutional neural network, wherein the deep convolutional neural network comprises N downsampling units, channel attention units are arranged in the first M downsampling units, and N is greater than M;
and training the deep convolutional neural network according to the training set and the test set to obtain the trained deep convolutional neural network.
8. The method according to claim 1, wherein classifying the human bodies according to the human body position information of the human bodies, and determining the human body states of the human bodies according to the classification result comprises:
acquiring height information of the equipment relative to the ground, and determining a neighborhood radius according to the height information and a preset minimum contained point number;
acquiring a first feature point cluster which takes a human head as a feature point, selecting a first feature point which is not accessed in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point;
determining that the feature points in the first feature point cluster and the second feature point cluster belong to a first category, and determining that the feature points outside the first feature point cluster and the second feature point cluster belong to a second category;
determining the human body state corresponding to the feature points of the first category as the non-isolated state, and determining the human body state corresponding to the feature points of the second category as the isolated state.
9. The method of claim 8, wherein the steps of obtaining a first feature point cluster, selecting an unvisited first feature point in the first feature point cluster, determining a neighboring feature point within a neighborhood range of the first feature point according to the neighborhood radius and positions of other feature points except the first feature point, and generating a second feature point cluster according to the first feature point and the neighboring feature point within the neighborhood range of the first feature point comprise:
judging whether the number of adjacent feature points in the neighborhood range of the first feature point reaches the preset minimum contained point number or not;
and under the condition that the number of the adjacent feature points in the first feature point neighborhood range reaches the preset minimum contained point number, generating the second feature point cluster according to the first feature point and the adjacent feature points in the first feature point neighborhood range, and marking the accessed first feature point.
10. The method of claim 8, wherein when determining that the number of neighboring feature points within the neighborhood of the first feature point does not reach the predetermined minimum inclusion point number, the method further comprises:
and determining that the first characteristic point is a noise point, and determining that the first characteristic point belongs to the second category.
11. The human body distribution detection method according to claim 8, wherein acquiring the first feature point cluster having the human head as a feature point comprises:
selecting a first characteristic point from the plurality of characteristic points which are not accessed by any characteristic point;
determining adjacent feature points in the neighborhood range of the first feature point according to the neighborhood radius and the positions of other feature points except the first feature point;
and generating the first characteristic point cluster according to the first characteristic point and the adjacent characteristic points in the neighborhood range of the first characteristic point.
12. An aerial device, comprising: the flight device comprises a flight machine body, a control unit and a control unit, wherein the flight machine body is provided with a camera and a processor which are connected with each other; the camera is used for shooting a video stream; the processor is configured to perform the human body distribution detection method of any one of claims 1 to 11.
13. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to perform the human body distribution detection method of any one of claims 1 to 11.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the human distribution detection method of any one of claims 1 to 11.
CN202110855695.1A 2021-07-28 2021-07-28 Human body distribution detection method, aerial photography device, electronic device, and storage medium Pending CN113673360A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110855695.1A CN113673360A (en) 2021-07-28 2021-07-28 Human body distribution detection method, aerial photography device, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110855695.1A CN113673360A (en) 2021-07-28 2021-07-28 Human body distribution detection method, aerial photography device, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
CN113673360A true CN113673360A (en) 2021-11-19

Family

ID=78540418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110855695.1A Pending CN113673360A (en) 2021-07-28 2021-07-28 Human body distribution detection method, aerial photography device, electronic device, and storage medium

Country Status (1)

Country Link
CN (1) CN113673360A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100254614A1 (en) * 2009-04-01 2010-10-07 Microsoft Corporation Clustering videos by location
WO2011010598A1 (en) * 2009-07-23 2011-01-27 オリンパス株式会社 Image processing device, image processing program and image processing method
CN102364944A (en) * 2011-11-22 2012-02-29 电子科技大学 Video monitoring method for preventing gathering of people
WO2012058024A1 (en) * 2010-10-28 2012-05-03 Eastman Kodak Company Method of locating nearby picture hotspots
US20120233000A1 (en) * 2011-03-07 2012-09-13 Jon Fisher Systems and methods for analytic data gathering from image providers at an event or geographic location
CN103903276A (en) * 2014-04-23 2014-07-02 吉林大学 Driver fixation point clustering method based on density clustering method and morphology clustering method
CN109618134A (en) * 2018-12-10 2019-04-12 北京智汇云舟科技有限公司 A kind of unmanned plane dynamic video three-dimensional geographic information real time fusion system and method
EP3525131A1 (en) * 2018-02-09 2019-08-14 Bayerische Motoren Werke Aktiengesellschaft Methods and apparatuses for object detection in a scene represented by depth data of a range detection sensor and image data of a camera
CN110458854A (en) * 2018-05-02 2019-11-15 北京图森未来科技有限公司 A kind of road edge detection method and device
CN111191637A (en) * 2020-02-26 2020-05-22 电子科技大学中山学院 Crowd concentration detection and presentation method based on unmanned aerial vehicle video acquisition
CN111339945A (en) * 2020-02-26 2020-06-26 贵州安防工程技术研究中心有限公司 Video-based people group and scatter inspection method and system
US20210182604A1 (en) * 2017-07-05 2021-06-17 Perceptive Automata, Inc. System and method of predicting human interaction with vehicles
CN112986982A (en) * 2021-05-12 2021-06-18 长沙万为机器人有限公司 Environment map reference positioning method and device and mobile robot

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100254614A1 (en) * 2009-04-01 2010-10-07 Microsoft Corporation Clustering videos by location
WO2011010598A1 (en) * 2009-07-23 2011-01-27 オリンパス株式会社 Image processing device, image processing program and image processing method
WO2012058024A1 (en) * 2010-10-28 2012-05-03 Eastman Kodak Company Method of locating nearby picture hotspots
US20120233000A1 (en) * 2011-03-07 2012-09-13 Jon Fisher Systems and methods for analytic data gathering from image providers at an event or geographic location
CN102364944A (en) * 2011-11-22 2012-02-29 电子科技大学 Video monitoring method for preventing gathering of people
CN103903276A (en) * 2014-04-23 2014-07-02 吉林大学 Driver fixation point clustering method based on density clustering method and morphology clustering method
US20210182604A1 (en) * 2017-07-05 2021-06-17 Perceptive Automata, Inc. System and method of predicting human interaction with vehicles
EP3525131A1 (en) * 2018-02-09 2019-08-14 Bayerische Motoren Werke Aktiengesellschaft Methods and apparatuses for object detection in a scene represented by depth data of a range detection sensor and image data of a camera
CN110458854A (en) * 2018-05-02 2019-11-15 北京图森未来科技有限公司 A kind of road edge detection method and device
CN109618134A (en) * 2018-12-10 2019-04-12 北京智汇云舟科技有限公司 A kind of unmanned plane dynamic video three-dimensional geographic information real time fusion system and method
CN111339945A (en) * 2020-02-26 2020-06-26 贵州安防工程技术研究中心有限公司 Video-based people group and scatter inspection method and system
CN111191637A (en) * 2020-02-26 2020-05-22 电子科技大学中山学院 Crowd concentration detection and presentation method based on unmanned aerial vehicle video acquisition
CN112986982A (en) * 2021-05-12 2021-06-18 长沙万为机器人有限公司 Environment map reference positioning method and device and mobile robot

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵爽;黄怀玉;胡一鸣;娄小平;王欣刚;: "基于深度学习的无人机航拍车辆检测", 计算机应用, no. 2, 30 December 2019 (2019-12-30) *

Similar Documents

Publication Publication Date Title
Chen et al. An edge traffic flow detection scheme based on deep learning in an intelligent transportation system
Wu et al. Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey
US11941887B2 (en) Scenario recreation through object detection and 3D visualization in a multi-sensor environment
EP3766044B1 (en) Three-dimensional environment modeling based on a multicamera convolver system
CN111295689B (en) Depth aware object counting
WO2020052678A1 (en) Method and system for generating synthetic point cloud data using a generative model
CN103609178B (en) The identification of place auxiliary
CN102834843B (en) Method and apparatus for face detection
CN112070807B (en) Multi-target tracking method and electronic device
CN109643489A (en) Three-dimensional information processing method and three-dimensional information processing unit
CN110533700A (en) Method for tracing object and device, storage medium and electronic device
CN109063549A (en) High-resolution based on deep neural network is taken photo by plane video moving object detection method
CN110087041B (en) Video data processing and transmitting method and system based on 5G base station
CN110555378A (en) Live video-based weather prediction method and system and weather prediction device
CN111899279A (en) Method and device for detecting motion speed of target object
Huang et al. V2X cooperative perception for autonomous driving: Recent advances and challenges
CN110738076A (en) People counting method and system in images
CN112907972B (en) Road vehicle flow detection method and system based on unmanned aerial vehicle and computer readable storage medium
CN112668675B (en) Image processing method and device, computer equipment and storage medium
CN113743151A (en) Method and device for detecting road surface sprinkled object and storage medium
CN113673360A (en) Human body distribution detection method, aerial photography device, electronic device, and storage medium
Delleji et al. An Improved YOLOv5 for Real-time Mini-UAV Detection in No Fly Zones.
CN115880538A (en) Method and equipment for domain generalization of image processing model and image processing
CN114862952A (en) Unmanned aerial vehicle detection and defense method and system
CN112446355B (en) Pedestrian recognition method and people stream statistics system in public place

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination