CN112733624B - People stream density detection method, system storage medium and terminal for indoor dense scene - Google Patents

People stream density detection method, system storage medium and terminal for indoor dense scene Download PDF

Info

Publication number
CN112733624B
CN112733624B CN202011570465.2A CN202011570465A CN112733624B CN 112733624 B CN112733624 B CN 112733624B CN 202011570465 A CN202011570465 A CN 202011570465A CN 112733624 B CN112733624 B CN 112733624B
Authority
CN
China
Prior art keywords
target
image
people stream
stream density
density estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011570465.2A
Other languages
Chinese (zh)
Other versions
CN112733624A (en
Inventor
匡平
刘晨阳
李凡
彭江艳
段其鹏
高宇
黄泓毓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011570465.2A priority Critical patent/CN112733624B/en
Publication of CN112733624A publication Critical patent/CN112733624A/en
Application granted granted Critical
Publication of CN112733624B publication Critical patent/CN112733624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/53Recognition of crowd images, e.g. recognition of crowd congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a people stream density detection method, a system storage medium and a terminal for an indoor dense scene, wherein the method comprises the steps of preprocessing an image in a random combination mode; predicting a target center point of an input image; performing down-sampling processing on the true value of the target center point; forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value; and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame. The invention preprocesses the image, greatly increases the variability of the image, and leads the whole detection method to have stronger robustness; the target edge frame is regressed based on the target center point prediction method, the key points are directly distributed on the target to be detected, an anchor point frame is not needed, the condition that a plurality of preselection frames are overlapped mutually does not exist, the condition that the real target to be detected is lost due to post-processing in a scene with dense heads is reduced, and the detection accuracy is high.

Description

People stream density detection method, system storage medium and terminal for indoor dense scene
Technical Field
The invention relates to the technical field of people stream density detection, in particular to a people stream density detection method, a system storage medium and a terminal for indoor dense scenes.
Background
Along with the rapid development of the economic society of China, the interaction between people is frequent, and various safety events are easily caused by the high crowding of people, so that the people stream density detection has strong practical application meaning. With the development of image processing technology, the image processing technology can be applied to the field of people stream density detection. In the existing mainstream people flow density detection method, a thermodynamic diagram of a corresponding image is generated through a people flow density estimation model, and then the people flow density is estimated according to the thermodynamic diagram; further, in order to solve the problem of prediction errors caused by multiple scales of image samples, the existing people stream density estimation method accurately identifies and counts pictures by adopting a people stream density estimation model which is trained by combining features of multiple scales, so that the people stream density prediction errors are effectively reduced, and the people stream density prediction accuracy is improved. However, the current people stream density detection method does not provide an effective solution for the problem of low prediction accuracy caused by loss of real samples due to overlapping, shielding and small target size in an indoor dense scene.
Disclosure of Invention
The invention aims to overcome the problem that the prior art cannot aim at low prediction precision caused by loss of real samples due to overlapping, shielding and small mesh size in an indoor dense scene, and provides a people stream density detection method, a system storage medium and a terminal for the indoor dense scene.
The purpose of the invention is realized by the following technical scheme: a people stream density detection method for an indoor dense scene comprises a target edge frame regression step:
predicting a target center point of an input image; performing down-sampling processing on the true value of the target center point; forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value; and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame.
As an option, the downsampling processing of the target central point truth value further includes punishing and correcting a sampling result: using local offset loss function L O Calculating a local offset, loss function L O The calculation formula of (2) is as follows:
Figure BDA0002862342780000021
wherein N represents the number of target center points,
Figure BDA0002862342780000022
indicating a local offset, p a true value for the center point of the object,
Figure BDA0002862342780000023
the effective value of the target center true value is represented, and R represents the output step size.
As an option, the step of determining a local peak point in the thermodynamic diagram further includes: when two Gaussian kernel functions of the same target are overlapped, the maximum key point of the target is used as a local peak point.
As an option, after the step of performing regression processing on the target size, the method further includes: calculating the target size offset by using a size regression loss function pair, wherein the size regression function has the calculation formula:
Figure BDA0002862342780000024
wherein s is h A real target size is represented and,
Figure BDA0002862342780000025
representing the predicted target size.
As an option, the method further comprises an image pre-processing step:
uniformly zooming the sample image, and rotating the zoomed partial image; and scaling the images subjected to the rotation processing in an equal proportion manner so that the length of the images subjected to the rotation processing is equal to the width of the images not subjected to the rotation processing, splicing the images, and taking the spliced images as a training set of the deep aggregation detection network.
As an option, the method further comprises a people stream density estimation step, including direct people stream density estimation and indirect people stream density estimation;
the direct people stream density estimation comprises: determining the number of targets according to the target edge frame, and realizing direct people flow density estimation by combining the area of the delimited area; the indirect people stream density estimation comprises: and inputting neighborhood parameters (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, clustering to obtain cluster division, and forming a regional pedestrian flow density thermodynamic diagram according to the cluster division to realize indirect pedestrian flow density estimation.
It should be further noted that the technical features corresponding to the above-mentioned method options can be combined with each other or replaced to form a new technical solution.
The invention also comprises a people stream density detection system of the indoor dense scene, wherein the system comprises a deep aggregation detection network for realizing target edge frame regression, and the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes; and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.
As an option, the system further comprises a preprocessing unit and a people stream density estimation unit;
the preprocessing unit is used for uniformly zooming the sample image and rotating the zoomed partial image; scaling the image subjected to rotation processing in an equal proportion manner to enable the length of the image subjected to rotation processing to be equal to the width of the image not subjected to rotation processing so as to realize image splicing, and inputting the spliced image serving as a training set into a deep aggregation detection network;
the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module and is used for estimating the people stream density according to a target edge frame output by the deep aggregation detection network;
the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of the delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.
The invention also includes a storage medium, on which computer instructions are stored, which when executed perform the steps of the people stream density detection method for the indoor dense scene.
The invention further comprises a terminal which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the computer instructions to execute the steps of the people stream density detection method for the indoor dense scene.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention forms a thermodynamic diagram by dispersing effective truth values through a Gaussian kernel function, then establishes a peak point of the thermodynamic diagram, namely obtains a real target central point, on the basis, takes the thermodynamic diagram as confidence coefficient, regresses a target edge frame based on a target central point prediction method, directly distributes key points to a target to be detected, can realize target detection and quantity statistics without using an anchor point frame, does not need to manually set a threshold value for target foreground and background classification, does not have the condition that a plurality of preselected frames are mutually overlapped, does not need to use a post-processing method of non-maximum value inhibition to judge and take one of the plurality of overlapped preselected frames, thereby reducing the condition that the real target to be detected is lost due to post-processing in a scene with dense people, being suitable for an indoor dense scene and having high detection accuracy.
(2) According to the invention, through carrying out uniform scaling, rotation, rescaling and splicing treatment on the images in sequence, the variability of the images is greatly increased, different areas of a single image have samples with different scales, and the preprocessed image has sample balance, so that the deep aggregation detection network has stronger robustness on the images in different environments and different crowding degrees.
(3) According to the method, the direct people flow density estimation is realized according to the target edge frame output by the deep polymerization detection network and the area of the defined area; meanwhile, clustering processing is carried out according to the sample set of the target center point, and the crowd density in the indoor area is corrected by combining the approach density degree of the target center coordinate, so that the accuracy of the crowd density detection result in an indoor dense scene is further improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.
FIG. 1 is a flowchart of a method of example 1 of the present invention;
FIG. 2 is a schematic diagram of a comparison simulation of the method of embodiment 1 of the present invention with a non-maximum suppression method;
FIG. 3 is a comparison chart before and after pretreatment in example 1 of the present invention;
FIG. 4 is a schematic diagram of simulation of the method of embodiment 1 of the present invention;
FIG. 5 is a schematic diagram showing the comparison between the simulation of the present invention method in embodiment 1 of the present invention and the simulation of the prior art;
fig. 6 is a system block diagram of embodiment 2 of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are directions or positional relationships described based on the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and operate, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention adopts a data preprocessing mode of random combination to balance the quantity of samples of each area of a flow interception image, constructs a dense head environment as a training set of a deep aggregation detection network, the deep aggregation detection network regresses a target edge frame based on a target central point prediction method to realize target detection and quantity statistics, then combines a defined area to realize people flow density estimation according to the target quantity, and combines the close dense degree of a target central point coordinate to correct the people flow density of an indoor area, and the invention takes the head detection as an embodiment for specific description.
Example 1
As shown in fig. 1, in embodiment 1, a people stream density detection method for an indoor dense scene includes a target edge frame regression step:
s111: predicting a target center point of an input image; specifically, the target center point is a head center point obtained by prediction based on the deep aggregation detection network model of the present application.
S112: down-sampling processing is carried out on the true value of the target central point;
s113: forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value;
s114: and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame.
Furthermore, in the process of down-sampling the true value of the target center point in step S112, since the effective value of the predicted center point of the human head will deviate due to the dispersion of the data, a local offset loss function L is used to correct the down-sampling result O For calculating local offset
Figure BDA0002862342780000071
Thereby punishing and correcting the down-sampling result, and performing a loss function L O The calculation formula of (2) is as follows:
Figure BDA0002862342780000072
wherein N represents the number of target center points,
Figure BDA0002862342780000073
Representing the local offset, p represents the true value of the target center point,
Figure BDA0002862342780000074
the effective value of the target center true value is represented, and R represents the output step size.
Further, the gaussian kernel function adopted in step S113 is:
Figure BDA0002862342780000075
wherein x and y represent the coordinates of the center point of the head,
Figure BDA0002862342780000076
the effective value coordinates, sigma, representing the true value of the center of the human head p Represents the adaptive variance of the target scale. Forming a head centroid heat map based on the above Gaussian kernel function scatter effective truth values
Figure BDA0002862342780000077
Wherein W is the length and width of the input image, C k The number of key points.
Further, the step S114 of establishing the local peak point of the thermodynamic diagram further includes determining the local peak point, when two gaussian kernel functions overlap for the same key point (the head center point of the target to be measured) or the same human head target, the maximum key point is retained, and the final formed thermodynamic diagram is obtained, if the two gaussian kernel functions overlap, the maximum key point is retained, and if the two gaussian kernel functions overlap for the same key point (the head center point of the target to be measured) or the same human head target, the maximum key point is obtained
Figure RE-GDA0002984184480000078
Namely the key point of the detected head center point,
Figure RE-GDA0002984184480000079
i.e. defined as background.
Further, after the step of performing regression processing on the target size in step S114, the method further includes:
the geometric center of the human head target obtained according to the thermodynamic diagram is
Figure BDA00028623427800000711
Assume the head coordinate is H:
Figure BDA00028623427800000712
thereby re-using the same neural network model (deep aggregation detection network) for the size of the human head target H
Figure BDA00028623427800000713
And (4) regression, wherein the size of the human head target does not use a normalization means, and the original pixel coordinates of the target are directly used. Meanwhile, a size regression Loss function L designed based on L1 Loss is used S For calculating the size offset
Figure BDA00028623427800000714
Errors resulting from size regression were evaluated.
Figure BDA0002862342780000081
Wherein s is h Representing the size of the real target person's head,
Figure BDA0002862342780000082
representing the predicted target head size. More specifically, the deep aggregation detection network outputs five values, including the number of head keypoints, x-coordinate offset, y-coordinate offset, size box length, and size box width. Further, the edge box of the human head target example is calculated based on deep aggregation detection network regression by combining the truth value of the thermodynamic diagram as the confidence coefficient
Figure BDA0002862342780000083
Figure BDA0002862342780000084
The invention forms a thermodynamic diagram by a Gaussian kernel function dispersion effective truth value, then establishes a thermodynamic diagram peak value point, namely, obtains a real target central point, on the basis, takes the thermodynamic diagram as a confidence coefficient, regresses a target edge frame based on a target central point prediction method, directly distributes key points to the target to be detected, can realize target detection and quantity statistics without using an anchor point frame, does not need to manually set a threshold value for target foreground and background classification, does not have the condition that a plurality of preselected frames are mutually overlapped, and does not need to use a post-processing method of non-maximum value inhibition to judge and take one of the plurality of overlapped preselected frames, thereby reducing the condition that the real target to be detected is lost due to post-processing in a scene with dense heads, being suitable for an indoor dense scene and having high detection accuracy.
In order to further illustrate the technical effect of the target edge frame regression method, the simulation diagram for predicting the human head target by adopting the non-maximum value inhibition method of the invention is shown in fig. 2, and it can be seen that compared with the human head target detection result (at the left lower part of fig. 2) of the non-maximum value inhibition method, the human head detection result (at the right lower part of fig. 2) of the method of the invention does not have the situation that the real target is lost, and the prediction precision is effectively improved. It should be further noted that all the simulation diagrams of the present invention are used to assist in explaining the technical effects of the present invention, and are not used to limit the scope of the claimed invention.
Further, aiming at the problems of uneven sample distribution and uneven head size of a single image in the existing human head data set, the invention provides an image preprocessing method, which comprises the following steps:
s101: uniformly zooming the sample image, and rotating the zoomed partial image; as a specific example, as shown in FIG. 3, four images are randomly selected from the data set for random combination, and the length and width of each image are defined as l and w, so the aspect ratio of the length to the width of the original input image is r original W is not less than 1; randomly setting a common scaling factor k epsilon (0,10) for the four selected images]N and Q, and after four images are uniformly scaled according to the scaling factor, two images are randomly selected and rotated by 90 DEGThe rotation direction is random, and can be clockwise or anticlockwise.
S102: and scaling the image subjected to the rotation processing in an equal proportion manner to enable the length of the image subjected to the rotation processing to be equal to the width of the image not subjected to the rotation processing so as to realize image splicing, wherein the image subjected to the splicing is used as a training set of the deep layer aggregation detection network. As a specific example, two images rotated by 90 degrees are scaled again, and the length of the two images and the width of the non-rotated image are combined into a mosaic as shown in fig. 3 (b), and the aspect ratio of the newly generated image is calculated as follows:
Figure BDA0002862342780000091
still further, the image preprocessing method further includes an image enhancement processing step of:
s103: and carrying out data enhancement processing on the newly generated image. Specifically, the traditional geometric distortion enhancement methods such as symmetry and turning are randomly adopted for each image combination, the illumination distortion enhancement methods such as brightness, contrast, hue and saturation of the image are randomly adjusted, the preprocessed image is finally formed, and the image is sent to a deep aggregation detection network for training.
According to the invention, through carrying out uniform scaling, rotation, rescaling and splicing treatment on the images in sequence, the variability of the images is greatly increased, different areas of a single image have samples with different scales, and the preprocessed images have sample balance, so that the deep polymerization detection network has stronger robustness on the images with different environments and different crowding degrees, and the accuracy of target detection is further ensured.
Furthermore, the method also comprises a people stream density estimation step which comprises direct people stream density estimation and indirect people stream density estimation, and the indirect people stream density estimation assists the direct people stream density estimation to realize data correction of human head detection and ensure the accuracy of detection.
Specifically, the direct people flow density estimation includes:
s121: and determining the number of targets according to the target edge frames, and realizing direct people flow density estimation by combining the area of the demarcated area.
Specifically, indirect people flow density estimation includes:
s122: inputting neighborhood parameters (e, minPts), a sample distance measurement mode and a sample set D = (x 1, x2,. Eta., xm) of a target central point, clustering to obtain cluster division, forming a regional pedestrian flow density thermodynamic diagram according to the cluster division, and realizing indirect pedestrian flow density estimation.
More specifically, the clustering process specifically includes:
initializing core object sample sets
Figure BDA0002862342780000101
Initializing cluster number k =0, initializing unaccessed sample set Γ = D, clustering
Figure BDA0002862342780000102
For j =1,2.. M, all core objects are found as follows:
s1221: finding an element belonging to the sample xjxj in a distance measurement mode, wherein the element belongs to a neighborhood sub sample set N belonging to (xj);
s1222: if the number of the sub sample set samples meets the condition that | N belongs to (xj) | is more than or equal to MinPts, adding the sample xjxj into the core object sample set: Ω = Ω { × j };
s1223: if core object sample set
Figure BDA0002862342780000103
Then the process is ended, otherwise, the process goes to step S1224;
s1224: randomly selecting a core object o in a core object sample set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set gamma = gamma- { o };
s1225: if the current cluster core object queue
Figure BDA0002862342780000104
After the current cluster Ck is generated, updating cluster division C = { C1, C2,.., ck }, updating a core object sample set Ω = Ω -Ck, and turning to step S1224; otherwise, updating the core object sample set omega = omega-Ck;
s1226: taking out a core object o 'from a current cluster core object queue omega cur, finding out all epsilon-neighborhood subsample sets N epsilon (o') through neighborhood distance thresholds epsilon, enabling delta = N epsilon (o ') andUΓ, updating a current cluster sample set Ck = Ck U tau, updating an unaccessed sample set Γ = F-Delta, and updating omega cur = omega cur U phi (delta &omega) -o' until the unaccessed samples are empty sets;
s1226: the output result is cluster division C = { C1, C2,.., ck };
s1227: and finally, forming a regional people stream density thermodynamic diagram according to cluster division so as to assist in directly estimating the result.
To further illustrate the technical effects of the deep aggregation network model of the present invention, a simulation diagram of human head detection using the deep aggregation detection model of the present invention is shown in fig. 4, and comparative simulation effects of the deep aggregation detection model of the present invention (fig. 5 (c)) and the conventional neural network model overfeat-AlexNet (fig. 5 (a)) and End-to-End model (fig. 5 (b)) are shown in fig. 5. Wherein, each head target detected in fig. 4 correspondingly illustrates the credibility probability thereof, and only the head target with the credibility probability value greater than 0.55 is output in the simulation process of fig. 4; as can be seen from fig. 4 to 5, when the target edge frame regression method is applied to an indoor dense scene for human head detection, the real human head target is not lost, and the prediction accuracy is better compared with the overfeat-AlexNet and the End-to-End model.
Example 2
The embodiment has the same inventive concept as the embodiment 1, and on the basis of the embodiment 1, the system for detecting the density of people stream in an indoor dense scene comprises a deep aggregation detection network for realizing target edge frame regression, wherein the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes; and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.
As a specific embodiment, a system block diagram of the deep aggregation detection network of the present invention is shown in fig. 6, and more hopping connections and more aggregation nodes are added to make the network focus on context details in an indoor dense scenario. Specifically, on the basis of a 34-layer deep aggregation network (DLA-34), 3 × 3 aggregation nodes are added at the bottom layer of the DLA-34, 256 channels are provided, jump connections are added before the aggregation nodes and the stage (level) output header of each resolution, and finally, the prediction of the head center point is realized by adopting a 1 × 1 output volume. In the deep aggregation detection network, dense prediction is performed by adopting full convolution upsampling and hierarchical jump connection, and the resolution of a characteristic diagram is symmetrically improved by adopting an iterative deep aggregation method. It should be noted that, a hierarchy (stage) in the present application is a set of convolution modules with the same resolution, each convolution module is composed of a batch normalization layer, a convolution layer, a pooling layer, and an activation layer, and a modular design can overcome the problem of too complex network by grouping and copying; combining a plurality of network layers into a convolution module, combining the convolution module into layers according to the characteristic resolution, wherein semantic fusion generally occurs in the layers, and spatial fusion generally occurs between the layers; the stacked convolution modules in the network are classified into layers according to the resolution, deeper layers have more semantic information but more coarse spatial information, and jump connections from shallow layers to deep layers are added to be capable of fusing the scale and the resolution, and the jump connections are called iterative deep aggregation connections in the patent. The aggregation node has the main function of combining and compressing the input of the nodes, selects interested important information to project through training, and then outputs the same scale characteristic as the input dimension. The aggregation node structure used in the present invention is sequentially connected convolution layer + batch normalization layer (BN) + nonlinear activation function ReLU matching the stage output resolution. The up-sampling (Upsample 2 x) of the present invention uses 2 × 2 full convolution layers for up-sampling.
Further, the system also comprises a preprocessing unit and a people stream density estimation unit;
the preprocessing unit is used for uniformly zooming the sample image and performing rotation processing on the zoomed partial image; and scaling the images subjected to the rotation processing in an equal proportion manner to enable the length of the images subjected to the rotation processing to be equal to the width of the images not subjected to the rotation processing so as to realize image splicing, and inputting the spliced images serving as training sets into a deep aggregation detection network.
Furthermore, the preprocessing unit further comprises an image enhancement processing module for performing data enhancement processing on the spliced newly generated image; specifically, a traditional geometric distortion enhancement method such as symmetry and turning is adopted, an illumination distortion enhancement method such as adjusting image brightness, contrast, hue and saturation is randomly used, a preprocessed image is finally formed, and the image is sent to a deep polymerization detection network for training.
Specifically, the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module, and is used for carrying out people stream density estimation according to a target edge frame output by the deep aggregation detection network;
the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of the delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.
Example 3
The present embodiment provides a storage medium, which has the same inventive concept as embodiment 1, and has stored thereon computer instructions, and when the computer instructions are executed, the steps of the method for detecting people stream density in an indoor dense scene in embodiment 1 are performed.
Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Example 3
The present embodiment also provides a terminal, which has the same inventive concept as that in embodiment 1, and includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the steps of the people stream density detection method for the indoor dense scene in embodiment 1 when executing the computer instructions. The processor may be a single or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.
Each functional unit in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above detailed description is for the purpose of describing the invention in detail, and it is not intended that the invention be limited to the specific embodiments described, and it should be understood that various modifications and substitutions can be made by one skilled in the art without departing from the spirit of the invention.

Claims (7)

1. A people stream density detection method for an indoor dense scene is characterized by comprising the following steps: the method comprises a target edge frame regression step:
predicting a target center point of an input image;
performing down-sampling processing on the true value of the target center point;
forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value;
establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame;
the method also comprises a people stream density estimation step, which comprises direct people stream density estimation and indirect people stream density estimation;
the direct people stream density estimation comprises: determining the number of targets according to the target edge frame, and realizing direct people flow density estimation by combining the area of a delimited area;
the indirect people flow density estimation comprises: inputting neighborhood parameters (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, clustering to obtain cluster division, forming a regional pedestrian flow density thermodynamic diagram according to the cluster division, and realizing indirect pedestrian flow density estimation;
the clustering process specifically includes:
initializing a core object sample set
Figure FDA0003838297550000011
Initializing cluster number k =0, initializing unvisited sample set Γ = D, clustering
Figure FDA0003838297550000012
For j =1,2.. M, all core objects are found as follows:
s1221: finding an element-neighborhood subsample set N element (xj) of the sample xj in a distance measurement mode;
s1222: if the number of the samples in the subsample set meets N epsilon (xj) | more than or equal to MinPts, adding the samples xj into the core object sample set: Ω = Ω { × j };
s1223: if core object sample set
Figure FDA0003838297550000013
Then the process is ended, otherwise, the process goes to step S1224;
s1224: randomly selecting a core object o in a core object sample set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set Γ = Γ - { o };
s1225: if the current cluster core object queue
Figure FDA0003838297550000022
After the current cluster Ck is generated, updating cluster division C = { C1, C2, ·, ck }, updating a core object sample set Ω = Ω -Ck, and going to step S1224; otherwise, updating the core object sample set omega = omega-Ck;
s1226: taking out a core object o 'from a current cluster core object queue Ω cur, finding out all an e-neighborhood subsample set N e (o') through a neighborhood distance threshold e, letting Δ = N e (o ') nΓ, updating a current cluster sample set Ck = Ck ≧ Ck ^ Δ, updating an unvisited sample set Γ = Γ - Δ, and updating Ω cur = Ω cur § u (Δ ≧ Ω) -o' until the unvisited sample is an empty set;
s1226: the output result is cluster division C = { C1, C2,.., ck };
s1227: finally, forming a regional people flow density thermodynamic diagram according to cluster division so as to assist in directly estimating a result;
the step of establishing the local peak point of the thermodynamic diagram further comprises the step of determining the local peak point:
when two Gaussian kernel functions of the same target are overlapped, taking the maximum key point of the target as a local peak point;
the method further comprises an image preprocessing step:
uniformly zooming the sample image, and rotating the zoomed partial image;
scaling the image subjected to the rotation processing in an equal proportion to enable the length of the image subjected to the rotation processing to be equal to the width of the image not subjected to the rotation processing so as to realize image splicing, wherein the spliced image is used as a training set of a deep aggregation detection network;
performing data enhancement processing on the spliced images;
aspect ratio r of the stitched image new The calculation method is as follows:
Figure FDA0003838297550000021
wherein k represents a common scaling factor; l represents the length of the image; w represents the length of the image.
2. The people stream density detection method for the indoor dense scene according to claim 1, characterized in that: the down-sampling processing of the target central point truth value further comprises the steps of punishing and correcting the sampling result:
using local offset loss function L O Calculating a local offset, loss function L O The calculation formula of (2) is as follows:
Figure FDA0003838297550000031
wherein N represents the number of target center points,
Figure FDA0003838297550000032
indicating a local offset, p a true value of the target center point,
Figure FDA0003838297550000033
the effective value of the target center true value is represented, and R represents the output step size.
3. The people stream density detection method of the indoor dense scene according to claim 1, characterized in that: the step of regression processing of the target size further comprises:
calculating the target size offset by using a size regression loss function pair, wherein the calculation formula of the size regression loss function is as follows:
Figure FDA0003838297550000034
wherein s is h Which is representative of the size of the real object,
Figure FDA0003838297550000035
representing the predicted target size.
4. The system for detecting people stream density in indoor dense scenes according to any one of claims 1 to 3, characterized in that: the system comprises a deep aggregation detection network for realizing target edge frame regression, wherein the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes;
and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.
5. The people stream density detection system of the indoor dense scene as claimed in claim 4, wherein: the system also comprises a preprocessing unit and a people stream density estimation unit;
the preprocessing unit is used for uniformly zooming the sample image and rotating the zoomed partial image; scaling the images subjected to rotation processing in an equal proportion manner to enable the length of the images subjected to rotation processing to be equal to the width of the images not subjected to rotation processing so as to realize image splicing, and inputting the spliced images serving as training sets into a deep aggregation detection network;
the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module and is used for estimating the people stream density according to a target edge frame output by the deep aggregation detection network;
the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of a delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.
6. A storage medium having computer instructions stored thereon, characterized in that: the computer instructions when executed perform the steps of the people stream density detection method of the indoor dense scene as claimed in any one of claims 1 to 3.
7. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, characterized in that: the processor executes the computer instructions to execute the steps of the people stream density detection method of the indoor dense scene in any one of claims 1-3.
CN202011570465.2A 2020-12-26 2020-12-26 People stream density detection method, system storage medium and terminal for indoor dense scene Active CN112733624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011570465.2A CN112733624B (en) 2020-12-26 2020-12-26 People stream density detection method, system storage medium and terminal for indoor dense scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011570465.2A CN112733624B (en) 2020-12-26 2020-12-26 People stream density detection method, system storage medium and terminal for indoor dense scene

Publications (2)

Publication Number Publication Date
CN112733624A CN112733624A (en) 2021-04-30
CN112733624B true CN112733624B (en) 2023-02-03

Family

ID=75616785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011570465.2A Active CN112733624B (en) 2020-12-26 2020-12-26 People stream density detection method, system storage medium and terminal for indoor dense scene

Country Status (1)

Country Link
CN (1) CN112733624B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612767B (en) * 2022-03-11 2022-11-15 电子科技大学 Scene graph-based image understanding and expressing method, system and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056078A (en) * 2016-05-31 2016-10-26 武汉大学深圳研究院 Crowd density estimation method based on multi-feature regression ensemble learning
CN106054184A (en) * 2016-05-23 2016-10-26 北京环境特性研究所 Method of estimating target scattering center position parameters
CN108399388A (en) * 2018-02-28 2018-08-14 福州大学 A kind of middle-high density crowd quantity statistics method
CN108510502A (en) * 2018-03-08 2018-09-07 华南理工大学 Melanoma picture tissue segmentation methods based on deep neural network and system
CN110459301A (en) * 2019-07-29 2019-11-15 清华大学 Brain neuroblastoma surgical navigation method for registering based on thermodynamic chart and facial key point
WO2019239162A1 (en) * 2018-06-16 2019-12-19 Oxsight Ltd Hand held device for controlling digital magnification on a portable display
CN110766728A (en) * 2019-10-16 2020-02-07 南京航空航天大学 Combined image feature accurate matching algorithm based on deep learning
CN111161181A (en) * 2019-12-26 2020-05-15 深圳市优必选科技股份有限公司 Image data enhancement method, model training method, device and storage medium
WO2020102988A1 (en) * 2018-11-20 2020-05-28 西安电子科技大学 Feature fusion and dense connection based infrared plane target detection method
CN111368673A (en) * 2020-02-26 2020-07-03 华南理工大学 Method for quickly extracting human body key points based on neural network
CN111460993A (en) * 2020-03-31 2020-07-28 西安电子科技大学 Human image generation method based on AND-OR graph AOG
CN111539957A (en) * 2020-07-07 2020-08-14 浙江啄云智能科技有限公司 Image sample generation method, system and detection method for target detection
CN111815592A (en) * 2020-06-29 2020-10-23 郑州大学 Training method of pulmonary nodule detection model
CN111898578A (en) * 2020-08-10 2020-11-06 腾讯科技(深圳)有限公司 Crowd density acquisition method and device, electronic equipment and computer program

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102044073B (en) * 2009-10-09 2013-05-29 汉王科技股份有限公司 Method and system for judging crowd density in image
CN103226860B (en) * 2013-04-12 2015-05-20 中国民航大学 Passage passenger traffic density estimation method
US11176409B2 (en) * 2016-12-20 2021-11-16 Sony Depthsensing Solutions Sa/Nv Distance-independent keypoint detection
BR112020007105A2 (en) * 2017-10-09 2020-09-24 The Board Of Trustees Of The Leland Stanford Junior University method for training a diagnostic imaging device to perform a medical diagnostic imaging with a reduced dose of contrast agent
CN109035292B (en) * 2018-08-31 2021-01-01 北京智芯原动科技有限公司 Moving target detection method and device based on deep learning
CN110598785B (en) * 2019-09-11 2021-09-07 腾讯科技(深圳)有限公司 Training sample image generation method and device
CN111797697B (en) * 2020-06-10 2022-08-05 河海大学 Angle high-resolution remote sensing image target detection method based on improved CenterNet
CN111832489A (en) * 2020-07-15 2020-10-27 中国电子科技集团公司第三十八研究所 Subway crowd density estimation method and system based on target detection
CN112036332A (en) * 2020-09-03 2020-12-04 深兰科技(上海)有限公司 Passenger density detection system and detection method for public transport
CN112070158B (en) * 2020-09-08 2022-11-15 哈尔滨工业大学(威海) Facial flaw detection method based on convolutional neural network and bilateral filtering

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106054184A (en) * 2016-05-23 2016-10-26 北京环境特性研究所 Method of estimating target scattering center position parameters
CN106056078A (en) * 2016-05-31 2016-10-26 武汉大学深圳研究院 Crowd density estimation method based on multi-feature regression ensemble learning
CN108399388A (en) * 2018-02-28 2018-08-14 福州大学 A kind of middle-high density crowd quantity statistics method
CN108510502A (en) * 2018-03-08 2018-09-07 华南理工大学 Melanoma picture tissue segmentation methods based on deep neural network and system
WO2019239162A1 (en) * 2018-06-16 2019-12-19 Oxsight Ltd Hand held device for controlling digital magnification on a portable display
WO2020102988A1 (en) * 2018-11-20 2020-05-28 西安电子科技大学 Feature fusion and dense connection based infrared plane target detection method
CN110459301A (en) * 2019-07-29 2019-11-15 清华大学 Brain neuroblastoma surgical navigation method for registering based on thermodynamic chart and facial key point
CN110766728A (en) * 2019-10-16 2020-02-07 南京航空航天大学 Combined image feature accurate matching algorithm based on deep learning
CN111161181A (en) * 2019-12-26 2020-05-15 深圳市优必选科技股份有限公司 Image data enhancement method, model training method, device and storage medium
CN111368673A (en) * 2020-02-26 2020-07-03 华南理工大学 Method for quickly extracting human body key points based on neural network
CN111460993A (en) * 2020-03-31 2020-07-28 西安电子科技大学 Human image generation method based on AND-OR graph AOG
CN111815592A (en) * 2020-06-29 2020-10-23 郑州大学 Training method of pulmonary nodule detection model
CN111539957A (en) * 2020-07-07 2020-08-14 浙江啄云智能科技有限公司 Image sample generation method, system and detection method for target detection
CN111898578A (en) * 2020-08-10 2020-11-06 腾讯科技(深圳)有限公司 Crowd density acquisition method and device, electronic equipment and computer program

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
"End-to-end people detection in crowded scenes";Russell Stewart等;《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》;20161231;第2325-2333页 *
"Research on central issues of crowd density eatimation";kuang ping等;《2013 10th International Computer Conference on Wavelet Active Media Technology and Information Processing》;20140123;第2781-2790页 *
"基于CenterNet-GYolov3的车辆检测方法";徐仲谋等;《软件》;20200531;第41卷(第5期);第25-30页 *
"基于YOLOv4卷积神经网络的口罩佩戴检测方法";管军霖等;《现代信息科技》;20200610;第4卷(第11期);第9-12页 *

Also Published As

Publication number Publication date
CN112733624A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
KR101643607B1 (en) Method and apparatus for generating of image data
US6504569B1 (en) 2-D extended image generation from 3-D data extracted from a video sequence
US9152926B2 (en) Systems, methods, and media for updating a classifier
CN101341733B (en) Single-image vignetting correction
Manzanera et al. Line and circle detection using dense one-to-one Hough transforms on greyscale images
CN111723693B (en) Crowd counting method based on small sample learning
CN102025959B (en) The System and method for of high definition video is produced from low definition video
CN106886748B (en) TLD-based variable-scale target tracking method applicable to unmanned aerial vehicle
CN111291768B (en) Image feature matching method and device, equipment and storage medium
CN112927279A (en) Image depth information generation method, device and storage medium
CN110544268B (en) Multi-target tracking method based on structured light and SiamMask network
US20140301639A1 (en) Method and apparatus for determining an alpha value
CN112733624B (en) People stream density detection method, system storage medium and terminal for indoor dense scene
CN112529006B (en) Panoramic picture detection method, device, terminal and storage medium
CN110809788B (en) Depth image fusion method and device and computer readable storage medium
CN106778822B (en) Image straight line detection method based on funnel transformation
CN105574844A (en) Radiation response function estimation method and device
CN114782507B (en) Asymmetric binocular stereo matching method and system based on unsupervised learning
CN114445458A (en) Target tracking method and device, electronic equipment and storage medium
CN115205111A (en) Image splicing method and device, terminal equipment and storage medium
CN114399532A (en) Camera position and posture determining method and device
JP5786838B2 (en) Image region dividing apparatus, method, and program
US8891869B2 (en) System and method for effectively performing an integrated segmentation procedure
EP2846307A1 (en) Method and apparatus for determining an alpha value for alpha matting
CN110889459A (en) Learning method based on edge and Fisher criterion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant