CN112733624B - People stream density detection method, system storage medium and terminal for indoor dense scene - Google Patents
People stream density detection method, system storage medium and terminal for indoor dense scene Download PDFInfo
- Publication number
- CN112733624B CN112733624B CN202011570465.2A CN202011570465A CN112733624B CN 112733624 B CN112733624 B CN 112733624B CN 202011570465 A CN202011570465 A CN 202011570465A CN 112733624 B CN112733624 B CN 112733624B
- Authority
- CN
- China
- Prior art keywords
- target
- image
- people stream
- stream density
- density estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000010586 diagram Methods 0.000 claims abstract description 43
- 238000012545 processing Methods 0.000 claims abstract description 42
- 238000007781 pre-processing Methods 0.000 claims abstract description 13
- 238000005070 sampling Methods 0.000 claims abstract description 12
- 239000006185 dispersion Substances 0.000 claims abstract description 6
- 230000002776 aggregation Effects 0.000 claims description 54
- 238000004220 aggregation Methods 0.000 claims description 54
- 230000006870 function Effects 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 10
- 238000005259 measurement Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000009191 jumping Effects 0.000 claims description 3
- 238000012805 post-processing Methods 0.000 abstract description 5
- 238000004088 simulation Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 4
- 230000005764 inhibitory process Effects 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 238000006116 polymerization reaction Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a people stream density detection method, a system storage medium and a terminal for an indoor dense scene, wherein the method comprises the steps of preprocessing an image in a random combination mode; predicting a target center point of an input image; performing down-sampling processing on the true value of the target center point; forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value; and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame. The invention preprocesses the image, greatly increases the variability of the image, and leads the whole detection method to have stronger robustness; the target edge frame is regressed based on the target center point prediction method, the key points are directly distributed on the target to be detected, an anchor point frame is not needed, the condition that a plurality of preselection frames are overlapped mutually does not exist, the condition that the real target to be detected is lost due to post-processing in a scene with dense heads is reduced, and the detection accuracy is high.
Description
Technical Field
The invention relates to the technical field of people stream density detection, in particular to a people stream density detection method, a system storage medium and a terminal for indoor dense scenes.
Background
Along with the rapid development of the economic society of China, the interaction between people is frequent, and various safety events are easily caused by the high crowding of people, so that the people stream density detection has strong practical application meaning. With the development of image processing technology, the image processing technology can be applied to the field of people stream density detection. In the existing mainstream people flow density detection method, a thermodynamic diagram of a corresponding image is generated through a people flow density estimation model, and then the people flow density is estimated according to the thermodynamic diagram; further, in order to solve the problem of prediction errors caused by multiple scales of image samples, the existing people stream density estimation method accurately identifies and counts pictures by adopting a people stream density estimation model which is trained by combining features of multiple scales, so that the people stream density prediction errors are effectively reduced, and the people stream density prediction accuracy is improved. However, the current people stream density detection method does not provide an effective solution for the problem of low prediction accuracy caused by loss of real samples due to overlapping, shielding and small target size in an indoor dense scene.
Disclosure of Invention
The invention aims to overcome the problem that the prior art cannot aim at low prediction precision caused by loss of real samples due to overlapping, shielding and small mesh size in an indoor dense scene, and provides a people stream density detection method, a system storage medium and a terminal for the indoor dense scene.
The purpose of the invention is realized by the following technical scheme: a people stream density detection method for an indoor dense scene comprises a target edge frame regression step:
predicting a target center point of an input image; performing down-sampling processing on the true value of the target center point; forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value; and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame.
As an option, the downsampling processing of the target central point truth value further includes punishing and correcting a sampling result: using local offset loss function L O Calculating a local offset, loss function L O The calculation formula of (2) is as follows:
wherein N represents the number of target center points,indicating a local offset, p a true value for the center point of the object,the effective value of the target center true value is represented, and R represents the output step size.
As an option, the step of determining a local peak point in the thermodynamic diagram further includes: when two Gaussian kernel functions of the same target are overlapped, the maximum key point of the target is used as a local peak point.
As an option, after the step of performing regression processing on the target size, the method further includes: calculating the target size offset by using a size regression loss function pair, wherein the size regression function has the calculation formula:
As an option, the method further comprises an image pre-processing step:
uniformly zooming the sample image, and rotating the zoomed partial image; and scaling the images subjected to the rotation processing in an equal proportion manner so that the length of the images subjected to the rotation processing is equal to the width of the images not subjected to the rotation processing, splicing the images, and taking the spliced images as a training set of the deep aggregation detection network.
As an option, the method further comprises a people stream density estimation step, including direct people stream density estimation and indirect people stream density estimation;
the direct people stream density estimation comprises: determining the number of targets according to the target edge frame, and realizing direct people flow density estimation by combining the area of the delimited area; the indirect people stream density estimation comprises: and inputting neighborhood parameters (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, clustering to obtain cluster division, and forming a regional pedestrian flow density thermodynamic diagram according to the cluster division to realize indirect pedestrian flow density estimation.
It should be further noted that the technical features corresponding to the above-mentioned method options can be combined with each other or replaced to form a new technical solution.
The invention also comprises a people stream density detection system of the indoor dense scene, wherein the system comprises a deep aggregation detection network for realizing target edge frame regression, and the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes; and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.
As an option, the system further comprises a preprocessing unit and a people stream density estimation unit;
the preprocessing unit is used for uniformly zooming the sample image and rotating the zoomed partial image; scaling the image subjected to rotation processing in an equal proportion manner to enable the length of the image subjected to rotation processing to be equal to the width of the image not subjected to rotation processing so as to realize image splicing, and inputting the spliced image serving as a training set into a deep aggregation detection network;
the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module and is used for estimating the people stream density according to a target edge frame output by the deep aggregation detection network;
the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of the delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.
The invention also includes a storage medium, on which computer instructions are stored, which when executed perform the steps of the people stream density detection method for the indoor dense scene.
The invention further comprises a terminal which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the computer instructions to execute the steps of the people stream density detection method for the indoor dense scene.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention forms a thermodynamic diagram by dispersing effective truth values through a Gaussian kernel function, then establishes a peak point of the thermodynamic diagram, namely obtains a real target central point, on the basis, takes the thermodynamic diagram as confidence coefficient, regresses a target edge frame based on a target central point prediction method, directly distributes key points to a target to be detected, can realize target detection and quantity statistics without using an anchor point frame, does not need to manually set a threshold value for target foreground and background classification, does not have the condition that a plurality of preselected frames are mutually overlapped, does not need to use a post-processing method of non-maximum value inhibition to judge and take one of the plurality of overlapped preselected frames, thereby reducing the condition that the real target to be detected is lost due to post-processing in a scene with dense people, being suitable for an indoor dense scene and having high detection accuracy.
(2) According to the invention, through carrying out uniform scaling, rotation, rescaling and splicing treatment on the images in sequence, the variability of the images is greatly increased, different areas of a single image have samples with different scales, and the preprocessed image has sample balance, so that the deep aggregation detection network has stronger robustness on the images in different environments and different crowding degrees.
(3) According to the method, the direct people flow density estimation is realized according to the target edge frame output by the deep polymerization detection network and the area of the defined area; meanwhile, clustering processing is carried out according to the sample set of the target center point, and the crowd density in the indoor area is corrected by combining the approach density degree of the target center coordinate, so that the accuracy of the crowd density detection result in an indoor dense scene is further improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.
FIG. 1 is a flowchart of a method of example 1 of the present invention;
FIG. 2 is a schematic diagram of a comparison simulation of the method of embodiment 1 of the present invention with a non-maximum suppression method;
FIG. 3 is a comparison chart before and after pretreatment in example 1 of the present invention;
FIG. 4 is a schematic diagram of simulation of the method of embodiment 1 of the present invention;
FIG. 5 is a schematic diagram showing the comparison between the simulation of the present invention method in embodiment 1 of the present invention and the simulation of the prior art;
fig. 6 is a system block diagram of embodiment 2 of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are directions or positional relationships described based on the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and operate, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention adopts a data preprocessing mode of random combination to balance the quantity of samples of each area of a flow interception image, constructs a dense head environment as a training set of a deep aggregation detection network, the deep aggregation detection network regresses a target edge frame based on a target central point prediction method to realize target detection and quantity statistics, then combines a defined area to realize people flow density estimation according to the target quantity, and combines the close dense degree of a target central point coordinate to correct the people flow density of an indoor area, and the invention takes the head detection as an embodiment for specific description.
Example 1
As shown in fig. 1, in embodiment 1, a people stream density detection method for an indoor dense scene includes a target edge frame regression step:
s111: predicting a target center point of an input image; specifically, the target center point is a head center point obtained by prediction based on the deep aggregation detection network model of the present application.
S112: down-sampling processing is carried out on the true value of the target central point;
s113: forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value;
s114: and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame.
Furthermore, in the process of down-sampling the true value of the target center point in step S112, since the effective value of the predicted center point of the human head will deviate due to the dispersion of the data, a local offset loss function L is used to correct the down-sampling result O For calculating local offsetThereby punishing and correcting the down-sampling result, and performing a loss function L O The calculation formula of (2) is as follows:
wherein N represents the number of target center points,Representing the local offset, p represents the true value of the target center point,the effective value of the target center true value is represented, and R represents the output step size.
Further, the gaussian kernel function adopted in step S113 is:
wherein x and y represent the coordinates of the center point of the head,the effective value coordinates, sigma, representing the true value of the center of the human head p Represents the adaptive variance of the target scale. Forming a head centroid heat map based on the above Gaussian kernel function scatter effective truth valuesWherein W is the length and width of the input image, C k The number of key points.
Further, the step S114 of establishing the local peak point of the thermodynamic diagram further includes determining the local peak point, when two gaussian kernel functions overlap for the same key point (the head center point of the target to be measured) or the same human head target, the maximum key point is retained, and the final formed thermodynamic diagram is obtained, if the two gaussian kernel functions overlap, the maximum key point is retained, and if the two gaussian kernel functions overlap for the same key point (the head center point of the target to be measured) or the same human head target, the maximum key point is obtainedNamely the key point of the detected head center point,i.e. defined as background.
Further, after the step of performing regression processing on the target size in step S114, the method further includes:
the geometric center of the human head target obtained according to the thermodynamic diagram isAssume the head coordinate is H:thereby re-using the same neural network model (deep aggregation detection network) for the size of the human head target HAnd (4) regression, wherein the size of the human head target does not use a normalization means, and the original pixel coordinates of the target are directly used. Meanwhile, a size regression Loss function L designed based on L1 Loss is used S For calculating the size offsetErrors resulting from size regression were evaluated.
Wherein s is h Representing the size of the real target person's head,representing the predicted target head size. More specifically, the deep aggregation detection network outputs five values, including the number of head keypoints, x-coordinate offset, y-coordinate offset, size box length, and size box width. Further, the edge box of the human head target example is calculated based on deep aggregation detection network regression by combining the truth value of the thermodynamic diagram as the confidence coefficient
The invention forms a thermodynamic diagram by a Gaussian kernel function dispersion effective truth value, then establishes a thermodynamic diagram peak value point, namely, obtains a real target central point, on the basis, takes the thermodynamic diagram as a confidence coefficient, regresses a target edge frame based on a target central point prediction method, directly distributes key points to the target to be detected, can realize target detection and quantity statistics without using an anchor point frame, does not need to manually set a threshold value for target foreground and background classification, does not have the condition that a plurality of preselected frames are mutually overlapped, and does not need to use a post-processing method of non-maximum value inhibition to judge and take one of the plurality of overlapped preselected frames, thereby reducing the condition that the real target to be detected is lost due to post-processing in a scene with dense heads, being suitable for an indoor dense scene and having high detection accuracy.
In order to further illustrate the technical effect of the target edge frame regression method, the simulation diagram for predicting the human head target by adopting the non-maximum value inhibition method of the invention is shown in fig. 2, and it can be seen that compared with the human head target detection result (at the left lower part of fig. 2) of the non-maximum value inhibition method, the human head detection result (at the right lower part of fig. 2) of the method of the invention does not have the situation that the real target is lost, and the prediction precision is effectively improved. It should be further noted that all the simulation diagrams of the present invention are used to assist in explaining the technical effects of the present invention, and are not used to limit the scope of the claimed invention.
Further, aiming at the problems of uneven sample distribution and uneven head size of a single image in the existing human head data set, the invention provides an image preprocessing method, which comprises the following steps:
s101: uniformly zooming the sample image, and rotating the zoomed partial image; as a specific example, as shown in FIG. 3, four images are randomly selected from the data set for random combination, and the length and width of each image are defined as l and w, so the aspect ratio of the length to the width of the original input image is r original W is not less than 1; randomly setting a common scaling factor k epsilon (0,10) for the four selected images]N and Q, and after four images are uniformly scaled according to the scaling factor, two images are randomly selected and rotated by 90 DEGThe rotation direction is random, and can be clockwise or anticlockwise.
S102: and scaling the image subjected to the rotation processing in an equal proportion manner to enable the length of the image subjected to the rotation processing to be equal to the width of the image not subjected to the rotation processing so as to realize image splicing, wherein the image subjected to the splicing is used as a training set of the deep layer aggregation detection network. As a specific example, two images rotated by 90 degrees are scaled again, and the length of the two images and the width of the non-rotated image are combined into a mosaic as shown in fig. 3 (b), and the aspect ratio of the newly generated image is calculated as follows:
still further, the image preprocessing method further includes an image enhancement processing step of:
s103: and carrying out data enhancement processing on the newly generated image. Specifically, the traditional geometric distortion enhancement methods such as symmetry and turning are randomly adopted for each image combination, the illumination distortion enhancement methods such as brightness, contrast, hue and saturation of the image are randomly adjusted, the preprocessed image is finally formed, and the image is sent to a deep aggregation detection network for training.
According to the invention, through carrying out uniform scaling, rotation, rescaling and splicing treatment on the images in sequence, the variability of the images is greatly increased, different areas of a single image have samples with different scales, and the preprocessed images have sample balance, so that the deep polymerization detection network has stronger robustness on the images with different environments and different crowding degrees, and the accuracy of target detection is further ensured.
Furthermore, the method also comprises a people stream density estimation step which comprises direct people stream density estimation and indirect people stream density estimation, and the indirect people stream density estimation assists the direct people stream density estimation to realize data correction of human head detection and ensure the accuracy of detection.
Specifically, the direct people flow density estimation includes:
s121: and determining the number of targets according to the target edge frames, and realizing direct people flow density estimation by combining the area of the demarcated area.
Specifically, indirect people flow density estimation includes:
s122: inputting neighborhood parameters (e, minPts), a sample distance measurement mode and a sample set D = (x 1, x2,. Eta., xm) of a target central point, clustering to obtain cluster division, forming a regional pedestrian flow density thermodynamic diagram according to the cluster division, and realizing indirect pedestrian flow density estimation.
More specifically, the clustering process specifically includes:
initializing core object sample setsInitializing cluster number k =0, initializing unaccessed sample set Γ = D, clusteringFor j =1,2.. M, all core objects are found as follows:
s1221: finding an element belonging to the sample xjxj in a distance measurement mode, wherein the element belongs to a neighborhood sub sample set N belonging to (xj);
s1222: if the number of the sub sample set samples meets the condition that | N belongs to (xj) | is more than or equal to MinPts, adding the sample xjxj into the core object sample set: Ω = Ω { × j };
s1223: if core object sample setThen the process is ended, otherwise, the process goes to step S1224;
s1224: randomly selecting a core object o in a core object sample set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set gamma = gamma- { o };
s1225: if the current cluster core object queueAfter the current cluster Ck is generated, updating cluster division C = { C1, C2,.., ck }, updating a core object sample set Ω = Ω -Ck, and turning to step S1224; otherwise, updating the core object sample set omega = omega-Ck;
s1226: taking out a core object o 'from a current cluster core object queue omega cur, finding out all epsilon-neighborhood subsample sets N epsilon (o') through neighborhood distance thresholds epsilon, enabling delta = N epsilon (o ') andUΓ, updating a current cluster sample set Ck = Ck U tau, updating an unaccessed sample set Γ = F-Delta, and updating omega cur = omega cur U phi (delta &omega) -o' until the unaccessed samples are empty sets;
s1226: the output result is cluster division C = { C1, C2,.., ck };
s1227: and finally, forming a regional people stream density thermodynamic diagram according to cluster division so as to assist in directly estimating the result.
To further illustrate the technical effects of the deep aggregation network model of the present invention, a simulation diagram of human head detection using the deep aggregation detection model of the present invention is shown in fig. 4, and comparative simulation effects of the deep aggregation detection model of the present invention (fig. 5 (c)) and the conventional neural network model overfeat-AlexNet (fig. 5 (a)) and End-to-End model (fig. 5 (b)) are shown in fig. 5. Wherein, each head target detected in fig. 4 correspondingly illustrates the credibility probability thereof, and only the head target with the credibility probability value greater than 0.55 is output in the simulation process of fig. 4; as can be seen from fig. 4 to 5, when the target edge frame regression method is applied to an indoor dense scene for human head detection, the real human head target is not lost, and the prediction accuracy is better compared with the overfeat-AlexNet and the End-to-End model.
Example 2
The embodiment has the same inventive concept as the embodiment 1, and on the basis of the embodiment 1, the system for detecting the density of people stream in an indoor dense scene comprises a deep aggregation detection network for realizing target edge frame regression, wherein the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes; and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.
As a specific embodiment, a system block diagram of the deep aggregation detection network of the present invention is shown in fig. 6, and more hopping connections and more aggregation nodes are added to make the network focus on context details in an indoor dense scenario. Specifically, on the basis of a 34-layer deep aggregation network (DLA-34), 3 × 3 aggregation nodes are added at the bottom layer of the DLA-34, 256 channels are provided, jump connections are added before the aggregation nodes and the stage (level) output header of each resolution, and finally, the prediction of the head center point is realized by adopting a 1 × 1 output volume. In the deep aggregation detection network, dense prediction is performed by adopting full convolution upsampling and hierarchical jump connection, and the resolution of a characteristic diagram is symmetrically improved by adopting an iterative deep aggregation method. It should be noted that, a hierarchy (stage) in the present application is a set of convolution modules with the same resolution, each convolution module is composed of a batch normalization layer, a convolution layer, a pooling layer, and an activation layer, and a modular design can overcome the problem of too complex network by grouping and copying; combining a plurality of network layers into a convolution module, combining the convolution module into layers according to the characteristic resolution, wherein semantic fusion generally occurs in the layers, and spatial fusion generally occurs between the layers; the stacked convolution modules in the network are classified into layers according to the resolution, deeper layers have more semantic information but more coarse spatial information, and jump connections from shallow layers to deep layers are added to be capable of fusing the scale and the resolution, and the jump connections are called iterative deep aggregation connections in the patent. The aggregation node has the main function of combining and compressing the input of the nodes, selects interested important information to project through training, and then outputs the same scale characteristic as the input dimension. The aggregation node structure used in the present invention is sequentially connected convolution layer + batch normalization layer (BN) + nonlinear activation function ReLU matching the stage output resolution. The up-sampling (Upsample 2 x) of the present invention uses 2 × 2 full convolution layers for up-sampling.
Further, the system also comprises a preprocessing unit and a people stream density estimation unit;
the preprocessing unit is used for uniformly zooming the sample image and performing rotation processing on the zoomed partial image; and scaling the images subjected to the rotation processing in an equal proportion manner to enable the length of the images subjected to the rotation processing to be equal to the width of the images not subjected to the rotation processing so as to realize image splicing, and inputting the spliced images serving as training sets into a deep aggregation detection network.
Furthermore, the preprocessing unit further comprises an image enhancement processing module for performing data enhancement processing on the spliced newly generated image; specifically, a traditional geometric distortion enhancement method such as symmetry and turning is adopted, an illumination distortion enhancement method such as adjusting image brightness, contrast, hue and saturation is randomly used, a preprocessed image is finally formed, and the image is sent to a deep polymerization detection network for training.
Specifically, the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module, and is used for carrying out people stream density estimation according to a target edge frame output by the deep aggregation detection network;
the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of the delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.
Example 3
The present embodiment provides a storage medium, which has the same inventive concept as embodiment 1, and has stored thereon computer instructions, and when the computer instructions are executed, the steps of the method for detecting people stream density in an indoor dense scene in embodiment 1 are performed.
Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
Example 3
The present embodiment also provides a terminal, which has the same inventive concept as that in embodiment 1, and includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the steps of the people stream density detection method for the indoor dense scene in embodiment 1 when executing the computer instructions. The processor may be a single or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.
Each functional unit in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The above detailed description is for the purpose of describing the invention in detail, and it is not intended that the invention be limited to the specific embodiments described, and it should be understood that various modifications and substitutions can be made by one skilled in the art without departing from the spirit of the invention.
Claims (7)
1. A people stream density detection method for an indoor dense scene is characterized by comprising the following steps: the method comprises a target edge frame regression step:
predicting a target center point of an input image;
performing down-sampling processing on the true value of the target center point;
forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value;
establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame;
the method also comprises a people stream density estimation step, which comprises direct people stream density estimation and indirect people stream density estimation;
the direct people stream density estimation comprises: determining the number of targets according to the target edge frame, and realizing direct people flow density estimation by combining the area of a delimited area;
the indirect people flow density estimation comprises: inputting neighborhood parameters (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, clustering to obtain cluster division, forming a regional pedestrian flow density thermodynamic diagram according to the cluster division, and realizing indirect pedestrian flow density estimation;
the clustering process specifically includes:
initializing a core object sample setInitializing cluster number k =0, initializing unvisited sample set Γ = D, clusteringFor j =1,2.. M, all core objects are found as follows:
s1221: finding an element-neighborhood subsample set N element (xj) of the sample xj in a distance measurement mode;
s1222: if the number of the samples in the subsample set meets N epsilon (xj) | more than or equal to MinPts, adding the samples xj into the core object sample set: Ω = Ω { × j };
s1223: if core object sample setThen the process is ended, otherwise, the process goes to step S1224;
s1224: randomly selecting a core object o in a core object sample set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set Γ = Γ - { o };
s1225: if the current cluster core object queueAfter the current cluster Ck is generated, updating cluster division C = { C1, C2, ·, ck }, updating a core object sample set Ω = Ω -Ck, and going to step S1224; otherwise, updating the core object sample set omega = omega-Ck;
s1226: taking out a core object o 'from a current cluster core object queue Ω cur, finding out all an e-neighborhood subsample set N e (o') through a neighborhood distance threshold e, letting Δ = N e (o ') nΓ, updating a current cluster sample set Ck = Ck ≧ Ck ^ Δ, updating an unvisited sample set Γ = Γ - Δ, and updating Ω cur = Ω cur § u (Δ ≧ Ω) -o' until the unvisited sample is an empty set;
s1226: the output result is cluster division C = { C1, C2,.., ck };
s1227: finally, forming a regional people flow density thermodynamic diagram according to cluster division so as to assist in directly estimating a result;
the step of establishing the local peak point of the thermodynamic diagram further comprises the step of determining the local peak point:
when two Gaussian kernel functions of the same target are overlapped, taking the maximum key point of the target as a local peak point;
the method further comprises an image preprocessing step:
uniformly zooming the sample image, and rotating the zoomed partial image;
scaling the image subjected to the rotation processing in an equal proportion to enable the length of the image subjected to the rotation processing to be equal to the width of the image not subjected to the rotation processing so as to realize image splicing, wherein the spliced image is used as a training set of a deep aggregation detection network;
performing data enhancement processing on the spliced images;
aspect ratio r of the stitched image new The calculation method is as follows:
wherein k represents a common scaling factor; l represents the length of the image; w represents the length of the image.
2. The people stream density detection method for the indoor dense scene according to claim 1, characterized in that: the down-sampling processing of the target central point truth value further comprises the steps of punishing and correcting the sampling result:
using local offset loss function L O Calculating a local offset, loss function L O The calculation formula of (2) is as follows:
3. The people stream density detection method of the indoor dense scene according to claim 1, characterized in that: the step of regression processing of the target size further comprises:
calculating the target size offset by using a size regression loss function pair, wherein the calculation formula of the size regression loss function is as follows:
4. The system for detecting people stream density in indoor dense scenes according to any one of claims 1 to 3, characterized in that: the system comprises a deep aggregation detection network for realizing target edge frame regression, wherein the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes;
and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.
5. The people stream density detection system of the indoor dense scene as claimed in claim 4, wherein: the system also comprises a preprocessing unit and a people stream density estimation unit;
the preprocessing unit is used for uniformly zooming the sample image and rotating the zoomed partial image; scaling the images subjected to rotation processing in an equal proportion manner to enable the length of the images subjected to rotation processing to be equal to the width of the images not subjected to rotation processing so as to realize image splicing, and inputting the spliced images serving as training sets into a deep aggregation detection network;
the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module and is used for estimating the people stream density according to a target edge frame output by the deep aggregation detection network;
the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of a delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.
6. A storage medium having computer instructions stored thereon, characterized in that: the computer instructions when executed perform the steps of the people stream density detection method of the indoor dense scene as claimed in any one of claims 1 to 3.
7. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, characterized in that: the processor executes the computer instructions to execute the steps of the people stream density detection method of the indoor dense scene in any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011570465.2A CN112733624B (en) | 2020-12-26 | 2020-12-26 | People stream density detection method, system storage medium and terminal for indoor dense scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011570465.2A CN112733624B (en) | 2020-12-26 | 2020-12-26 | People stream density detection method, system storage medium and terminal for indoor dense scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112733624A CN112733624A (en) | 2021-04-30 |
CN112733624B true CN112733624B (en) | 2023-02-03 |
Family
ID=75616785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011570465.2A Active CN112733624B (en) | 2020-12-26 | 2020-12-26 | People stream density detection method, system storage medium and terminal for indoor dense scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112733624B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114373191A (en) * | 2022-01-04 | 2022-04-19 | 北京沃东天骏信息技术有限公司 | Hand condyle positioning method and device |
CN114612767B (en) * | 2022-03-11 | 2022-11-15 | 电子科技大学 | Scene graph-based image understanding and expressing method, system and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056078A (en) * | 2016-05-31 | 2016-10-26 | 武汉大学深圳研究院 | Crowd density estimation method based on multi-feature regression ensemble learning |
CN106054184A (en) * | 2016-05-23 | 2016-10-26 | 北京环境特性研究所 | Method of estimating target scattering center position parameters |
CN108399388A (en) * | 2018-02-28 | 2018-08-14 | 福州大学 | A kind of middle-high density crowd quantity statistics method |
CN108510502A (en) * | 2018-03-08 | 2018-09-07 | 华南理工大学 | Melanoma picture tissue segmentation methods based on deep neural network and system |
CN110459301A (en) * | 2019-07-29 | 2019-11-15 | 清华大学 | Brain neuroblastoma surgical navigation method for registering based on thermodynamic chart and facial key point |
WO2019239162A1 (en) * | 2018-06-16 | 2019-12-19 | Oxsight Ltd | Hand held device for controlling digital magnification on a portable display |
CN110766728A (en) * | 2019-10-16 | 2020-02-07 | 南京航空航天大学 | Combined image feature accurate matching algorithm based on deep learning |
CN111161181A (en) * | 2019-12-26 | 2020-05-15 | 深圳市优必选科技股份有限公司 | Image data enhancement method, model training method, device and storage medium |
WO2020102988A1 (en) * | 2018-11-20 | 2020-05-28 | 西安电子科技大学 | Feature fusion and dense connection based infrared plane target detection method |
CN111368673A (en) * | 2020-02-26 | 2020-07-03 | 华南理工大学 | Method for quickly extracting human body key points based on neural network |
CN111460993A (en) * | 2020-03-31 | 2020-07-28 | 西安电子科技大学 | Human image generation method based on AND-OR graph AOG |
CN111539957A (en) * | 2020-07-07 | 2020-08-14 | 浙江啄云智能科技有限公司 | Image sample generation method, system and detection method for target detection |
CN111815592A (en) * | 2020-06-29 | 2020-10-23 | 郑州大学 | Training method of pulmonary nodule detection model |
CN111898578A (en) * | 2020-08-10 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Crowd density acquisition method and device, electronic equipment and computer program |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102044073B (en) * | 2009-10-09 | 2013-05-29 | 汉王科技股份有限公司 | Method and system for judging crowd density in image |
CN103226860B (en) * | 2013-04-12 | 2015-05-20 | 中国民航大学 | Passage passenger traffic density estimation method |
US11176409B2 (en) * | 2016-12-20 | 2021-11-16 | Sony Depthsensing Solutions Sa/Nv | Distance-independent keypoint detection |
WO2019074938A1 (en) * | 2017-10-09 | 2019-04-18 | The Board Of Trustees Of The Leland Stanford Junior University | Contrast dose reduction for medical imaging using deep learning |
CN109035292B (en) * | 2018-08-31 | 2021-01-01 | 北京智芯原动科技有限公司 | Moving target detection method and device based on deep learning |
CN110598785B (en) * | 2019-09-11 | 2021-09-07 | 腾讯科技(深圳)有限公司 | Training sample image generation method and device |
CN111797697B (en) * | 2020-06-10 | 2022-08-05 | 河海大学 | Angle high-resolution remote sensing image target detection method based on improved CenterNet |
CN111832489A (en) * | 2020-07-15 | 2020-10-27 | 中国电子科技集团公司第三十八研究所 | Subway crowd density estimation method and system based on target detection |
CN112036332A (en) * | 2020-09-03 | 2020-12-04 | 深兰科技(上海)有限公司 | Passenger density detection system and detection method for public transport |
CN112070158B (en) * | 2020-09-08 | 2022-11-15 | 哈尔滨工业大学(威海) | Facial flaw detection method based on convolutional neural network and bilateral filtering |
-
2020
- 2020-12-26 CN CN202011570465.2A patent/CN112733624B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106054184A (en) * | 2016-05-23 | 2016-10-26 | 北京环境特性研究所 | Method of estimating target scattering center position parameters |
CN106056078A (en) * | 2016-05-31 | 2016-10-26 | 武汉大学深圳研究院 | Crowd density estimation method based on multi-feature regression ensemble learning |
CN108399388A (en) * | 2018-02-28 | 2018-08-14 | 福州大学 | A kind of middle-high density crowd quantity statistics method |
CN108510502A (en) * | 2018-03-08 | 2018-09-07 | 华南理工大学 | Melanoma picture tissue segmentation methods based on deep neural network and system |
WO2019239162A1 (en) * | 2018-06-16 | 2019-12-19 | Oxsight Ltd | Hand held device for controlling digital magnification on a portable display |
WO2020102988A1 (en) * | 2018-11-20 | 2020-05-28 | 西安电子科技大学 | Feature fusion and dense connection based infrared plane target detection method |
CN110459301A (en) * | 2019-07-29 | 2019-11-15 | 清华大学 | Brain neuroblastoma surgical navigation method for registering based on thermodynamic chart and facial key point |
CN110766728A (en) * | 2019-10-16 | 2020-02-07 | 南京航空航天大学 | Combined image feature accurate matching algorithm based on deep learning |
CN111161181A (en) * | 2019-12-26 | 2020-05-15 | 深圳市优必选科技股份有限公司 | Image data enhancement method, model training method, device and storage medium |
CN111368673A (en) * | 2020-02-26 | 2020-07-03 | 华南理工大学 | Method for quickly extracting human body key points based on neural network |
CN111460993A (en) * | 2020-03-31 | 2020-07-28 | 西安电子科技大学 | Human image generation method based on AND-OR graph AOG |
CN111815592A (en) * | 2020-06-29 | 2020-10-23 | 郑州大学 | Training method of pulmonary nodule detection model |
CN111539957A (en) * | 2020-07-07 | 2020-08-14 | 浙江啄云智能科技有限公司 | Image sample generation method, system and detection method for target detection |
CN111898578A (en) * | 2020-08-10 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Crowd density acquisition method and device, electronic equipment and computer program |
Non-Patent Citations (4)
Title |
---|
"End-to-end people detection in crowded scenes";Russell Stewart等;《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》;20161231;第2325-2333页 * |
"Research on central issues of crowd density eatimation";kuang ping等;《2013 10th International Computer Conference on Wavelet Active Media Technology and Information Processing》;20140123;第2781-2790页 * |
"基于CenterNet-GYolov3的车辆检测方法";徐仲谋等;《软件》;20200531;第41卷(第5期);第25-30页 * |
"基于YOLOv4卷积神经网络的口罩佩戴检测方法";管军霖等;《现代信息科技》;20200610;第4卷(第11期);第9-12页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112733624A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101643607B1 (en) | Method and apparatus for generating of image data | |
US9152926B2 (en) | Systems, methods, and media for updating a classifier | |
CN101341733B (en) | Single-image vignetting correction | |
CN111723693B (en) | Crowd counting method based on small sample learning | |
Manzanera et al. | Line and circle detection using dense one-to-one Hough transforms on greyscale images | |
CN102025959B (en) | The System and method for of high definition video is produced from low definition video | |
CN112733624B (en) | People stream density detection method, system storage medium and terminal for indoor dense scene | |
CN111368717B (en) | Line-of-sight determination method, line-of-sight determination device, electronic apparatus, and computer-readable storage medium | |
CN111291768B (en) | Image feature matching method and device, equipment and storage medium | |
CN110809788B (en) | Depth image fusion method and device and computer readable storage medium | |
CN114782507B (en) | Asymmetric binocular stereo matching method and system based on unsupervised learning | |
CN113850748B (en) | Evaluation system and method for point cloud quality | |
CN110544268A (en) | Multi-target tracking method based on structured light and SiamMask network | |
CN107155100B (en) | A kind of solid matching method and device based on image | |
US20140301639A1 (en) | Method and apparatus for determining an alpha value | |
CN115937552A (en) | Image matching method based on fusion of manual features and depth features | |
CN112529006B (en) | Panoramic picture detection method, device, terminal and storage medium | |
CN118229980A (en) | Image segmentation method and system based on attribute network random block model | |
CN117635875A (en) | Three-dimensional reconstruction method, device and terminal | |
CN106778822B (en) | Image straight line detection method based on funnel transformation | |
CN115797453B (en) | Positioning method and device for infrared weak target and readable storage medium | |
CN105574844A (en) | Radiation response function estimation method and device | |
CN113066165B (en) | Three-dimensional reconstruction method and device for multi-stage unsupervised learning and electronic equipment | |
CN114445458A (en) | Target tracking method and device, electronic equipment and storage medium | |
CN114399532A (en) | Camera position and posture determining method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |