CN112733624B

CN112733624B - People stream density detection method, system storage medium and terminal for indoor dense scene

Info

Publication number: CN112733624B
Application number: CN202011570465.2A
Authority: CN
Inventors: 匡平; 刘晨阳; 李凡; 彭江艳; 段其鹏; 高宇; 黄泓毓
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-12-26
Filing date: 2020-12-26
Publication date: 2023-02-03
Anticipated expiration: 2040-12-26
Also published as: CN112733624A

Abstract

The invention discloses a people stream density detection method, a system storage medium and a terminal for an indoor dense scene, wherein the method comprises the steps of preprocessing an image in a random combination mode; predicting a target center point of an input image; performing down-sampling processing on the true value of the target center point; forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value; and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame. The invention preprocesses the image, greatly increases the variability of the image, and leads the whole detection method to have stronger robustness; the target edge frame is regressed based on the target center point prediction method, the key points are directly distributed on the target to be detected, an anchor point frame is not needed, the condition that a plurality of preselection frames are overlapped mutually does not exist, the condition that the real target to be detected is lost due to post-processing in a scene with dense heads is reduced, and the detection accuracy is high.

Description

People stream density detection method, system storage medium and terminal for indoor dense scene

Technical Field

The invention relates to the technical field of people stream density detection, in particular to a people stream density detection method, a system storage medium and a terminal for indoor dense scenes.

Background

Along with the rapid development of the economic society of China, the interaction between people is frequent, and various safety events are easily caused by the high crowding of people, so that the people stream density detection has strong practical application meaning. With the development of image processing technology, the image processing technology can be applied to the field of people stream density detection. In the existing mainstream people flow density detection method, a thermodynamic diagram of a corresponding image is generated through a people flow density estimation model, and then the people flow density is estimated according to the thermodynamic diagram; further, in order to solve the problem of prediction errors caused by multiple scales of image samples, the existing people stream density estimation method accurately identifies and counts pictures by adopting a people stream density estimation model which is trained by combining features of multiple scales, so that the people stream density prediction errors are effectively reduced, and the people stream density prediction accuracy is improved. However, the current people stream density detection method does not provide an effective solution for the problem of low prediction accuracy caused by loss of real samples due to overlapping, shielding and small target size in an indoor dense scene.

Disclosure of Invention

The invention aims to overcome the problem that the prior art cannot aim at low prediction precision caused by loss of real samples due to overlapping, shielding and small mesh size in an indoor dense scene, and provides a people stream density detection method, a system storage medium and a terminal for the indoor dense scene.

The purpose of the invention is realized by the following technical scheme: a people stream density detection method for an indoor dense scene comprises a target edge frame regression step:

predicting a target center point of an input image; performing down-sampling processing on the true value of the target center point; forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value; and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame.

As an option, the downsampling processing of the target central point truth value further includes punishing and correcting a sampling result: using local offset loss function L _O Calculating a local offset, loss function L _O The calculation formula of (2) is as follows:

wherein N represents the number of target center points,

indicating a local offset, p a true value for the center point of the object,

the effective value of the target center true value is represented, and R represents the output step size.

As an option, the step of determining a local peak point in the thermodynamic diagram further includes: when two Gaussian kernel functions of the same target are overlapped, the maximum key point of the target is used as a local peak point.

As an option, after the step of performing regression processing on the target size, the method further includes: calculating the target size offset by using a size regression loss function pair, wherein the size regression function has the calculation formula:

wherein s is _h A real target size is represented and,

representing the predicted target size.

As an option, the method further comprises an image pre-processing step:

uniformly zooming the sample image, and rotating the zoomed partial image; and scaling the images subjected to the rotation processing in an equal proportion manner so that the length of the images subjected to the rotation processing is equal to the width of the images not subjected to the rotation processing, splicing the images, and taking the spliced images as a training set of the deep aggregation detection network.

As an option, the method further comprises a people stream density estimation step, including direct people stream density estimation and indirect people stream density estimation;

the direct people stream density estimation comprises: determining the number of targets according to the target edge frame, and realizing direct people flow density estimation by combining the area of the delimited area; the indirect people stream density estimation comprises: and inputting neighborhood parameters (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, clustering to obtain cluster division, and forming a regional pedestrian flow density thermodynamic diagram according to the cluster division to realize indirect pedestrian flow density estimation.

It should be further noted that the technical features corresponding to the above-mentioned method options can be combined with each other or replaced to form a new technical solution.

The invention also comprises a people stream density detection system of the indoor dense scene, wherein the system comprises a deep aggregation detection network for realizing target edge frame regression, and the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes; and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.

As an option, the system further comprises a preprocessing unit and a people stream density estimation unit;

the preprocessing unit is used for uniformly zooming the sample image and rotating the zoomed partial image; scaling the image subjected to rotation processing in an equal proportion manner to enable the length of the image subjected to rotation processing to be equal to the width of the image not subjected to rotation processing so as to realize image splicing, and inputting the spliced image serving as a training set into a deep aggregation detection network;

the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module and is used for estimating the people stream density according to a target edge frame output by the deep aggregation detection network;

the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of the delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.

The invention also includes a storage medium, on which computer instructions are stored, which when executed perform the steps of the people stream density detection method for the indoor dense scene.

The invention further comprises a terminal which comprises a memory and a processor, wherein the memory stores computer instructions capable of running on the processor, and the processor executes the computer instructions to execute the steps of the people stream density detection method for the indoor dense scene.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention forms a thermodynamic diagram by dispersing effective truth values through a Gaussian kernel function, then establishes a peak point of the thermodynamic diagram, namely obtains a real target central point, on the basis, takes the thermodynamic diagram as confidence coefficient, regresses a target edge frame based on a target central point prediction method, directly distributes key points to a target to be detected, can realize target detection and quantity statistics without using an anchor point frame, does not need to manually set a threshold value for target foreground and background classification, does not have the condition that a plurality of preselected frames are mutually overlapped, does not need to use a post-processing method of non-maximum value inhibition to judge and take one of the plurality of overlapped preselected frames, thereby reducing the condition that the real target to be detected is lost due to post-processing in a scene with dense people, being suitable for an indoor dense scene and having high detection accuracy.

(2) According to the invention, through carrying out uniform scaling, rotation, rescaling and splicing treatment on the images in sequence, the variability of the images is greatly increased, different areas of a single image have samples with different scales, and the preprocessed image has sample balance, so that the deep aggregation detection network has stronger robustness on the images in different environments and different crowding degrees.

(3) According to the method, the direct people flow density estimation is realized according to the target edge frame output by the deep polymerization detection network and the area of the defined area; meanwhile, clustering processing is carried out according to the sample set of the target center point, and the crowd density in the indoor area is corrected by combining the approach density degree of the target center coordinate, so that the accuracy of the crowd density detection result in an indoor dense scene is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention.

FIG. 1 is a flowchart of a method of example 1 of the present invention;

FIG. 2 is a schematic diagram of a comparison simulation of the method of embodiment 1 of the present invention with a non-maximum suppression method;

FIG. 3 is a comparison chart before and after pretreatment in example 1 of the present invention;

FIG. 4 is a schematic diagram of simulation of the method of embodiment 1 of the present invention;

FIG. 5 is a schematic diagram showing the comparison between the simulation of the present invention method in embodiment 1 of the present invention and the simulation of the prior art;

fig. 6 is a system block diagram of embodiment 2 of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that directions or positional relationships indicated by "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like are directions or positional relationships described based on the drawings, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and operate, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that unless otherwise explicitly stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention adopts a data preprocessing mode of random combination to balance the quantity of samples of each area of a flow interception image, constructs a dense head environment as a training set of a deep aggregation detection network, the deep aggregation detection network regresses a target edge frame based on a target central point prediction method to realize target detection and quantity statistics, then combines a defined area to realize people flow density estimation according to the target quantity, and combines the close dense degree of a target central point coordinate to correct the people flow density of an indoor area, and the invention takes the head detection as an embodiment for specific description.

Example 1

As shown in fig. 1, in embodiment 1, a people stream density detection method for an indoor dense scene includes a target edge frame regression step:

s111: predicting a target center point of an input image; specifically, the target center point is a head center point obtained by prediction based on the deep aggregation detection network model of the present application.

S112: down-sampling processing is carried out on the true value of the target central point;

s113: forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value;

s114: and establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame.

Furthermore, in the process of down-sampling the true value of the target center point in step S112, since the effective value of the predicted center point of the human head will deviate due to the dispersion of the data, a local offset loss function L is used to correct the down-sampling result _O For calculating local offset

Thereby punishing and correcting the down-sampling result, and performing a loss function L _O The calculation formula of (2) is as follows:

wherein N represents the number of target center points，

Representing the local offset, p represents the true value of the target center point,

Further, the gaussian kernel function adopted in step S113 is:

wherein x and y represent the coordinates of the center point of the head,

the effective value coordinates, sigma, representing the true value of the center of the human head _p Represents the adaptive variance of the target scale. Forming a head centroid heat map based on the above Gaussian kernel function scatter effective truth values

Wherein W is the length and width of the input image, C _k The number of key points.

Further, the step S114 of establishing the local peak point of the thermodynamic diagram further includes determining the local peak point, when two gaussian kernel functions overlap for the same key point (the head center point of the target to be measured) or the same human head target, the maximum key point is retained, and the final formed thermodynamic diagram is obtained, if the two gaussian kernel functions overlap, the maximum key point is retained, and if the two gaussian kernel functions overlap for the same key point (the head center point of the target to be measured) or the same human head target, the maximum key point is obtained

Namely the key point of the detected head center point,

i.e. defined as background.

Further, after the step of performing regression processing on the target size in step S114, the method further includes:

the geometric center of the human head target obtained according to the thermodynamic diagram is

Assume the head coordinate is H:

thereby re-using the same neural network model (deep aggregation detection network) for the size of the human head target H

And (4) regression, wherein the size of the human head target does not use a normalization means, and the original pixel coordinates of the target are directly used. Meanwhile, a size regression Loss function L designed based on L1 Loss is used _S For calculating the size offset

Errors resulting from size regression were evaluated.

Wherein s is _h Representing the size of the real target person's head,

representing the predicted target head size. More specifically, the deep aggregation detection network outputs five values, including the number of head keypoints, x-coordinate offset, y-coordinate offset, size box length, and size box width. Further, the edge box of the human head target example is calculated based on deep aggregation detection network regression by combining the truth value of the thermodynamic diagram as the confidence coefficient

The invention forms a thermodynamic diagram by a Gaussian kernel function dispersion effective truth value, then establishes a thermodynamic diagram peak value point, namely, obtains a real target central point, on the basis, takes the thermodynamic diagram as a confidence coefficient, regresses a target edge frame based on a target central point prediction method, directly distributes key points to the target to be detected, can realize target detection and quantity statistics without using an anchor point frame, does not need to manually set a threshold value for target foreground and background classification, does not have the condition that a plurality of preselected frames are mutually overlapped, and does not need to use a post-processing method of non-maximum value inhibition to judge and take one of the plurality of overlapped preselected frames, thereby reducing the condition that the real target to be detected is lost due to post-processing in a scene with dense heads, being suitable for an indoor dense scene and having high detection accuracy.

In order to further illustrate the technical effect of the target edge frame regression method, the simulation diagram for predicting the human head target by adopting the non-maximum value inhibition method of the invention is shown in fig. 2, and it can be seen that compared with the human head target detection result (at the left lower part of fig. 2) of the non-maximum value inhibition method, the human head detection result (at the right lower part of fig. 2) of the method of the invention does not have the situation that the real target is lost, and the prediction precision is effectively improved. It should be further noted that all the simulation diagrams of the present invention are used to assist in explaining the technical effects of the present invention, and are not used to limit the scope of the claimed invention.

Further, aiming at the problems of uneven sample distribution and uneven head size of a single image in the existing human head data set, the invention provides an image preprocessing method, which comprises the following steps:

s101: uniformly zooming the sample image, and rotating the zoomed partial image; as a specific example, as shown in FIG. 3, four images are randomly selected from the data set for random combination, and the length and width of each image are defined as l and w, so the aspect ratio of the length to the width of the original input image is r _original W is not less than 1; randomly setting a common scaling factor k epsilon (0,10) for the four selected images]N and Q, and after four images are uniformly scaled according to the scaling factor, two images are randomly selected and rotated by 90 DEGThe rotation direction is random, and can be clockwise or anticlockwise.

S102: and scaling the image subjected to the rotation processing in an equal proportion manner to enable the length of the image subjected to the rotation processing to be equal to the width of the image not subjected to the rotation processing so as to realize image splicing, wherein the image subjected to the splicing is used as a training set of the deep layer aggregation detection network. As a specific example, two images rotated by 90 degrees are scaled again, and the length of the two images and the width of the non-rotated image are combined into a mosaic as shown in fig. 3 (b), and the aspect ratio of the newly generated image is calculated as follows:

still further, the image preprocessing method further includes an image enhancement processing step of:

s103: and carrying out data enhancement processing on the newly generated image. Specifically, the traditional geometric distortion enhancement methods such as symmetry and turning are randomly adopted for each image combination, the illumination distortion enhancement methods such as brightness, contrast, hue and saturation of the image are randomly adjusted, the preprocessed image is finally formed, and the image is sent to a deep aggregation detection network for training.

According to the invention, through carrying out uniform scaling, rotation, rescaling and splicing treatment on the images in sequence, the variability of the images is greatly increased, different areas of a single image have samples with different scales, and the preprocessed images have sample balance, so that the deep polymerization detection network has stronger robustness on the images with different environments and different crowding degrees, and the accuracy of target detection is further ensured.

Furthermore, the method also comprises a people stream density estimation step which comprises direct people stream density estimation and indirect people stream density estimation, and the indirect people stream density estimation assists the direct people stream density estimation to realize data correction of human head detection and ensure the accuracy of detection.

Specifically, the direct people flow density estimation includes:

s121: and determining the number of targets according to the target edge frames, and realizing direct people flow density estimation by combining the area of the demarcated area.

Specifically, indirect people flow density estimation includes:

s122: inputting neighborhood parameters (e, minPts), a sample distance measurement mode and a sample set D = (x 1, x2,. Eta., xm) of a target central point, clustering to obtain cluster division, forming a regional pedestrian flow density thermodynamic diagram according to the cluster division, and realizing indirect pedestrian flow density estimation.

More specifically, the clustering process specifically includes:

initializing core object sample sets

Initializing cluster number k =0, initializing unaccessed sample set Γ = D, clustering

For j =1,2.. M, all core objects are found as follows:

s1221: finding an element belonging to the sample xjxj in a distance measurement mode, wherein the element belongs to a neighborhood sub sample set N belonging to (xj);

s1222: if the number of the sub sample set samples meets the condition that | N belongs to (xj) | is more than or equal to MinPts, adding the sample xjxj into the core object sample set: Ω = Ω { × j };

s1223: if core object sample set

Then the process is ended, otherwise, the process goes to step S1224;

s1224: randomly selecting a core object o in a core object sample set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set gamma = gamma- { o };

s1225: if the current cluster core object queue

After the current cluster Ck is generated, updating cluster division C = { C1, C2,.., ck }, updating a core object sample set Ω = Ω -Ck, and turning to step S1224; otherwise, updating the core object sample set omega = omega-Ck;

s1226: taking out a core object o 'from a current cluster core object queue omega cur, finding out all epsilon-neighborhood subsample sets N epsilon (o') through neighborhood distance thresholds epsilon, enabling delta = N epsilon (o ') andUΓ, updating a current cluster sample set Ck = Ck U tau, updating an unaccessed sample set Γ = F-Delta, and updating omega cur = omega cur U phi (delta &omega) -o' until the unaccessed samples are empty sets;

s1226: the output result is cluster division C = { C1, C2,.., ck };

s1227: and finally, forming a regional people stream density thermodynamic diagram according to cluster division so as to assist in directly estimating the result.

To further illustrate the technical effects of the deep aggregation network model of the present invention, a simulation diagram of human head detection using the deep aggregation detection model of the present invention is shown in fig. 4, and comparative simulation effects of the deep aggregation detection model of the present invention (fig. 5 (c)) and the conventional neural network model overfeat-AlexNet (fig. 5 (a)) and End-to-End model (fig. 5 (b)) are shown in fig. 5. Wherein, each head target detected in fig. 4 correspondingly illustrates the credibility probability thereof, and only the head target with the credibility probability value greater than 0.55 is output in the simulation process of fig. 4; as can be seen from fig. 4 to 5, when the target edge frame regression method is applied to an indoor dense scene for human head detection, the real human head target is not lost, and the prediction accuracy is better compared with the overfeat-AlexNet and the End-to-End model.

Example 2

The embodiment has the same inventive concept as the embodiment 1, and on the basis of the embodiment 1, the system for detecting the density of people stream in an indoor dense scene comprises a deep aggregation detection network for realizing target edge frame regression, wherein the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes; and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.

As a specific embodiment, a system block diagram of the deep aggregation detection network of the present invention is shown in fig. 6, and more hopping connections and more aggregation nodes are added to make the network focus on context details in an indoor dense scenario. Specifically, on the basis of a 34-layer deep aggregation network (DLA-34), 3 × 3 aggregation nodes are added at the bottom layer of the DLA-34, 256 channels are provided, jump connections are added before the aggregation nodes and the stage (level) output header of each resolution, and finally, the prediction of the head center point is realized by adopting a 1 × 1 output volume. In the deep aggregation detection network, dense prediction is performed by adopting full convolution upsampling and hierarchical jump connection, and the resolution of a characteristic diagram is symmetrically improved by adopting an iterative deep aggregation method. It should be noted that, a hierarchy (stage) in the present application is a set of convolution modules with the same resolution, each convolution module is composed of a batch normalization layer, a convolution layer, a pooling layer, and an activation layer, and a modular design can overcome the problem of too complex network by grouping and copying; combining a plurality of network layers into a convolution module, combining the convolution module into layers according to the characteristic resolution, wherein semantic fusion generally occurs in the layers, and spatial fusion generally occurs between the layers; the stacked convolution modules in the network are classified into layers according to the resolution, deeper layers have more semantic information but more coarse spatial information, and jump connections from shallow layers to deep layers are added to be capable of fusing the scale and the resolution, and the jump connections are called iterative deep aggregation connections in the patent. The aggregation node has the main function of combining and compressing the input of the nodes, selects interested important information to project through training, and then outputs the same scale characteristic as the input dimension. The aggregation node structure used in the present invention is sequentially connected convolution layer + batch normalization layer (BN) + nonlinear activation function ReLU matching the stage output resolution. The up-sampling (Upsample 2 x) of the present invention uses 2 × 2 full convolution layers for up-sampling.

Further, the system also comprises a preprocessing unit and a people stream density estimation unit;

the preprocessing unit is used for uniformly zooming the sample image and performing rotation processing on the zoomed partial image; and scaling the images subjected to the rotation processing in an equal proportion manner to enable the length of the images subjected to the rotation processing to be equal to the width of the images not subjected to the rotation processing so as to realize image splicing, and inputting the spliced images serving as training sets into a deep aggregation detection network.

Furthermore, the preprocessing unit further comprises an image enhancement processing module for performing data enhancement processing on the spliced newly generated image; specifically, a traditional geometric distortion enhancement method such as symmetry and turning is adopted, an illumination distortion enhancement method such as adjusting image brightness, contrast, hue and saturation is randomly used, a preprocessed image is finally formed, and the image is sent to a deep polymerization detection network for training.

Specifically, the people stream density estimation unit comprises a direct people stream density estimation module and an indirect people stream density estimation module, and is used for carrying out people stream density estimation according to a target edge frame output by the deep aggregation detection network;

Example 3

The present embodiment provides a storage medium, which has the same inventive concept as embodiment 1, and has stored thereon computer instructions, and when the computer instructions are executed, the steps of the method for detecting people stream density in an indoor dense scene in embodiment 1 are performed.

Based on such understanding, the technical solution of the present embodiment or parts of the technical solution may be essentially implemented in the form of a software product, which is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Example 3

The present embodiment also provides a terminal, which has the same inventive concept as that in embodiment 1, and includes a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the steps of the people stream density detection method for the indoor dense scene in embodiment 1 when executing the computer instructions. The processor may be a single or multi-core central processing unit or a specific integrated circuit, or one or more integrated circuits configured to implement the present invention.

Each functional unit in the embodiments provided by the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above detailed description is for the purpose of describing the invention in detail, and it is not intended that the invention be limited to the specific embodiments described, and it should be understood that various modifications and substitutions can be made by one skilled in the art without departing from the spirit of the invention.

Claims

1. A people stream density detection method for an indoor dense scene is characterized by comprising the following steps: the method comprises a target edge frame regression step:

predicting a target center point of an input image;

performing down-sampling processing on the true value of the target center point;

forming a target central point thermodynamic diagram by adopting a Gaussian kernel function dispersion effective truth value;

establishing a local peak point of the thermodynamic diagram, performing regression processing on the target size by taking the thermodynamic diagram as a confidence coefficient, and outputting a target edge frame;

the method also comprises a people stream density estimation step, which comprises direct people stream density estimation and indirect people stream density estimation;

the direct people stream density estimation comprises: determining the number of targets according to the target edge frame, and realizing direct people flow density estimation by combining the area of a delimited area;

the indirect people flow density estimation comprises: inputting neighborhood parameters (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, clustering to obtain cluster division, forming a regional pedestrian flow density thermodynamic diagram according to the cluster division, and realizing indirect pedestrian flow density estimation;

the clustering process specifically includes:

initializing a core object sample set

Initializing cluster number k =0, initializing unvisited sample set Γ = D, clustering

For j =1,2.. M, all core objects are found as follows:

s1221: finding an element-neighborhood subsample set N element (xj) of the sample xj in a distance measurement mode;

s1222: if the number of the samples in the subsample set meets N epsilon (xj) | more than or equal to MinPts, adding the samples xj into the core object sample set: Ω = Ω { × j };

s1223: if core object sample set

Then the process is ended, otherwise, the process goes to step S1224;

s1224: randomly selecting a core object o in a core object sample set omega, initializing a current cluster core object queue omega cur = { o }, initializing a class serial number k = k +1, initializing a current cluster sample set Ck = { o }, and updating an unaccessed sample set Γ = Γ - { o };

s1225: if the current cluster core object queue

After the current cluster Ck is generated, updating cluster division C = { C1, C2, ·, ck }, updating a core object sample set Ω = Ω -Ck, and going to step S1224; otherwise, updating the core object sample set omega = omega-Ck;

s1226: taking out a core object o 'from a current cluster core object queue Ω cur, finding out all an e-neighborhood subsample set N e (o') through a neighborhood distance threshold e, letting Δ = N e (o ') nΓ, updating a current cluster sample set Ck = Ck ≧ Ck ^ Δ, updating an unvisited sample set Γ = Γ - Δ, and updating Ω cur = Ω cur § u (Δ ≧ Ω) -o' until the unvisited sample is an empty set;

s1226: the output result is cluster division C = { C1, C2,.., ck };

s1227: finally, forming a regional people flow density thermodynamic diagram according to cluster division so as to assist in directly estimating a result;

the step of establishing the local peak point of the thermodynamic diagram further comprises the step of determining the local peak point:

when two Gaussian kernel functions of the same target are overlapped, taking the maximum key point of the target as a local peak point;

the method further comprises an image preprocessing step:

uniformly zooming the sample image, and rotating the zoomed partial image;

scaling the image subjected to the rotation processing in an equal proportion to enable the length of the image subjected to the rotation processing to be equal to the width of the image not subjected to the rotation processing so as to realize image splicing, wherein the spliced image is used as a training set of a deep aggregation detection network;

performing data enhancement processing on the spliced images;

aspect ratio r of the stitched image _new The calculation method is as follows:

wherein k represents a common scaling factor; l represents the length of the image; w represents the length of the image.

2. The people stream density detection method for the indoor dense scene according to claim 1, characterized in that: the down-sampling processing of the target central point truth value further comprises the steps of punishing and correcting the sampling result:

using local offset loss function L _O Calculating a local offset, loss function L _O The calculation formula of (2) is as follows:

wherein N represents the number of target center points,

indicating a local offset, p a true value of the target center point,

3. The people stream density detection method of the indoor dense scene according to claim 1, characterized in that: the step of regression processing of the target size further comprises:

calculating the target size offset by using a size regression loss function pair, wherein the calculation formula of the size regression loss function is as follows:

wherein s is _h Which is representative of the size of the real object,

representing the predicted target size.

4. The system for detecting people stream density in indoor dense scenes according to any one of claims 1 to 3, characterized in that: the system comprises a deep aggregation detection network for realizing target edge frame regression, wherein the deep aggregation detection network comprises a 34-layer structure deep aggregation network and a plurality of newly added aggregation nodes;

and the newly added aggregation nodes are arranged at the bottom layer of the 34-layer structure deep aggregation network and are respectively in corresponding jumping connection with the output aggregation nodes corresponding to different resolution levels in the 34-layer structure deep aggregation network.

5. The people stream density detection system of the indoor dense scene as claimed in claim 4, wherein: the system also comprises a preprocessing unit and a people stream density estimation unit;

the preprocessing unit is used for uniformly zooming the sample image and rotating the zoomed partial image; scaling the images subjected to rotation processing in an equal proportion manner to enable the length of the images subjected to rotation processing to be equal to the width of the images not subjected to rotation processing so as to realize image splicing, and inputting the spliced images serving as training sets into a deep aggregation detection network;

the direct people stream density estimation module determines the number of targets according to the target edge frame and realizes direct people stream density estimation by combining the area of a delimited area; the indirect people flow density estimation module is used for obtaining cluster division after clustering according to an input neighborhood parameter (epsilon, minPts), a sample distance measurement mode and a sample set of a target central point, and forming a regional people flow density thermodynamic diagram according to the cluster division to realize indirect people flow density estimation.

6. A storage medium having computer instructions stored thereon, characterized in that: the computer instructions when executed perform the steps of the people stream density detection method of the indoor dense scene as claimed in any one of claims 1 to 3.

7. A terminal comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, characterized in that: the processor executes the computer instructions to execute the steps of the people stream density detection method of the indoor dense scene in any one of claims 1-3.