CN112052833B

CN112052833B - Object density monitoring system, method, video analysis server and storage medium

Info

Publication number: CN112052833B
Application number: CN202011033404.2A
Authority: CN
Inventors: 徐海俊; 孙新; 曹李军; 章勇; 张琦; 蒲一超
Original assignee: Shanghai Lingshi Communication Technology Development Co ltd; Suzhou Keda Technology Co Ltd
Current assignee: Shanghai Lingshi Communication Technology Development Co ltd; Suzhou Keda Technology Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2023-04-07
Anticipated expiration: 2040-09-27
Also published as: CN112052833A

Abstract

The application relates to an object density monitoring system, a method, a video analysis server and a storage medium, belonging to the technical field of Internet of things, wherein the system comprises: a plurality of cameras; a gateway in communication connection with each camera transmits the monitoring pictures acquired by the cameras to a video analysis server; a video analysis server which is in communication connection with the gateway receives the monitoring picture transmitted by the gateway; inputting the monitoring picture into a prestored object density monitoring model to output object density information; the object density monitoring model is obtained by model cutting; a service platform which is in communication connection with the video analysis server receives and displays the object density information of the video analysis server; the problem that when a convolutional neural network is used for object density monitoring, the requirement of a system on equipment is high, and the requirement of multi-channel video real-time analysis is difficult to meet under the conditions of low equipment and low power consumption can be solved; the hardware requirement of the system on the equipment can be reduced, and the monitoring requirement of multiple paths of videos is met.

Description

Object density monitoring system, method, video analysis server and storage medium

Technical Field

The application relates to an object density monitoring system and method, a video analysis server and a storage medium, and belongs to the technical field of Internet of things.

Background

The high crowd density has become a great safety threat to modern cities. People group density is controlled through a technical means, and high-density people are early warned, so that the method becomes a research hotspot in the field of intelligent security.

The existing crowd density monitoring system usually uses a convolutional neural network method to regress a crowd density graph.

However, the convolutional neural network has a large number of network parameters and occupies a high device resource during operation, which leads to a problem that the crowd density monitoring system has a high requirement on devices.

Disclosure of Invention

The application provides an object density monitoring system, an object density monitoring method, a video analysis server and a storage medium, which can solve the problem that when the existing crowd density monitoring system uses a convolutional neural network to monitor the object density, the system has high requirements on equipment and is difficult to meet the requirement of multi-channel video real-time analysis under the conditions of low equipment and low power consumption. The application provides the following technical scheme:

in a first aspect, there is provided an object density monitoring system, the system comprising:

the cameras are used for collecting monitoring pictures in a monitoring scene;

the gateway is in communication connection with each camera and transmits the monitoring pictures acquired by the cameras to the video analysis server;

the video analysis server is in communication connection with the gateway and receives the monitoring picture transmitted by the gateway; inputting the monitoring picture into a prestored object density monitoring model to output object density information; the object density monitoring model is obtained by model cutting;

and the service platform is in communication connection with the video analysis server and receives and displays the object density information of the video analysis server.

Optionally, the training process of the object density monitoring model includes:

training a reference network by using training data to obtain an initial network model, wherein the training data comprises a plurality of object images with objects and object labeling information of each object image;

clipping the initial network model;

training the cut model by using the training data until the difference between the accuracy of the model obtained by training and the accuracy of the model before cutting is smaller than the accuracy threshold value, and obtaining the cut model after training;

if the parameter number of the trimmed model is larger than a preset threshold value, re-trimming the trained trimmed model, and executing training on the model after re-trimming by using the training data until the difference between the accuracy of the model obtained by training and the model before the trimming at this time is smaller than the accuracy threshold value, so as to obtain the trained trimmed model;

and if the parameter number of the cut model is less than or equal to the preset threshold value, determining the trained cut model as the object density monitoring model.

Optionally, the training data comprises raw acquisition data and augmented data of the raw acquisition data;

the augmentation data refers to data obtained after data augmentation operation is carried out on the original collected data.

Optionally, the data augmentation operation comprises an image pyramid based scaling operation.

Optionally, the object density monitoring model is used for identifying an object in the monitoring picture to obtain a density heat value map of the object; the object density information comprises the density heat value map;

the object labeling information comprises an object density heat value graph obtained by convolving an object label in each object image with a Gaussian kernel function;

the Gaussian kernel function is used for labeling the object with and (4) performing convolution on the Gaussian kernel to obtain the object density heat value map.

Optionally, the clipping the initial network model includes:

determining a specific convolution kernel parameter among all convolution kernel parameters of the initial network model, wherein the specific convolution kernel parameter is a convolution kernel parameter which minimizes loss variation of the initial network model;

clipping the particular convolution kernel parameter.

Optionally, the determining a specific convolution kernel parameter among all convolution kernel parameters of the initial network model includes:

for each convolution kernel parameter, calculating a first loss value before clipping the convolution kernel parameter based on Taylor expansion;

calculating a second loss value after clipping the convolution kernel parameter based on a Taylor expansion;

determining a loss value change amount based on the first loss value and the second loss value;

and determining the specific convolution kernel parameter with the minimum loss value change amount from the convolution kernel parameters.

Optionally, the preset threshold is obtained by regularization calculation based on the floating point operation times FLOPs.

Optionally, the reference network supports processing of input images of different scales.

In a second aspect, an object density monitoring method is provided, which is used in a video resolution server in the object density monitoring system provided in the first aspect, and the method includes:

receiving a monitoring picture transmitted by the gateway;

inputting the monitoring picture into a prestored object density monitoring model to output object density information; the object density monitoring model is obtained by model cutting;

and sending the object density information to the service platform for display.

In a third aspect, a video parsing server is provided, the video parsing server comprising a processor and a memory; the memory stores a program that is loaded and executed by the processor to implement the object density monitoring method of the second aspect.

In a fourth aspect, there is provided a computer-readable storage medium having a program stored therein, the program being loaded and executed by the processor to implement the object density monitoring method of the second aspect.

The beneficial effect of this application lies in: analyzing the monitoring picture through an object density monitoring model in a video analysis server to obtain object density information; the object density information is sent to a service platform for display, so that the problems that when the conventional crowd density monitoring system uses a convolutional neural network for object density monitoring, the system has high requirements on equipment and is difficult to meet the requirement of multi-channel video real-time analysis under the conditions of low equipment and low power consumption are solved; because the object density monitoring model is obtained after model cutting, the equipment resources consumed during model operation can be reduced, the hardware requirements of the system on the equipment are reduced, and the monitoring requirements of multiple paths of videos are met.

In addition, model cutting is carried out on the initial network model obtained through training, and the model after cutting is trained, so that the model precision can be ensured while the number of parameters of the model is reduced.

In addition, the scale transformation operation based on the image pyramid has strong applicability to the combination of CSRnet and model cutting, and the augmented data obtained by the scale transformation operation based on the image pyramid can enable the accuracy of the model after cutting to be almost consistent with the accuracy of the network model before cutting, so that the modeling efficiency of the object density monitoring model can be improved.

In addition, the object density monitoring model established based on the basic network does not limit the size of the input image, supports the analysis of monitoring pictures with different sizes, and improves the universality of image analysis.

In addition, the density of the object is displayed through the density heat value diagram, so that the display effect of the density of the object is more visual.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical solutions of the present application more clear and clear, and to implement the technical solutions according to the content of the description, the following detailed description is made with reference to the preferred embodiments of the present application and the accompanying drawings.

Drawings

FIG. 1 is a schematic diagram of an object density monitoring system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an image pyramid provided in one embodiment of the present application;

FIG. 3 is a schematic diagram of a modeling process of an object density monitoring model provided by an embodiment of the present application;

FIG. 4 is a flowchart of an object density monitoring method according to an embodiment of the present application;

FIG. 5 is a block diagram of an object density monitoring apparatus provided in one embodiment of the present application;

fig. 6 is a block diagram of a video parsing server provided in an embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application will be described in conjunction with the accompanying drawings and examples. The following examples are intended to illustrate the present application but are not intended to limit the scope of the present application.

Fig. 1 is a schematic structural diagram of an object density monitoring system according to an embodiment of the present application, and as shown in fig. 1, the system at least includes: a plurality of cameras 11, a gateway 12 communicatively coupled to each camera 11, a video analytics server 13 communicatively coupled to the gateway 12, and a service platform 14 communicatively coupled to the video analytics server 13.

The camera 11 is used for collecting monitoring pictures in a monitoring scene. Optionally, the camera 11 is a network high definition camera; in other implementation manners, the camera 11 may also be a coaxial high-definition camera, and the present embodiment does not limit the device type of the camera 11.

The gateway 12 is configured to transmit the monitoring picture acquired by the camera 11 to the video parsing server 13. The gateway 12 implements network interconnection above the network layer, and is a complex network interconnection device. In this embodiment, the gateway 12 may be used for interconnection of both wan and lan; in other words, the camera 11 and the video resolution server 13 may be interconnected based on a wide area network, or may be interconnected based on a local area network.

The video resolution server 13 may be a server cluster formed by a plurality of server hosts; alternatively, a single server host may be used, and the implementation of the video parsing server 13 is not limited in this embodiment. In this embodiment, the video parsing server 13 is configured to receive a monitoring picture transmitted by the gateway 12; the monitoring screen is input to a pre-stored object density monitoring model 131 to output object density information. The object density monitoring model 131 is obtained by model cutting.

Model clipping refers to a process of reducing model structure and/or model parameters.

The object density information is used to reflect the density of objects within the current monitored scene. Alternatively, the object may be a person, an animal, or the like that can move within the monitored scene, and the present embodiment does not limit the type of the object.

In one example, the object density monitoring model is obtained by pre-training the video parsing server 13, and the training process of the object density monitoring model includes: training a reference network by using training data to obtain an initial network model, wherein the training data comprises a plurality of object images with objects and object labeling information of each object image; cutting the initial network model; training the cut model by using training data until the difference between the accuracy of the model obtained by training and the model before cutting is smaller than the accuracy threshold value, and obtaining the cut model after training; if the parameter number of the model after cutting is larger than the preset threshold value, cutting the trained cutting model again, and training the model after cutting again by using training data until the difference between the accuracy of the model obtained by training and the model before cutting is smaller than the accuracy threshold value, so as to obtain the trained cutting model; and if the parameter number of the cut model is less than or equal to a preset threshold value, determining the trained cut model as an object density monitoring model.

The accuracy of the model is the percentage of the number of correct predictions of the model to the total number of samples. In the model training process, the accuracy of the model can be obtained based on the prediction result of the model every time training data is input into the model.

The training data comprises a plurality of groups, and each group of training data comprises an object image with an object and object labeling information of the object image.

In one example, the object density monitoring model is used for identifying an object in a monitoring picture to obtain a density heat value map of the object; the object density information includes a density heat value map. At this time, the object labeling information includes an object density hot value map obtained by convolving the object label in each object image with a gaussian kernel function. And the Gaussian kernel function is used for convolving the object label and the Gaussian kernel to obtain an object density heat value map.

In one example, the gaussian kernel function is an adaptive gaussian kernel function applied to a high-density population, which is proposed by a Multi-column Convolutional Neural Network (MCNN), and the formula is as follows:

wherein x is _i The position of the marked object (namely the object label) in the current monitoring picture; n total number of objects in the object image;

means having a standard deviation parameter σ _i Is selected based on the convolution kernel of (4)>

Is a distance x _i The average distance, β, of the nearest m objects is a constant, such as 3.

According to the self-adaptive Gaussian kernel function, after the target image is labeled, the target label obtained by labeling is input into the self-adaptive Gaussian kernel function, and the corresponding target density heat value image can be obtained.

In this embodiment, the reference network of the object density monitoring model supports processing of input images of different scales. In one example, the reference network is CSRnet, and the front end network of CSRnet is VGG-16; the size of the image output by the front-end network is 1/8 of that of the original image; the back-end network is a hole convolution, and the reception field can be expanded while the resolution is kept.

Optionally, the plurality of sets of training data include raw collected data and augmented data of the raw collected data; the augmentation data refers to data obtained by performing data augmentation operation on the original acquired data.

The original collected data refers to data sent by other devices and received by the video analysis server 13, or data read from a storage medium; in other words, the raw acquisition data is not subjected to the data augmentation operation. The original acquisition data can also be a plurality of groups, and each group of original acquisition data comprises an object image and object labeling information of the object image.

The augmentation data may be a plurality of sets, and each set of augmentation data includes an object image and object labeling information of the object image. Optionally, the data augmentation operation comprises an image pyramid based scaling operation. Such as: and reducing and amplifying the same monitoring picture by a bilinear interpolation method to obtain image data with different sizes, wherein the image data with different sizes are reduced and amplified according to a pyramid shape. Referring to the image pyramid shown in fig. 2, it can be known from fig. 2 that the size of the image data decreases from bottom to top layer by layer.

In other embodiments, the data augmentation operation may further include a rotation operation, a lighting adjustment operation, and/or a contrast adjustment operation, and the like, and the present embodiment does not limit the type of the data augmentation operation.

In this embodiment, the model clipping means: a process of determining a specific convolution kernel parameter among all the parameters of the convolution kernels of the initial network model to minimize a loss variation of the initial network model. Based on the method, the initial network model is cut, and the method comprises the following steps: determining a specific convolution kernel parameter in all convolution kernel parameters of the initial network model, wherein the specific convolution kernel parameter is the convolution kernel parameter which enables the loss change of the initial network model to be minimum; the particular convolution kernel parameter is clipped.

Correspondingly, the trained clipping model is clipped again, and the method comprises the following steps: determining a specific convolution kernel parameter in all convolution kernel parameters of the trained clipping model, wherein the specific convolution kernel parameter is the convolution kernel parameter which enables the loss change of the trained clipping model to be minimum; the particular convolution kernel parameter is clipped.

In one example, determining a particular convolution kernel parameter among all convolution kernel parameters of the initial network model (or the trained clipping model) includes: for each convolution kernel parameter, calculating a first loss value before clipping the convolution kernel parameter based on Taylor expansion; calculating a second loss value after clipping the convolution kernel parameter based on a Taylor expansion; determining a loss value variation based on the first loss value and the second loss value; the specific convolution kernel parameter with the minimum loss value variation amount is determined from the various convolution kernel parameters.

Such as: h is a total of _i Representing the output of the convolution kernel parameter i, h _i Is represented by the following formula:

the loss value variation is represented by the following formula:

|ΔC(h _i )|＝|C(D|h _i ＝0)-C(D,h _i )|；

wherein, C (D | h) _i = 0) represents h _i Second clipped loss value, C (D, h) _i ) Represents h _i First loss value before being clipped.

The taylor expansion at x = a for the function f (x) is:

wherein f is ^(p) (a) The p-order derivative of f (x) at a. Based on this, C (D | h) _i = 0) in h _i The first order taylor expansion at =0 is approximated as:

accordingly, the loss value variation | Δ C (h) _i ) Using a first order taylor expansion, the resulting formula is referenced by:

for multi-feature variable output, the above loss value variation formula is converted into:

where M is the length of the characteristic variable.

Optionally, since the feature weight of some network layers is larger, in order to prevent the overfitting phenomenon from getting worse due to parameter clipping, L2 normalization may be performed on the features of each layer of the network, and the normalization process is expressed by the following formula:

optionally, the preset threshold is calculated based on floating point operation times FLOPs regularization. Such as: and calculating the quantity of the parameters needing to be cut down based on the FLOPs regularization, and subtracting the quantity of the parameters needing to be cut down by using the quantity of the parameters of the initial network model to obtain a preset threshold value.

Wherein, the number of parameters to be cut off for each layer of the convolutional network is represented by the following formula:

h is the height of the characteristic diagram input into the current layer of the convolutional network, W is the width of the characteristic diagram input into the current layer of the convolutional network, C _in Inputting the channel number of a feature map of a current layer of convolutional network; k is the size of the convolution kernel of the current layer of the convolution network; c _out Is a current layerAnd the number of channels of the feature map output by the convolution network.

In other embodiments, the preset threshold may also be a fixed value, such as: 2/3 of the model parameters, etc., the setting mode of the preset threshold is not limited in this embodiment.

In order to more clearly understand the modeling process of the object density monitoring model provided in the present application, the modeling process is described below with an example in which an object is taken as an example. Referring to fig. 3, the modeling process includes at least the following steps:

step 31, collecting crowd images, and marking heads in the crowd images to obtain original collected data;

it should be added that, this example is described by taking the example of marking human heads, and in actual implementation, other positions may also be marked, for example: the position of the whole person, the upper body of the person, and the like, and the labeling position is not limited in this embodiment.

Step 32, performing data augmentation operation on the original collected data to obtain augmented data;

the data augmentation operation comprises image scale transformation based on an image pyramid;

of course, the data augmentation operation may also include a rotation operation, an illumination intensity adjustment operation, and/or a contrast adjustment operation.

Step 33, training a basic network based on training data to obtain an initial network model, wherein the training data comprises original acquisition data and augmentation data;

the initial network model includes an initial network structure and initial network parameters.

Step 34, cutting the initial network model to obtain a cut model;

the clipped model comprises a clipped network structure and clipped network parameters.

Step 35, training the cut model by using the training data until the difference between the accuracy of the cut model and the accuracy of the network model before the cutting is smaller than an accuracy threshold value, and obtaining the trained cut model;

it should be added that, in this embodiment, the applicability of the scale transformation operation based on the image pyramid to the combination of CSRnet and the above model clipping is strong, and the augmented data obtained by the scale transformation operation based on the image pyramid can make the accuracy of the clipped model almost consistent with the accuracy of the network model before clipping.

Step 36, determining whether the parameter number of the cut model is less than or equal to a preset threshold value; if so, determining the trained cutting model as an object density monitoring model, and ending the process; if not, go to step 37.

Step 37, cutting the trained cutting model again, and executing step 35 again; in this case, the clipped model in step 35 is a clipped model obtained by clipping again.

The service platform 14 may be implemented on a computer, a mobile phone, a tablet computer, and the like, and the implementation manner of the service platform 14 is not limited in this embodiment. In this embodiment, the service platform 14 is configured to receive and display object density information of the video parsing server 13.

Optionally, the service platform 14 includes an alarm device 141, and outputs an alarm signal through the alarm device 141.

In one example, an alarm rule is configured in the service platform 14, and the alarm device 141 is triggered to output an alarm when the object density information satisfies the alarm rule.

Optionally, the alert rules include, but are not limited to: setting an alarm threshold value in an alarm area needing to be alarmed in a monitoring scene, and outputting an alarm when the number of objects in the alarm area is greater than or equal to the alarm threshold value; or, setting an alarm threshold for all monitored scenes, and outputting an alarm when the number of objects in the monitored scene is greater than or equal to the alarm threshold, where of course, the alarm rule may also be other rules, and the setting manner of the alarm rule is not limited in this embodiment.

Alternatively, the alert signal may be an audio signal, and accordingly, the alert device 141 is an audio player; and/or the alarm signal is an optical signal, and accordingly, the alarm device 141 is an indicator light; and/or the alarm signal is a vibration signal, and accordingly, the alarm device 141 is a vibrator; the present embodiment does not limit the implementation manner of the alarm signal and the alarm device.

In summary, in the object density monitoring system provided in this embodiment, the monitoring picture is analyzed through the object density monitoring model in the video analysis server to obtain the object density information; the object density information is sent to a service platform for display, so that the problems that when the conventional crowd density monitoring system uses a convolutional neural network to monitor the object density, the system has high requirements on equipment, and the requirements of multi-channel video real-time analysis are difficult to meet under the conditions of low equipment and low power consumption are solved; because the object density monitoring model is obtained after model cutting, the equipment resources consumed during model operation can be reduced, the hardware requirements of the system on the equipment are reduced, and the requirements of multiple paths of videos are met.

In addition, by cutting the trained model and training the cut model, the number of parameters of the model is reduced, and the precision of the model can be ensured.

In addition, by arranging the alarm device in the service platform, the alarm can be realized when the object density is high.

Fig. 4 is a flowchart of an object density monitoring method according to an embodiment of the present application, where the method is applied to the object density monitoring system shown in fig. 1, and an execution subject of each step is described as an example of the video parsing server 13 in the system. The method at least comprises the following steps:

step 401, receiving a monitoring picture transmitted by a gateway.

The monitoring picture is collected by the camera and sent by the gateway.

Step 402, inputting the monitoring picture into a pre-stored object density monitoring model to output object density information.

The object density monitoring model is obtained through model clipping, and the model clipping refers to a process of reducing a model structure and/or model parameters.

Optionally, the training process of the object density monitoring model includes: training a reference network by using training data to obtain an initial network model, wherein the training data comprises a plurality of object images with objects and object labeling information of each object image; cutting the initial network model; training the cut model by using training data until the difference between the accuracy of the model obtained by training and the model before cutting is smaller than the accuracy threshold value, and obtaining the cut model after training; if the parameter number of the model after cutting is larger than the preset threshold value, cutting the trained cutting model again, and training the model after cutting again by using training data until the difference between the accuracy of the model obtained by training and the model before cutting is smaller than the accuracy threshold value, so as to obtain the trained cutting model; and if the parameter number of the cut model is less than or equal to a preset threshold value, determining the trained cut model as an object density monitoring model.

Optionally, the training data comprises raw acquisition data and augmented data of the raw acquisition data; the augmentation data refers to data obtained by performing data augmentation operation on the original acquired data.

In one example, the data augmentation operation includes an image pyramid-based scaling operation.

Optionally, the tailoring of the initial network model includes: determining a specific convolution kernel parameter in all convolution kernel parameters of the initial network model, wherein the specific convolution kernel parameter is the convolution kernel parameter which enables the loss change of the initial network model to be minimum; clipping specific convolution kernel parameters.

Optionally, determining a specific convolution kernel parameter among all convolution kernel parameters of the initial network model includes: for each convolution kernel parameter, calculating a first loss value before clipping the convolution kernel parameter based on Taylor expansion; calculating a second loss value after clipping the convolution kernel parameter based on a Taylor expansion; determining a loss value variation based on the first loss value and the second loss value; the specific convolution kernel parameter with the minimum loss value variation amount is determined from the various convolution kernel parameters.

Optionally, the preset threshold is calculated based on floating point operation times FLOPs regularization.

Optionally, the object density monitoring model is used for identifying an object in the monitoring picture to obtain a density heat value map of the object; the object density information includes a density thermal value map; the object labeling information comprises an object density heat value graph obtained by convolving an object label in each object image with a Gaussian kernel function; and the Gaussian kernel function is used for convolving the object label with the Gaussian kernel to obtain an object density heat value image.

Optionally, the reference network of the object density monitoring model supports processing of input images of different scales. In one example, the reference network is CSRnet, the front-end network of CSRnet is VGG-16, and the back-end network is a hole convolution.

And step 403, sending the object density information to a service platform for display.

For details of this embodiment, reference is made to the above system embodiment, and this embodiment is not described herein again.

In summary, the object density monitoring method provided in this embodiment receives the monitoring picture transmitted by the gateway; inputting the monitoring picture into a pre-stored object density monitoring model to output object density information; sending the object density information to a service platform for display; the problem that when the conventional crowd density monitoring system uses a convolutional neural network to monitor the density of an object, the system has high requirements on equipment and is difficult to meet the requirements of multi-channel video real-time analysis under the conditions of low equipment and low power consumption can be solved; because the object density monitoring model is obtained after model cutting, the equipment resources consumed during model operation can be reduced, the hardware requirements of the system on the equipment are reduced, and the requirements of multiple paths of videos are met.

In addition, model cutting is carried out on the trained model, and the cut model is trained, so that the number of parameters of the model can be reduced, and the precision of the model can be ensured.

In addition, the object density monitoring model established based on the basic network does not limit the size of the input image, supports the analysis of monitoring pictures with different sizes and improves the universality of image analysis.

Fig. 5 is a block diagram of an object density monitoring apparatus according to an embodiment of the present application, and this embodiment is described by taking an example in which the apparatus is applied to the video resolution server 13 in the object density monitoring system shown in fig. 1. The device at least comprises the following modules: an image receiving module 510, an image analyzing module 520, and an information transmitting module 530.

An image receiving module 510, configured to receive the monitoring picture transmitted by the gateway;

an image analysis module 520, configured to input the monitoring picture into a pre-stored object density monitoring model to output object density information; the object density monitoring model is obtained by model cutting;

an information sending module 530, configured to send the object density information to the service platform for display.

For relevant details reference is made to the above-described method embodiments.

It should be noted that: in the object density monitoring apparatus provided in the above embodiment, when performing object density monitoring, only the division of the functional modules is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the object density monitoring apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the object density monitoring device and the object density monitoring method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 6 is a block diagram of a video parsing server provided in an embodiment of the present application. The video resolution server comprises at least a processor 601 and a memory 602.

Processor 601 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 602 may include one or more computer-readable storage media, which may be non-transitory. Memory 602 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 602 is used to store at least one instruction for execution by processor 601 to implement the object density monitoring method provided by the method embodiments herein.

In some embodiments, the video parsing server may further optionally include: a peripheral interface and at least one peripheral. The processor 601, memory 602 and peripheral interface may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface via a bus, signal line, or circuit board. Illustratively, peripheral devices include, but are not limited to: radio frequency circuit, touch display screen, audio circuit, power supply, etc.

Of course, the video parsing server may also include fewer or more components, which is not limited by this embodiment.

Optionally, the present application further provides a computer-readable storage medium, where a program is stored, and the program is loaded and executed by a processor to implement the object density monitoring method of the foregoing method embodiment.

Optionally, the present application further provides a computer product, which includes a computer-readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the object density monitoring method of the above-mentioned method embodiment.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. An object density monitoring system, the system comprising:

the cameras are used for collecting monitoring pictures in a monitoring scene;

the service platform is in communication connection with the video analysis server and receives and displays the object density information of the video analysis server;

the training process of the object density monitoring model comprises the following steps:

cutting specific convolution kernel parameters in the initial network model one by one, wherein the specific convolution kernel parameters are convolution kernel parameters which enable the loss variation of the initial network model to be minimum;

2. The system of claim 1, wherein the training data comprises raw acquisition data and augmented data of the raw acquisition data;

3. The system of claim 2, wherein the data augmentation operation comprises an image pyramid-based scaling operation.

4. The system of claim 1,

the object density monitoring model is used for identifying the object in the monitoring picture to obtain a density heat value graph of the object; the object density information comprises the density heat value map;

and the Gaussian kernel function is used for convolving the object label with a Gaussian kernel to obtain the object density heat value image.

5. The system of claim 1, wherein the tailoring the initial network model comprises:

determining a specific convolution kernel parameter from all convolution kernel parameters of the initial network model;

clipping the particular convolution kernel parameter.

6. The system of claim 5, wherein determining a particular convolution kernel parameter among all convolution kernel parameters of the initial network model comprises:

the specific convolution kernel parameter with the minimum loss value variation amount is determined from the various convolution kernel parameters.

7. The system of claim 1, wherein the predetermined threshold is calculated based on floating point operations times FLOPs regularization.

8. The system of claim 1, wherein the reference network supports processing of input images at different scales.

9. An object density monitoring method used in a video resolution server in the object density monitoring system according to any one of claims 1 to 8, the method comprising:

receiving a monitoring picture transmitted by the gateway;

and sending the object density information to the service platform for display.

10. A video analytics server, wherein the video analytics server comprises a processor and a memory; the memory has stored therein a program that is loaded and executed by the processor to implement the object density monitoring method of claim 9.

11. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when being executed by a processor, is adapted to carry out the object density monitoring method according to claim 9.