CN114627269A

CN114627269A - Virtual reality security protection monitoring platform based on degree of depth learning target detection

Info

Publication number: CN114627269A
Application number: CN202210240851.8A
Authority: CN
Inventors: 李南希; 韩芳; 王青云
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2022-03-10
Filing date: 2022-03-10
Publication date: 2022-06-14

Abstract

A virtual reality security monitoring platform based on deep learning target detection belongs to the field of intelligent security detection. The weather classification network is used for performing weather classification on input pictures, calling different image enhancement models according to output classification results, wherein the image enhancement models comprise a dim light enhancement model, a rain removal model and a defogging model, and improving the partial models to enable the partial models to have better processing effect and meet the real-time application requirements. The output of the image enhancement model is used as the input of a Yolov5 network for target detection, and the feature extraction network of the Yolov5 target detection network is improved to be lighter so as to meet the real-time requirement of the system. The system terminal is developed based on a UE4 engine, a three-dimensional virtual interface is provided, an algorithm model is deployed on a Web server, an algorithm detection result is returned to the terminal through an http protocol for displaying, and the visual interface display effect can enable security personnel to check and position campus real-time monitoring information more quickly.

Description

Virtual reality security protection monitoring platform based on degree of depth learning target detection

Technical Field

The invention belongs to the technical field of intelligent security detection, and particularly relates to a virtual reality security monitoring platform based on deep learning target detection.

Background

Campus security is increasingly becoming a focus of social concern. The traditional campus security means mainly comprises people's air defense and physical defense, and security management is carried out through personnel patrol and entity precaution. With the continuous progress of scientific technology, technical protection becomes supplement and extension of the traditional security means, and the efficient combination of the informatization technology and campus security is an important ring for improving the strength of campus security.

The traditional campus security monitoring management system only displays real-time monitoring pictures on a two-dimensional interface in a unified manner. The system is more suitable for reviewing monitoring pictures, is not beneficial to real-time inspection of security personnel, and cannot effectively improve the campus security management efficiency. The YOLO series target detection algorithm has high detection precision and detection speed, but cannot be well adapted to target detection tasks under severe weather conditions and low light conditions. A virtual scene of a real campus is built based on a UE4 engine, and then an improved target detection algorithm is used for carrying out target detection on real-time monitoring videos, so that the method is suitable for target detection tasks under various severe weather and low-light conditions, and real-time video data are transmitted back to a virtual campus platform, so that real-time monitoring can be carried out in the virtual scene. The system has high real-time performance, intuitive interface display and friendly man-machine interaction, and is beneficial to large-scale application.

The real-time video monitoring system based on the BS framework utilizes a network IP camera to collect real-time video data, utilizes an RTMP/RTSP protocol to transmit video streams to a background video server, and utilizes CGI and JavaScript technologies to realize that various intelligent terminals log in a browser to display monitoring pictures. However, such systems only perform simple video data return, are not fused with the deep learning algorithm, and have a single interface display effect. The RCNN target detection algorithm adopts a selective search algorithm to generate a plurality of regions which possibly contain objects, utilizes a convolutional neural network to extract features, uses a support vector machine classifier to classify, and finally adopts a boundary box regression mode to accurately position. The FasterR-CNN algorithm proposed by Ren et al adds a region-proposing network after the convolutional layer instead of the selective search algorithm, and thus has a great improvement in speed. However, although the series of target detection algorithms have higher detection accuracy, computational redundancy still exists, and the method is not beneficial to system application with higher requirements on real-time performance. The Yolo series target detection algorithm is a single-stage target detection algorithm, is based on regression of a boundary box, performs classification and regression while generating the boundary box, and can meet the real-time requirement. However, the series of target detection algorithms have poor detection effect on low-resolution images and cannot meet the target detection requirement under severe weather conditions. High-speed rail perimeter intrusion target detection under typical severe weather conditions-an image processing method based on deep learning [ willow-green red high-speed rail perimeter intrusion target detection under typical severe weather conditions [ D ]. Beijing university of transportation, 2021.DOI:10.26944/d.cnki.gbfju.2021.000202 ] proposes to carry out preprocessing of defogging and rain removal on an image, and then the processed image is used as the input of a target detection network, so that the target detection effect under severe weather conditions can be effectively improved. However, this approach sacrifices target detection performance for normal weather and only accommodates target detection for a single inclement weather.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a virtual reality security monitoring platform based on deep learning target detection. The method comprises the steps of establishing a three-dimensional virtual campus scene by utilizing the powerful three-dimensional graphic rendering capability of a UE4 engine and combining an actual campus scene, collecting real-time monitoring data through a network IP camera, transmitting video data to a background server through an RTMP protocol, calling an improved target detection algorithm to perform target detection in the background, and responding a detection result to a virtual campus terminal by utilizing an HTTP protocol. The Yolo target detection algorithm is combined with the image classification and image enhancement algorithm, the method can effectively adapt to target detection in various severe weather and low light states and has strong real-time performance, a web service technology is combined with a virtual terminal, and the campus real-time monitoring information can be checked and positioned more quickly by security personnel through a more visual interface display effect, so that the use efficiency of a campus security monitoring platform and the working efficiency of the security personnel are improved.

In order to solve the technical problem, the technical scheme of the invention is to provide a virtual reality security monitoring platform based on deep learning target detection, which is characterized by comprising the following steps:

step 1: building a background audio and video server based on Nginx, acquiring remote real-time video data by using an RTMP protocol, and responding the real-time video data to a virtual terminal; and building a Web background server based on a flash framework, and pulling video stream data to a routing address of the audio/video server.

Step 2: constructing image data sets of different weather states; collecting a plurality of intact weather state images and marking weather types as labels for driving deep neural network training; the weather conditions are classified into fog days, rain days, and fine days.

And step 3: training a weather state classification model based on a lightweight network model ShuffleNet, performing framing processing on the real-time video data acquired in the step 1, taking an image of each frame as the input of a network, and outputting three weather state classification results (rainy days, foggy days and sunny days) by the network; constructing a lightweight neural network model ShuffleNet, and training by using the weather state image data set constructed in the step 2; the lightweight neural network model extracts features by using packet convolution, rearranges channels after convolution, reduces calculated amount and ensures model efficiency; designing a ShuffleNet unit, extracting features by utilizing 1 × 1 grouping point-by-point convolution, and then carrying out channel rearrangement; the number of channels connected to it for a hop connection is matched using 3 x 3 depth convolution and packet point-by-point convolution.

And 4, step 4: designing a parallel three-channel connector, and performing enhancement operation on an image by dividing the connector into a channel A, a channel B and a channel C; respectively taking the three classification results obtained in the step 3 as the input of a connector, wherein the channel A is a rain removing channel, and the channel integrates a rain removing model and outputs a rain removing image; the channel B is a demisting channel, and the channel integrates a demisting model and outputs a demisting image; the channel C is a null channel and outputs an original image; and three channels are integrated with a dim light enhancement algorithm to preprocess the dim light image.

And 5: taking the image enhancement output completed in the step 4 as the input of a Yolov5 detection network; the YOLOv5 detection network is widely applied to image editing, crowd detection and automatic driving in practical application. In contrast to the previous version, YOLOv3 designed darknet-53 based on the idea of Resnet, which consists of successive 3 × 3 and 1 × 1 convolution kernels. The algorithm realizes multi-scale training by predicting the multi-scale characteristic graph, thereby further improving the detection precision, particularly the detection of small targets. The backbone network of the YOLOV5 is improved to be composed of a Shuffle layer and a Shuffle block, 4 times of slice operation is adopted in an upper layer structure of feature extraction to form a Focus layer, common convolution in an original model is replaced by deep separable convolution, and a Swin activation function is used as the activation function, so that network parameters and calculated quantity are reduced, and the target detection speed is improved.

Step 6: developing a virtual terminal based on the UE4 engine; establishing a virtual campus scene according to the campus real scene, and respectively acquiring real-time video data and detection result data by a virtual terminal through an http protocol; developing a virtual scene roaming function, and after logging in a virtual terminal, security personnel can respectively switch to a first-person view angle and a global roaming view angle to browse a virtual campus; the virtual campus monitoring site icons are set according to the campus monitoring actual sites, different states of the chart can be rendered according to http response data, the virtual icons are controlled by a user through a ray detection principle, and a real-time monitoring picture is popped up.

By adopting the technical scheme, compared with the prior art, the invention has the following advantages:

1. firstly, preprocessing an image of the input of a Yolo target detection algorithm, combining the image with an image classification algorithm and an image enhancement algorithm, effectively adapting to the target detection of various severe weather and low light states and having strong real-time property; a three-dimensional virtual terminal is developed based on a UE4 engine, a web service technology is combined with the virtual terminal, and security personnel can check and position campus real-time monitoring information more quickly through a more intuitive interface display effect, so that the use efficiency of a campus security monitoring platform and the working efficiency of the security personnel are improved.

2. The image preprocessing method of the invention provides a new target detection architecture, and a parallel three-channel connector is designed to combine a weather classification network with a defogging model, a rain removal model and a dark light enhancement model, so that a Yolo network can simultaneously adapt to target detection tasks in foggy days, rainy days and low light states and give consideration to real-time performance, and meanwhile, the detection performance of an input image in a good state cannot be reduced.

3. According to the invention, the three-dimensional virtual terminal is developed based on the UE4 engine, a web service technology is combined with the virtual terminal, and security personnel can check and position the campus real-time monitoring information more quickly through a more intuitive interface display effect, so that the use efficiency of a campus security monitoring platform and the working efficiency of the security personnel are improved.

4. The parallel three-channel connector is divided into a channel A, a channel B and a channel C to perform enhancement operation on an image; respectively taking three classification results obtained by the image classification network as the input of a connector, wherein the channel A is a rain removing channel, and the channel integrates a rain removing model and outputs a rain removing image; the channel B is a demisting channel, and the channel integrates a demisting model and outputs a demisting image; the channel C is a null channel and outputs an original image; and three channels are integrated with a dim light enhancement algorithm to preprocess the dim light image. By the design, one set of system can simultaneously carry out optimization processing on various severe weather state images, and the parallel design ensures the real-time performance of the system.

5. The defogging model of the invention is based on the dark channel prior image defogging algorithm, improves the atmospheric light estimation method and the low-complexity morphological reconstruction method, and provides a new atmospheric light estimation method on the basis of the prior method, thereby improving the robustness of the algorithm, solving the overexposure problem, and having the algorithm complexity lower than other algorithms with the same performance.

6. The rain removing model improves a classic rain removing algorithm based on sparse representation, clustering errors exist in the process that a dictionary is divided into a geometric dictionary and a rain dictionary by the classic algorithm, the errors are too large when the dictionary is restored into an image, or serious blurring occurs when background information is filtered too seriously. Therefore, the algorithm solves the problem of residual errors compared with the method that the original algorithm only uses high-frequency components to learn and then separates a geometry dictionary to automatically and directly superpose low-frequency components.

7. According to the dim light enhancement model, the hyperbolic tangent curve is utilized to map the image brightness to an ideal level, and then the weighting parameters are determined according to the maximum entropy of the image, so that the enhancement of a highlight area can be effectively inhibited, and the occurrence of noise is reduced.

8. The Yolo detection network improves a backbone network of YOLOV5, so that the backbone network is composed of Shufflechannel and Shuffleblock, meanwhile, a Focus layer is formed by adopting 4 times of slice operation in an upper layer structure of feature extraction, common convolution in an original model is replaced by deep separable convolution, and a Swin activation function is used as the activation function, so that network parameters and calculated amount are reduced, and the target detection speed is improved.

Drawings

FIG. 1 is a schematic diagram of an architecture of a virtual reality security monitoring platform according to the present invention.

FIG. 2 is a flow chart of the rain removal algorithm of the present invention.

FIG. 3 is a flow chart of the defogging algorithm of the present invention.

FIG. 4 is a flow chart of the dim light enhancement algorithm of the present invention.

Detailed Description

Examples

Fig. 1 is a flowchart of a virtual reality security monitoring platform based on deep learning target detection provided in this embodiment, which specifically includes the following steps:

Step 2: constructing image data sets of different weather states; the method comprises the steps of obtaining images through various large search engines such as Baidu, must and Google, wherein a data set comprises 9000 pictures in 3 types of weather such as foggy days, rainy days and sunny days, 3000 pictures in each type of weather, the data set is oriented to multiple scenes, and the backgrounds of the images are different, complicated and changeable.

And the data set is enriched by using methods such as mirroring, turning, cutting, Gaussian noise adding and the like, and the algorithm generalization capability is improved, so that the network model is better optimized. And 3 types of weather pictures are respectively expanded to 10000, 20% of each type of weather data is selected as a test set without data enhancement, and the accuracy of the algorithm is ensured to accord with the real situation. And the rest 80% of data is subjected to data enhancement and is used as a training and verification set to improve the balance and diversity of the sample.

And 3, step 3: training a weather state classification model based on a lightweight network model ShuffleNet, performing framing processing on the real-time video data obtained in the step 1, and taking an image of each frame as the input of a network; the ShuffleNet provides point-by-point grouping convolution, so that hardware computing power can be effectively saved, computing speed is improved, and meanwhile channel mixing is adopted to help information circulation among groups, so that a light weight ShuffleNet network architecture is constructed. The classification model adopts 16 ShuffleNet units and a LeakyReLU activation function to build a weather state classification model.

The network is pre-trained using the ImageNet data set to optimize model parameters, thereby reducing network iteration times. Firstly, training set data is used as input of a network, convolution operation is respectively carried out to extract picture features to form a feature map, then the feature map is propagated backwards through the specific point-by-point grouping convolution operation and channel mixing operation of Softmax, and after the convolution layer and the pooling layer, the Softmax is enabled to approach the network, so that a result is predicted. And (3) performing loss calculation by taking the predicted value and the actual value of the network as the input of a loss function, updating network parameters by using a gradient descent algorithm, and training an optimal weather classification model through continuous iteration.

And 4, step 4: designing a parallel three-channel connector, and performing enhancement operation on an image by dividing the connector into a channel A, a channel B and a channel C; and (4) respectively taking the three classification results obtained in the step (3) as the input of the connector.

Step 4.1: as shown in fig. 2, when the weather state classification model determines that the image is in a rainy weather state, the image will be used as an input of a channel a, which is a rain removal channel, and the channel a integrates a rain removal model to output a rain removal image.

Decomposing the original rainwater image I into a low-frequency part ILf by using a bilateral filtering algorithm, and obtaining a high-frequency part IHf of the image by using I-ILf; the bilateral filter principle is as follows:

wherein (i, j) is a central point coordinate, and f (i, j) is a central point coordinate pixel value; the point (k, l) is any point in the neighborhood S of the center point (i, j), σ_dAnd σ_rIs a smoothing parameter; f (k, l) is a pixel value of a point ((k, l), a spatial distance from the point (k, l) to the point (i, j) is d (i, j, k, l), a gray level difference domain value matrix is r (i, j, k, l), omega (i, j, k, l) is a weight matrix of bilateral filtering, and a pixel value output by each pixel point through the bilateral filtering is g (i, j);

background information and noise are contained in the high-frequency component IHF, so that image HOG features are extracted and a high-frequency image dictionary is obtained through sparse matrix coding and dictionary learning. In the HOG feature descriptor, a distribution (histogram) of directional gradient directions is used as a feature. The large magnitude of the gradient around the edges and corners (regions of abrupt intensity changes) indicates that the object edges contain more object shape information than planar regions.

And taking blocks from the IHF, learning a dictionary to obtain a dictionary D, and dividing the dictionary into two types, namely a rain dictionary and a geometric dictionary, by using a KMeans nearest neighbor algorithm. Dictionary learning is a branch of signal processing and machine learning, and the goal is to find a framework (called a dictionary) in which some training data is represented sparsely, with the sparser representation representing the better the dictionary.

And classifying the rain dictionary again according to the prior information of rain, and removing the misclassification part to obtain a final rain dictionary DR 1. Recovering rain component IHFR of the high-frequency image according to a rain dictionary DR1, obtaining IHFG by utilizing IHF-IHFR, performing bilateral filtering on the IHFG again and performing denoising processing by using a weaker BM3D algorithm to obtain a new geometric component IHFG, wherein BM3D depends on non-local and local characteristics of a natural image, namely a large amount of mutually similar spots and image data are locally and highly correlated. If these features are verified, the group has correlation in all three dimensions and a sparse representation of the true signal can be obtained by applying a decorative three-dimensional transformation on the group. And finally, adding the IHFG and the ILF to obtain a final rain removing result.

Step 4.2: the channel B is a demisting channel, and the channel integrates a demisting model and outputs a demisting image;

the defogging model adopted in this embodiment is based on an improved DCP algorithm, as shown in fig. 3, in a local area of any three-channel input image, a pixel value in at least one channel approaches to zero, and a dark channel expression is obtained:

omega (x) being centred on the pixel position xLocal area block, I^CWhich represents a certain color channel of the input image I, and y represents the pixel values in the region of Ω (x). Then DCP is represented as:

I^dark(x)→0

in bad weather conditions, the captured image will fog up due to scattering effects of airborne particles, which can be expressed as:

x is the pixel location, i (x) is the color intensity of the captured image, j (x) is the original scene brightness to be restored, a is atmospheric light, t (x) is the transmittance; the problem of recovering the fog-free image j (x) translates into predicting the unknown parameters atmospheric light intensity a and transmittance t (x).

If the estimated value of the atmospheric light intensity a is low, the result is blurred, and if the estimated value is too high, excessive noise is generated. This embodiment proposes a new atmospheric light estimation method, A_brightestRepresenting the highest value of the original image in the dark channel, A_0.1％The average value of the first 0.1% of the brightest pixels in the dark channel and a is the default value of the weight and is set to 0.6, and the robustness of the estimation is obviously improved by using the method.

A＝aA_brightest+(1-a)A_0.1％

Color distortion of the reconstructed image sky area is prevented by introducing a lower transmittance bound. Definition of t₀The default value is 0.4, which is the lower bound of the transmittance.

Normally, the transmittance of the target area is not lower than 0.6, and the transmittance of the sky area is close to 0, so the defogging process of the target area is not affected by the lower bound. Since DCP defogging is applied to the entire image, incorrect estimation of the pixel values of the sky region results in higher color intensity of the sky region due to the change in transmittance. However, since the color of the haze and sky are very similar, this incorrect estimation does not affect the overall quality of the result. In the case where haze is heavy, the transmittance may be lower than 0.4, resulting in the presence of the corresponding regionResidual haze, in order to solve the problem, before the lower limit of the transmissivity is applied, a group of morphological operations are introduced to reconstruct transmission mapping, and an opening operation is firstly carried out

Then carrying out closed operation gamma; the closed operation is used to fill in the holes of the image and enhance the structure of the object, while the open operation removes the small objects, epsilon, in the image while preserving the larger objects_BAs a spreading factor, δ_BIs a contraction factor;

and

t_CO＝γ(t_C)＝δ_B[ε_B(t_C)]

finally, the sky and the object are separated, and the transmissivity t, t is estimated again by the following method_{reconstructed}Is the transmittance ultimately used to restore the haze-free image;

the method avoids excessive enhancement through more robust estimation of the lower limit of the scene brightness; processing time is significantly reduced by morphological reconstruction and smooth edges are provided.

Step 4.3: the channel C is a null channel and outputs an original image;

step 4.4: as shown in fig. 4, the channel a, the channel B, and the channel C are all integrated with a dim light enhancement algorithm; the Hyperbolic Tangent Curve (HTC) is a monotone increasing function in a range of (-1, +1), and when the image intensity value is normalized to [0,1], the hyperbolic tangent curve is used for brightness mapping; for an RGB image, let the pixel value be I (x, y) ═ { R (x, y), G (x, y), B (x, y) }; the red, green and blue components are represented as R, G and B, respectively, and normalized to [0,1] with the pixel location coordinate (x, y) resulting in the hyperbolic tangent function:

wherein

k is a scale factor whose value is determined according to the brightness of the image.

To suppress the enhancement of the highlight region, the luminance of the output image is weighted as:

I_w(x,y)＝wI(x,y)+(1-w)tanh(kI(x,y))

w is a weighting coefficient, which is used as an average value of RGB channel values to maintain image intensity distribution;

the luminance enhancement should cover all pixel grayscales from 0 to 1 to ensure that the image is globally enhanced, for which the luminance weighting of the output image is subjected to a stretching process:

I_w(x,y)_maxand I_w(x,y)_minRespectively the maximum value and the minimum value of the image intensity value; is (x, y) represents the pixel value of each point after global enhancement Is performed on the image.

The larger the image entropy is, the more information carried by the image is, and the clearer the image is; the image entropy is defined as:

p (i) is the ratio of the intensity of the ith image; applying the above formula to calculate the entropy value of Is (x, y) at different k, when the entropy value Is maximum, the corresponding scale factor k Is the optimal value, and vice versa; and when the brightness enhancement of the image is finished, carrying out image denoising treatment by using a basic bilateral filtering algorithm.

And 5: and (3) taking the frame-by-frame image enhancement output completed in the step (4) as the input of a YOLOV5 detection network, taking a backbone network trained by Darknet-53 and used for classification as a feature extractor, inputting the image to obtain features of three scales, and then carrying out multi-scale target detection. The backbone network can make more efficient use of the GPU because it is a full convolutional layer and therefore can be compatible with input images of arbitrary dimensions. The output of the network is 3 scales of featuremap, each gridcell has 3 anchors, each Anchor has 5 (box coordinates) +80(80 class probabilities) parameters, and 3 × 85 ═ 255. The 13 × 13 scale corresponds to the field of the original 32 × 32, that is, 1 gridcell corresponds to the area of the original 32 × 32, and the fields of different scales are responsible for predicting the objects of different scales. By means of multi-scale fusion, the deep network semantic description abstract features are exerted, and the bottom layer features of the shallow network fine-grained pixel level are fully utilized, so that multi-scale fusion and prediction of objects with different scales can be achieved through the structure. And then, screening and filtering the bounding box by a non-maximum inhibition method, and filtering out low confidence and repeated frames to obtain a final target detection result. The backbone network of the YOLOV5 is improved to be composed of Shufflechannel and Shuffleblock, 4 times of slice operation is adopted in an upper layer structure of feature extraction to form a Focus layer, common convolution in an original model is replaced by deep separable convolution, and a Swin activation function is used as the activation function, so that network parameters and calculated amount are reduced, and the target detection speed is improved.

Step 6.1: virtual terminal development is performed based on the UE4 engine. Instantiating an object of a brown class, wherein the object of the brown class can be controlled by a user, and the object of the brown class is actually controlled when the user roams in a scene; a camera assembly and spring arm assembly then need to be added to the pan-class object to enable the display and zooming of the scene. And finally, binding physical equipment, such as keyboard keys, so that the control on the brown class object can be realized through the keyboard keys, and the roaming function of the role in the scene is completed.

Step 6.2: the camera icon is declared to be an Actor class, the Actor class is a basic type of an object which can be placed in a virtual scene, and an instance of the Actor class can realize interaction with a user; binding the brown class object with the physical equipment, such as a right mouse button, and emitting rays along the current direction when the right mouse button is clicked, for example, emitting rays to a camera icon when the camera icon is clicked; and judging whether the ray has an object which impacts the Actor class, if so, executing corresponding logic, and if so, starting a real-time monitoring picture, so that a user can check the real-time monitoring picture in a virtual scene.

Step 6.3: and compiling an http data request code in an Actor class blueprint, calling a Call URL communication node provided by a VaRest plug-in to acquire a server address, and acquiring detection data information by a Get method. Judging the acquired detection data information, carrying out inter-blueprint communication if the acquired detection data exists, calling a monitoring site icon blueprint in a communication type blueprint, and highlighting the monitoring site icon by using a setdataral node.

The foregoing is merely a preferred embodiment of the invention and is not intended to limit the invention in any manner and in any manner. It should be noted that, for a person skilled in the art, several modifications and additions can be made without departing from the method of the invention, which should also be considered as a protection scope of the invention. Those skilled in the art can make various changes, modifications and equivalent arrangements, which are equivalent to the embodiments of the present invention, without departing from the spirit and scope of the present invention, and which may be made by utilizing the techniques disclosed above; meanwhile, any changes, modifications and variations of the above-described embodiments, which are equivalent to those of the technical spirit of the present invention, are within the scope of the technical solution of the present invention.

Claims

1. The utility model provides a virtual reality security protection monitoring platform based on deep learning target detects which characterized in that includes following step:

step 1: building a background audio and video server based on Nginx, acquiring remote real-time video data by using an RTMP protocol, and responding the real-time video data to a virtual terminal; building a Web background server based on a flash framework, and pulling video stream data to a routing address of an audio/video server;

step 2: constructing image data sets of different weather states; collecting a plurality of intact weather state images and marking weather types as labels for driving deep neural network training; the weather state types are divided into fog days, rain days and sunny days;

and 3, step 3: training a weather state classification model based on a lightweight network model ShuffleNet, performing framing processing on the real-time video data obtained in the step 1, taking an image of each frame as the input of a network, and outputting three weather state classification results by the network; building a lightweight neural network model ShuffleNet, and training by using the weather state image data set built in the step 2; the lightweight neural network model extracts features by using packet convolution, rearranges channels after convolution, reduces calculated amount and ensures model efficiency; designing a ShuffleNet unit, extracting features by utilizing 1 × 1 grouping point-by-point convolution, and then carrying out channel rearrangement; matching the number of channels of the jump connection connected with the channel by applying 3 x 3 depth convolution and grouping point-by-point convolution;

and 4, step 4: designing a parallel three-channel connector, and performing enhancement operation on an image by dividing the connector into a channel A, a channel B and a channel C; respectively taking the three classification results obtained in the step 3 as the input of a connector, wherein the channel A is a rain removing channel, and the channel integrates a rain removing model and outputs a rain removing image; the channel B is a demisting channel, and the channel integrates a demisting model and outputs a demisting image; the channel C is a null channel and outputs an original image; three channels are integrated with a dim light enhancement algorithm to preprocess the dim light image;

and 5: taking the image enhancement output completed in the step 4 as the input of a YOLO v5 detection network; the backbone network of the YOLO v5 consists of a Shuffle channel and a Shuffle block, 4 times of slice operation is adopted in an upper layer structure of feature extraction to form a Focus layer, common convolution in an original model is replaced by deep separable convolution, and a Swin activation function is used as an activation function;

and 6: developing a virtual terminal based on the UE4 engine; establishing a virtual campus scene according to the campus real scene, and respectively acquiring real-time video data and detection result data by a virtual terminal through an http protocol; developing a virtual scene roaming function, and after logging in a virtual terminal, security personnel can respectively switch to a first-person view angle and a global roaming view angle to browse a virtual campus; setting a virtual campus monitoring site icon according to the campus monitoring actual site, rendering different states of the chart according to http response data, and realizing control of a user on the virtual icon by using a ray detection principle to pop up a real-time monitoring picture.

2. The virtual reality security monitoring platform according to claim 1, wherein the specific process of the step 2 is: constructing image data sets of different weather states; acquiring images through a search engine, wherein a data set comprises 9000 pictures in 3 types of weather such as foggy weather, rainy weather and sunny weather, 3000 pictures are respectively taken in each type of weather, and the data set is oriented to multiple scenes;

mirror image, turning, cutting and Gaussian noise enrichment data sets are used, 3 types of weather pictures are respectively expanded to 10000, 20% of weather data are selected as a test set without data enhancement, and the rest 80% of weather data are subjected to data enhancement and used as a training and verification set to improve the balance and diversity of samples.

3. The virtual reality security monitoring platform according to claim 2, wherein the specific process of the step 3 is as follows: training a weather state classification model based on a lightweight network model ShuffleNet, performing framing processing on the real-time video data obtained in the step 1, and taking an image of each frame as the input of a network; meanwhile, channel mixing is adopted to help information circulation among all groups, so that a lightweight ShuffleNet network architecture is constructed; the classification model adopts 16 ShuffleNet units and a Leaky ReLU activation function to build a weather state classification model.

Pre-training the network by using an ImageNet data set, firstly, taking training set data as the input of the network, respectively performing convolution operation to extract picture characteristics to form a characteristic graph, then, backward propagating through the specific point-by-point grouping convolution operation and channel mixing operation of Softmax, and after passing through a plurality of convolution layers and pooling layers, enabling the Softmax to approach the network so as to predict a result; and (3) performing loss calculation by taking the predicted value and the actual value of the network as the input of a loss function, updating network parameters by using a gradient descent algorithm, and training an optimal weather classification model through continuous iteration.

4. The virtual reality security monitoring platform according to claim 3, wherein the specific process of the step 4 is as follows:

step 4.1: when the weather state classification model judges that the image is in a rainy day state, the image is used as the input of a channel A, the channel A is a rain removing channel, and the channel A integrates a rain removing model and outputs a rain removing image;

the high-frequency IHF comprises background information and noise, image HOG characteristics are extracted, and a high-frequency image dictionary is obtained through sparse matrix coding and dictionary learning; in the HOG feature descriptor, the distribution of directional gradient directions is used as a feature; the gradient amplitude around the area with suddenly changed intensity is large, which shows that the object edge contains more object shape information than the plane area;

taking blocks from the IHF, learning a dictionary to obtain a dictionary D, and dividing the dictionary into two types, namely a rain dictionary and a geometric dictionary, by using a KMeans nearest neighbor algorithm;

classifying the rain dictionary again according to the prior information of rain, and removing the misclassification part to obtain a final rain dictionary DR 1; restoring a rain component IHFR of the high-frequency image according to a rain dictionary DR1, obtaining IHFG by utilizing IHF-IHFR, performing bilateral filtering on the IHFG again, performing denoising processing by using a BM3D algorithm to obtain a new geometric component IHFG, and adding the IHFG and the ILF to obtain a final rain removal result;

the defogging model is based on an improved DCP algorithm, and in a local area of any three-channel input image, at least one pixel value in one channel approaches to zero, so that a dark channel expression is obtained:

Ω (x) is a local area block centered at the pixel position x, I^CRepresenting a certain color channel of the input image I, y representing a pixel value within the region Ω (x); then DCP is represented as:

I^dark(x)→0

in severe weather conditions, the captured image may fog up, which may be represented as:

x is the pixel location, i (x) is the color intensity of the captured image, j (x) is the original scene brightness to be restored, a is atmospheric light, t (x) is the transmittance; the problem of recovering the fog-free image J (x) can be converted into the prediction of the atmospheric light intensity A and the transmittance t (x) with unknown parameters;

if the estimated value of the atmospheric light intensity A is low, the result is fuzzy, and if the estimated value is too high, excessive noise is generated;

A＝aA_brightest+(1-a)A_0.1％

wherein A is_brightestRepresenting the highest value of the original image in the dark channel, A_0.1％The average value of the first 0.1% of brightest pixels in the dark channel is used, a is a weight default value and is set to be 0.6, and the robustness of the atmospheric light intensity A estimated value is improved;

definition of t₀The lower limit of the transmissivity is set as 0.4 by default, and the color distortion of a reconstructed image sky area is prevented;

under normal conditions, the transmittance of a target region is not lower than 0.6, and the transmittance of a sky region is close to 0, so that the defogging process of the target region is not influenced by the lower bound, because DCP defogging is applied to the whole image, the color intensity of the sky region is higher due to the fact that the pixel value of the sky region is not correctly estimated due to the change of the transmittance, and under the condition that haze is heavy, the transmittance may be lower than 0.4, so that residual haze exists in the corresponding region;

before applying the lower limit of transmissivity, a group of morphological operations are introduced to reconstruct transmission mapping, and an opening operation is firstly carried out

and

t_CO＝γ(t_C)＝δ_B[ε_B(t_C)]

step 4.3: the channel C is a null channel and outputs an original image;

step 4.4: the channel A, the channel B and the channel C are integrated with a dim light enhancement algorithm; the Hyperbolic Tangent Curve (HTC) is a monotone increasing function in a range of (-1, +1), and when the image intensity value is normalized to [0,1], the hyperbolic tangent curve is used for brightness mapping; for an RGB image, let the pixel value be I (x, y) ═ { R (x, y), G (x, y), B (x, y) }; the red, green and blue components are represented as R, G and B, respectively, and normalized to [0,1] with pixel position coordinates of (x, y) resulting in a hyperbolic tangent function:

wherein

k is a scale factor, the value of which is determined according to the brightness of the image;

I_w(x,y)＝wI(x,y)+(1-w)tanh(kI(x,y))

w is a weighting coefficient which is used as an average value of RGB channel values to keep the image intensity distribution;

p (i) is the ratio of the intensity of the ith image; applying the above formula to calculate the entropy value of Is (x, y) at different k, when the entropy value Is maximum, the corresponding scale factor k Is the optimal value, and vice versa; and when the brightness enhancement of the image is finished, carrying out image denoising processing by using a basic bilateral filtering algorithm.

5. The virtual reality security monitoring platform according to claim 4, wherein the specific process of the step 5 is as follows:

the frame-by-frame image enhancement output completed in the step 4 is used as the input of a YOLO V5 detection network, a backbone network trained by Darknet-53 and used for classification is used as a feature extractor, features of three scales are obtained after the image is input, and then multi-scale target detection is carried out; the network outputs feature maps with 3 scales, each grid cell has 3 anchors, each Anchor has 5+80 parameters, and 3 × 85 is 255; wherein 5 corresponds to the box coordinates and 80 corresponds to the class probability; the 13 × 13 scale corresponds to the receptive field of the original image 32 × 32, that is, 1 grid cell corresponds to the area of the original image 32 × 32, and the receptive fields with different scales are responsible for predicting objects with different scales; and then, screening and filtering the bounding box by a non-maximum inhibition method, and filtering out low confidence and repeated frames to obtain a final target detection result.

6. The virtual reality security monitoring platform according to claim 4, wherein the specific process of the step 6 is as follows:

step 6.1: virtual terminal development is carried out based on a UE4 engine, and a Pawn-type object is actually controlled when a user roams in a scene; a camera assembly and a spring arm assembly are required to be added to the Pawn object so as to display and zoom the scene; binding physical equipment to realize control on the brown class object and finish the roaming function of the role in the scene;

step 6.2: declaring the camera icon as an Actor class, binding the wind class object with the physical equipment, and executing corresponding logic;

step 6.3: compiling an http data request code in an Actor class blueprint, calling a Call URL communication node provided by a Va Rest plug-in to acquire a server address and acquiring detection data information by a Get method; judging the acquired detection data information, carrying out inter-blueprint communication if the detection data is acquired, calling a monitoring site icon blueprint in a communication blueprint, and highlighting the monitoring site icon by using a set material node.