CN110689021A - Real-time target detection method in low-visibility environment based on deep learning - Google Patents

Real-time target detection method in low-visibility environment based on deep learning Download PDF

Info

Publication number
CN110689021A
CN110689021A CN201910985552.5A CN201910985552A CN110689021A CN 110689021 A CN110689021 A CN 110689021A CN 201910985552 A CN201910985552 A CN 201910985552A CN 110689021 A CN110689021 A CN 110689021A
Authority
CN
China
Prior art keywords
network
target
model
low
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910985552.5A
Other languages
Chinese (zh)
Inventor
李成严
马金涛
赵帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN201910985552.5A priority Critical patent/CN110689021A/en
Publication of CN110689021A publication Critical patent/CN110689021A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a low-visibility real-time target detection method based on deep learning, which solves the problem of target detection in a low-visibility environment greatly influenced by smoke dust, water mist and light shadow through guiding filtering. And processing the frame picture by adopting an SSD target detection model, finding out the target area coordinates, and fully utilizing the accuracy advantage of the SSD target detection model. The method has the advantages that the guided filtering is introduced, the guided filtering is integrated with the SSD target detection model, influence factors under low-visibility environment are solved, in a scene influenced by environmental factors, the guided filtering is used for carrying out operations such as image enhancement and defogging, processed images are clearer, the resolution ratio is higher, after the images are processed, target position coordinates are generated, the processed images are transmitted into a lower-layer network GoogleNet for accuracy verification, the efficiency of the GoogleNet network is utilized, and the detection precision is improved under the condition that the speed is not reduced. The method can accurately identify the target in a low-visibility environment, has certain reliability and higher identification precision.

Description

Real-time target detection method in low-visibility environment based on deep learning
Technical Field
The invention relates to the field of image processing and target detection, in particular to a real-time target detection method in a low-visibility environment based on deep learning. The low-visibility environment is defined as an environment in which external factors such as smoke, water mist, insufficient light and the like have a large influence.
Background
Image processing (image processing) refers to a technique for analyzing an image with a computer to achieve a desired result. Also known as image processing. Image processing generally refers to digital image processing. Digital images are large two-dimensional arrays of elements called pixels and values called gray-scale values, which are captured by industrial cameras, video cameras, scanners, etc. Image processing techniques generally include image compression, enhancement and restoration, matching, description and identification of 3 parts.
The image processing method used by the invention is guide filtering, the guide filtering is a filter needing a guide map, the guide map can be a single image or an input image, and when the guide map is the input image, the guide filtering becomes a filtering operation for keeping edges and can be used for filtering image reconstruction. The method is self-adaptive weight filtering, and can perform operations such as image smoothing, enhancement, extinction, feathering, defogging, joint sampling and the like. An intuitive approach is to apply the guided filtering directly to the three color channels (RGB). The guiding filtering is to assume that a window with a pixel k as a center has a local linear relationship, which can be determined by differentiating the local linear relationship, and the two given parameters a and b are the mean values of the weights around the window with the pixel k as the center, so as to smooth the image and maintain the boundary.
The reason that the guiding filtering can keep the linear complexity is that each pixel is contained by a plurality of windows, when a specific output value of a certain point pixel value is obtained, all linear function values containing the point are only required to be averaged, the guiding filtering supports linear calculation amount, and the processing efficiency can be obviously improved. By using the two methods, the target detection is more effective, the target can be accurately detected even under a low-visibility environment (such as affected by smoke, water mist and illumination), the target detection accuracy is improved, and the false alarm rate and the false missing report rate of the target detection are reduced.
The target detection, also called target extraction, is an image segmentation based on target geometry and statistical characteristics, which combines the segmentation and identification of targets into one, and the accuracy and real-time performance of the method are important capabilities of the whole system. Especially, in a complex scene, when a plurality of targets need to be processed in real time, automatic target extraction and identification are particularly important.
With the development of computer technology and the wide application of computer vision principle, the real-time tracking research on the target by using the computer image processing technology is more and more popular, and the dynamic real-time tracking and positioning of the target has wide application value in the aspects of intelligent traffic systems, intelligent monitoring systems, military target detection, surgical instrument positioning in medical navigation operations and the like.
The invention relates to the field of object detection, and provides a method for detecting an object in a low-visibility environment by using an object detection technology.
Disclosure of Invention
To solve the low visibility object detection problem, the present invention uses an object detection method that can smooth, enhance, and deblur an image.
Therefore, the invention provides the following technical scheme:
a real-time target detection method based on deep learning in a low-visibility environment is characterized in that an improved VGG16 network, a GoogleLeNet target detection algorithm and guiding filtering are combined, so that target detection can be accurately identified in environments with insufficient smoke, water mist and light. The specific process comprises the following steps:
step 1: real-time video stream acquisition
Step 2: generating a data set in target detection;
and step 3: setting parameters of the improved VGG16 network;
and 4, step 4: introducing an improved VGG16 network to process and classify target data;
and 5: testing the performance of the trained network model to find a current performance optimal model;
step 6: introducing guiding filtering by combining the characteristics of the target in a low-visibility environment, so that the model can accurately find the target and determine the target coordinate;
and 7: introducing a GoogLeNet network model, performing multi-scale training, extracting deep-level features of data, and then detecting a target in a coordinate region;
and 8: and constructing an integral target detection framework, namely a VGFG (VGG16 Guided Filter GoogleLeNet) model, and testing by using a model obtained by combining an improved VGG16 network, a GoogleLeNet target detection algorithm and guide filtering.
Further, the real-time video stream is obtained, and the IP of the computer (the IP of the added network card) is modified according to the IP of the hard disk video recorder, so that the network segment of the computer is consistent with the hard disk video recorder.
Further, a VOC format data set is manufactured, picture data in the data set are labeled by using a labelImg tool, an XML file is generated, a data set path document is generated, and a training set, a testing set and a verification set path file are generated.
Further, selecting a priori frame with a proper scale, a network learning rate and a training layer number. For the scale of the prior box, it obeys a linear increasing rule: as the feature map size decreases, the a priori box scales add linearly. The calculation formula is
Figure BDA0002236571780000031
Where m is the number of feature maps, SkRepresenting the ratio of the prior frame size to the picture, SminAnd SmaxThe maximum and minimum values of the ratio are indicated, respectively.
The choice of learning rate is a super-parameter problem, which requires constant testing, with basic ranges of 0.1, 0.01,
0.001, 0.0001, with increasing order of magnitude, until an optimum is found, or by adjusting through a gradient of a loss function of
Figure BDA0002236571780000032
Wherein N represents the number of positive samples and represents
Figure BDA0002236571780000033
Whether the prior frame is matched with the ith group route and the type of the prior frame is p, and c represents the confidence coefficient predicted value of the type. l represents the position prediction of the corresponding bounding box of the prior boxThe value, g, represents the location parameter of the ground truth.
Further, the improved VGG16 network is trained by pre-training convolutional layer weights, downloading the file darknet53.conv.74 and the target detection data set generated in step 1.
Further, training is repeatedly carried out, parameters are adjusted according to the step 2, the improved VGG16 network outputs one model every 10 times of iteration, the training is carried out 240000 times in total, and test results show that the model effect is best when the model is iterated 180000 times.
Further, the target characteristics in the low-visibility environment are that the influence of smoke, water mist and light shadow is large, the target is difficult to identify, and aiming at the target characteristics in the low-visibility environment, guide filtering is introduced, is self-adaptive weight filtering, can play a role in keeping a boundary while smoothing an image, and can perform operations such as image smoothing, enhancement, extinction, feathering, defogging, combined sampling and the like. Guiding filtering, wherein the output result of a certain pixel point is as follows:
Figure BDA0002236571780000041
where q is the output image, I is the guide image, and a and b are the invariant coefficients of the linear function when the window center is located at k. The assumed conditions of the method are: q and I have a local linear relationship in the window centered on pixel k. Taking the derivative of the above equation (i.e., representing an edge) it can be seen that the output result will only appear edge when there is an edge in the guide image. In order to solve the coefficients a and b in the test, assuming that p is the result before q filtering and satisfying the condition that the difference between q and p is minimum, the method of unconstrained image restoration can be converted into an optimization problem, and the cost function is as follows:
qi=pi-ni
where n is noise and p is a degraded image with q contaminated by noise n.
Limiting i in a window w, so that the value a is not too large, and solving the previous test by using a least square method to obtain
Figure BDA0002236571780000043
Figure BDA0002236571780000044
Wherein, mu and sigma2Respectively, the mean and variance of I in the local window ω. | ω | is the number of pixels within the window. Then, a window operation is adopted in the whole image, and finally, the average value is taken to obtain the output result of a certain pixel point as follows:
Figure BDA0002236571780000045
wherein
Furthermore, the GoogleLeNet network model is a 22-layer neural network, an optimal local network structure is found from the existing deep network structure, and in order to avoid the problem of gradient disappearance caused by depth increase, two losses are skillfully added at different depths of the network layer to ensure the gradient return disappearance phenomenon. In order to avoid the problems of overfitting phenomenon caused by excessive parameters due to width increase and difficulty In application caused by excessive calculation complexity, the google lenet adopts an inclusion structure, namely a Network-In-Network (Network In Network) structure, namely, an original node is also a Network. The Network-in-Network model uses a fully-connected multilayer perceptron to replace the traditional convolution process so as to obtain more comprehensive expression of features, meanwhile, because the process of improving the feature expression is already carried out in the prior art, the last fully-connected layer of the traditional CNN is replaced by a global average pooling layer, at the moment, the map has enough credibility for classification, and the loss can be directly calculated through softmax.
Further, an overall target detection model is built, an improved VGG16 network is used as a first-layer network of the model, coordinates of a target region are output, image processing of the target region is carried out after guiding filtering, finally target recognition is carried out through a GoogLeNet network model, and an overall framework is packaged to form a new target detection model.
Compared with the prior art, the invention adopts the technical scheme and has the following technical effects:
when solving the problem of object detection in low visibility environments. Using guided filtering, the solution of the problem can be made closer to reality. Under the condition that the target detection is influenced by low-visibility environments such as dust, smoke, water mist and illumination factors, the target can be accurately detected, the detection accuracy of the network model is improved, the false alarm rate of the network model is reduced, and the false alarm rate of the network model is reduced. The target coordinates are found by using the improved VGG16 network, the identification accuracy of the improved VGG16 network is fully utilized, guide filtering is applied to three color channels (RGB), images are enhanced, the problem of low-visibility target detection is solved, finally, a GoogLeNet network model is introduced, on the premise that the target detection accuracy is ensured, a target area detected by the improved VGG16 network is detected, identified and classified, and the false alarm of the VGG16 network are reduced. Compared with other inventions, the invention has higher detection precision, higher detection speed and more accurate identification result.
Drawings
FIG. 1 is a flow chart of the present invention
FIG. 2 is a diagram of an improved VGG16 network architecture
FIG. 3 is a drawing of the incorporation structure adopted by the GoogleLeNet network model
FIG. 4 is a dimension-reduced and improved inclusion structure
FIG. 5 is a graph showing the training effect of the VGFG model
FIG. 6 is a comparison graph of the effect of each target detection model
Detailed Description
The technical scheme of the invention is further explained in detail by combining the attached drawings 1-6:
fig. 1 shows a flow chart of the present invention, and each step is explained in detail based on the contents shown in the flow chart.
Step 1, acquiring a real-time video stream;
in step 1, acquiring the real-time video stream requires adding one more network card to the host for accessing the video recorder, downloading Haokangwei video network video monitoring 4200 (selected according to the models of the camera and the silver disc recorder), modifying the computer IP (modifying the IP added with the network card) according to the IP of the hard disk video recorder to ensure that the network segment of the computer is consistent with the hard disk video recorder, accessing the IP of the hard disk video recorder by using a browser, inputting a user name and a password on a pop-up interface for login, and if the video stream is displayed, proving that the acquisition is successful. The hard disk recorder can be connected with four cameras, so that the lower channel numbers of 33,34,35 and 36 respectively need to be modified.
Step 2: generating a data set in deep learning;
in step 2, a data set in deep learning is generated, the data format is VOC, labelImg software is opened for image annotation, an XML file is generated, an XML file in the VOC format is converted into a txt file, a folder VOC is created, the VOC folder is composed of 4 folders, namely, indications (storing all XML files), ImageSets folder (under the ImageSets folder, there are two folders Main and Layout, the files in Main respectively indicate that test.txt is a test set, train.txt is a training set, val.txt is a verification set, in.txt is a training and verification set.), JPEGImages (storing all image files), and labels folder (storing txt files). And the VOC data set is manufactured, each folder stores a file, and the network model is relatively standard and convenient to train.
And step 3: setting parameters of the improved VGG16 network;
table 1 improved VGG16 network parameter settings
And 4, step 4: processing and classifying the target data using the modified VGG16 network;
the VGG16 network is specifically improved by adding new convolutional layers in the VGG16 network to obtain more feature maps for detection, the VGG16 network feature map has high resolution, the information in the original image is more complete, and the receptive field is smaller, when the VGG16 network is improved, the full connection layers fc6 and fc7 of the VGG16 are respectively converted into 3 × 3 convolutional layers Conv6 and 1 × 1 convolutional layers Conv7, and simultaneously the pooling layer pool5 is changed from 2 × 2, which is the original stride 2, to 3 × 3, which is the stride 1, in order to match the change, an AtrousAlgorithm is adopted, and the convolutional layers Conv6 adopt extended convolution or perforated convolution (translation Conv), which exponentially expands the visual field of convolution under the condition of not increasing parameters and model complexity. The dropout layer and fc8 layers are then removed and a series of convolution layers are added and fine-tuned on the test data set.
The data set is processed by using the improved VGG16 network, and the improved VGG16 network is trained, so that the target can be identified, and the coordinate information of the target area can be output for the next step. The improved VGG16 network structure is shown in FIG. 2, and the core of the improved VGG16 network structure is that a convolution kernel is adopted on a feature map to predict class scores and offsets of a series of default bounding boxes. In order to improve the detection accuracy, prediction is carried out on feature maps with different scales, and in addition, results with different aspect ratios are obtained. The improved VGG16 network realizes end-to-end training, and can ensure the detection accuracy under the condition of low image resolution. The improved VGG16 network extracts feature maps of different scales for detection, a large-scale feature map (a feature map at the front) can be used for detecting small objects, a small-scale feature map (a feature map at the back) is used for detecting large objects, Prior boxes (Default boxes) of different scales and aspect ratios are adopted, each unit is provided with a Prior box of different scales or aspect ratios, and predicted bounding boxes (bounding boxes) are based on the Prior boxes, so that the training difficulty is reduced to a certain extent.
And 5: testing the performance of the trained network model to find a current performance optimal model;
the trained VGG16 network is tested until an optimal model is found, the training time of the improved VGG16 network is 8 hours, 240000 iterations are performed totally, 1 model is output every 10 iterations, models with the iteration times of 5000, 6000, 10000, 120000, 18000 and 240000 are taken for testing, the test result shows that the model with the iteration time of 180000 has the best effect, the best learning rate is 0.01, the best training time is 8 pieces/time and is more than 8 pieces/time, the video memory explosion can be caused, the training is stopped, and the precision is reduced when the number of training is less than 8 pieces/time.
Step 6: introducing guiding filtering by combining the characteristics of the target in a low-visibility environment, so that the model can accurately find the target and determine the target coordinate;
in step 6, real-time target detection in a low-visibility environment has the characteristics that the real-time target detection is greatly influenced by smoke, water mist and light shadow, a common target detection model cannot identify the real-time target detection, according to the characteristics, guide filtering is introduced to process an image, the guide filtering is introduced after an improved VGG16 network, a target area detected by the improved VGG16 network is subjected to smoothing, enhancing, feathering and defogging, the boundary is kept while the image is smoothed, the guide filtering needs a guide image during filtering, the guide image can be an additional independent image or an input image, when the guide image is the input image, time is saved, the guide filtering becomes a filtering operation for keeping the edge, the complexity of the guide filtering is irrelevant to the size of a window, when a specific output value of a certain point pixel value is obtained, only linear average function values of the point are needed, when a large picture is processed, the efficiency is obviously improved, and meanwhile, the phenomenon of gradient overturning in bilateral filtering can be well overcome by guiding filtering.
In a general linear rotation variation filtering process, the output of a certain pixel point is:
Figure BDA0002236571780000081
Wijfor weighting, in bilateral filtering, the weighting function is expressed as:
Figure BDA0002236571780000082
guiding filtering, wherein the output result of a certain pixel point is as follows:
Figure BDA0002236571780000083
there is a linear relationship between the guide images I and q, and the settings can be such that the information provided by the guide images is primarily used to indicate which are edges. If the pilot map indicates edges here, the end result tries to preserve the edge information. Therefore, the preconditions for the guided filtering are: it makes sense when I and q satisfy a linear relationship.
And introducing guide filtering to process the target area, and transmitting the processed target area coordinates into a next-level network for target identification.
And 7: introducing a GoogLeNet network model, performing multi-scale training, extracting deep-level features of data, and then detecting a target in a coordinate region;
google lenet proposed the most straightforward way to advance deep neural networks is to increase the size of the network, including width and depth. Depth is the number of layers in the network and width refers to the number of neurons used in each layer. This straightforward solution, however, has two significant drawbacks. The increase in network size results in an increase in parameters, which also makes the network more susceptible to overfitting, with a concomitant increase in computational resources.
Therefore, the fully-connected mode is changed into the sparse connection to solve the two problems. When the probability distribution of the data set is expressed by a large and sparse deep neural network, the network topology can be optimized by analyzing the activation values of the upper layer highly correlated with the output and the related statistical information of the clustering neurons layer by layer. But this has a very large number of limitations. It is therefore proposed to use the Hebbian principle, which makes the above idea practically feasible with a few restrictions.
Generally, full connections are used to better optimize parallel computation, while sparse connections are used to break symmetry to improve learning, and traditionally, sparsity in the spatial domain is often used by convolution, but convolution also connects to patches in the early layers of the network are dense connections, thus sparsity is used at the filter level, not on neurons. However, on the nonuniform sparse data structure, computation efficiency of numerical values is low, undefined cost of searching and caching is high, and requirements on computing infrastructure are high, so that the sparse matrix is considered to be clustered into a relatively dense subspace to be prone to computation optimization of the sparse matrix. The inclusion structure was therefore proposed (see figure 2).
The main idea of the inclusion structure is to approximate and cover an optimized local sparse structure in the convolutional vision network by a series of readily available dense substructures. The network topology is formed by analyzing the related statistical information of the previous layer by layer and aggregating the statistical information into a highly related unit group, the clusters (unit group) express the units (neurons) of the next layer and are connected with the previous units, and the related units close to the bottom layer of the input image are aggregated in a local area, so that the clusters can be aggregated on a single area to be ended, the clusters can be covered by a layer of convolution layer of 1x1 on the next layer, namely, the clusters which are spread in a larger space by using a smaller number can be covered by convolution on larger patches, and the number of the patches on the larger area can be reduced.
To avoid the patch alignment problem, the size of the filter in the inclusion structure is therefore limited to 1x1, 3x3, 5x 5. Because the inclusion structures are stacked, the output related statistical information is different: in order to extract more abstract features at a high layer, the spatial aggregation is reduced, so that the features of a larger area are captured by increasing the number of convolutions of 3x3 and 5x5 in the high-layer inclusion structure.
In the above inclusion structure, since the computational overhead of a filter of 5 × 5 size is very large due to the increase in the number of filters, in addition to the pooling operation, the merging of the pooled layer output with the convolutional layer output increases the number of output values, and may cover an optimized sparse structure, the processing is very inefficient, causing computational explosion. Thus leading to a dimension-reduced inclusion structure (see figure 4).
The dimension reduction inclusion structure has many nests, the low-dimensional nest contains a large amount of picture patch information, the nest expresses a dense and compressed information form, but the nest is required to express sparseness, and the signal is compressed only when a large amount of aggregation occurs, so that the dimension reduction processing is considered by performing 1x1 convolution before 3x3 and 5x5 convolution operations, and 1x1 not only reduces the dimension, but also introduces ReLU nonlinear activation. It has been found that it is more advantageous to use the inclusion structure only in the upper layers of the overall network.
The advantage of the dimension-reduced inclusion structure is that the number of units in each stage, namely the width and depth of the network, can be increased when the computational explosion without uncontrolled computational complexity exists; meanwhile, the structure is similar to that of the image after multi-scale processing, and processing results are gathered together so that features under different sizes can be extracted at the same time in the next stage.
Due to the problem of the large computational load of sparse structures, a convolution of 1x1 is adopted to reduce the computation of parameters, wherein the 1x1 convolution is interpreted as:
before the 3x3 and 5x5 layers, a convolution operation of 1x1 is added respectively. Convolution of 1x1 (or a network in the network layer) provides a method of reducing dimensionality. Assume that there is one input layer and the volume is 100x100x60 (input for each layer in the network). Adding 20 convolution filters of 1x1 reduces the input volume to 100x100x 20. Furthermore, 3x3 layers and 5x5 layers do not need to handle as much volume as the input layers. This can be considered as a "pooling of features" because the height of the volume is being reduced, similar to reducing the width and length using the commonly used maximum pooling layers (maxporoling layers). These 1x1 convolutional layers are followed by a ReLU unit. Dimension-reduced inclusion structural form: the model consists of a network in a network layer, a medium-sized filtering convolution, a large-sized filtering convolution, and a Poolingperation pool (Poolingperation). The network in the network convolutional layer can extract information in every detail in the input volume, while the 5x5 filter can also cover most of the input of the receiving layer, and thus extract the information therein. A pool operation may also be performed to reduce the size of the space and reduce overfitting. On top of these layers, there is a ReLU after each convolutional layer, which can improve the non-linear characteristics of the network.
Table 2 google lenet network block diagram details
Figure BDA0002236571780000101
Figure BDA0002236571780000111
Table 2 is a network block diagram detail of google lenet, where "# 3x3 reduce", "# 5x5 reduce" represents the number of convolutions using 1x1 prior to the 3x3, 5x5 convolution operation. The input image is 224x224x3, and all the dimensionality reduction layers are subjected to a preprocessing operation of zero averaging, and all the dimensionality reduction layers are also subjected to a ReLU nonlinear activation function. The GoogLeNet network model uses a dimensionality reduction inclusion structure, which is equivalent to model fusion, and simultaneously adds a counter-propagating gradient signal to the network, and also provides extra regularization, so that the training speed is increased and the precision is improved. And extracting deep-level features of the data, and then detecting the target in the coordinate region.
And 8: and constructing an integral target detection model, namely a VGFG (VGG 16-defined Filter GoogleLeNet) model, and testing the model by using a modified VGG16 network, a GoogleLeNet target detection algorithm and a guide Filter combined model.
The specific pseudo code of the overall target detection model is constructed as follows:
TABLE 3 Algorithm one
The step is mainly to load the configuration, and load the context framework, the trained improved VGG16 network, the prototxt file, the GoogleLeNet target detection model and the configuration file thereof.
TABLE 4 Algorithm two
The video stream is read from the camera according to frames, the frame picture is compressed, the color image channel is sliced, the boundary frame coordinate is calculated, the image is transmitted to the network in the forward direction, and the like, and the frame picture is converted into binary data to be transmitted to a target detection model for detection.
TABLE 5 Algorithm III
Figure BDA0002236571780000131
And the step is mainly used for detecting the correctness of the target region subjected to the guide filtering processing, transmitting the position coordinates into a GoogLeNet network for verification, drawing out the target region by using a wire frame if the target region is detected, and returning to detect the next frame if the target region is not detected.
Therefore, the integral target detection model VGFG is built, and the VGFG target detection model can accurately detect the target under the influence of a low-visibility environment. All the performances are better.
According to the invention, the frame picture is processed by using the guide filtering in the target detection under low visibility, so that the target can be accurately detected under the influence of a low visibility environment, the detection accuracy of the network model is improved, the false alarm rate of the network model is reduced, and the false alarm rate of the network model is reduced.
Detailed description of the invention
The model training diagram of the embodiment is shown in fig. 5, loss of the model can be zero, the accuracy of the model can reach 92.7%, and the model has high accuracy, the embodiment is tested by combining with practical application scenes, the test result is shown in table 6, the test scene is a real-time video in a belt corridor of a certain company, the influence of smoke dust, dust and water mist in the belt corridor is large, the visibility is low, the cameras at different positions are subjected to scene-divided testing, the testing time is 24 hours, and the number of tested frame pictures is 1728000 (the classification number is 2, the classification is human and background). The data statistics are obtained by counting the frame pictures in the database.
TABLE 6 VGFG model test Effect statistics
Figure BDA0002236571780000132
Figure BDA0002236571780000141
The analysis of table 6 shows that the accuracy of the VGFG model is about 92.7%, the recall rate is 80.6%, the false alarm rate of the model is low, but the missing report rate is high, and the VGFG model already has a certain target detection capability in a low-visibility environment.
In the embodiment, the detection model provided by the invention is used for carrying out accuracy comparison with other detection models, and other algorithms comprise four network models of Fast R-CNN, SSD, yolov3 and GoogleNet for carrying out comparison experiments. The other four models are pre-trained, and the public data set V0C2012 is used to verify the correctness of the VGFG model and the other four models, and the result is shown in FIG. 6, and as can be seen from FIG. 6, the problem solving of the VGFG model provided by the invention is closer to reality and can achieve high accuracy.
The foregoing detailed description has been presented in conjunction with the appended drawings to illustrate embodiments of the invention, and the detailed description is provided to facilitate an understanding of the methods of the invention. For those skilled in the art, the invention can be modified and adapted within the scope of the embodiments and applications according to the spirit of the present invention, and therefore the present invention should not be construed as being limited thereto.

Claims (9)

1. A real-time target detection method based on deep learning in a low-visibility environment is characterized in that an improved VGG16 network, a GoogleLeNet target detection algorithm and guiding filtering are combined, so that target detection can be accurately identified in low-visibility environments such as smoke, water mist and insufficient light. The specific process comprises the following steps:
step 1: real-time video stream acquisition
Step 2: generating a data set in target detection;
and step 3: setting parameters of the VGG16 network;
and 4, step 4: introducing a VGG16 network to process and classify target data;
and 5: testing the performance of the trained network model to find a current performance optimal model;
step 6: introducing guiding filtering by combining the characteristics of the target in a low-visibility environment, so that the model can accurately find the target and determine the target coordinate;
and 7: introducing a GoogLeNet network model, performing multi-scale training, extracting deep-level features of data, and then detecting a target in a coordinate region;
and 8: and constructing an integral target detection framework, namely a VGFG (VGG16 Guided Filter GoogleLeNet) model, and testing by using a model obtained by combining an improved VGG16 network, a GoogleLeNet target detection algorithm and guide filtering.
2. The method of claim 1, wherein the hard disk recorder IP is matched with the host IP, and the determination of the channel number in the RTSP protocol.
3. The method for detecting the real-time target in the low-visibility environment based on the deep learning as claimed in claim 1, wherein a VOC format data set is produced, and picture data in the data set is labeled by using a labelImg tool to generate an XML file.
4. The method for detecting the real-time target in the low-visibility environment based on the deep learning as claimed in claim 1, wherein a priori frames, a network learning rate and training layer numbers with appropriate scales are selected. For the scale of the prior box, it obeys a linear increasing rule: as the feature map size decreases, the a priori box scales add linearly. The calculation formula is
Figure FDA0002236571770000011
Where m is the number of feature maps, SkRepresenting the ratio of the prior frame size to the picture, SminAnd SmaxThe maximum and minimum values of the ratio are indicated, respectively.
The selection of the learning rate is a super-parameter problem, continuous testing is needed, the basic range is 0.1, 0.01, 0.001, 0.0001, the magnitude order is increased to carry out testing until an optimal value is found, and adjustment can be carried out through the gradient of a loss function, wherein the loss function is
Figure FDA0002236571770000021
Wherein N represents the number of positive samples and represents
Figure FDA0002236571770000022
Whether the prior frame is matched with the ith group route and the type of the prior frame is p, and c represents the confidence coefficient predicted value of the type. l represents the predicted value of the position of the corresponding bounding box of the prior frame, and g represents the position parameter of the ground channel.
5. The method for detecting the real-time target under the low-visibility environment based on the deep learning as claimed in claim 1, wherein the convolution layer weight is pre-trained, and the file darknet53.conv.74 and the deep learning data set generated in the step 1 are downloaded for the improved VGG16 network training.
6. The method for detecting the real-time target in the low-visibility environment based on the deep learning as claimed in claim 1, wherein training is repeated, parameters are adjusted according to the step 2, the improved VGG16 network model outputs one model for each 100 iterations, the training is performed for 240000 times, and test results show that the model effect is optimal for 180000 iterations.
7. The method for detecting the real-time target in the low-visibility environment based on the deep learning is characterized in that the target in the low-visibility environment is characterized by being large in smoke, water fog and light shadow influence and difficult to identify, and the guiding filtering is introduced aiming at the target in the low-visibility environment, is self-adaptive weight filtering, can play a role in keeping a boundary while smoothing an image, and can perform operations of smoothing, enhancing, extinction, feathering, defogging, joint sampling and the like on the image. Guiding filtering, wherein the output result of a certain pixel point is as follows:
Figure FDA0002236571770000023
where q is the output image, I is the guide image, and a and b are the invariant coefficients of the linear function when the window center is located at k. The assumed conditions of the method are: q and I have a local linear relationship in the window centered on pixel k. Taking the derivative of the above equation (i.e., representing an edge) it can be seen that the output result will only appear edge when there is an edge in the guide image. In order to solve the coefficients a and b in the test, assuming that p is the result before q filtering and satisfying the condition that the difference between q and p is minimum, the method of unconstrained image restoration can be converted into an optimization problem, and the cost function is as follows:
qi=pi-ni
where n is noise and p is a degraded image with q contaminated by noise n.
Figure FDA0002236571770000031
Limiting i in a window w, so that the value a is not too large, and solving the previous test by using a least square method to obtain
Figure FDA0002236571770000032
Wherein, mu and sigma2Respectively, the mean and variance of I in the local window ω. | ω | is the number of pixels within the window. Then, a window operation is adopted in the whole image, and finally, the average value is taken to obtain the output result of a certain pixel point as follows:
Figure FDA0002236571770000034
wherein
Figure FDA0002236571770000035
8. The method as claimed in claim 1, wherein the google lenet network model is a 22-layer neural network, and finds an optimal local network structure from the existing deep network structures, and to avoid the problem of gradient disappearance due to depth increase, two los are skillfully added at different depths of the network layer to ensure the gradient return disappearance. In order to avoid the problems of overfitting phenomenon caused by excessive parameters due to width increase and difficulty In application caused by excessive calculation complexity, the google lenet adopts an inclusion structure, namely a Network-In-Network (Network In Network) structure, namely, an original node is also a Network. The Network-in-Network model uses a fully-connected multilayer perceptron to replace the traditional convolution process so as to obtain more comprehensive expression of features, meanwhile, because the process of improving the feature expression is already carried out in the prior art, the last fully-connected layer of the traditional CNN is replaced by a global average pooling layer, at the moment, the map has enough credibility for classification, and the loss can be directly calculated through softmax.
9. The method for detecting the target in real time under the low-visibility environment based on the deep learning is characterized in that an overall target detection model is built, a modified VGG16 network is used as a first-layer network of the model, the coordinates of a target area are output, image processing of the target area is carried out through guide filtering, finally target recognition is carried out through a GoogLeNet network model, and an overall framework is packaged to form a new target detection model (VGFG model).
CN201910985552.5A 2019-10-17 2019-10-17 Real-time target detection method in low-visibility environment based on deep learning Pending CN110689021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910985552.5A CN110689021A (en) 2019-10-17 2019-10-17 Real-time target detection method in low-visibility environment based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910985552.5A CN110689021A (en) 2019-10-17 2019-10-17 Real-time target detection method in low-visibility environment based on deep learning

Publications (1)

Publication Number Publication Date
CN110689021A true CN110689021A (en) 2020-01-14

Family

ID=69112986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910985552.5A Pending CN110689021A (en) 2019-10-17 2019-10-17 Real-time target detection method in low-visibility environment based on deep learning

Country Status (1)

Country Link
CN (1) CN110689021A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210474A (en) * 2020-02-26 2020-05-29 上海麦图信息科技有限公司 Method for acquiring real-time ground position of airplane in airport
CN111291647A (en) * 2020-01-21 2020-06-16 陕西师范大学 Single-stage action positioning method based on multi-scale convolution kernel and superevent module
CN111340771A (en) * 2020-02-23 2020-06-26 北京工业大学 Fine particle real-time monitoring method integrating visual information richness and wide-depth combined learning
CN111723656A (en) * 2020-05-12 2020-09-29 中国电子系统技术有限公司 Smoke detection method and device based on YOLO v3 and self-optimization
CN111931857A (en) * 2020-08-14 2020-11-13 桂林电子科技大学 MSCFF-based low-illumination target detection method
CN112016558A (en) * 2020-08-26 2020-12-01 大连信维科技有限公司 Medium visibility identification method based on image quality
CN112214369A (en) * 2020-10-23 2021-01-12 华中科技大学 Hard disk fault prediction model establishing method based on model fusion and application thereof
CN112862715A (en) * 2021-02-08 2021-05-28 天津大学 Real-time and controllable scale space filtering method
CN113553937A (en) * 2021-07-19 2021-10-26 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN114445408A (en) * 2022-04-11 2022-05-06 山东仕达思生物产业有限公司 Improved circulation-oriented filtering algorithm-based pathogen detection promoting method, equipment and storage medium
CN115931359A (en) * 2023-03-03 2023-04-07 西安航天动力研究所 Turbine pump bearing fault diagnosis method and device
CN116977336A (en) * 2023-09-22 2023-10-31 苏州思谋智能科技有限公司 Camera defect detection method, device, computer equipment and storage medium
CN112016558B (en) * 2020-08-26 2024-05-31 大连信维科技有限公司 Medium visibility recognition method based on image quality

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140282938A1 (en) * 2013-03-15 2014-09-18 Adam Moisa Method and system for integrated cloud storage management
CN107316001A (en) * 2017-05-31 2017-11-03 天津大学 Small and intensive method for traffic sign detection in a kind of automatic Pilot scene
CN108304808A (en) * 2018-02-06 2018-07-20 广东顺德西安交通大学研究院 A kind of monitor video method for checking object based on space time information Yu depth network
CN108319949A (en) * 2018-01-26 2018-07-24 中国电子科技集团公司第十五研究所 Mostly towards Ship Target Detection and recognition methods in a kind of high-resolution remote sensing image
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109685066A (en) * 2018-12-24 2019-04-26 中国矿业大学(北京) A kind of mine object detection and recognition method based on depth convolutional neural networks
CN109740588A (en) * 2018-12-24 2019-05-10 中国科学院大学 The X-ray picture contraband localization method reassigned based on the response of Weakly supervised and depth
CN109753903A (en) * 2019-02-27 2019-05-14 北航(四川)西部国际创新港科技有限公司 A kind of unmanned plane detection method based on deep learning
CN109902697A (en) * 2017-12-07 2019-06-18 展讯通信(天津)有限公司 Multi-target detection method, device and mobile terminal
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning
CN110222787A (en) * 2019-06-14 2019-09-10 合肥工业大学 Multiscale target detection method, device, computer equipment and storage medium
CN110298321A (en) * 2019-07-02 2019-10-01 中国科学院遥感与数字地球研究所 Route denial information extraction based on deep learning image classification
CN110298410A (en) * 2019-07-04 2019-10-01 北京维联众诚科技有限公司 Weak target detection method and device in soft image based on deep learning

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140282938A1 (en) * 2013-03-15 2014-09-18 Adam Moisa Method and system for integrated cloud storage management
CN107316001A (en) * 2017-05-31 2017-11-03 天津大学 Small and intensive method for traffic sign detection in a kind of automatic Pilot scene
CN109902697A (en) * 2017-12-07 2019-06-18 展讯通信(天津)有限公司 Multi-target detection method, device and mobile terminal
CN108319949A (en) * 2018-01-26 2018-07-24 中国电子科技集团公司第十五研究所 Mostly towards Ship Target Detection and recognition methods in a kind of high-resolution remote sensing image
CN108304808A (en) * 2018-02-06 2018-07-20 广东顺德西安交通大学研究院 A kind of monitor video method for checking object based on space time information Yu depth network
CN109117876A (en) * 2018-07-26 2019-01-01 成都快眼科技有限公司 A kind of dense small target deteection model building method, model and detection method
CN109344821A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Small target detecting method based on Fusion Features and deep learning
CN109685066A (en) * 2018-12-24 2019-04-26 中国矿业大学(北京) A kind of mine object detection and recognition method based on depth convolutional neural networks
CN109740588A (en) * 2018-12-24 2019-05-10 中国科学院大学 The X-ray picture contraband localization method reassigned based on the response of Weakly supervised and depth
CN109753903A (en) * 2019-02-27 2019-05-14 北航(四川)西部国际创新港科技有限公司 A kind of unmanned plane detection method based on deep learning
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning
CN110222787A (en) * 2019-06-14 2019-09-10 合肥工业大学 Multiscale target detection method, device, computer equipment and storage medium
CN110298321A (en) * 2019-07-02 2019-10-01 中国科学院遥感与数字地球研究所 Route denial information extraction based on deep learning image classification
CN110298410A (en) * 2019-07-04 2019-10-01 北京维联众诚科技有限公司 Weak target detection method and device in soft image based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GEUMYOUNG SON等: "Video Based Smoke and Flame Detection Using Convolutional Neural Network", 《IEEE》 *
程显毅等: "基于深度学习监控场景下的多尺度目标检测算法研究", 《南京师范大学学报(工程技术版)》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291647A (en) * 2020-01-21 2020-06-16 陕西师范大学 Single-stage action positioning method based on multi-scale convolution kernel and superevent module
CN111340771A (en) * 2020-02-23 2020-06-26 北京工业大学 Fine particle real-time monitoring method integrating visual information richness and wide-depth combined learning
CN111340771B (en) * 2020-02-23 2024-04-09 北京工业大学 Fine particulate matter real-time monitoring method integrating visual information richness and wide-depth joint learning
CN111210474B (en) * 2020-02-26 2023-05-23 上海麦图信息科技有限公司 Method for acquiring real-time ground position of airport plane
CN111210474A (en) * 2020-02-26 2020-05-29 上海麦图信息科技有限公司 Method for acquiring real-time ground position of airplane in airport
CN111723656A (en) * 2020-05-12 2020-09-29 中国电子系统技术有限公司 Smoke detection method and device based on YOLO v3 and self-optimization
CN111723656B (en) * 2020-05-12 2023-08-22 中国电子系统技术有限公司 Smog detection method and device based on YOLO v3 and self-optimization
CN111931857A (en) * 2020-08-14 2020-11-13 桂林电子科技大学 MSCFF-based low-illumination target detection method
CN112016558A (en) * 2020-08-26 2020-12-01 大连信维科技有限公司 Medium visibility identification method based on image quality
CN112016558B (en) * 2020-08-26 2024-05-31 大连信维科技有限公司 Medium visibility recognition method based on image quality
CN112214369A (en) * 2020-10-23 2021-01-12 华中科技大学 Hard disk fault prediction model establishing method based on model fusion and application thereof
CN112862715A (en) * 2021-02-08 2021-05-28 天津大学 Real-time and controllable scale space filtering method
CN113553937A (en) * 2021-07-19 2021-10-26 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN114445408A (en) * 2022-04-11 2022-05-06 山东仕达思生物产业有限公司 Improved circulation-oriented filtering algorithm-based pathogen detection promoting method, equipment and storage medium
CN114445408B (en) * 2022-04-11 2022-06-24 山东仕达思生物产业有限公司 Improved circulation-oriented filtering algorithm-based pathogen detection promoting method, equipment and storage medium
CN115931359A (en) * 2023-03-03 2023-04-07 西安航天动力研究所 Turbine pump bearing fault diagnosis method and device
CN115931359B (en) * 2023-03-03 2023-07-14 西安航天动力研究所 Turbine pump bearing fault diagnosis method and device
CN116977336A (en) * 2023-09-22 2023-10-31 苏州思谋智能科技有限公司 Camera defect detection method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110689021A (en) Real-time target detection method in low-visibility environment based on deep learning
Fu et al. Fast and accurate detection of kiwifruit in orchard using improved YOLOv3-tiny model
CN110472627B (en) End-to-end SAR image recognition method, device and storage medium
CN109344701B (en) Kinect-based dynamic gesture recognition method
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN110929593B (en) Real-time significance pedestrian detection method based on detail discrimination
Kalsotra et al. Background subtraction for moving object detection: explorations of recent developments and challenges
CN112861575A (en) Pedestrian structuring method, device, equipment and storage medium
CN111368634B (en) Human head detection method, system and storage medium based on neural network
Lv et al. A visual identification method for the apple growth forms in the orchard
Shen et al. Biomimetic vision for zoom object detection based on improved vertical grid number YOLO algorithm
Liu et al. High-throughput rice density estimation from transplantation to tillering stages using deep networks
Han et al. Research on remote sensing image target recognition based on deep convolution neural network
Geng et al. An improved helmet detection method for YOLOv3 on an unbalanced dataset
US20230095533A1 (en) Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling
Jemilda et al. Moving object detection and tracking using genetic algorithm enabled extreme learning machine
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
Hu et al. RGB-D image multi-target detection method based on 3D DSF R-CNN
CN116740516A (en) Target detection method and system based on multi-scale fusion feature extraction
CN112418256A (en) Classification, model training and information searching method, system and equipment
Zhang et al. Robust object detection in aerial imagery based on multi-scale detector and soft densely connected
Zhu et al. Scene text relocation with guidance
Alsaadi et al. An automated mammals detection based on SSD-mobile net
CN112418262A (en) Vehicle re-identification method, client and system
CN113723833B (en) Method, system, terminal equipment and storage medium for evaluating quality of forestation actual results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200114

WD01 Invention patent application deemed withdrawn after publication