CN113436217A - Unmanned vehicle environment detection method based on deep learning - Google Patents
Unmanned vehicle environment detection method based on deep learning Download PDFInfo
- Publication number
- CN113436217A CN113436217A CN202110838473.9A CN202110838473A CN113436217A CN 113436217 A CN113436217 A CN 113436217A CN 202110838473 A CN202110838473 A CN 202110838473A CN 113436217 A CN113436217 A CN 113436217A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- resolution image
- gaussian
- super
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 53
- 238000013135 deep learning Methods 0.000 title claims abstract description 16
- 238000009826 distribution Methods 0.000 claims abstract description 29
- 230000001629 suppression Effects 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 15
- 230000003044 adaptive effect Effects 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000000694 effects Effects 0.000 description 11
- 238000011161 development Methods 0.000 description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011217 control strategy Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/181—Segmentation; Edge detection involving edge growing; involving edge linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an unmanned vehicle environment detection method based on deep learning, which comprises the following steps: step (1), obtaining a super-resolution image; step (2), generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target; and (3) self-adaptive Gaussian distribution modeling, and (4) fusing an EEN and a self-adaptive uncertainty target positioning-based end-to-end detection model through a dynamic suppression strategy. The method has strong self-adaptive capacity on small targets, medium targets and large targets, and verifies that when the density of the targets in the image changes greatly, the problems of high overlap rate and high error rate in detection can be solved by increasing or reducing the threshold value through a dynamic adjustment strategy, so that good generalization capacity and scene self-adaptive capacity are shown.
Description
Technical Field
The invention relates to the technical field of environment detection, in particular to an unmanned vehicle environment detection method based on deep learning.
Background
Unmanned ground vehicles are also called unmanned vehicles (unmanned vehicles for short), and due to the rapid development of the development of novel sensors and the basic research of machine learning technology in recent years, the development of civil unmanned vehicles becomes technically possible. Research institutions and enterprises at home and abroad are invested in research and development lines of intelligent automobiles or unmanned automobiles, and some of the institutions are said to realize commercial popularization of unmanned automobiles in five years in the future.
Disclosure of Invention
The invention aims to provide an unmanned vehicle environment detection method based on deep learning to solve the problems in the background technology.
The technical problem solved by the invention is realized by adopting the following technical scheme: the unmanned vehicle environment detection method based on deep learning comprises the following steps:
firstly, inputting an input low-resolution image into a G generator in a countermeasure generation network, inputting the image generated by the generator into an edge enhancement network, and then mapping the extracted edge features into a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image;
generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target;
step (3), self-adaptive Gaussian distribution modeling is carried out, and a threshold value is increased when the target density is increased and mutual shielding is increased through a dynamic suppression strategy; when the target density is low and the object appears independently, the threshold value is reduced;
and (4) fusing an EEN and an adaptive uncertainty target positioning end-to-end detection model based on the GAN.
In the step (1), the network is focused on real edge information by the constructed Mask branch by using an attention mechanism to remove noise and artifacts, and the constructed Mask branch is used for learning an image to detect and eliminate the separated noise, namely, the wrong edge point when the edge is extracted.
After the edge enhancement network removes image noise and extracts high-frequency edge detail features to finally generate a super-resolution image in the step (2); the discriminator for generating the countermeasure network judges the generated super-resolution image, judges whether the image is a false image or not, finds the difference value between the GT image and the middle super-resolution image, and reversely transmits the difference value to the G generator to generate the super-resolution image until the discriminator cannot judge whether the generated image is true or false with the high-resolution image, and the whole network training is completed.
In the step (2), in the network branch generation during network training, an SRResNet structure is adopted as an integral network structure, a convolution layer with the step length of 1, 9 x 9 is used for extracting a primary feature map, then an RRDB dense residual block is used for extracting image semantic information to obtain a clearer edge feature map, then feature fusion is carried out on the edge feature map and the primary feature, and finally a medium-resolution image is obtained through an upsampling operation; sending the medium super-resolution image into an EEN network, and extracting edge features to obtain a super-resolution image; and finally, inputting the SR image into a YOLO3 detection network, and classifying and positioning to obtain a final result.
In the step (3), a CNN fitting density function is trained to be used as a supervision signal, namely, a picture is input, and the density of the object at each position can be output; the density function is shown as a formula (1), and the density of the target i is defined as the largest value of the bounding box iou in other targets of the label set;
thus, the dynamic suppression strategy is updated according to the density function definition using the following equations (4-19):
NM:=max(Nt,dM) (2)
wherein N isMAdaptive threshold representing target M, dMRepresents the density of the target M; there are three cases of dynamic suppression strategies: (1) when the adjacent bounding box is far away from M, i.e. iou (M, b)i)<NMConsistent with the initial NMS threshold; (2) when M is located in a dense region, i.e. dM>NtUsing the density value of M as the adaptive threshold N of A-NMSdM. Thus, adjacent candidate regions are retained, which may be located around M; (3) for objects in sparse regions, i.e. dM<NtNMS threshold equal to NtThus, FP can be reduced.
In the step (3), the self-adaptive Gaussian distribution modeling adopts a Gaussian distribution function to model the prediction coordinates, the mean value of the output coordinates is used as the mean value of the Gaussian distribution, and the variance represents the uncertainty of the prediction positioning.
Uncertainty of Box coordinates in the adaptive Gaussian distribution modeling in the step (3) can be modeled and evaluated by using each Gaussian model with center coordinates, width and height. For a given test sample x, the output y may be modeled with a model t consisting of Gaussian parametersx,ty,tw,thTo indicate indeterminate positioning, as shown by the following equation, P (y | x) ═ N (y; μ (x), s2(x) Wherein μ (x) and s2(x) Mean and variance of Box coordinates, respectively, and y is a value under the gaussian distribution.
To predict the uncertainty of the Box coordinates, the predicted feature map coordinates are mean and variance in Gaussian modeling, and the output isConsidering the nature of the gaussian distribution and the structure of the YOLO3 detection layer, the variance of the gaussian distribution is fixed between 0 and 1, fixing the range of the variance. Therefore, the following formula is adopted to preprocess the Gaussian parameters and establish four Gaussian distributions;
mean value of each coordinate in the detection layerPredicted coordinates representing a gaussian model; variance of each coordinateRepresenting the uncertainty of each coordinate. Because of the fact thatRepresents the center coordinates of the frame, so is processed to a value between 0 and 1 using the sigmoid function; variance of each coordinateProcessing the value between 0 and 1 based on the Sigmoid function to represent the reliability of the coordinate; in YOLO3, the height and width information of the bounding box is processed as tw、thA priori bounding boxes, i.e. Gaussian parametersRepresenting t in a YOLO networkw、th。
In the step (4), the image edge enhancement GAN network and the AN-Gaussian YOLOv3 are jointly designed into a unified framework, the loss function of the model is redesigned, the AN-Gaussian YOLOv3 algorithm is added to judge the loss to detect the loss, the COCO data set is tested, and the COCO data set is compared with other algorithms to verify the effectiveness of the algorithms.
Compared with the prior art, the invention has the beneficial effects that: the method has strong self-adaptive capacity on small targets, medium targets and large targets, and verifies that when the density of the targets in the image changes greatly, the problems of high overlap rate and high error rate in detection can be solved by increasing or reducing the threshold value through a dynamic adjustment strategy, so that good generalization capacity and scene self-adaptive capacity are shown.
Drawings
FIG. 1 is a schematic flow chart of step (1) of the present invention.
FIG. 2 is a schematic flow chart of step (2) of the present invention.
FIG. 3 is a schematic view of step (2) according to the present invention.
FIG. 4 is a diagram of the AN-Gaussian YOLOv3 predicted features of the present invention.
FIG. 5 is a diagram of the detection effect of the AN-Gaussian YOLOv3 algorithm in multiple scenes.
FIG. 6 is a graph of the IOU versus position uncertainty of the present invention.
Fig. 7(a) is a diagram of the effect of the uncertainty sparse scene detection of the present invention.
FIG. 7(b) is a diagram of the detection effect of the dense target scene under the Gaussian combination framework of the present invention.
FIG. 7(c) is a diagram of the effect of complex background detection in the Gaussian combination framework of the present invention.
FIG. 7(d) is a diagram of the occlusion detection effect under the Gaussian combination frame of the present invention.
FIG. 8 is a context aware skills development flow diagram of the present invention.
Fig. 9(a) is a diagram illustrating the effect of detecting an uncertain sparse scene according to the present invention.
Fig. 9(b) is a diagram of the effect of the uncertainty sparse scene detection of the present invention.
Fig. 9(c) is a diagram of the effect of the uncertainty sparse scene detection of the present invention.
Detailed Description
In the description of the present invention, it should be noted that unless otherwise specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements.
As shown in fig. 1 to 9, the method for detecting the environment of the unmanned vehicle based on deep learning includes the following steps:
firstly, inputting an input low-resolution image into a G generator in a countermeasure generation network, inputting the image generated by the generator into an edge enhancement network, and then mapping the extracted edge features into a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image;
generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target;
step (3), self-adaptive Gaussian distribution modeling is carried out, and a threshold value is increased when the target density is increased and mutual shielding is increased through a dynamic suppression strategy; when the target density is low and the object appears independently, the threshold value is reduced;
and (4) fusing an EEN and an adaptive uncertainty target positioning end-to-end detection model based on the GAN.
In the step (1), the network is focused on real edge information by the constructed Mask branch by using an attention mechanism to remove noise and artifacts, and the constructed Mask branch is used for learning an image to detect and eliminate the separated noise, namely, the wrong edge point when the edge is extracted.
After the edge enhancement network removes image noise and extracts high-frequency edge detail features to finally generate a super-resolution image in the step (2); the discriminator for generating the countermeasure network judges the generated super-resolution image, judges whether the image is a false image or not, finds the difference value between the GT image and the middle super-resolution image, and reversely transmits the difference value to the G generator to generate the super-resolution image until the discriminator cannot judge whether the generated image is true or false with the high-resolution image, and the whole network training is completed.
In the step (2), in the network branch generation during network training, an SRResNet structure is adopted as an integral network structure, a convolution layer with the step length of 1, 9 x 9 is used for extracting a primary feature map, then an RRDB dense residual block is used for extracting image semantic information to obtain a clearer edge feature map, then feature fusion is carried out on the edge feature map and the primary feature, and finally a medium-resolution image is obtained through an upsampling operation; sending the medium super-resolution image into an EEN network, and extracting edge features to obtain a super-resolution image; and finally, inputting the SR image into a YOLO3 detection network, and classifying and positioning to obtain a final result.
The network structure of D adopts VGG network, the activation function between layers is LEAKYRELU, and the number of basic channels is 64. After the features are extracted, the image output is 4 multiplied by 512, and finally the output probability value is obtained through classifier. The perceptual loss is used in network training and is the average of the difference between the ISR and HR images generated by the generator and the output of the network through the VGG 19.
Setting and training network parameters: the network model was trained using an Adam optimizer, where β1=0.9,β2The generator and arbiter are alternately updated until convergence, 0.999. Beta is a1The coefficient is exponential decay rate, and the specific gravity of momentum and the current gradient is controlled; beta is a2The coefficient is an exponential decay rate and controls the influence of the square of the gradient.
In the training process, the G and D networks are alternately updated based on Adam, the maximum iteration number is 500000, the initial learning rate is 0.0001, and the training is performed by halving after the iteration numbers are 50k, 100k and 250 k; the residual information needs to be scaled in the residual structure described above, and thus the size is set to 0.2.
In the step (3), a CNN fitting density function is trained to be used as a supervision signal, namely, a picture is input, and the density of the object at each position can be output; the density function is shown as a formula (1), and the density of the target i is defined as the largest value of the bounding box iou in other targets of the label set;
thus, the dynamic suppression strategy is updated according to the density function definition using the following equations (4-19):
NM:=max(Nt,dM) (2)
wherein N isMAdaptive threshold representing target M, dMRepresents the density of the target M; there are three cases of dynamic suppression strategies: (1) when the adjacent bounding box is far away from M, i.e. iou (M, b)i)<NMConsistent with the initial NMS threshold; (2) when M is located in a dense region, i.e. dM>Nt(4) Using the density value of M as the adaptive threshold N of A-NMSdM. Thus, adjacent candidate regions are retained, which may be located around M; (3) for objects in sparse regions, i.e. dM<Nt(5) NMS threshold equal to NtThus, FP can be reduced.
In the step (3), the self-adaptive Gaussian distribution modeling adopts a Gaussian distribution function to model the prediction coordinates, the mean value of the output coordinates is used as the mean value of the Gaussian distribution, and the variance represents the uncertainty of the prediction positioning.
Uncertainty of Box coordinates in the adaptive Gaussian distribution modeling in the step (3) can be modeled and evaluated by using each Gaussian model with center coordinates, width and height. For a given testSample x, output y may be modeled by a model t consisting of Gaussian parametersx,ty,tw,thTo indicate indeterminate positioning, as shown by the following equation, P (y | x) ═ N (y; μ (x), s2(x) (6) wherein μ (x) and s2(x) Mean and variance of Box coordinates, respectively, and y is a value under the gaussian distribution.
To predict the uncertainty of the Box coordinates, the predicted feature map coordinates are mean and variance in Gaussian modeling, and the output isConsidering the nature of the gaussian distribution and the structure of the YOLO3 detection layer, the variance of the gaussian distribution is fixed between 0 and 1, fixing the range of the variance. Therefore, the following formula is adopted to preprocess the Gaussian parameters and establish four Gaussian distributions;
mean value of each coordinate in the detection layerPredicted coordinates representing a gaussian model; variance of each coordinateRepresenting the uncertainty of each coordinate. Because of the fact thatRepresents the center coordinates of the frame, so is processed to a value between 0 and 1 using the sigmoid function; variance of each coordinateAlso processed as values between 0 and 1 based on Sigmoid function, representing coordinatesReliability; in YOLO3, the height and width information of the bounding box is processed as tw、thA priori bounding boxes, i.e. Gaussian parametersRepresenting t in a YOLO networkw、th。
The AN-Gauss ian Yolov3 algorithm flow: comprises the following steps:
the R-Darknet-53 optimized network obtained by YOLO in ImageNet classification task is used as AN initialization weight parameter of AN AN-Gaussian YOLOv3 backbone network, AN Adam optimizer is used for carrying out back propagation to update network parameters, the batch size of the network is set to 64, the initial learning rate is set to 0.001, 60000 iterations are carried out, and training is carried out by reducing the learning rate by half after 10k, 20k and 250k iterations respectively; and (5) adopting multi-scale training, and adjusting the resolution after each 10 iterations.
As can be seen from the detection effect diagram in fig. 5, the algorithm has accurate positioning precision for a small target at a longer distance in the upper part of the diagram, and can extract deep semantic information to further improve generalization capability; in the middle and lower parts of the graph, when the targets to be detected are shielded and the target background information is close to the target information, the model can still accurately position the targets. Therefore, the AN-Gaussian YOLOv3 algorithm designed by the invention has good generalization capability and self-adaption capability for the high target density and sparse target, the positioning accuracy of the algorithm is greatly improved, and the target can be accurately positioned for the detection results with poor illumination conditions and good target shielding problems. But the accuracy is affected to some extent by noise problems due to the lower resolution of the image.
FIG. 6 shows the relationship between IOU and position uncertainty on KITTI data set, with the IOU increasing as the value of the position uncertainty decreases; the larger the IOU, the smaller the positioning uncertainty, and the closer the prediction result is to the real label value. Therefore, the comprehensive evaluation index of the uncertainty target of the proposed adaptive Gaussian distribution detection algorithm can effectively represent the confidence degree of the predicted coordinate.
In the step (4), the image edge enhancement GAN network and the AN-Gaussian YOLOv3 are jointly designed into a unified framework, the loss function of the model is redesigned, the AN-Gaussian YOLOv3 algorithm detection loss is added in the process of judging the loss, the image quality and the positioning precision are improved, the hardware resources are saved, the requirements of the system on real-time performance and high precision are met, the test is carried out on a COCO data set, and the comparison with other algorithms is carried out to verify the effectiveness of the algorithms.
When the detection result is accurate, the loss function brings the prediction coordinate into a Gaussian probability density function as shown in formula (9) to judge the accuracy of the result. The greater the probability, the more accurate the prediction.
Therefore, the box coordinate regression loss is reconstructed by using the negative log-likelihood loss, namely the sum of Gaussian probability density values, which is recorded as Lx,Ly,Lw,LhRespectively represent coordinate components The sum of the losses. The formula is shown in formulas (10) to (11):
and positioning uncertainty evaluation indexes, namely after the position uncertainty of the four coordinate components is obtained through Gaussian modeling, in order to reduce the fraction of the bounding box with higher uncertainty to be lower than a threshold value, and improve the detection precision. The following comprehensive uncertainty evaluation indexes are provided:
Cr=σobj×U_sijk (14)
wherein:
in the formula (15), U _ sijkIs the confidence of the location of the predicted value. Because the location uncertainty is determined collectively by the location coordinates, location confidence can be measured. Multiplying the original target confidence coefficient by the positioning confidence coefficient to obtain an uncertainty comprehensive score, wherein when the positioning is accurate, the positioning confidence coefficient is close to 1; when the positioning accuracy is poor, the position confidence degree is small, the position confidence degree approaches to 0, at the moment, the original target confidence degree is multiplied by a small coefficient, so that the uncertainty comprehensive score is obviously reduced, further, the prediction result lower than the threshold value is filtered, the FP is reduced to a certain extent, and the detection accuracy is improved.
The synthetic loss function: introducing the AN-Gaussian YOLOv3 algorithm detection loss into the total discriminant loss, redesigning a total discriminant network loss function to optimize and update the generator network and the detection network, wherein the total discriminant network loss is defined as the weighted sum of the total loss of the edge enhancement network, the classification loss of AN-Gaussian YOLOv3 and the uncertainty regression loss under Gaussian distribution.
Lall=LG_all+ξLdet_AN-Gaussian YOLOv3 (16)
Ldet_AN-Gaussian YOLOv3=Lcls_AN-Gaussian YOLOv3+Lreg_AN-Gaussian YOLOv3 (17)
Experimental setup: initializing by using a VGG19 weight network pre-trained in ImageNet, initializing to 1, training an algorithm model in an end-to-end mode, and setting the learning rate to be 0.0001 in the training process, wherein the learning rate is halved at each iteration of 50 k; the batch size is set to 5 and the weights are updated using Adam as the optimizer until the entire architecture converges. 23 RRDB blocks and 6 RRDB dense residue blocks of the EEN network are used in the generator G.
The EEN network consists of densely connected sub-networks and Mask branch networks. The network is focused on real edge information by a constructed Mask branch by using an attention mechanism to achieve the aim of removing noise and artifacts, and finally a mapping function F is learned to reconstruct a corresponding real image for given low-resolution input.
For a given sample IbaseFirstly, using laplacian operator to detect and extract edge information, defining laplacian L (x, y) of an image I (x, y) as an image second-order partial derivative, and defining E (x, y) as an extracted edge feature, and using the following formula:
the edge information is then extracted using a jump connection and mapped into a low resolution space, as opposed to operating in a high resolution space, which reduces the amount of computation. Meanwhile, the constructed Mask branch learns the image to detect and eliminate the separated noise, namely the wrong edge point when the edge is extracted; and then mapping the extracted edge features to a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image.
The penalty function for an EEN network is defined as the image consistency penalty and the edge consistency penalty. In training the network, using Charbonnier Loss between the medium and high resolution images, called Loss of image consistency, as shown below,representing the distance between the high-resolution and super-resolution images. This is advantageous for obtaining an image with good edge information. However, the edge of the object is damaged, which may generate noise, and thus good edge information may not be obtained. Therefore, to calculate the edge Loss, an edge consistency Loss is introduced, and charbonier Loss between the edge information extracted from the medium resolution image and the edge information extracted from the high resolution image is evaluated, as shown below.
Finally, the total loss of consistency is the sum of the individual losses of the image and the edge, as shown below.
LEEN=Limg_cl+Ledge_cl。
As can be seen from fig. 7(a), the added adaptive uncertainty algorithm type has a good detection effect on the non-occluded target. It can be seen from fig. 7(b) that when the target density of the model is high, the detection effect is higher than the model without uncertainty algorithm in positioning accuracy, and the parameter is adjusted in time according to the target density, so that the generalization capability is better. It can be seen that the predicted bounding box can accurately detect the target object in the image after adding the localization uncertainty algorithm, wherein the dashed box represents the localization of the localization uncertainty represented by the adaptive NMS-gaussian distribution algorithm.
The method comprises the steps of converting a Pytrch algorithm Model trained by a self-adaptive Gaussian distribution uncertainty algorithm into a Hilens Kit pb Model suitable for an adaptor chip based on a Model Arts cloud AI development platform, compiling a skill template based on Pytrch 1.0 Python3.6 in a Hilens Studio IDE multi-language integrated development environment, simultaneously importing the Model Arts pb Model from an OBS storage server, compiling a logic reasoning code, finally obtaining a om skill Model in Hilens Studio, compiling and debugging the skill code, and then publishing skills, deploying and operating the skills on an end-side device Hilens Kit.
In order to actually test the intelligent vehicle sensing system in an outdoor environment, a simpler control strategy and a lane line identification algorithm are firstly designed, so that the intelligent vehicle can move along a travelable area of a lane line according to detected environmental information. Firstly, calculating the inter-frame running distance of the intelligent trolley according to the moving speed of the intelligent trolley and the frame difference time of image processing by Hilens, then judging whether the periphery of the next frame difference is a safe drivable area or not according to the detection result of the end-to-end combined model, and transmitting the detection result to a main control decision system, wherein the intelligent trolley controls the next step of movement of the intelligent trolley according to the detection result of a specific target, and the images 9(a), 9(b) and 9(c) show that the unmanned vehicle algorithm is actually measured under the conditions of good weather conditions and wind and snow weather in the outdoor actual environment, so that the algorithm under the actually measured environment still has good positioning accuracy and generalization capability in severe environment.
The intelligent vehicle is actually measured in an outdoor environment through a simple control strategy and a lane line identification algorithm. Hilens captures the front image of the surrounding environment and inputs the front image into an om detection model, and the output of the model controls the next action of the intelligent trolley through a controller. And the main control system of the intelligent trolley controls the stepping motor and the servo motor according to the corresponding action instruction. The test was performed in an outdoor environment on a campus. Under most circumstances, the intelligent trolley can safely drive along a lane line and accurately position surrounding targets, but under the condition that the surrounding environment is complex, the intelligent trolley cannot accurately judge traffic rules, the judgment capability of the two-dimensional image on the space position is limited, and the control precision of a specific target tracking task is improved.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (9)
1. The unmanned vehicle environment detection method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
firstly, inputting an input low-resolution image into a G generator in a countermeasure generation network, inputting the image generated by the generator into an edge enhancement network, and then mapping the extracted edge features into a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image;
generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target;
step (3), self-adaptive Gaussian distribution modeling is carried out, and a threshold value is increased when the target density is increased and mutual shielding is increased through a dynamic suppression strategy; when the target density is low and the object appears independently, the threshold value is reduced;
and (4) fusing an EEN and an adaptive uncertainty target positioning end-to-end detection model based on the GAN.
2. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: in the step (1), the network is focused on real edge information by the constructed Mask branch by using an attention mechanism to remove noise and artifacts, and the constructed Mask branch is used for learning an image to detect and eliminate the separated noise, namely, the wrong edge point when the edge is extracted.
3. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: after the edge enhancement network removes image noise and extracts high-frequency edge detail features to finally generate a super-resolution image in the step (2); the discriminator for generating the countermeasure network judges the generated super-resolution image, judges whether the image is a false image or not, finds the difference value between the GT image and the middle super-resolution image, and reversely transmits the difference value to the G generator to generate the super-resolution image until the discriminator cannot judge whether the generated image is true or false with the high-resolution image, and the whole network training is completed.
4. The deep learning-based unmanned vehicle environment detection method according to claim 3, characterized in that: in the step (2), in the network branch generation during network training, an SRResNet structure is adopted as an integral network structure, a convolution layer with the step length of 1, 9 x 9 is used for extracting a primary feature map, then an RRDB dense residual block is used for extracting image semantic information to obtain a clearer edge feature map, then feature fusion is carried out on the edge feature map and the primary feature, and finally a medium-resolution image is obtained through an upsampling operation; sending the medium super-resolution image into an EEN network, and extracting edge features to obtain a super-resolution image; and finally, inputting the SR image into a YOLO3 detection network, and classifying and positioning to obtain a final result.
5. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: in the step (3), a CNN fitting density function is trained to be used as a supervision signal, namely, a picture is input, and the density of the object at each position can be output; the density function is shown as a formula (1), and the density of the target i is defined as the largest value of the bounding box iou in other targets of the label set;
thus, the dynamic suppression strategy is updated according to the density function definition using the following formula:
NM:=max(Nt,dM) (2)
wherein N isMAdaptive threshold representing target M, dMRepresents the density of the target M; there are three cases of dynamic suppression strategies: (1) when the adjacent bounding box is far away from M, i.e. iou (M, b)i)<NMConsistent with the initial NMS threshold; (2) when M is located in a dense region, i.e. dM>NtUsing the density value of M as the adaptive threshold N of A-NMSdM(ii) a Thus, adjacent candidate regions are retained, which may be located around M; (3) for objects in sparse regions, i.e. dM<NtNMS threshold equal to NtThus, FP can be reduced.
6. The deep learning-based unmanned vehicle environment detection method according to claim 5, characterized in that: in the step (3), the self-adaptive Gaussian distribution modeling adopts a Gaussian distribution function to model the prediction coordinates, the mean value of the output coordinates is used as the mean value of the Gaussian distribution, and the variance represents the uncertainty of the prediction positioning.
7. The deep learning-based unmanned vehicle environment detection method according to claim 6, characterized in that: the uncertainty of the Box coordinate in the adaptive Gaussian distribution modeling in the step (3) can be modeled and evaluated by each Gaussian model with the center coordinate, the width and the height; for a given test sample x, the output y may be modeled with a model t consisting of Gaussian parametersx,ty,tw,thTo indicate indeterminate positioning, as shown by the following equation, P (y | x) ═ N (y; μ (x), s2(x) Wherein μ (x) and s2(x) Mean and variance of Box coordinates, respectively, and y is a value under the gaussian distribution.
8. According to claim 7The unmanned vehicle environment detection method based on deep learning is characterized by comprising the following steps: to predict the uncertainty of the Box coordinates, the predicted feature map coordinates are mean and variance in Gaussian modeling, and the output isConsidering the nature of the gaussian distribution and the structure of the YOLO3 detection layer, the variance of the gaussian distribution is fixed between 0 and 1, fixing the range of the variance; therefore, the following formula is adopted to preprocess the Gaussian parameters and establish four Gaussian distributions;
mean value of each coordinate in the detection layerPredicted coordinates representing a gaussian model; variance of each coordinateRepresenting the uncertainty of each coordinate; because of the fact thatRepresents the center coordinates of the frame, so is processed to a value between 0 and 1 using the sigmoid function; variance of each coordinateProcessing the value between 0 and 1 based on the Sigmoid function to represent the reliability of the coordinate; in YOLO3, the height and width information of the bounding box is processed as tw、thA priori bounding boxes, i.e. Gaussian parametersRepresenting t in a YOLO networkw、th。
9. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: in the step (4), the image edge enhancement GAN network and the AN-Gaussian YOLOv3 are jointly designed into a unified framework, the loss function of the model is redesigned, the AN-Gaussian YOLOv3 algorithm is added to judge the loss to detect the loss, the COCO data set is tested, and the COCO data set is compared with other algorithms to verify the effectiveness of the algorithms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110838473.9A CN113436217A (en) | 2021-07-23 | 2021-07-23 | Unmanned vehicle environment detection method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110838473.9A CN113436217A (en) | 2021-07-23 | 2021-07-23 | Unmanned vehicle environment detection method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113436217A true CN113436217A (en) | 2021-09-24 |
Family
ID=77761668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110838473.9A Pending CN113436217A (en) | 2021-07-23 | 2021-07-23 | Unmanned vehicle environment detection method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113436217A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113963027A (en) * | 2021-10-28 | 2022-01-21 | 广州文远知行科技有限公司 | Uncertainty detection model training method and device, and uncertainty detection method and device |
CN114581790A (en) * | 2022-03-01 | 2022-06-03 | 哈尔滨理工大学 | Small target detection method based on image enhancement and multi-feature fusion |
CN115359258A (en) * | 2022-08-26 | 2022-11-18 | 中国科学院国家空间科学中心 | Weak and small target detection method and system for component uncertainty measurement |
CN115471773A (en) * | 2022-09-16 | 2022-12-13 | 北京联合大学 | Student tracking method and system for intelligent classroom |
CN116469047A (en) * | 2023-03-20 | 2023-07-21 | 南通锡鼎智能科技有限公司 | Small target detection method and detection device for laboratory teaching |
CN117952901A (en) * | 2023-12-12 | 2024-04-30 | 中国人民解放军战略支援部队航天工程大学 | Multi-source heterogeneous image change detection method and device based on generation countermeasure network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458758A (en) * | 2019-07-29 | 2019-11-15 | 武汉工程大学 | A kind of image super-resolution rebuilding method, system and computer storage medium |
CN111524135A (en) * | 2020-05-11 | 2020-08-11 | 安徽继远软件有限公司 | Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line |
CN111899172A (en) * | 2020-07-16 | 2020-11-06 | 武汉大学 | Vehicle target detection method oriented to remote sensing application scene |
CN112906547A (en) * | 2021-02-09 | 2021-06-04 | 哈尔滨市科佳通用机电股份有限公司 | Railway train windshield breakage fault detection method based on E-YOLO |
-
2021
- 2021-07-23 CN CN202110838473.9A patent/CN113436217A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458758A (en) * | 2019-07-29 | 2019-11-15 | 武汉工程大学 | A kind of image super-resolution rebuilding method, system and computer storage medium |
CN111524135A (en) * | 2020-05-11 | 2020-08-11 | 安徽继远软件有限公司 | Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line |
CN111899172A (en) * | 2020-07-16 | 2020-11-06 | 武汉大学 | Vehicle target detection method oriented to remote sensing application scene |
CN112906547A (en) * | 2021-02-09 | 2021-06-04 | 哈尔滨市科佳通用机电股份有限公司 | Railway train windshield breakage fault detection method based on E-YOLO |
Non-Patent Citations (2)
Title |
---|
JAKARIA RABBI ET AL.: "Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network", 《REMOTE SENSING》 * |
SONGTAO LIU,DI HUANG,YUNHONG WANG: "Adaptive NMS:Refining Pedestrian Detection in a Crowd", 《ARXIV:1904.03629V1[CS.CV] 7 APR 2019》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113963027A (en) * | 2021-10-28 | 2022-01-21 | 广州文远知行科技有限公司 | Uncertainty detection model training method and device, and uncertainty detection method and device |
CN113963027B (en) * | 2021-10-28 | 2022-09-09 | 广州文远知行科技有限公司 | Uncertainty detection model training method and device, and uncertainty detection method and device |
CN114581790A (en) * | 2022-03-01 | 2022-06-03 | 哈尔滨理工大学 | Small target detection method based on image enhancement and multi-feature fusion |
CN115359258A (en) * | 2022-08-26 | 2022-11-18 | 中国科学院国家空间科学中心 | Weak and small target detection method and system for component uncertainty measurement |
CN115471773A (en) * | 2022-09-16 | 2022-12-13 | 北京联合大学 | Student tracking method and system for intelligent classroom |
CN115471773B (en) * | 2022-09-16 | 2023-09-15 | 北京联合大学 | Intelligent classroom-oriented student tracking method and system |
CN116469047A (en) * | 2023-03-20 | 2023-07-21 | 南通锡鼎智能科技有限公司 | Small target detection method and detection device for laboratory teaching |
CN117952901A (en) * | 2023-12-12 | 2024-04-30 | 中国人民解放军战略支援部队航天工程大学 | Multi-source heterogeneous image change detection method and device based on generation countermeasure network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113436217A (en) | Unmanned vehicle environment detection method based on deep learning | |
CN110781838B (en) | Multi-mode track prediction method for pedestrians in complex scene | |
CN109800689B (en) | Target tracking method based on space-time feature fusion learning | |
CN110929578B (en) | Anti-shielding pedestrian detection method based on attention mechanism | |
CN111626128B (en) | Pedestrian detection method based on improved YOLOv3 in orchard environment | |
CN110837778A (en) | Traffic police command gesture recognition method based on skeleton joint point sequence | |
CN113076871B (en) | Fish shoal automatic detection method based on target shielding compensation | |
CN111709300B (en) | Crowd counting method based on video image | |
JP2020038660A (en) | Learning method and learning device for detecting lane by using cnn, and test method and test device using the same | |
CN110097028B (en) | Crowd abnormal event detection method based on three-dimensional pyramid image generation network | |
CN113920107A (en) | Insulator damage detection method based on improved yolov5 algorithm | |
CN114758288A (en) | Power distribution network engineering safety control detection method and device | |
CN113705636A (en) | Method and device for predicting trajectory of automatic driving vehicle and electronic equipment | |
JP2020038661A (en) | Learning method and learning device for detecting lane by using lane model, and test method and test device using the same | |
CN113409252A (en) | Obstacle detection method for overhead transmission line inspection robot | |
CN113936210A (en) | Anti-collision method for tower crane | |
CN112347930A (en) | High-resolution image scene classification method based on self-learning semi-supervised deep neural network | |
Yi et al. | End-to-end neural network for autonomous steering using lidar point cloud data | |
CN114049541A (en) | Visual scene recognition method based on structural information characteristic decoupling and knowledge migration | |
CN113177439A (en) | Method for detecting pedestrian crossing road guardrail | |
CN115731517B (en) | Crowded Crowd detection method based on crown-RetinaNet network | |
CN116664851A (en) | Automatic driving data extraction method based on artificial intelligence | |
CN112614158B (en) | Sampling frame self-adaptive multi-feature fusion online target tracking method | |
Li et al. | Detection and discrimination of obstacles to vehicle environment under convolutional neural networks | |
Shi et al. | A novel model based on deep learning for Pedestrian detection and Trajectory prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210924 |