CN113436217A - Unmanned vehicle environment detection method based on deep learning - Google Patents

Unmanned vehicle environment detection method based on deep learning Download PDF

Info

Publication number
CN113436217A
CN113436217A CN202110838473.9A CN202110838473A CN113436217A CN 113436217 A CN113436217 A CN 113436217A CN 202110838473 A CN202110838473 A CN 202110838473A CN 113436217 A CN113436217 A CN 113436217A
Authority
CN
China
Prior art keywords
image
network
resolution image
gaussian
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110838473.9A
Other languages
Chinese (zh)
Inventor
宋勇
张双建
庞豹
袁宪锋
许庆阳
巩志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202110838473.9A priority Critical patent/CN113436217A/en
Publication of CN113436217A publication Critical patent/CN113436217A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • G06T5/70
    • G06T5/80
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/181Segmentation; Edge detection involving edge growing; involving edge linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation

Abstract

The invention provides an unmanned vehicle environment detection method based on deep learning, which comprises the following steps: step (1), obtaining a super-resolution image; step (2), generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target; and (3) self-adaptive Gaussian distribution modeling, and (4) fusing an EEN and a self-adaptive uncertainty target positioning-based end-to-end detection model through a dynamic suppression strategy. The method has strong self-adaptive capacity on small targets, medium targets and large targets, and verifies that when the density of the targets in the image changes greatly, the problems of high overlap rate and high error rate in detection can be solved by increasing or reducing the threshold value through a dynamic adjustment strategy, so that good generalization capacity and scene self-adaptive capacity are shown.

Description

Unmanned vehicle environment detection method based on deep learning
Technical Field
The invention relates to the technical field of environment detection, in particular to an unmanned vehicle environment detection method based on deep learning.
Background
Unmanned ground vehicles are also called unmanned vehicles (unmanned vehicles for short), and due to the rapid development of the development of novel sensors and the basic research of machine learning technology in recent years, the development of civil unmanned vehicles becomes technically possible. Research institutions and enterprises at home and abroad are invested in research and development lines of intelligent automobiles or unmanned automobiles, and some of the institutions are said to realize commercial popularization of unmanned automobiles in five years in the future.
Disclosure of Invention
The invention aims to provide an unmanned vehicle environment detection method based on deep learning to solve the problems in the background technology.
The technical problem solved by the invention is realized by adopting the following technical scheme: the unmanned vehicle environment detection method based on deep learning comprises the following steps:
firstly, inputting an input low-resolution image into a G generator in a countermeasure generation network, inputting the image generated by the generator into an edge enhancement network, and then mapping the extracted edge features into a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image;
generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target;
step (3), self-adaptive Gaussian distribution modeling is carried out, and a threshold value is increased when the target density is increased and mutual shielding is increased through a dynamic suppression strategy; when the target density is low and the object appears independently, the threshold value is reduced;
and (4) fusing an EEN and an adaptive uncertainty target positioning end-to-end detection model based on the GAN.
In the step (1), the network is focused on real edge information by the constructed Mask branch by using an attention mechanism to remove noise and artifacts, and the constructed Mask branch is used for learning an image to detect and eliminate the separated noise, namely, the wrong edge point when the edge is extracted.
After the edge enhancement network removes image noise and extracts high-frequency edge detail features to finally generate a super-resolution image in the step (2); the discriminator for generating the countermeasure network judges the generated super-resolution image, judges whether the image is a false image or not, finds the difference value between the GT image and the middle super-resolution image, and reversely transmits the difference value to the G generator to generate the super-resolution image until the discriminator cannot judge whether the generated image is true or false with the high-resolution image, and the whole network training is completed.
In the step (2), in the network branch generation during network training, an SRResNet structure is adopted as an integral network structure, a convolution layer with the step length of 1, 9 x 9 is used for extracting a primary feature map, then an RRDB dense residual block is used for extracting image semantic information to obtain a clearer edge feature map, then feature fusion is carried out on the edge feature map and the primary feature, and finally a medium-resolution image is obtained through an upsampling operation; sending the medium super-resolution image into an EEN network, and extracting edge features to obtain a super-resolution image; and finally, inputting the SR image into a YOLO3 detection network, and classifying and positioning to obtain a final result.
In the step (3), a CNN fitting density function is trained to be used as a supervision signal, namely, a picture is input, and the density of the object at each position can be output; the density function is shown as a formula (1), and the density of the target i is defined as the largest value of the bounding box iou in other targets of the label set;
Figure BDA0003178052310000031
thus, the dynamic suppression strategy is updated according to the density function definition using the following equations (4-19):
NM:=max(Nt,dM) (2)
Figure BDA0003178052310000032
wherein N isMAdaptive threshold representing target M, dMRepresents the density of the target M; there are three cases of dynamic suppression strategies: (1) when the adjacent bounding box is far away from M, i.e. iou (M, b)i)<NMConsistent with the initial NMS threshold; (2) when M is located in a dense region, i.e. dM>NtUsing the density value of M as the adaptive threshold N of A-NMSdM. Thus, adjacent candidate regions are retained, which may be located around M; (3) for objects in sparse regions, i.e. dM<NtNMS threshold equal to NtThus, FP can be reduced.
In the step (3), the self-adaptive Gaussian distribution modeling adopts a Gaussian distribution function to model the prediction coordinates, the mean value of the output coordinates is used as the mean value of the Gaussian distribution, and the variance represents the uncertainty of the prediction positioning.
Uncertainty of Box coordinates in the adaptive Gaussian distribution modeling in the step (3) can be modeled and evaluated by using each Gaussian model with center coordinates, width and height. For a given test sample x, the output y may be modeled with a model t consisting of Gaussian parametersx,ty,tw,thTo indicate indeterminate positioning, as shown by the following equation, P (y | x) ═ N (y; μ (x), s2(x) Wherein μ (x) and s2(x) Mean and variance of Box coordinates, respectively, and y is a value under the gaussian distribution.
To predict the uncertainty of the Box coordinates, the predicted feature map coordinates are mean and variance in Gaussian modeling, and the output is
Figure BDA0003178052310000041
Considering the nature of the gaussian distribution and the structure of the YOLO3 detection layer, the variance of the gaussian distribution is fixed between 0 and 1, fixing the range of the variance. Therefore, the following formula is adopted to preprocess the Gaussian parameters and establish four Gaussian distributions;
Figure BDA0003178052310000042
Figure BDA0003178052310000043
mean value of each coordinate in the detection layer
Figure BDA0003178052310000051
Predicted coordinates representing a gaussian model; variance of each coordinate
Figure BDA0003178052310000052
Representing the uncertainty of each coordinate. Because of the fact that
Figure BDA0003178052310000053
Represents the center coordinates of the frame, so is processed to a value between 0 and 1 using the sigmoid function; variance of each coordinate
Figure BDA0003178052310000054
Processing the value between 0 and 1 based on the Sigmoid function to represent the reliability of the coordinate; in YOLO3, the height and width information of the bounding box is processed as tw、thA priori bounding boxes, i.e. Gaussian parameters
Figure BDA0003178052310000055
Representing t in a YOLO networkw、th
In the step (4), the image edge enhancement GAN network and the AN-Gaussian YOLOv3 are jointly designed into a unified framework, the loss function of the model is redesigned, the AN-Gaussian YOLOv3 algorithm is added to judge the loss to detect the loss, the COCO data set is tested, and the COCO data set is compared with other algorithms to verify the effectiveness of the algorithms.
Compared with the prior art, the invention has the beneficial effects that: the method has strong self-adaptive capacity on small targets, medium targets and large targets, and verifies that when the density of the targets in the image changes greatly, the problems of high overlap rate and high error rate in detection can be solved by increasing or reducing the threshold value through a dynamic adjustment strategy, so that good generalization capacity and scene self-adaptive capacity are shown.
Drawings
FIG. 1 is a schematic flow chart of step (1) of the present invention.
FIG. 2 is a schematic flow chart of step (2) of the present invention.
FIG. 3 is a schematic view of step (2) according to the present invention.
FIG. 4 is a diagram of the AN-Gaussian YOLOv3 predicted features of the present invention.
FIG. 5 is a diagram of the detection effect of the AN-Gaussian YOLOv3 algorithm in multiple scenes.
FIG. 6 is a graph of the IOU versus position uncertainty of the present invention.
Fig. 7(a) is a diagram of the effect of the uncertainty sparse scene detection of the present invention.
FIG. 7(b) is a diagram of the detection effect of the dense target scene under the Gaussian combination framework of the present invention.
FIG. 7(c) is a diagram of the effect of complex background detection in the Gaussian combination framework of the present invention.
FIG. 7(d) is a diagram of the occlusion detection effect under the Gaussian combination frame of the present invention.
FIG. 8 is a context aware skills development flow diagram of the present invention.
Fig. 9(a) is a diagram illustrating the effect of detecting an uncertain sparse scene according to the present invention.
Fig. 9(b) is a diagram of the effect of the uncertainty sparse scene detection of the present invention.
Fig. 9(c) is a diagram of the effect of the uncertainty sparse scene detection of the present invention.
Detailed Description
In the description of the present invention, it should be noted that unless otherwise specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements.
As shown in fig. 1 to 9, the method for detecting the environment of the unmanned vehicle based on deep learning includes the following steps:
firstly, inputting an input low-resolution image into a G generator in a countermeasure generation network, inputting the image generated by the generator into an edge enhancement network, and then mapping the extracted edge features into a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image;
generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target;
step (3), self-adaptive Gaussian distribution modeling is carried out, and a threshold value is increased when the target density is increased and mutual shielding is increased through a dynamic suppression strategy; when the target density is low and the object appears independently, the threshold value is reduced;
and (4) fusing an EEN and an adaptive uncertainty target positioning end-to-end detection model based on the GAN.
In the step (1), the network is focused on real edge information by the constructed Mask branch by using an attention mechanism to remove noise and artifacts, and the constructed Mask branch is used for learning an image to detect and eliminate the separated noise, namely, the wrong edge point when the edge is extracted.
After the edge enhancement network removes image noise and extracts high-frequency edge detail features to finally generate a super-resolution image in the step (2); the discriminator for generating the countermeasure network judges the generated super-resolution image, judges whether the image is a false image or not, finds the difference value between the GT image and the middle super-resolution image, and reversely transmits the difference value to the G generator to generate the super-resolution image until the discriminator cannot judge whether the generated image is true or false with the high-resolution image, and the whole network training is completed.
In the step (2), in the network branch generation during network training, an SRResNet structure is adopted as an integral network structure, a convolution layer with the step length of 1, 9 x 9 is used for extracting a primary feature map, then an RRDB dense residual block is used for extracting image semantic information to obtain a clearer edge feature map, then feature fusion is carried out on the edge feature map and the primary feature, and finally a medium-resolution image is obtained through an upsampling operation; sending the medium super-resolution image into an EEN network, and extracting edge features to obtain a super-resolution image; and finally, inputting the SR image into a YOLO3 detection network, and classifying and positioning to obtain a final result.
The network structure of D adopts VGG network, the activation function between layers is LEAKYRELU, and the number of basic channels is 64. After the features are extracted, the image output is 4 multiplied by 512, and finally the output probability value is obtained through classifier. The perceptual loss is used in network training and is the average of the difference between the ISR and HR images generated by the generator and the output of the network through the VGG 19.
Setting and training network parameters: the network model was trained using an Adam optimizer, where β1=0.9,β2The generator and arbiter are alternately updated until convergence, 0.999. Beta is a1The coefficient is exponential decay rate, and the specific gravity of momentum and the current gradient is controlled; beta is a2The coefficient is an exponential decay rate and controls the influence of the square of the gradient.
Figure BDA0003178052310000081
In the training process, the G and D networks are alternately updated based on Adam, the maximum iteration number is 500000, the initial learning rate is 0.0001, and the training is performed by halving after the iteration numbers are 50k, 100k and 250 k; the residual information needs to be scaled in the residual structure described above, and thus the size is set to 0.2.
In the step (3), a CNN fitting density function is trained to be used as a supervision signal, namely, a picture is input, and the density of the object at each position can be output; the density function is shown as a formula (1), and the density of the target i is defined as the largest value of the bounding box iou in other targets of the label set;
Figure BDA0003178052310000091
thus, the dynamic suppression strategy is updated according to the density function definition using the following equations (4-19):
NM:=max(Nt,dM) (2)
Figure BDA0003178052310000092
wherein N isMAdaptive threshold representing target M, dMRepresents the density of the target M; there are three cases of dynamic suppression strategies: (1) when the adjacent bounding box is far away from M, i.e. iou (M, b)i)<NMConsistent with the initial NMS threshold; (2) when M is located in a dense region, i.e. dM>Nt(4) Using the density value of M as the adaptive threshold N of A-NMSdM. Thus, adjacent candidate regions are retained, which may be located around M; (3) for objects in sparse regions, i.e. dM<Nt(5) NMS threshold equal to NtThus, FP can be reduced.
In the step (3), the self-adaptive Gaussian distribution modeling adopts a Gaussian distribution function to model the prediction coordinates, the mean value of the output coordinates is used as the mean value of the Gaussian distribution, and the variance represents the uncertainty of the prediction positioning.
Uncertainty of Box coordinates in the adaptive Gaussian distribution modeling in the step (3) can be modeled and evaluated by using each Gaussian model with center coordinates, width and height. For a given testSample x, output y may be modeled by a model t consisting of Gaussian parametersx,ty,tw,thTo indicate indeterminate positioning, as shown by the following equation, P (y | x) ═ N (y; μ (x), s2(x) (6) wherein μ (x) and s2(x) Mean and variance of Box coordinates, respectively, and y is a value under the gaussian distribution.
To predict the uncertainty of the Box coordinates, the predicted feature map coordinates are mean and variance in Gaussian modeling, and the output is
Figure BDA0003178052310000101
Considering the nature of the gaussian distribution and the structure of the YOLO3 detection layer, the variance of the gaussian distribution is fixed between 0 and 1, fixing the range of the variance. Therefore, the following formula is adopted to preprocess the Gaussian parameters and establish four Gaussian distributions;
Figure BDA0003178052310000102
Figure BDA0003178052310000103
mean value of each coordinate in the detection layer
Figure BDA0003178052310000104
Predicted coordinates representing a gaussian model; variance of each coordinate
Figure BDA0003178052310000105
Representing the uncertainty of each coordinate. Because of the fact that
Figure BDA0003178052310000106
Represents the center coordinates of the frame, so is processed to a value between 0 and 1 using the sigmoid function; variance of each coordinate
Figure BDA0003178052310000107
Also processed as values between 0 and 1 based on Sigmoid function, representing coordinatesReliability; in YOLO3, the height and width information of the bounding box is processed as tw、thA priori bounding boxes, i.e. Gaussian parameters
Figure BDA0003178052310000111
Representing t in a YOLO networkw、th
The AN-Gauss ian Yolov3 algorithm flow: comprises the following steps:
Figure BDA0003178052310000112
the R-Darknet-53 optimized network obtained by YOLO in ImageNet classification task is used as AN initialization weight parameter of AN AN-Gaussian YOLOv3 backbone network, AN Adam optimizer is used for carrying out back propagation to update network parameters, the batch size of the network is set to 64, the initial learning rate is set to 0.001, 60000 iterations are carried out, and training is carried out by reducing the learning rate by half after 10k, 20k and 250k iterations respectively; and (5) adopting multi-scale training, and adjusting the resolution after each 10 iterations.
As can be seen from the detection effect diagram in fig. 5, the algorithm has accurate positioning precision for a small target at a longer distance in the upper part of the diagram, and can extract deep semantic information to further improve generalization capability; in the middle and lower parts of the graph, when the targets to be detected are shielded and the target background information is close to the target information, the model can still accurately position the targets. Therefore, the AN-Gaussian YOLOv3 algorithm designed by the invention has good generalization capability and self-adaption capability for the high target density and sparse target, the positioning accuracy of the algorithm is greatly improved, and the target can be accurately positioned for the detection results with poor illumination conditions and good target shielding problems. But the accuracy is affected to some extent by noise problems due to the lower resolution of the image.
FIG. 6 shows the relationship between IOU and position uncertainty on KITTI data set, with the IOU increasing as the value of the position uncertainty decreases; the larger the IOU, the smaller the positioning uncertainty, and the closer the prediction result is to the real label value. Therefore, the comprehensive evaluation index of the uncertainty target of the proposed adaptive Gaussian distribution detection algorithm can effectively represent the confidence degree of the predicted coordinate.
In the step (4), the image edge enhancement GAN network and the AN-Gaussian YOLOv3 are jointly designed into a unified framework, the loss function of the model is redesigned, the AN-Gaussian YOLOv3 algorithm detection loss is added in the process of judging the loss, the image quality and the positioning precision are improved, the hardware resources are saved, the requirements of the system on real-time performance and high precision are met, the test is carried out on a COCO data set, and the comparison with other algorithms is carried out to verify the effectiveness of the algorithms.
When the detection result is accurate, the loss function brings the prediction coordinate into a Gaussian probability density function as shown in formula (9) to judge the accuracy of the result. The greater the probability, the more accurate the prediction.
Figure BDA0003178052310000131
Therefore, the box coordinate regression loss is reconstructed by using the negative log-likelihood loss, namely the sum of Gaussian probability density values, which is recorded as Lx,Ly,Lw,LhRespectively represent coordinate components
Figure BDA0003178052310000132
Figure BDA0003178052310000133
The sum of the losses. The formula is shown in formulas (10) to (11):
Figure BDA0003178052310000134
Figure BDA0003178052310000135
Figure BDA0003178052310000136
Figure BDA0003178052310000137
and positioning uncertainty evaluation indexes, namely after the position uncertainty of the four coordinate components is obtained through Gaussian modeling, in order to reduce the fraction of the bounding box with higher uncertainty to be lower than a threshold value, and improve the detection precision. The following comprehensive uncertainty evaluation indexes are provided:
Cr=σobj×U_sijk (14)
wherein:
Figure BDA0003178052310000138
in the formula (15), U _ sijkIs the confidence of the location of the predicted value. Because the location uncertainty is determined collectively by the location coordinates, location confidence can be measured. Multiplying the original target confidence coefficient by the positioning confidence coefficient to obtain an uncertainty comprehensive score, wherein when the positioning is accurate, the positioning confidence coefficient is close to 1; when the positioning accuracy is poor, the position confidence degree is small, the position confidence degree approaches to 0, at the moment, the original target confidence degree is multiplied by a small coefficient, so that the uncertainty comprehensive score is obviously reduced, further, the prediction result lower than the threshold value is filtered, the FP is reduced to a certain extent, and the detection accuracy is improved.
The synthetic loss function: introducing the AN-Gaussian YOLOv3 algorithm detection loss into the total discriminant loss, redesigning a total discriminant network loss function to optimize and update the generator network and the detection network, wherein the total discriminant network loss is defined as the weighted sum of the total loss of the edge enhancement network, the classification loss of AN-Gaussian YOLOv3 and the uncertainty regression loss under Gaussian distribution.
Lall=LG_all+ξLdet_AN-Gaussian YOLOv3 (16)
Ldet_AN-Gaussian YOLOv3=Lcls_AN-Gaussian YOLOv3+Lreg_AN-Gaussian YOLOv3 (17)
Figure BDA0003178052310000141
Experimental setup: initializing by using a VGG19 weight network pre-trained in ImageNet, initializing to 1, training an algorithm model in an end-to-end mode, and setting the learning rate to be 0.0001 in the training process, wherein the learning rate is halved at each iteration of 50 k; the batch size is set to 5 and the weights are updated using Adam as the optimizer until the entire architecture converges. 23 RRDB blocks and 6 RRDB dense residue blocks of the EEN network are used in the generator G.
The EEN network consists of densely connected sub-networks and Mask branch networks. The network is focused on real edge information by a constructed Mask branch by using an attention mechanism to achieve the aim of removing noise and artifacts, and finally a mapping function F is learned to reconstruct a corresponding real image for given low-resolution input.
For a given sample IbaseFirstly, using laplacian operator to detect and extract edge information, defining laplacian L (x, y) of an image I (x, y) as an image second-order partial derivative, and defining E (x, y) as an extracted edge feature, and using the following formula:
Figure BDA0003178052310000151
Figure BDA0003178052310000152
the edge information is then extracted using a jump connection and mapped into a low resolution space, as opposed to operating in a high resolution space, which reduces the amount of computation. Meanwhile, the constructed Mask branch learns the image to detect and eliminate the separated noise, namely the wrong edge point when the edge is extracted; and then mapping the extracted edge features to a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image.
The penalty function for an EEN network is defined as the image consistency penalty and the edge consistency penalty. In training the network, using Charbonnier Loss between the medium and high resolution images, called Loss of image consistency, as shown below,
Figure BDA0003178052310000153
representing the distance between the high-resolution and super-resolution images. This is advantageous for obtaining an image with good edge information. However, the edge of the object is damaged, which may generate noise, and thus good edge information may not be obtained. Therefore, to calculate the edge Loss, an edge consistency Loss is introduced, and charbonier Loss between the edge information extracted from the medium resolution image and the edge information extracted from the high resolution image is evaluated, as shown below.
Figure BDA0003178052310000154
Finally, the total loss of consistency is the sum of the individual losses of the image and the edge, as shown below.
LEEN=Limg_cl+Ledge_cl
As can be seen from fig. 7(a), the added adaptive uncertainty algorithm type has a good detection effect on the non-occluded target. It can be seen from fig. 7(b) that when the target density of the model is high, the detection effect is higher than the model without uncertainty algorithm in positioning accuracy, and the parameter is adjusted in time according to the target density, so that the generalization capability is better. It can be seen that the predicted bounding box can accurately detect the target object in the image after adding the localization uncertainty algorithm, wherein the dashed box represents the localization of the localization uncertainty represented by the adaptive NMS-gaussian distribution algorithm.
The method comprises the steps of converting a Pytrch algorithm Model trained by a self-adaptive Gaussian distribution uncertainty algorithm into a Hilens Kit pb Model suitable for an adaptor chip based on a Model Arts cloud AI development platform, compiling a skill template based on Pytrch 1.0 Python3.6 in a Hilens Studio IDE multi-language integrated development environment, simultaneously importing the Model Arts pb Model from an OBS storage server, compiling a logic reasoning code, finally obtaining a om skill Model in Hilens Studio, compiling and debugging the skill code, and then publishing skills, deploying and operating the skills on an end-side device Hilens Kit.
In order to actually test the intelligent vehicle sensing system in an outdoor environment, a simpler control strategy and a lane line identification algorithm are firstly designed, so that the intelligent vehicle can move along a travelable area of a lane line according to detected environmental information. Firstly, calculating the inter-frame running distance of the intelligent trolley according to the moving speed of the intelligent trolley and the frame difference time of image processing by Hilens, then judging whether the periphery of the next frame difference is a safe drivable area or not according to the detection result of the end-to-end combined model, and transmitting the detection result to a main control decision system, wherein the intelligent trolley controls the next step of movement of the intelligent trolley according to the detection result of a specific target, and the images 9(a), 9(b) and 9(c) show that the unmanned vehicle algorithm is actually measured under the conditions of good weather conditions and wind and snow weather in the outdoor actual environment, so that the algorithm under the actually measured environment still has good positioning accuracy and generalization capability in severe environment.
The intelligent vehicle is actually measured in an outdoor environment through a simple control strategy and a lane line identification algorithm. Hilens captures the front image of the surrounding environment and inputs the front image into an om detection model, and the output of the model controls the next action of the intelligent trolley through a controller. And the main control system of the intelligent trolley controls the stepping motor and the servo motor according to the corresponding action instruction. The test was performed in an outdoor environment on a campus. Under most circumstances, the intelligent trolley can safely drive along a lane line and accurately position surrounding targets, but under the condition that the surrounding environment is complex, the intelligent trolley cannot accurately judge traffic rules, the judgment capability of the two-dimensional image on the space position is limited, and the control precision of a specific target tracking task is improved.
The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (9)

1. The unmanned vehicle environment detection method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:
firstly, inputting an input low-resolution image into a G generator in a countermeasure generation network, inputting the image generated by the generator into an edge enhancement network, and then mapping the extracted edge features into a high-resolution space by utilizing an up-sampling operation of sub-pixel convolution to obtain a super-resolution image;
generating a medium-resolution image by a generator; then, sending the generated image with the medium resolution to an edge enhancement network to generate a super-resolution image, and then transmitting the generated super-resolution image to a target detector to perform a task of classifying and positioning a target;
step (3), self-adaptive Gaussian distribution modeling is carried out, and a threshold value is increased when the target density is increased and mutual shielding is increased through a dynamic suppression strategy; when the target density is low and the object appears independently, the threshold value is reduced;
and (4) fusing an EEN and an adaptive uncertainty target positioning end-to-end detection model based on the GAN.
2. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: in the step (1), the network is focused on real edge information by the constructed Mask branch by using an attention mechanism to remove noise and artifacts, and the constructed Mask branch is used for learning an image to detect and eliminate the separated noise, namely, the wrong edge point when the edge is extracted.
3. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: after the edge enhancement network removes image noise and extracts high-frequency edge detail features to finally generate a super-resolution image in the step (2); the discriminator for generating the countermeasure network judges the generated super-resolution image, judges whether the image is a false image or not, finds the difference value between the GT image and the middle super-resolution image, and reversely transmits the difference value to the G generator to generate the super-resolution image until the discriminator cannot judge whether the generated image is true or false with the high-resolution image, and the whole network training is completed.
4. The deep learning-based unmanned vehicle environment detection method according to claim 3, characterized in that: in the step (2), in the network branch generation during network training, an SRResNet structure is adopted as an integral network structure, a convolution layer with the step length of 1, 9 x 9 is used for extracting a primary feature map, then an RRDB dense residual block is used for extracting image semantic information to obtain a clearer edge feature map, then feature fusion is carried out on the edge feature map and the primary feature, and finally a medium-resolution image is obtained through an upsampling operation; sending the medium super-resolution image into an EEN network, and extracting edge features to obtain a super-resolution image; and finally, inputting the SR image into a YOLO3 detection network, and classifying and positioning to obtain a final result.
5. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: in the step (3), a CNN fitting density function is trained to be used as a supervision signal, namely, a picture is input, and the density of the object at each position can be output; the density function is shown as a formula (1), and the density of the target i is defined as the largest value of the bounding box iou in other targets of the label set;
Figure FDA0003178052300000021
thus, the dynamic suppression strategy is updated according to the density function definition using the following formula:
NM:=max(Nt,dM) (2)
Figure FDA0003178052300000022
wherein N isMAdaptive threshold representing target M, dMRepresents the density of the target M; there are three cases of dynamic suppression strategies: (1) when the adjacent bounding box is far away from M, i.e. iou (M, b)i)<NMConsistent with the initial NMS threshold; (2) when M is located in a dense region, i.e. dM>NtUsing the density value of M as the adaptive threshold N of A-NMSdM(ii) a Thus, adjacent candidate regions are retained, which may be located around M; (3) for objects in sparse regions, i.e. dM<NtNMS threshold equal to NtThus, FP can be reduced.
6. The deep learning-based unmanned vehicle environment detection method according to claim 5, characterized in that: in the step (3), the self-adaptive Gaussian distribution modeling adopts a Gaussian distribution function to model the prediction coordinates, the mean value of the output coordinates is used as the mean value of the Gaussian distribution, and the variance represents the uncertainty of the prediction positioning.
7. The deep learning-based unmanned vehicle environment detection method according to claim 6, characterized in that: the uncertainty of the Box coordinate in the adaptive Gaussian distribution modeling in the step (3) can be modeled and evaluated by each Gaussian model with the center coordinate, the width and the height; for a given test sample x, the output y may be modeled with a model t consisting of Gaussian parametersx,ty,tw,thTo indicate indeterminate positioning, as shown by the following equation, P (y | x) ═ N (y; μ (x), s2(x) Wherein μ (x) and s2(x) Mean and variance of Box coordinates, respectively, and y is a value under the gaussian distribution.
8. According to claim 7The unmanned vehicle environment detection method based on deep learning is characterized by comprising the following steps: to predict the uncertainty of the Box coordinates, the predicted feature map coordinates are mean and variance in Gaussian modeling, and the output is
Figure FDA0003178052300000031
Considering the nature of the gaussian distribution and the structure of the YOLO3 detection layer, the variance of the gaussian distribution is fixed between 0 and 1, fixing the range of the variance; therefore, the following formula is adopted to preprocess the Gaussian parameters and establish four Gaussian distributions;
Figure FDA0003178052300000041
Figure FDA0003178052300000042
mean value of each coordinate in the detection layer
Figure FDA0003178052300000043
Predicted coordinates representing a gaussian model; variance of each coordinate
Figure FDA0003178052300000044
Representing the uncertainty of each coordinate; because of the fact that
Figure FDA0003178052300000045
Represents the center coordinates of the frame, so is processed to a value between 0 and 1 using the sigmoid function; variance of each coordinate
Figure FDA0003178052300000046
Processing the value between 0 and 1 based on the Sigmoid function to represent the reliability of the coordinate; in YOLO3, the height and width information of the bounding box is processed as tw、thA priori bounding boxes, i.e. Gaussian parameters
Figure FDA0003178052300000047
Representing t in a YOLO networkw、th
9. The deep learning-based unmanned vehicle environment detection method according to claim 1, characterized in that: in the step (4), the image edge enhancement GAN network and the AN-Gaussian YOLOv3 are jointly designed into a unified framework, the loss function of the model is redesigned, the AN-Gaussian YOLOv3 algorithm is added to judge the loss to detect the loss, the COCO data set is tested, and the COCO data set is compared with other algorithms to verify the effectiveness of the algorithms.
CN202110838473.9A 2021-07-23 2021-07-23 Unmanned vehicle environment detection method based on deep learning Pending CN113436217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110838473.9A CN113436217A (en) 2021-07-23 2021-07-23 Unmanned vehicle environment detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110838473.9A CN113436217A (en) 2021-07-23 2021-07-23 Unmanned vehicle environment detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN113436217A true CN113436217A (en) 2021-09-24

Family

ID=77761668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110838473.9A Pending CN113436217A (en) 2021-07-23 2021-07-23 Unmanned vehicle environment detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN113436217A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963027A (en) * 2021-10-28 2022-01-21 广州文远知行科技有限公司 Uncertainty detection model training method and device, and uncertainty detection method and device
CN114581790A (en) * 2022-03-01 2022-06-03 哈尔滨理工大学 Small target detection method based on image enhancement and multi-feature fusion
CN115359258A (en) * 2022-08-26 2022-11-18 中国科学院国家空间科学中心 Weak and small target detection method and system for component uncertainty measurement
CN115471773A (en) * 2022-09-16 2022-12-13 北京联合大学 Student tracking method and system for intelligent classroom
CN116469047A (en) * 2023-03-20 2023-07-21 南通锡鼎智能科技有限公司 Small target detection method and detection device for laboratory teaching

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458758A (en) * 2019-07-29 2019-11-15 武汉工程大学 A kind of image super-resolution rebuilding method, system and computer storage medium
CN111524135A (en) * 2020-05-11 2020-08-11 安徽继远软件有限公司 Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line
CN111899172A (en) * 2020-07-16 2020-11-06 武汉大学 Vehicle target detection method oriented to remote sensing application scene
CN112906547A (en) * 2021-02-09 2021-06-04 哈尔滨市科佳通用机电股份有限公司 Railway train windshield breakage fault detection method based on E-YOLO

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458758A (en) * 2019-07-29 2019-11-15 武汉工程大学 A kind of image super-resolution rebuilding method, system and computer storage medium
CN111524135A (en) * 2020-05-11 2020-08-11 安徽继远软件有限公司 Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line
CN111899172A (en) * 2020-07-16 2020-11-06 武汉大学 Vehicle target detection method oriented to remote sensing application scene
CN112906547A (en) * 2021-02-09 2021-06-04 哈尔滨市科佳通用机电股份有限公司 Railway train windshield breakage fault detection method based on E-YOLO

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JAKARIA RABBI ET AL.: "Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network", 《REMOTE SENSING》 *
SONGTAO LIU,DI HUANG,YUNHONG WANG: "Adaptive NMS:Refining Pedestrian Detection in a Crowd", 《ARXIV:1904.03629V1[CS.CV] 7 APR 2019》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963027A (en) * 2021-10-28 2022-01-21 广州文远知行科技有限公司 Uncertainty detection model training method and device, and uncertainty detection method and device
CN113963027B (en) * 2021-10-28 2022-09-09 广州文远知行科技有限公司 Uncertainty detection model training method and device, and uncertainty detection method and device
CN114581790A (en) * 2022-03-01 2022-06-03 哈尔滨理工大学 Small target detection method based on image enhancement and multi-feature fusion
CN115359258A (en) * 2022-08-26 2022-11-18 中国科学院国家空间科学中心 Weak and small target detection method and system for component uncertainty measurement
CN115471773A (en) * 2022-09-16 2022-12-13 北京联合大学 Student tracking method and system for intelligent classroom
CN115471773B (en) * 2022-09-16 2023-09-15 北京联合大学 Intelligent classroom-oriented student tracking method and system
CN116469047A (en) * 2023-03-20 2023-07-21 南通锡鼎智能科技有限公司 Small target detection method and detection device for laboratory teaching

Similar Documents

Publication Publication Date Title
CN113436217A (en) Unmanned vehicle environment detection method based on deep learning
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN110781838B (en) Multi-mode track prediction method for pedestrians in complex scene
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN110929578B (en) Anti-shielding pedestrian detection method based on attention mechanism
CN111626128B (en) Pedestrian detection method based on improved YOLOv3 in orchard environment
CN109635685A (en) Target object 3D detection method, device, medium and equipment
CN113076871B (en) Fish shoal automatic detection method based on target shielding compensation
JP2020038660A (en) Learning method and learning device for detecting lane by using cnn, and test method and test device using the same
CN111709300B (en) Crowd counting method based on video image
CN113920107A (en) Insulator damage detection method based on improved yolov5 algorithm
CN113408584B (en) RGB-D multi-modal feature fusion 3D target detection method
JP2020038661A (en) Learning method and learning device for detecting lane by using lane model, and test method and test device using the same
CN113705636A (en) Method and device for predicting trajectory of automatic driving vehicle and electronic equipment
CN113936210A (en) Anti-collision method for tower crane
CN112347930A (en) High-resolution image scene classification method based on self-learning semi-supervised deep neural network
CN114049541A (en) Visual scene recognition method based on structural information characteristic decoupling and knowledge migration
CN113177439A (en) Method for detecting pedestrian crossing road guardrail
Yi et al. End-to-end neural network for autonomous steering using lidar point cloud data
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
CN112614158B (en) Sampling frame self-adaptive multi-feature fusion online target tracking method
Li et al. Detection and discrimination of obstacles to vehicle environment under convolutional neural networks
Shi et al. A novel model based on deep learning for Pedestrian detection and Trajectory prediction
Xu et al. Generative adversarial network for image raindrop removal of transmission line based on unmanned aerial vehicle inspection
Harba et al. Prediction of dust storm direction from satellite images by utilized deep learning neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210924