CN111126338B

CN111126338B - Intelligent vehicle environment perception method integrating visual attention mechanism

Info

Publication number: CN111126338B
Application number: CN201911412860.5A
Authority: CN
Inventors: 连静; 王政皓; 李琳辉; 周雅夫; 尹昱航; 李磊
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2022-09-16
Anticipated expiration: 2039-12-31
Also published as: CN111126338A

Abstract

The invention discloses an intelligent vehicle environment perception method integrating a visual attention mechanism, which comprises the following steps of: inputting the processed disparity map and the processed gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features; normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate an attention distribution weight W related to the depth and the vehicle corner; performing fusion by adopting a Hadamard product mode to generate a visual attention characteristic A; and inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target. The invention introduces a visual attention mechanism, is beneficial to reducing the occupation of irrelevant areas in the image on computing resources, and has higher detection accuracy on the areas with concentrated attention. The invention can reduce the complexity of traffic scenes, reduce the computing resources occupied by irrelevant areas and improve the real-time property of target detection.

Description

Intelligent vehicle environment perception method integrating visual attention mechanism

Technical Field

The invention relates to the field of environment perception of intelligent vehicles, in particular to an intelligent vehicle environment perception method based on computer vision.

Background

With the development of the automobile automatic driving technology and the intelligent networking technology, the intelligent vehicle becomes a current research hotspot, the environment perception technology of the intelligent vehicle is the most challenging technical problem in the field of intelligent vehicles, and the premise that how to accurately identify the obstacle information around the vehicle in real time is the automatic driving of the automobile.

Currently, in the field of computer vision, a target detection algorithm based on deep learning is the mainstream environmental perception method, which is defined as: the method includes simulating the visual perception process of human through deep learning, processing images acquired by a visual sensor, and identifying and marking the position and the type of a target from input images. Although the deep learning algorithm can obtain higher recognition accuracy, the recognition speed of the deep learning algorithm can not meet the real-time requirement of automatic driving under the low-cost requirement, and the recognition accuracy can be obviously reduced under the condition of complex background. Among them, one important factor affecting the deep learning speed is: when the computer senses the image, each area in the image can be traversed indiscriminately, and feature extraction and classification identification are carried out on the area; when the human senses the image acquired by the eyes, the human focuses attention on the key object or region and automatically ignores some irrelevant regions, so that the image processing speed is remarkably improved, and the recognition accuracy can be correspondingly improved in the region with focused attention.

Based on the analysis, if the target detection algorithm based on deep learning is optimized by combining the attention characteristics of human drivers, the algorithm speed can be effectively improved, and the accuracy of target identification in the attention focusing area can be improved. Generally, human drivers focus primarily on areas within a certain distance while driving a car, and as the distance increases, the attention allocated to humans also decreases; when the vehicle turns to the right, attention is mainly focused on the left area of the visual field, and higher attention is paid to the vehicle and the obstacle on the left side.

Disclosure of Invention

In order to solve the technical problems, the invention provides an intelligent vehicle environment perception method integrating a visual attention mechanism and giving consideration to both accuracy and real-time performance.

In order to achieve the purpose, the technical scheme of the invention is as follows: an intelligent vehicle environment perception method integrating a visual attention mechanism comprises the following steps:

a: image preprocessing, namely performing gray processing on an RGB image output by a binocular stereo vision system to generate a gray image, processing the disparity image by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interested area, and filtering the ground area and the non-interested area in the disparity image.

B: and inputting the processed disparity map and gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features.

C: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; and when the parallax value is smaller than the threshold value T, the weight is set to be 0. When the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side of the feature diagram of the D feature is assigned with higher weight, the larger the vehicle steering angle is, the higher the assigned weight is, the left side is assigned with lower weight, and the left side area weight is gradually reduced from the right to the left.

The process of generating the attention assignment weight is summarized as the following formula:

in the formula, D _i,j The parallax value of the pixel point corresponding to the ith row and the jth column is obtained; d _max And D _min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image.

D: and B, fusing the gray features generated based on the steps B and C and the attention distribution weight W by adopting a Hadamard product mode, and weighting the gray features to generate the features with visual attention distribution, namely the visual attention features A. Through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with weight 0 on the attention distribution weight map is also 0, and the feature with weight 0 is an irrelevant feature. The formula for visual attention feature a fusion is as follows:

A＝W⊙G

in equation, as |, Hadamard product operator.

E: and inputting the visual attention feature A into a sparse compression module, and filtering out rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features. And inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target.

Compared with the prior art, the invention has the advantages that:

1. the method includes the steps that a visual attention mechanism is embedded into a deep learning model, weight distribution is conducted on features extracted by a deep network based on the attention model, and the position and the category of a target are regressed and predicted; the introduction of the visual attention mechanism helps to reduce the occupation of computing resources by irrelevant areas in the image, and has higher detection accuracy for the attention-focused areas.

2. According to the invention, the depth characteristics and the vehicle corner signals are subjected to normalization processing to obtain the attention distribution weight, so that the complexity of a traffic scene can be reduced, the calculation resources occupied by irrelevant areas can be reduced, and the real-time performance of target detection can be improved.

3. According to the invention, the attention distribution weight is obtained by carrying out normalization processing on the depth characteristic and the vehicle corner signal, the image is subjected to the processing with the emphasis, and the region with concentrated attention obtains higher detection accuracy.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. As shown in fig. 1, an intelligent vehicle environment perception method with a fused visual attention mechanism includes the following steps:

a: the method comprises the steps of carrying out gray processing on an RGB image acquired from a binocular stereo vision system to generate a gray image, processing a disparity image output by the binocular stereo vision system by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interest area, filtering the ground area and the non-interest area in the disparity image, and obtaining a D _ image in the disparity image after the V disparity processing.

B: inputting the gray map and the disparity map into a weight-shared twin convolutional neural network, and extracting a gray feature, namely a G feature, and a depth feature, namely a D feature, in this embodiment, two retrained convolutional neural networks VGGNet are adopted as the twin convolutional neural network, the network for extracting the G feature is VGGNet-1, and the network for extracting the D feature is VGGNet-2.

C: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; and when the disparity value is smaller than the threshold T, the weight is set to 0, and in this embodiment, the threshold T is selected to be 3. When the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side feature of the feature diagram of the D feature is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the left side feature is assigned with lower weight, and the left side region weight is gradually reduced from the right to the left.

in the formula, D _i,j The parallax value of a pixel point corresponding to the ith row and the jth column is obtained; d _max And D _min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image.

D: and B, fusing the gray features generated based on the steps B and C and the attention distribution weight W by adopting a Hadamard product mode, and weighting the gray features to generate the features with visual attention distribution, namely the visual attention features A. Through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with 0 on the attention allocation weight map is also 0, and the feature with 0 is an irrelevant feature.

The formula for feature fusion is as follows:

A＝W⊙G

in equation,. is the Hadamard product operator.

E: and inputting the visual attention feature A into a sparse compression module, and filtering out rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features. And inputting the compressed visual attention feature A into a Regression Prediction Network (RPN) for regression prediction to obtain the position and the category of the target.

The present invention is not limited to the embodiment, and any equivalent idea or change within the technical scope of the present invention is to be regarded as the protection scope of the present invention.

Claims

1. An intelligent vehicle environment perception method fused with a visual attention mechanism is characterized in that: the method comprises the following steps:

a: image preprocessing, namely performing gray processing on an RGB image output by a binocular stereo vision system to generate a gray image, processing the disparity image by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interested area, and filtering the ground area and the non-interested area in the disparity image;

b: inputting the processed disparity map and the processed gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features;

c: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; when the parallax value is smaller than the threshold value T, the weight is set to be 0; when the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the left side is assigned with lower weight, and the weight of the left side area is gradually reduced from the right to the left;

in the formula, D _i,j The parallax value of the pixel point corresponding to the ith row and the jth column is obtained; d _max And D _min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image;

d: fusing the gray features generated in the steps B and C and the attention distribution weight W in a Hadamard product mode, and weighting the gray features to generate features with visual attention distribution, namely visual attention features A; through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with the weight of 0 on the attention distribution weight map is also 0, and the feature with the weight of 0 is an irrelevant feature; the formula for the visual attention feature a fusion is as follows:

A＝W⊙G

in equation, "Hadamard product operator";

e: inputting the visual attention feature A into a sparse compression module, and filtering rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features; and inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target.