CN111126338B - Intelligent vehicle environment perception method integrating visual attention mechanism - Google Patents
Intelligent vehicle environment perception method integrating visual attention mechanism Download PDFInfo
- Publication number
- CN111126338B CN111126338B CN201911412860.5A CN201911412860A CN111126338B CN 111126338 B CN111126338 B CN 111126338B CN 201911412860 A CN201911412860 A CN 201911412860A CN 111126338 B CN111126338 B CN 111126338B
- Authority
- CN
- China
- Prior art keywords
- weight
- features
- attention
- vehicle
- visual attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Abstract
The invention discloses an intelligent vehicle environment perception method integrating a visual attention mechanism, which comprises the following steps of: inputting the processed disparity map and the processed gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features; normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate an attention distribution weight W related to the depth and the vehicle corner; performing fusion by adopting a Hadamard product mode to generate a visual attention characteristic A; and inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target. The invention introduces a visual attention mechanism, is beneficial to reducing the occupation of irrelevant areas in the image on computing resources, and has higher detection accuracy on the areas with concentrated attention. The invention can reduce the complexity of traffic scenes, reduce the computing resources occupied by irrelevant areas and improve the real-time property of target detection.
Description
Technical Field
The invention relates to the field of environment perception of intelligent vehicles, in particular to an intelligent vehicle environment perception method based on computer vision.
Background
With the development of the automobile automatic driving technology and the intelligent networking technology, the intelligent vehicle becomes a current research hotspot, the environment perception technology of the intelligent vehicle is the most challenging technical problem in the field of intelligent vehicles, and the premise that how to accurately identify the obstacle information around the vehicle in real time is the automatic driving of the automobile.
Currently, in the field of computer vision, a target detection algorithm based on deep learning is the mainstream environmental perception method, which is defined as: the method includes simulating the visual perception process of human through deep learning, processing images acquired by a visual sensor, and identifying and marking the position and the type of a target from input images. Although the deep learning algorithm can obtain higher recognition accuracy, the recognition speed of the deep learning algorithm can not meet the real-time requirement of automatic driving under the low-cost requirement, and the recognition accuracy can be obviously reduced under the condition of complex background. Among them, one important factor affecting the deep learning speed is: when the computer senses the image, each area in the image can be traversed indiscriminately, and feature extraction and classification identification are carried out on the area; when the human senses the image acquired by the eyes, the human focuses attention on the key object or region and automatically ignores some irrelevant regions, so that the image processing speed is remarkably improved, and the recognition accuracy can be correspondingly improved in the region with focused attention.
Based on the analysis, if the target detection algorithm based on deep learning is optimized by combining the attention characteristics of human drivers, the algorithm speed can be effectively improved, and the accuracy of target identification in the attention focusing area can be improved. Generally, human drivers focus primarily on areas within a certain distance while driving a car, and as the distance increases, the attention allocated to humans also decreases; when the vehicle turns to the right, attention is mainly focused on the left area of the visual field, and higher attention is paid to the vehicle and the obstacle on the left side.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent vehicle environment perception method integrating a visual attention mechanism and giving consideration to both accuracy and real-time performance.
In order to achieve the purpose, the technical scheme of the invention is as follows: an intelligent vehicle environment perception method integrating a visual attention mechanism comprises the following steps:
a: image preprocessing, namely performing gray processing on an RGB image output by a binocular stereo vision system to generate a gray image, processing the disparity image by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interested area, and filtering the ground area and the non-interested area in the disparity image.
B: and inputting the processed disparity map and gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features.
C: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; and when the parallax value is smaller than the threshold value T, the weight is set to be 0. When the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side of the feature diagram of the D feature is assigned with higher weight, the larger the vehicle steering angle is, the higher the assigned weight is, the left side is assigned with lower weight, and the left side area weight is gradually reduced from the right to the left.
The process of generating the attention assignment weight is summarized as the following formula:
in the formula, D i,j The parallax value of the pixel point corresponding to the ith row and the jth column is obtained; d max And D min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image.
D: and B, fusing the gray features generated based on the steps B and C and the attention distribution weight W by adopting a Hadamard product mode, and weighting the gray features to generate the features with visual attention distribution, namely the visual attention features A. Through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with weight 0 on the attention distribution weight map is also 0, and the feature with weight 0 is an irrelevant feature. The formula for visual attention feature a fusion is as follows:
A=W⊙G
in equation, as |, Hadamard product operator.
E: and inputting the visual attention feature A into a sparse compression module, and filtering out rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features. And inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target.
Compared with the prior art, the invention has the advantages that:
1. the method includes the steps that a visual attention mechanism is embedded into a deep learning model, weight distribution is conducted on features extracted by a deep network based on the attention model, and the position and the category of a target are regressed and predicted; the introduction of the visual attention mechanism helps to reduce the occupation of computing resources by irrelevant areas in the image, and has higher detection accuracy for the attention-focused areas.
2. According to the invention, the depth characteristics and the vehicle corner signals are subjected to normalization processing to obtain the attention distribution weight, so that the complexity of a traffic scene can be reduced, the calculation resources occupied by irrelevant areas can be reduced, and the real-time performance of target detection can be improved.
3. According to the invention, the attention distribution weight is obtained by carrying out normalization processing on the depth characteristic and the vehicle corner signal, the image is subjected to the processing with the emphasis, and the region with concentrated attention obtains higher detection accuracy.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. As shown in fig. 1, an intelligent vehicle environment perception method with a fused visual attention mechanism includes the following steps:
a: the method comprises the steps of carrying out gray processing on an RGB image acquired from a binocular stereo vision system to generate a gray image, processing a disparity image output by the binocular stereo vision system by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interest area, filtering the ground area and the non-interest area in the disparity image, and obtaining a D _ image in the disparity image after the V disparity processing.
B: inputting the gray map and the disparity map into a weight-shared twin convolutional neural network, and extracting a gray feature, namely a G feature, and a depth feature, namely a D feature, in this embodiment, two retrained convolutional neural networks VGGNet are adopted as the twin convolutional neural network, the network for extracting the G feature is VGGNet-1, and the network for extracting the D feature is VGGNet-2.
C: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; and when the disparity value is smaller than the threshold T, the weight is set to 0, and in this embodiment, the threshold T is selected to be 3. When the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side feature of the feature diagram of the D feature is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the left side feature is assigned with lower weight, and the left side region weight is gradually reduced from the right to the left.
The process of generating the attention assignment weight is summarized as the following formula:
in the formula, D i,j The parallax value of a pixel point corresponding to the ith row and the jth column is obtained; d max And D min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image.
D: and B, fusing the gray features generated based on the steps B and C and the attention distribution weight W by adopting a Hadamard product mode, and weighting the gray features to generate the features with visual attention distribution, namely the visual attention features A. Through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with 0 on the attention allocation weight map is also 0, and the feature with 0 is an irrelevant feature.
The formula for feature fusion is as follows:
A=W⊙G
in equation,. is the Hadamard product operator.
E: and inputting the visual attention feature A into a sparse compression module, and filtering out rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features. And inputting the compressed visual attention feature A into a Regression Prediction Network (RPN) for regression prediction to obtain the position and the category of the target.
The present invention is not limited to the embodiment, and any equivalent idea or change within the technical scope of the present invention is to be regarded as the protection scope of the present invention.
Claims (1)
1. An intelligent vehicle environment perception method fused with a visual attention mechanism is characterized in that: the method comprises the following steps:
a: image preprocessing, namely performing gray processing on an RGB image output by a binocular stereo vision system to generate a gray image, processing the disparity image by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interested area, and filtering the ground area and the non-interested area in the disparity image;
b: inputting the processed disparity map and the processed gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features;
c: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; when the parallax value is smaller than the threshold value T, the weight is set to be 0; when the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the left side is assigned with lower weight, and the weight of the left side area is gradually reduced from the right to the left;
the process of generating the attention assignment weight is summarized as the following formula:
in the formula, D i,j The parallax value of the pixel point corresponding to the ith row and the jth column is obtained; d max And D min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image;
d: fusing the gray features generated in the steps B and C and the attention distribution weight W in a Hadamard product mode, and weighting the gray features to generate features with visual attention distribution, namely visual attention features A; through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with the weight of 0 on the attention distribution weight map is also 0, and the feature with the weight of 0 is an irrelevant feature; the formula for the visual attention feature a fusion is as follows:
A=W⊙G
in equation, "Hadamard product operator";
e: inputting the visual attention feature A into a sparse compression module, and filtering rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features; and inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911412860.5A CN111126338B (en) | 2019-12-31 | 2019-12-31 | Intelligent vehicle environment perception method integrating visual attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911412860.5A CN111126338B (en) | 2019-12-31 | 2019-12-31 | Intelligent vehicle environment perception method integrating visual attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111126338A CN111126338A (en) | 2020-05-08 |
CN111126338B true CN111126338B (en) | 2022-09-16 |
Family
ID=70506515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911412860.5A Active CN111126338B (en) | 2019-12-31 | 2019-12-31 | Intelligent vehicle environment perception method integrating visual attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111126338B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112508058B (en) * | 2020-11-17 | 2023-11-14 | 安徽继远软件有限公司 | Transformer fault diagnosis method and device based on audio feature analysis |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886269A (en) * | 2019-02-27 | 2019-06-14 | 南京中设航空科技发展有限公司 | A kind of transit advertising board recognition methods based on attention mechanism |
CN110378242A (en) * | 2019-06-26 | 2019-10-25 | 南京信息工程大学 | A kind of remote sensing target detection method of dual attention mechanism |
-
2019
- 2019-12-31 CN CN201911412860.5A patent/CN111126338B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886269A (en) * | 2019-02-27 | 2019-06-14 | 南京中设航空科技发展有限公司 | A kind of transit advertising board recognition methods based on attention mechanism |
CN110378242A (en) * | 2019-06-26 | 2019-10-25 | 南京信息工程大学 | A kind of remote sensing target detection method of dual attention mechanism |
Non-Patent Citations (2)
Title |
---|
基于双目视觉感兴趣区域的行人检测;应光林;《信息通信》;20180315(第03期);全文 * |
基于改进YOLOv3网络的无人车夜间环境感知;裴嘉欣等;《应用光学》;20190515(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111126338A (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110097109B (en) | Road environment obstacle detection system and method based on deep learning | |
CN107506711B (en) | Convolutional neural network-based binocular vision barrier detection system and method | |
CN105550665B (en) | A kind of pilotless automobile based on binocular vision can lead to method for detecting area | |
CN108875608B (en) | Motor vehicle traffic signal identification method based on deep learning | |
CN111460919B (en) | Monocular vision road target detection and distance estimation method based on improved YOLOv3 | |
CN111860274B (en) | Traffic police command gesture recognition method based on head orientation and upper half skeleton characteristics | |
WO2021016873A1 (en) | Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium | |
US20230075836A1 (en) | Model training method and related device | |
CN103034843A (en) | Method for detecting vehicle at night based on monocular vision | |
CN111491093A (en) | Method and device for adjusting field angle of camera | |
Wang et al. | The research on edge detection algorithm of lane | |
CN112825192A (en) | Object identification system and method based on machine learning | |
CN104915642A (en) | Method and apparatus for measurement of distance to vehicle ahead | |
CN111860316A (en) | Driving behavior recognition method and device and storage medium | |
CN107220632B (en) | Road surface image segmentation method based on normal characteristic | |
CN117095368A (en) | Traffic small target detection method based on YOLOV5 fusion multi-target feature enhanced network and attention mechanism | |
CN114359876A (en) | Vehicle target identification method and storage medium | |
CN111126338B (en) | Intelligent vehicle environment perception method integrating visual attention mechanism | |
CN117111055A (en) | Vehicle state sensing method based on thunder fusion | |
CN114973199A (en) | Rail transit train obstacle detection method based on convolutional neural network | |
CN106650814B (en) | Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision | |
CN112509321A (en) | Unmanned aerial vehicle-based driving control method and system for urban complex traffic situation and readable storage medium | |
CN113989495B (en) | Pedestrian calling behavior recognition method based on vision | |
CN111062311B (en) | Pedestrian gesture recognition and interaction method based on depth-level separable convolution network | |
CN114429621A (en) | UFSA algorithm-based improved lane line intelligent detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |