CN111126338B - Intelligent vehicle environment perception method integrating visual attention mechanism - Google Patents

Intelligent vehicle environment perception method integrating visual attention mechanism Download PDF

Info

Publication number
CN111126338B
CN111126338B CN201911412860.5A CN201911412860A CN111126338B CN 111126338 B CN111126338 B CN 111126338B CN 201911412860 A CN201911412860 A CN 201911412860A CN 111126338 B CN111126338 B CN 111126338B
Authority
CN
China
Prior art keywords
weight
features
attention
vehicle
visual attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911412860.5A
Other languages
Chinese (zh)
Other versions
CN111126338A (en
Inventor
连静
王政皓
李琳辉
周雅夫
尹昱航
李磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201911412860.5A priority Critical patent/CN111126338B/en
Publication of CN111126338A publication Critical patent/CN111126338A/en
Application granted granted Critical
Publication of CN111126338B publication Critical patent/CN111126338B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

The invention discloses an intelligent vehicle environment perception method integrating a visual attention mechanism, which comprises the following steps of: inputting the processed disparity map and the processed gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features; normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate an attention distribution weight W related to the depth and the vehicle corner; performing fusion by adopting a Hadamard product mode to generate a visual attention characteristic A; and inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target. The invention introduces a visual attention mechanism, is beneficial to reducing the occupation of irrelevant areas in the image on computing resources, and has higher detection accuracy on the areas with concentrated attention. The invention can reduce the complexity of traffic scenes, reduce the computing resources occupied by irrelevant areas and improve the real-time property of target detection.

Description

Intelligent vehicle environment perception method integrating visual attention mechanism
Technical Field
The invention relates to the field of environment perception of intelligent vehicles, in particular to an intelligent vehicle environment perception method based on computer vision.
Background
With the development of the automobile automatic driving technology and the intelligent networking technology, the intelligent vehicle becomes a current research hotspot, the environment perception technology of the intelligent vehicle is the most challenging technical problem in the field of intelligent vehicles, and the premise that how to accurately identify the obstacle information around the vehicle in real time is the automatic driving of the automobile.
Currently, in the field of computer vision, a target detection algorithm based on deep learning is the mainstream environmental perception method, which is defined as: the method includes simulating the visual perception process of human through deep learning, processing images acquired by a visual sensor, and identifying and marking the position and the type of a target from input images. Although the deep learning algorithm can obtain higher recognition accuracy, the recognition speed of the deep learning algorithm can not meet the real-time requirement of automatic driving under the low-cost requirement, and the recognition accuracy can be obviously reduced under the condition of complex background. Among them, one important factor affecting the deep learning speed is: when the computer senses the image, each area in the image can be traversed indiscriminately, and feature extraction and classification identification are carried out on the area; when the human senses the image acquired by the eyes, the human focuses attention on the key object or region and automatically ignores some irrelevant regions, so that the image processing speed is remarkably improved, and the recognition accuracy can be correspondingly improved in the region with focused attention.
Based on the analysis, if the target detection algorithm based on deep learning is optimized by combining the attention characteristics of human drivers, the algorithm speed can be effectively improved, and the accuracy of target identification in the attention focusing area can be improved. Generally, human drivers focus primarily on areas within a certain distance while driving a car, and as the distance increases, the attention allocated to humans also decreases; when the vehicle turns to the right, attention is mainly focused on the left area of the visual field, and higher attention is paid to the vehicle and the obstacle on the left side.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intelligent vehicle environment perception method integrating a visual attention mechanism and giving consideration to both accuracy and real-time performance.
In order to achieve the purpose, the technical scheme of the invention is as follows: an intelligent vehicle environment perception method integrating a visual attention mechanism comprises the following steps:
a: image preprocessing, namely performing gray processing on an RGB image output by a binocular stereo vision system to generate a gray image, processing the disparity image by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interested area, and filtering the ground area and the non-interested area in the disparity image.
B: and inputting the processed disparity map and gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features.
C: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; and when the parallax value is smaller than the threshold value T, the weight is set to be 0. When the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side of the feature diagram of the D feature is assigned with higher weight, the larger the vehicle steering angle is, the higher the assigned weight is, the left side is assigned with lower weight, and the left side area weight is gradually reduced from the right to the left.
The process of generating the attention assignment weight is summarized as the following formula:
Figure BDA0002350420020000021
in the formula, D i,j The parallax value of the pixel point corresponding to the ith row and the jth column is obtained; d max And D min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image.
D: and B, fusing the gray features generated based on the steps B and C and the attention distribution weight W by adopting a Hadamard product mode, and weighting the gray features to generate the features with visual attention distribution, namely the visual attention features A. Through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with weight 0 on the attention distribution weight map is also 0, and the feature with weight 0 is an irrelevant feature. The formula for visual attention feature a fusion is as follows:
A=W⊙G
in equation, as |, Hadamard product operator.
E: and inputting the visual attention feature A into a sparse compression module, and filtering out rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features. And inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target.
Compared with the prior art, the invention has the advantages that:
1. the method includes the steps that a visual attention mechanism is embedded into a deep learning model, weight distribution is conducted on features extracted by a deep network based on the attention model, and the position and the category of a target are regressed and predicted; the introduction of the visual attention mechanism helps to reduce the occupation of computing resources by irrelevant areas in the image, and has higher detection accuracy for the attention-focused areas.
2. According to the invention, the depth characteristics and the vehicle corner signals are subjected to normalization processing to obtain the attention distribution weight, so that the complexity of a traffic scene can be reduced, the calculation resources occupied by irrelevant areas can be reduced, and the real-time performance of target detection can be improved.
3. According to the invention, the attention distribution weight is obtained by carrying out normalization processing on the depth characteristic and the vehicle corner signal, the image is subjected to the processing with the emphasis, and the region with concentrated attention obtains higher detection accuracy.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. As shown in fig. 1, an intelligent vehicle environment perception method with a fused visual attention mechanism includes the following steps:
a: the method comprises the steps of carrying out gray processing on an RGB image acquired from a binocular stereo vision system to generate a gray image, processing a disparity image output by the binocular stereo vision system by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interest area, filtering the ground area and the non-interest area in the disparity image, and obtaining a D _ image in the disparity image after the V disparity processing.
B: inputting the gray map and the disparity map into a weight-shared twin convolutional neural network, and extracting a gray feature, namely a G feature, and a depth feature, namely a D feature, in this embodiment, two retrained convolutional neural networks VGGNet are adopted as the twin convolutional neural network, the network for extracting the G feature is VGGNet-1, and the network for extracting the D feature is VGGNet-2.
C: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; and when the disparity value is smaller than the threshold T, the weight is set to 0, and in this embodiment, the threshold T is selected to be 3. When the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side feature of the feature diagram of the D feature is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the left side feature is assigned with lower weight, and the left side region weight is gradually reduced from the right to the left.
The process of generating the attention assignment weight is summarized as the following formula:
Figure BDA0002350420020000041
in the formula, D i,j The parallax value of a pixel point corresponding to the ith row and the jth column is obtained; d max And D min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image.
D: and B, fusing the gray features generated based on the steps B and C and the attention distribution weight W by adopting a Hadamard product mode, and weighting the gray features to generate the features with visual attention distribution, namely the visual attention features A. Through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with 0 on the attention allocation weight map is also 0, and the feature with 0 is an irrelevant feature.
The formula for feature fusion is as follows:
A=W⊙G
in equation,. is the Hadamard product operator.
E: and inputting the visual attention feature A into a sparse compression module, and filtering out rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features. And inputting the compressed visual attention feature A into a Regression Prediction Network (RPN) for regression prediction to obtain the position and the category of the target.
The present invention is not limited to the embodiment, and any equivalent idea or change within the technical scope of the present invention is to be regarded as the protection scope of the present invention.

Claims (1)

1. An intelligent vehicle environment perception method fused with a visual attention mechanism is characterized in that: the method comprises the following steps:
a: image preprocessing, namely performing gray processing on an RGB image output by a binocular stereo vision system to generate a gray image, processing the disparity image by using a V disparity algorithm, extracting a ground area in the disparity image, setting an area exceeding a certain height of a vehicle as a non-interested area, and filtering the ground area and the non-interested area in the disparity image;
b: inputting the processed disparity map and the processed gray map into a weight-sharing twin convolutional neural network, and extracting gray features, namely G features, and depth features, namely D features;
c: normalizing the D characteristic and the vehicle corner signal by using a normalization algorithm to generate a weight distribution characteristic related to depth and vehicle corners, namely attention distribution weight W, wherein the distribution rule is as follows: the larger the parallax value is, the closer the representative distance is, the larger the weight is, and the greater the allocated attention is; the smaller the parallax value is, the farther the representative distance is, the smaller the weight is, and the smaller the allocated attention is; when the parallax value is smaller than the threshold value T, the weight is set to be 0; when the vehicle steering sensor obtains a positive steering angle, namely the vehicle turns to the right, the left side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the lower the right side is, and the weight of the right side area is gradually reduced from left to right; when the steering angle is negative, namely the vehicle turns to the left, the right side of the characteristic diagram of the D characteristic is assigned with higher weight, the larger the vehicle turning angle is, the higher the assigned weight is, the left side is assigned with lower weight, and the weight of the left side area is gradually reduced from the right to the left;
the process of generating the attention assignment weight is summarized as the following formula:
Figure FDA0002350420010000011
in the formula, D i,j The parallax value of the pixel point corresponding to the ith row and the jth column is obtained; d max And D min Respectively a maximum parallax value and a minimum parallax value on the parallax map; theta is a vehicle turning angle; h is the width of the image;
d: fusing the gray features generated in the steps B and C and the attention distribution weight W in a Hadamard product mode, and weighting the gray features to generate features with visual attention distribution, namely visual attention features A; through Hadamard product fusion, the pixel value on the visual attention feature map corresponding to the pixel with the weight of 0 on the attention distribution weight map is also 0, and the feature with the weight of 0 is an irrelevant feature; the formula for the visual attention feature a fusion is as follows:
A=W⊙G
in equation, "Hadamard product operator";
e: inputting the visual attention feature A into a sparse compression module, and filtering rows or columns with low sparsity in the input feature map by the sparse compression module to reduce the proportion of irrelevant features; and inputting the compressed visual attention feature A into a regression prediction network for regression prediction to obtain the position and the category of the target.
CN201911412860.5A 2019-12-31 2019-12-31 Intelligent vehicle environment perception method integrating visual attention mechanism Active CN111126338B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911412860.5A CN111126338B (en) 2019-12-31 2019-12-31 Intelligent vehicle environment perception method integrating visual attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911412860.5A CN111126338B (en) 2019-12-31 2019-12-31 Intelligent vehicle environment perception method integrating visual attention mechanism

Publications (2)

Publication Number Publication Date
CN111126338A CN111126338A (en) 2020-05-08
CN111126338B true CN111126338B (en) 2022-09-16

Family

ID=70506515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911412860.5A Active CN111126338B (en) 2019-12-31 2019-12-31 Intelligent vehicle environment perception method integrating visual attention mechanism

Country Status (1)

Country Link
CN (1) CN111126338B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508058B (en) * 2020-11-17 2023-11-14 安徽继远软件有限公司 Transformer fault diagnosis method and device based on audio feature analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886269A (en) * 2019-02-27 2019-06-14 南京中设航空科技发展有限公司 A kind of transit advertising board recognition methods based on attention mechanism
CN110378242A (en) * 2019-06-26 2019-10-25 南京信息工程大学 A kind of remote sensing target detection method of dual attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886269A (en) * 2019-02-27 2019-06-14 南京中设航空科技发展有限公司 A kind of transit advertising board recognition methods based on attention mechanism
CN110378242A (en) * 2019-06-26 2019-10-25 南京信息工程大学 A kind of remote sensing target detection method of dual attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于双目视觉感兴趣区域的行人检测;应光林;《信息通信》;20180315(第03期);全文 *
基于改进YOLOv3网络的无人车夜间环境感知;裴嘉欣等;《应用光学》;20190515(第03期);全文 *

Also Published As

Publication number Publication date
CN111126338A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN110097109B (en) Road environment obstacle detection system and method based on deep learning
CN107506711B (en) Convolutional neural network-based binocular vision barrier detection system and method
CN105550665B (en) A kind of pilotless automobile based on binocular vision can lead to method for detecting area
CN108875608B (en) Motor vehicle traffic signal identification method based on deep learning
CN111460919B (en) Monocular vision road target detection and distance estimation method based on improved YOLOv3
CN111860274B (en) Traffic police command gesture recognition method based on head orientation and upper half skeleton characteristics
WO2021016873A1 (en) Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium
US20230075836A1 (en) Model training method and related device
CN103034843A (en) Method for detecting vehicle at night based on monocular vision
CN111491093A (en) Method and device for adjusting field angle of camera
Wang et al. The research on edge detection algorithm of lane
CN112825192A (en) Object identification system and method based on machine learning
CN104915642A (en) Method and apparatus for measurement of distance to vehicle ahead
CN111860316A (en) Driving behavior recognition method and device and storage medium
CN107220632B (en) Road surface image segmentation method based on normal characteristic
CN117095368A (en) Traffic small target detection method based on YOLOV5 fusion multi-target feature enhanced network and attention mechanism
CN114359876A (en) Vehicle target identification method and storage medium
CN111126338B (en) Intelligent vehicle environment perception method integrating visual attention mechanism
CN117111055A (en) Vehicle state sensing method based on thunder fusion
CN114973199A (en) Rail transit train obstacle detection method based on convolutional neural network
CN106650814B (en) Outdoor road self-adaptive classifier generation method based on vehicle-mounted monocular vision
CN112509321A (en) Unmanned aerial vehicle-based driving control method and system for urban complex traffic situation and readable storage medium
CN113989495B (en) Pedestrian calling behavior recognition method based on vision
CN111062311B (en) Pedestrian gesture recognition and interaction method based on depth-level separable convolution network
CN114429621A (en) UFSA algorithm-based improved lane line intelligent detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant