CN115984351A

CN115984351A - Unmanned vehicle multi-sensor fusion map building method based on self-attention mechanism learning

Info

Publication number: CN115984351A
Application number: CN202211673929.1A
Authority: CN
Inventors: 蒋方呈; 赵紫旭; 袁尧; 张大霖; 梁浩; 曹连建
Original assignee: Huaian Zhongke Jingshang Intelligent Network Research Institute Co ltd
Current assignee: Huaian Zhongke Jingshang Intelligent Network Research Institute Co ltd
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2023-04-18

Abstract

The invention relates to an unmanned vehicle multi-sensor fusion map building method based on self-attention mechanism learning, which comprises the steps of designing a neural network based on a self-attention mechanism and training the neural network, wherein the neural network design comprises the following steps: (1) converting a sensor coordinate system; (2) acquiring depth information; (3) constructing a data set; (4) constructing a network model to extract sensor data characteristics; the neural network training comprises the following steps (1) and a network training process; and (2) fusion mapping. The invention provides an unmanned vehicle multi-sensor fusion map building method which can concern the influence of external environment and concern the relevance among different sensor data.

Description

Unmanned vehicle multi-sensor fusion map building method based on self-attention mechanism learning

Technical Field

The invention relates to the technical field of unmanned systems, in particular to an unmanned vehicle multi-sensor fusion map building method based on self-attention mechanism learning.

Background

The automatic driving system of the automobile can obviously improve the driving safety, reduce traffic accidents, relieve traffic pressure, improve the productivity and accelerate the intelligent and informatization conversion of the industry in China. The automatic driving system integrates various technologies such as sensors, computers, artificial intelligence, communication and the like, wherein the multi-sensor integration technology is a key technology in the automatic driving system, and information and data from multiple sensors or multiple sources are automatically analyzed and integrated under certain criteria to complete needed decision and estimation. Multi-sensor fusion algorithms can be broadly classified into two categories: random class methods and artificial intelligence methods. The random method mainly comprises the following steps: the artificial intelligence method mainly comprises the following steps: fuzzy logic theory methods and artificial neural network methods.

The existing multi-sensor fusion map building algorithm is greatly influenced by interference in the fusion map building process due to external environmental interference, human factors and the like. Due to the fact that an external environment and vehicle movement are complex, different sensors are often interfered under different conditions, and the effect of fusion map building is poor; in addition, the fusion mapping processing process of the existing multi-sensor fusion algorithm is complex, the requirement on data quality is high, and the fusion mapping of multi-sensor data is not facilitated. The inventor finds that the external environment of the automobile driving is complex, the interference on the data collected by the sensor is large, the relevance of the sensor data is insufficient, the improvement of the relevance of the data and the weakening of the interference of the external environment are important factors to be considered in the process of establishing a multi-sensor data fusion map, and the following problems need to be solved for improving the effect of the relevant data fusion: (1) How to handle the interference of the external environment to the sensor in the image construction process of multi-sensor data fusion, thereby improving the precision of the fusion image construction; (2) How to discover the relevance among different sensors so as to improve the effect of fusing different sensor data to construct a map.

Disclosure of Invention

The invention aims to overcome the technical defects and provides an unmanned vehicle multi-sensor fusion map building method which can concern the influence of external environment and the relevance among different sensor data.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: an unmanned vehicle multi-sensor fusion map building method based on self-attention mechanism learning comprises a neural network design based on a self-attention mechanism and training of the neural network, wherein the neural network design comprises the following steps:

(1) And converting a sensor coordinate system: deploying an RGB-D depth camera, a laser radar, an IMU, an illumination sensor, a humidity sensor, a temperature sensor and the like on an unmanned vehicle in a reasonable pose structure, unifying pose observation information of different sensors from respective coordinate systems to a world coordinate system through coordinate conversion, and unifying the pose information perception of different sensors on the same object in the world coordinate system;

(2) Acquiring depth information: respectively acquiring depth information of the vision system and the laser radar system for measuring the same position of the surrounding environment of the unmanned vehicle;

(3) And constructing a data set: bad data which cause measurement errors due to factors such as angles and focal lengths in the collected data are removed, and three-dimensional data based on sensor information such as an RGB-D depth camera and a laser radar are respectively used for constructing a data set so as to be used by a subsequently constructed neural network;

(4) Constructing a network model to extract sensor data characteristics: constructing a convolutional neural network model based on a self-attention mechanism, comprising the following steps: an input layer, a convolution layer, a pooling layer, a self-attention module, and an output layer;

(5) Constructing an environment information attention module based on a self-attention mechanism: the attention mechanism is a variant of the attention mechanism, the weight assigned to each input item depends on the interaction between the input items, namely the input items are determined by the internal association of the input items to determine which input items should be paid attention to, the dependence on external information is reduced, and the internal correlation of data or characteristics is better captured.

The neural network training comprises the following steps:

(1) And a network training process: the method comprises the steps of initialization, a forward propagation process and a backward propagation process;

(2) Fusing and constructing a map: firstly, acquiring a data fusion result, combining sensor data to be fused into a three-dimensional vector, putting the three-dimensional vector into a model which is trained, acquiring a data fusion result of multiple sensors through calculation, and then processing the data fusion result through a PCL point cloud base to construct a denser and regular depth information map so as to acquire depth information maps with different visual angles.

As an improvement, in the data set construction process in step (3), the information and the real measured value respectively measured by the vision system and the laser radar system are firstly converted into three-dimensional data Y respectively _v 、Y _l 、Y _t ：

Y _i ＝[x _i ,y _i ,d _i ]

Wherein, Y _i For three-dimensional data, x _i ，y _i As coordinate information, d _i Is depth information.

Then, the illumination intensity, humidity and temperature of the current environment are measured and recorded by using an illumination sensor, a humidity sensor and a temperature sensor, and the illumination sensor and the humidity sensor are required to be used for measuring and recording the illumination intensity, the humidity and the temperature of the current environment due to different standards of data collected by different sensorsThe related data measured by the temperature sensor and the temperature sensor are standardized and are serially connected to the three-dimensional data Y as expanded data _v 、Y _l 、Y _t The method comprises the following steps:

Y _i [n]＝contact[x,y,d,lx,t,h]

wherein Y is _i The data elements are spliced together in series through a contact function, wherein n is the number of the data elements in the three-dimensional data, x, y and d are coordinate information, lx is illumination intensity, t is temperature, h is humidity.

As an improvement, the convolutional layer in step (4) extracts feature information in the constructed three-dimensional data through three-dimensional convolution, extracts local features through convolution kernel sliding, and processes the extracted features through an activation function, wherein the activation function adopts a smoother nonlinear Softplus function, so that the nonlinear expression capability is improved:

Softplus(y[n])＝log(1+ _e ^y[n] )

wherein y is extracted characteristic information, [ x ] is an input sequence, g is a convolution kernel, and Softplus improves the nonlinear expression capability for an activation function.

The pooling layer changes the dimension of the feature vector and enlarges the scope of the receptive field, and the dimension of the extracted feature information is converted into the corresponding dimension of the self-attention module; the pooling layer is represented as: compressing the input, setting the size of a pooling area to be n multiplied by n, wherein the pooling standard is average pooling, and the input is m multiplied by m, and then the output is expressed as;

as an improvement, in the step (5), the extracted feature information is put into a self-attention module, the input feature information is connected through a full connection layer W, a data query matrix Q (query), a data keyword matrix K (key), and a data true value V (value) matrix in the self-attention module are created, and Q, K, V are defined as:

defining the distribution of attention:

Q.K ^T Putting the obtained object into a pooling layer, compressing the dimension to be consistent with the dimension of the input characteristic to obtain an attention distribution matrix Q _k (ii) a The attention distribution matrix is put into a softmax function for normalization to obtain an attention score, and an attention distribution matrix V' is obtained; and performing weighted calculation on the characteristics of the multi-sensor measurement data according to the attention distribution matrix V', and finally obtaining the multi-sensor data fusion information of the point.

After the method is adopted, the invention has the following advantages: (1) The sensor data feature extraction network based on the self-attention mechanism converts data into corresponding vectors after different sensors acquire the data, utilizes the neural network to extract corresponding features of each sensor data, can extract deep features, explores the relevance among each sensor data, and reduces blindness and uncertainty caused by manual participation; (2) The self-attention-based environment information attention mechanism adjusts data fusion weight according to the fact that different sensors are interfered by environment and the like when external environment affects sensor data, and improves the effect of fusion map building. In conclusion, the invention solves the problem of limitations of adaptability and data relevance of the existing multi-sensor fusion map building method in the unmanned vehicle driving process, completes the multi-sensor data fusion map building by reasonably deploying the sensors such as a camera, a laser radar and an inertia measurement unit, and the like, can focus on the influence of different environments on the sensors in the fusion map building process so as to adjust the fusion map building strategy, deepen the relevance of different sensor data, adapt to different driving environments and improve the accuracy of the fusion map building.

Drawings

FIG. 1 is a flowchart of the fusion map of the present invention.

Fig. 2 is a schematic flow diagram of the present invention.

FIG. 3 is a schematic diagram of the network architecture of the present invention

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1-3, after preprocessing data of sensors such as a camera, a laser radar, an IMU, and a GPS, the preprocessed data is input to a self-attention mechanism neural network for data fusion. The self-attention mechanism neural network can endow dynamic weights to observation data of different sensors in different environments, and reduce the weight of influence of visual data on automatic driving control under the condition of large illumination influence; under the conditions of high external environment humidity and high temperature, the performance of the laser radar is unstable, so that the influence weight of the measurement data of the laser radar on the fusion map is reduced; under the condition that the pose changes of the vehicle such as turning and turning are severe, the influence weight of the inertia unit is reduced; and finally, the accuracy of the multi-sensor data fusion map building under different environments is ensured.

1. Neural network design based on self-attention mechanism

1.1 sensor coordinate System conversion

Deploying an RGB-D depth camera, a laser radar, an IMU, an illumination sensor, a humidity sensor, a temperature sensor and the like on an unmanned vehicle in a reasonable pose structure, unifying pose observation information of different sensors from respective coordinate systems to a world coordinate system through coordinate conversion, and unifying the pose information perception of different sensors on the same object in the world coordinate system;

1.2 depth information acquisition

Respectively acquiring depth information of the vision system and the laser radar system for measuring the same position of the surrounding environment of the unmanned vehicle; and (3) starting from the position of the sensor along the horizontal direction by using a distance measuring tool, measuring the real horizontal distance from the plane where the position is located to each sensor, and acquiring the real depth information of the position and each sensor.

1.3 construction of data sets

Bad data of measurement errors caused by factors such as angles and focal lengths in the collected data are removed. Respectively converting the information and the real measured value respectively measured by a vision system and a laser radar system into three-dimensional data Y _v 、Y _l 、Y _t ：

Y _i ＝[x _i ,y _i ,d _i ]

Wherein Y is _i For three-dimensional data, x _i ，y _i As coordinate information, d _i Is depth information.

And measuring and recording the illumination intensity, humidity and temperature of the current environment by using an illumination sensor, a humidity sensor and a temperature sensor. Because the data collected by different sensors have different standards, the related data measured by the illumination sensor, the humidity sensor and the temperature sensor needs to be standardized and serially connected to the three-dimensional data Y as the expansion data _v 、Y _l 、Y _t The method comprises the following steps:

Y _i [n]＝contact[x,y,d,lx,t,h]

Three-dimensional data based on sensor information such as an RGB-D depth camera and a laser radar are respectively used for constructing a data set for a subsequently constructed neural network.

1.4 constructing network model to extract sensor data characteristics

Constructing a convolutional neural network model based on a self-attention mechanism, comprising the following steps: an input layer, a convolution layer, a pooling layer, a self-attention module, and an output layer;

the convolutional layer extracts feature information in the constructed three-dimensional data through three-dimensional convolution, extracts local features through convolution kernel sliding, processes the extracted features through an activation function, and the activation function adopts a smoother nonlinear Softplus function, so that the nonlinear expression capacity is improved:

Softplus(y[n])＝log(1+e ^y[n] )

The pooling layer changes the dimension of the characteristic vector and increases the scope of the receptive field, and converts the dimension of the extracted characteristic information into the corresponding dimension of the self-attention module; the pooling layer is represented as: the input is compressed and the pooling zone size is set to n x n, the pooling criterion being average pooling. If the input is m multiplied by m, the output is expressed as;

1.5 construction of an environmental information attention module based on a self-attention mechanism

The attention mechanism assigns a weight to each input item of the model, the weight represents the attention degree of the model to the part, the weight simulates the attention of a human in processing information, the performance of the model is effectively improved, and the calculated amount is reduced to a certain extent. The gravity mechanism is a variant of the attention mechanism, and the weight assigned to each entry depends on the interaction between the entries, i.e. the decision of which entries should be focused is made through the internal association of the entries, so that the dependence on external information is reduced, and the internal correlation of data or characteristics is better captured. Putting the extracted characteristic information into a self-attention module, connecting the input characteristic information through a full-connection layer W, creating a data query matrix Q (query), a data keyword matrix K (key) and a data true value V (value) matrix in a self-attention mechanism, and defining Q, K and V as follows:

defining the distribution of attention:

Q.K ^T Putting the obtained object into a pooling layer, compressing the dimension to be consistent with the dimension of the input characteristic to obtain an attention distribution matrix Q _k (ii) a The attention distribution matrix is placed into a softmax function for normalization to obtain an attention score, and an attention distribution matrix V' is obtained; and performing weighted calculation on the characteristics of the multi-sensor measurement data according to the attention distribution matrix V', and finally obtaining the multi-sensor data fusion information of the point:

2. neural network training based on self-attention mechanism

2.1 network training procedure

Training a constructed neural network, wherein the neural network comprises an initialization process, a forward propagation process and a backward propagation process;

and (3) forward propagation process: in the forward propagation process, the data of the layer and the parameters and bias execution of the layer are subjected to convolution, pooling, dot product and other operations, a linear combination value is calculated, an activation function is applied to the combination value, and finally a processing result of the whole network is obtained; and (3) backward propagation process: calculating the difference between the input data and the true value through a loss function after the input data is transmitted forwards, performing backward transmission through gradient descent, and updating trainable parameters of each layer of the convolutional neural network layer by layer; continuously and iteratively updating network parameters according to the loss function to finally obtain a trained network model; wherein the loss function uses a cross entropy function:

wherein y is _i Being a prediction of some information, y _i ' predicting the probability of correctness for the value;

and ending the training until the loss function is converged or the training times reach the maximum iteration times to obtain the corresponding neural network model.

2.2 fusion map

2.2.1 obtaining data fusion results

Combining sensor data to be fused into a three-dimensional vector, putting the three-dimensional vector into a model which is trained, and obtaining a data fusion result of the multiple sensors through calculation, namely coordinates and depth information [ x, y, d ] of corresponding points; x and y are two-dimensional coordinate information of the corresponding points, and d is depth information. Corresponding data are collected through sensors at different visual angles on the unmanned vehicle, the sensor data obtained at different visual angles are combined into a three-dimensional vector and put into a model completing training, and data fusion results at different visual angles are obtained.

2.2.2 mapping Process

Processing the data fusion result through a PCL (Point Cloud Library) Point Cloud Library to construct a dense and regular depth information map, and obtaining depth information maps with different visual angles; because the depth information images at different visual angles have different positions, registration is needed; overlapping and matching a plurality of depth information maps of different visual angles into a unified coordinate system by taking a common part of a scene as a reference; calculating corresponding rotation matrixes and displacement vectors through camera external parameters, and converting depth information maps under different viewing angles into a unified world coordinate system; after a plurality of depth information maps are unified in the same world coordinate, if the density of points in an overlapping area is high, partial point information is randomly eliminated, and meanwhile, redundant information such as outliers existing at the edges of the areas of the depth information maps is eliminated.

Because sensor data under different visual angles contain certain overlapping parts, a depth information map needs to be analyzed, if a plurality of depth information appear at the same point position, the plurality of depth information of the overlapping point are subjected to average fusion, and the average value of the plurality of depth information is used as new depth information of the point; splicing the local depth information maps to obtain local three-dimensional information, and further obtaining three-dimensional information in the whole area;

after the three-dimensional information is determined, the two-dimensional image and the three-dimensional information acquired by the vision sensor are registered through an SFM (Structure from Motion) technology, coordinate systems of the two-dimensional image and the three-dimensional information are calibrated, corresponding rotation transformation and translation transformation between the coordinate systems are calculated, the two-dimensional image and the three-dimensional information are unified to the same coordinate system through transformation, the same-name points of the two-dimensional image and the three-dimensional map are in one-to-one correspondence, RGB information of the corresponding points is projected onto the three-dimensional map, and drawing construction is completed.

The present invention and its embodiments have been described above, but the description is not limitative, and the actual structure is not limited thereto. It should be understood that those skilled in the art should understand that they can easily make various changes, substitutions and alterations herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The unmanned vehicle multi-sensor fusion map building method based on the self-attention mechanism learning is characterized by comprising the following steps of: the method comprises the following steps of designing a neural network based on an attention-deficit mechanism and training the neural network, wherein the neural network comprises the following steps:

The neural network training comprises the following steps:

(2) Fusing and establishing a picture: firstly, acquiring a data fusion result, combining sensor data to be fused into a three-dimensional vector, putting the three-dimensional vector into a model which is trained, acquiring a data fusion result of multiple sensors through calculation, and then processing the data fusion result through a PCL point cloud base to construct a denser and regular depth information map so as to acquire depth information maps with different visual angles.

2. The unattended vehicle multi-sensor fusion map building method based on self-attention mechanism learning according to claim 1, wherein: in the data set construction process in the step (3), firstly, the information and the real measured value respectively measured by the vision system and the laser radar system are respectively converted into three-dimensional data Y _v 、Y _l 、Y _t ：

Y _i ＝[x _i ,y _i ,d _i ]

Wherein, Y _i For three-dimensional data, x _i ，y _i As coordinate information, d _i Is the depth information.

Then, the illumination intensity, humidity and temperature of the current environment are measured and recorded by using an illumination sensor, a humidity sensor and a temperature sensor, and due to different standards of data collected by different sensors, the relevant data measured by the illumination sensor, the humidity sensor and the temperature sensor needs to be standardized and processed as expansion numbersAccording to a series connection to three-dimensional data Y _v 、Y _l 、Y _t The method comprises the following steps:

Y _i [n]＝contact[x,y,d,lx,t,h]

wherein, Y _i The data elements are spliced together in series through a contact function, wherein n is the number of the data elements in the three-dimensional data, x, y and d are coordinate information, lx is illumination intensity, t is temperature, h is humidity.

3. The unattended vehicle multi-sensor fusion map building method based on self-attention mechanism learning according to claim 1, wherein: the convolutional layer in the step (4) extracts feature information in the constructed three-dimensional data through three-dimensional convolution, extracts local features through convolution kernel sliding, processes the extracted features through an activation function, and the activation function adopts a smooth nonlinear Softplus function to improve nonlinear expression capacity:

Softplus(y[n])＝log(1+e ^y[n] )

The pooling layer changes the dimension of the characteristic vector and increases the scope of the receptive field, and converts the dimension of the extracted characteristic information into the corresponding dimension of the self-attention module; the pooling layer is represented as: compressing the input, setting the size of a pooling area to be n multiplied by n, wherein the pooling standard is average pooling, and the input is m multiplied by m, and then the output is expressed as;

4. the unattended vehicle multi-sensor fusion map building method based on self-attention mechanism learning according to claim 1, wherein: putting the extracted characteristic information into a self-attention module, connecting the input characteristic information through a full-connection layer W, creating a data query matrix Q (query), a data keyword matrix K (key) and a data true value V (value) matrix in a self-attention mechanism, and defining Q, K and V as follows:

defining the distribution of attention:

Q.K ^T Putting the obtained object into a pooling layer, compressing the object to obtain an attention distribution matrix Q with the dimension consistent with the dimension of the input characteristic _k (ii) a The attention distribution matrix is placed into a softmax function for normalization to obtain an attention score, and an attention distribution matrix V' is obtained; and performing weighted calculation on the characteristics of the multi-sensor measurement data according to the attention distribution matrix V', and finally obtaining the multi-sensor data fusion information of the point.