CN112381829A

CN112381829A - Autonomous learning navigation method based on visual attention mechanism

Info

Publication number: CN112381829A
Application number: CN202011266136.9A
Authority: CN
Inventors: 罗大鹏; 郭鹏; 杜国庆; 徐慧敏; 何松泽; 牟泉政; 魏龙生; 高常鑫; 王勇
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2021-02-19

Abstract

The invention provides an autonomous learning navigation method based on a visual attention mechanism, which uses a developing neural network as a core algorithm, simultaneously adds the visual attention mechanism inspired by a human visual system, receives the input of an image sensor, only needs to receive navigation guide information at the previous two moments, and can continuously output correct navigation information.

Description

Autonomous learning navigation method based on visual attention mechanism

Technical Field

The invention relates to the technical field of computer vision and information, in particular to an autonomous learning navigation method based on a visual attention mechanism.

Background

Vision-assisted navigation is one of the research hotspots in the field of intelligent vehicle navigation. The system generates reference information of driving behaviors based on acquisition and analysis of image information of scenes around the vehicle, so that dangerous driving factors in the driving process are eliminated. Vision-based navigation assistance systems have shown powerful performance in specific applications of intelligent driving such as Lane Departure Warning (LDW), Forward Collision Warning (FCW), Lane Keeping Assistance (LKA), and panoramic parking (SVP). Compared with traditional multi-mode navigation technologies such as ultrasonic radar, laser radar and millimeter wave radar, data collected by the image sensor can be greatly compressed through a thinning means, the demand on computing resources of the vehicle-mounted computer is low, and the method is more economical and efficient.

In a traditional working mode, the visual navigation technology generally adopts a deep convolutional network to perform semantic segmentation on an acquired image, lane and non-lane pixels are separated, and then the driving process of a vehicle is corrected through a control algorithm. However, a large amount of labeled data is needed in the training process, and the collected data is difficult to cover all driving environments, so that the generalization capability of the trained model is poor, and the model without self-learning capability influences the performance of the system in strange environments. Meanwhile, due to the redundancy of background information in the image, the noise and interference brought by the background information can also greatly reduce the training speed and robustness of the model. The invention provides an autonomous learning navigation method based on a visual attention mechanism, which is used for improving the anti-interference capability of a visual navigation system to background noise and the generalization capability to an unfamiliar environment.

In order to solve the defects of insufficient generalization ability to an unknown environment and insufficient anti-interference ability to background noise of the traditional method, the invention provides the autonomous learning navigation method based on the visual attention mechanism, and the system can continuously and autonomously learn under the condition of only needing the guide information of the first two moments. In addition, by adding a visual attention mechanism, the model has the capability of paying attention to key areas of the image, the defects of sensitivity to noise in a complex background picture, low learning efficiency and poor learning effect of the traditional method are effectively overcome, and the performance of visual navigation is greatly improved.

Disclosure of Invention

In view of the above, the present invention provides an autonomous learning navigation method based on a visual attention mechanism, which includes the following steps:

s1, acquiring front-end input and rear-end input of the visual navigation model, wherein the front-end input information is continuously input by the image sensor, the rear-end input information is input from the outside at the first two moments, and the subsequent moments are input by the model output at the previous moments;

s2, the front end input is processed by an attention mechanism, the image of the key area is reserved, and the images of the rest areas are suppressed;

s3, calculating a front-end input processed by an attention mechanism and a bottom-up weight inner product to obtain a bottom-up partial pre-response, calculating a back-end input and a top-down weight inner product to obtain a top-down partial pre-response, superposing the two partial pre-responses to obtain a total pre-response, and competing the pre-responses to obtain a Y regional response;

s4, calculating the bottom-up weight inner product of the Y area response and the Z area to obtain Z area response, and mapping the Z area response to an effect space to obtain final navigation output;

and S5, the visual navigation model is automatically learned and updated, and the next round of step cycle is started until the front-end input is no longer received, and the cycle is terminated.

The technical scheme provided by the invention has the beneficial effects that: according to the method, only the guide information of the first two moments is used as the supervision information of model training, and the developing neural network is used as a core processing algorithm, so that the model has the autonomous learning capability; a visual attention mechanism is added to the model, and top-down attention information is provided as supervision information, so that the robustness of the model to complex background interference information is improved.

Drawings

FIG. 1 is a flow chart of an autonomous learnable navigation method based on visual attention mechanism of the present invention;

FIG. 2 is a timing diagram of an autonomous learnable navigation method based on visual attention mechanism according to the present invention;

FIG. 3 is a schematic diagram of the core navigation algorithm model-developmental neural network of the present invention;

FIG. 4 is a schematic view of the visual attention mechanism of the present invention;

FIG. 5 is a schematic diagram of the visual attention generation mechanism of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, the present invention provides an autonomous learning navigation method based on a visual attention mechanism, referring to fig. 2, in time sequence, the method includes the following steps:

s1, at the first time t1 when the navigation method starts to operate, executing the following steps;

s11, acquiring image from image sensor, preprocessing the image into single-channel grey-scale map with 38X 38 pixels, expanding the grey-scale map into one-dimensional data with 1X 1444, normalizing, inputting the one-dimensional data into front end of core algorithm model, namely X region, and recording as X region_r；

S12, inputting the guiding information to the rear end of the core algorithm model, namely the Z area, wherein the format of the guiding information is shown as follows:

tagged item	Data format	Description of the invention	Example of a tag
				z1
	1*6	Navigation action guidance	[0,1,0,0,0,0]
				z2	1*4	GPS guidance	[0,1,0,0]
z3	1*144	Attention position	[1,0,0,...,0,0]
				z4	1*8	Obstacle object	[1,0,0,0,0,0,0,0]
z5	1*2	Dimension information (Global, local)	[1,0]

Wherein navigation action guidance is necessary at the first two moments, the attentional position is necessary at the attentional generation phase, and the rest of the information is unnecessary.

S13, inputting X to the front end based on different receptive fields of Y regional neurons_rPerforming attention area masking operation, wherein the size of an attention area is only 15 pixels by 15 pixels, the attention area slides in a picture, picture information in the attention area is reserved, and information outside the attention area is suppressed;

and S14, initializing a core algorithm model.

S2, at a second time t2, executing the following steps:

s21, executing the steps S11-S13 to obtain front-end and rear-end input information of the second moment;

and S22, calculating the response of the second time by using the two-end input of the S21.

And S23, self-learning and updating the model by using the response obtained in S22.

S3, at a third time t3, executing the following steps:

s31, executing the step S11 to obtain the front end input of a third time t 3;

and S32, obtaining the back end input of the third time by mapping to an effect space by using the response obtained at the time of S22, and simultaneously outputting the navigation output of the second time t 2.

And S33, calculating the response of the third moment by using the input obtained in S21.

And S34, self-learning and updating the model by using the response obtained in S33.

And S4, repeating the step S3 at the subsequent time of the operation of the method, and carrying out self-learning updating on the model while obtaining the navigation output.

For the model response of the method, and the self-learning update, please refer to fig. 3:

the core algorithm model of the method is a developmental neural network, and the developmental neural network is a bionic, shallow and self-organizing network model. It is inspired by the hebran theory in neuroscience, i.e., the principle of synaptic plasticity. The developmental neural network has three areas of X, Y and Z, wherein the X area is an accepting area and is used for acquiring input excitation from an external environment; the Y area is a hidden layer and is used for learning knowledge and rules; the Z area is an effect area, and can output an effect to the outside, and besides, supervisory information can be input from the Z area to the Y area, and the Z area can also be used as an input area at this time. The X area is in one-way full connection with the Y area, and the Y area is in two-way full connection with the Z area. The learning process is as follows:

s1, initializing a model, wherein the model comprises an initialization response, a weight, an attention mask and neuron activation information;

s2, calculating a response value of the Y area;

s21, calculating the bottom-up response r of the Y area_b。

Performing inner product on the preprocessed front-end input and the bottom-up weight;

wherein r is_bFor the bottom-up response of the Y region, w_bAre the weights from the bottom up and are,

is an inner product operation, x_rIs an input image;

s22, calculating the top-down response r of the Y area_t；

Wherein r is_tFor top-down response of the Y region, z_rFor back-end input, i.e. supervisory information, w_tAre bottom-up weights;

s23, calculating the pre-response r of the Y area_p

r_p＝k*r_t+(1-k)r_b

Where k is the impact factor of the top-down response and (1-k) is the impact factor of the bottom-up response, the sum of which is 1.

S24, Top-k competition mechanism

In order to simulate the neuron side inhibition effect and reduce the neuron renewal rate, a Top-k competition mechanism is adopted to ensure that r is_pMaximum KThe neuron is an activated neuron, the response of the activated neuron is set to be 1, and the responses of the rest neurons are set to be 0;

r_y(argmax(r_p))←1

wherein r is_yResponding to the competitive Y area;

s25 weight update of activated neuron

Wherein, V_jA weight vector for activating neuron j for the Y region, comprising (w)_t，w_b)，w_bIs a bottom-up weight, w_tAre bottom-up weights; g_jThe age of the jth neuron, the greater the number of activations, the greater the age,

for input vectors, comprising front-end and back-end inputs, ω₁And ω₂Learning factor, ω, for controlling the rate of synaptic weight update of neurons₂/ω₁The larger, V_jWill reflect more of the learning of new knowledge, ω₁And ω₂Derived from the forgetting averaging algorithm:

ω₁(g_i)＝1-ω₂(g_i)

in the formula, u (g)_i) The forgetting equation is g at the activation age of the ith neuron_iThe value of time, forgetting equation u (g) is defined as follows:

wherein g is₁,g₂To forget the age threshold, a typical setting is g₁＝20,g₂The typical setting values are c 2 and λ 2000, where c is 200 and λ is a hyper-parameter that controls the learning speed. After each activation, neurons are activated to provoke age renewal, g_i←g_i+1。

S3, calculating the bottom-up weight inner product of the Y area response and the Z area to obtain Z area response, and mapping the Z area response to an effect space to obtain final navigation output, wherein the method comprises the following steps;

s31, transmitting the Y area response to the Z area, and calculating the response of the Z area:

wherein Z is_iResponse of Z region for ith effect space, r_yTotal response after Top-k competition for Y region, W_zbIs the bottom-up weight of the Z region;

s32, selecting an effector in a Z area, wherein the effector in the Z area corresponds to navigation actions and comprises 6 action states of forward movement, left turning, right turning, slight left turning, slight right turning and stopping;

argmax () is a function of the position where the maximum of the response is found, e equals 1, indicating that the navigation output is forward; e is equal to 2, indicating that the navigation output is a left turn; e equals 3, indicating that the navigation output is a right turn; e equals 4, indicating a slight left turn of the navigation output, e equals 5, indicating a slight right turn of the navigation output; e equals 6, indicating that the navigation output is stopped.

For the attention mechanism described in this method, please refer to fig. 4, the Y-region neurons are only connected to the X-region neurons in their receptive field, each Y-region active neuron (i.e. the neuron activated by the Top-K competition) possesses different attention receptive fields, and the attention generation mechanism is shown in fig. 5, when the key region of the input image is the receptive field of the ith Y-region neuron, the bottom-up response r of the ith neuron is_bAt the maximum, it will have a greater potential to win out in the Top-K competition mechanism, enabling turnover to be activated, thus enhancing the attachment of the neuron to its attention receptor field, Top-1 in this example.

Referring to fig. 5, in the development and growth process of the model, if the maximum response of the Y region is smaller than the set threshold after the image is input, it indicates that the model is insensitive to the type information of the input image, including effect information, attention information, guidance information, etc., and the model does not learn to notice the type of input, and at this time, a Y region neuron is added, and its attention receptive field is set as the key region of the type. After receiving the same type of income multiple times, the connection between the new Y regional neurons and the X regional neurons in the receptive field thereof is strengthened. If the maximum response of the Y area to the input image is larger than the threshold value, the model at the moment is proved to learn the semantic expression. After training, the model will learn to pay attention to the key areas of all types of pictures, i.e. the model gets an attention mechanism.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An autonomous learning navigation method based on a visual attention mechanism is characterized by comprising the following steps:

2. The method according to claim 1, wherein the model-derived input in S1 is specifically:

obtaining the front-end input means obtaining a navigation environment image from an image sensor, preprocessing the image into a single-channel gray-scale map with 38X 38 pixels, expanding the gray-scale map into one-dimensional data with 1X 1444, normalizing the data, and inputting the one-dimensional data into the front end of a developing neural network, namely an X area, which is marked as X_r(ii) a Inputting an image X_rNormalization treatment:

x_r←normalization(x_r)

wherein x is_rFor an input image, normalization () is a normalization function;

the back-end input is to input guide information to the back end of the developmental neural network, namely the Z region, wherein the guide information comprises: navigation action guidance, GPS guidance, attention location, obstacle objects, and scale information.

3. The method according to claim 1, wherein the attention mechanism in S2, i.e. masking the input image and the receptive field, is as follows:

x_r←x_r e Mask_b

wherein x is_rFor input image, for dot product operation, Mask_bFor bottom-up attention masking, the front-end input is processed through an attention mechanism, the image of the key region is retained, and the images of the remaining regions are suppressed.

4. The method according to claim 1, wherein the Y-region response obtained in S3 is as follows:

s31, performing inner product of the preprocessed front-end input and the bottom-up weight;

is an inner product operation, x_rIs an input image;

s32, calculating the top-down response r of the Y area_t；

s33 response pre-screening

To eliminate interference of random noise, the response r is corrected_bAnd r_tPerforming pre-screening operation, and enabling the response value smaller than the threshold cutValue to return to zero;

r_t(r_t<cutValue)←0 r_b(r_b<cutValue)←0

s34, calculating the total pre-response r of the Y area_p

r_p＝k*r_t+(1-k)r_b

Wherein k is the impact factor of the top-down response, and (1-k) is the impact factor of the bottom-up response, the sum of which is 1;

s35, Top-k competition mechanism

Using a Top-k competition mechanism, r_pThe largest neuron is an activated neuron, the responses of the K neurons with the largest responses are set to be 1, and the responses of the rest neurons are set to be 0;

r_y(argmax(r_p))←1

where argmax () is a function of the position at which the maximum of the response is obtained, r_yIs responded by the contended Y region.

5. The method according to claim 1, wherein the navigation output obtained in S4 is as follows:

s41, transmitting the Y area response to the Z area, and calculating the response of the Z area:

s42, selecting an effector in a Z area, wherein the effector in the Z area corresponds to navigation actions and comprises 6 action states of forward movement, left turning, right turning, slight left turning, slight right turning and stopping;

argmax () is a function of the position at which the maximum value of the response is obtained, m is the number of effectors, e is equal to 1, representing that the navigation output is forward; e is equal to 2, indicating that the navigation output is a left turn; e equals 3, indicating that the navigation output is a right turn; e equals 4, indicating a slight left turn of the navigation output, e equals 5, indicating a slight right turn of the navigation output; e equals 6, indicating that the navigation output is stopped.

6. The method according to claim 1, wherein the model in S5 is updated by autonomous learning, and the updating process is as follows:

ω₁(g_i)＝1-ω₂(g_i)

activation of neurons to provoke age renewal, g_i←g_i+1, wherein g₁And g₂Is a forgetting age threshold; c and λ are hyper-parameters that control the learning speed.