CN114463420A - Visual mileage calculation method based on attention convolution neural network - Google Patents
Visual mileage calculation method based on attention convolution neural network Download PDFInfo
- Publication number
- CN114463420A CN114463420A CN202210113074.0A CN202210113074A CN114463420A CN 114463420 A CN114463420 A CN 114463420A CN 202210113074 A CN202210113074 A CN 202210113074A CN 114463420 A CN114463420 A CN 114463420A
- Authority
- CN
- China
- Prior art keywords
- gru
- layer
- attention
- input
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 13
- 238000004364 calculation method Methods 0.000 title claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 18
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 230000004913 activation Effects 0.000 claims description 7
- 238000006073 displacement reaction Methods 0.000 claims description 7
- 238000011423 initialization method Methods 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012956 testing procedure Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 239000011159 matrix material Substances 0.000 claims 1
- 238000002474 experimental method Methods 0.000 description 9
- 238000013135 deep learning Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 125000004122 cyclic group Chemical group 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000009616 inductively coupled plasma Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000288105 Grus Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a visual mileage calculation method based on an attention convolution neural network. Aiming at the problems that the traditional visual odometer requires that the picture contains a large amount of texture information, the solving process is complex, and the visual odometer based on the convolutional neural network has low precision, the visual odometer based on the attention convolutional neural network and the gate control cycle unit is provided. And the attention mechanism is utilized to improve the accuracy of feature extraction of the convolution module, so that the accuracy of visual positioning is improved. Compared with the conventional visual mileage calculation method, the method has the advantages that the precision is ensured, and meanwhile, a complex solving process is abandoned, so that the method is more suitable for practical engineering application.
Description
Technical Field
The invention relates to the field of deep learning and a Visual positioning technology, and provides a Visual odometer based on an attention convolution Neural network and a Gated Recurrent Unit (GRU) aiming at the problems that a traditional Visual Odometer (VO) requires a picture to contain a large amount of texture information, the solving process is complex, and the precision of the Visual odometer based on the Convolution Neural Network (CNN) is low. And the attention mechanism is utilized to improve the accuracy of the feature extraction of the convolution module, thereby improving the accuracy of the visual positioning.
Background
For autonomous navigation of intelligent vehicles, the self-positioning capability of the vehicle during motion is very important. Early vehicle positioning systems typically employed wheel speed encoders to calculate vehicle range, however, this method had significant cumulative errors. With the development of computer vision technology, vision sensors are increasingly used for vehicle positioning and motion estimation. The visual sensor not only can provide abundant perception information, but also has the characteristics of low cost, small size and the like. We generally refer to the problem of visually acquiring camera pose as visual odometer. A related study of visual odometry began in the 80's of the 20 th century. In recent years, with the continuous and deep research on visual odometers by scholars at home and abroad, VOs are gradually applied to various fields such as robots, automatic driving, pedestrian navigation and the like.
The traditional visual mileage calculation method mainly comprises the steps of image feature extraction, feature matching, pose estimation and the like, and finally obtains the camera pose with six degrees of freedom, namely displacement (x, y, z) and rotation (roll, pitch, yaw). The traditional visual odometry method is mainly divided into two types: direct methods and feature point methods. The direct method calculates camera motion mainly based on gray scale information of pixels in an image, and typical algorithms include SVO, LSD-SLAM, and the like. The feature point method is considered as the mainstream method of the visual odometer, firstly, feature extraction and matching are carried out on images, then, the motion of a camera is estimated according to the matched feature points, and the methods such as ICP (inductively coupled plasma), epipolar geometry and PNP (plug-and-play) can be adopted according to different types of the camera. Algorithms such as the famous SIFT and ORB estimate the pose of the camera by extracting feature points. However, the method is time-consuming and large in calculation amount when a large number of extracted feature points are available, and when a small number of extracted feature points are available, the feature is lost and the camera motion cannot be recovered, and the feature point method still faces a great challenge in scenes with few textures or dynamic target motion. The direct method can avoid feature loss and feature calculation time, but is often used in a scene with alternating light and shade.
With the continuous development of deep learning, the method is gradually applied to various fields, such as fault diagnosis, machine translation, image classification and the like. In recent years, this technique is also used in visual odometers. In 2015, Kishore and the like firstly utilize a convolutional neural network to study a visual odometer, and designs two different convolutional networks for learning the speed and rotation of movement respectively. In the same year, Kendall et al propose a PoseNet model, estimate the position and attitude of a camera by inputting a single picture, and realize end-to-end camera attitude estimation by using a convolutional neural network for the first time. In 2017, Wang et al proposed a deep vo model, and a Recurrent Neural Network (RNN) was added on the basis of the convolutional Neural Network to maintain the timing connectivity between images. The model adopts adjacent picture sequences in KITTI data set as input, firstly extracts image features through a convolutional neural network module, then inputs the image features into a cyclic neural network to learn geometric association of images, and finally outputs camera gestures.
The existing VO estimation method based on deep learning still cannot compare the accuracy with the traditional method. Although the latest research greatly improves the pose accuracy under the condition of introducing optical flow, the method is difficult to be widely applied. At present, most of existing visual odometers based on deep learning are based on a single convolutional neural network or the combination of the convolutional neural network and a cyclic neural network, and the accuracy of the estimated trajectory is still low, so that a great research space is provided in the aspects of model construction and the like.
Disclosure of Invention
In view of the above problems, we propose an acgr (attention restriction and Gate recovery unit) model, as shown in fig. 1, which incorporates an attention mechanism based on CNN and RNN, and uses the attention mechanism to improve the accuracy of feature extraction, thereby improving the accuracy of estimated trajectory.
The method comprises the following specific steps:
1. feature extraction based on the Attention recommendation Model.
The model introduces a spatial and channel attention mechanism in the CNN at the same time. The structure is shown in fig. 2.
The ACGR designs a convolution network module according to the implementation principle of an optical flow method in the traditional VO method, and utilizes the convolution network module to extract the characteristics of the picture, thereby calculating the motion characteristics of the picture and expressing the motion characteristics in a vector mode.
2. GRU based timing modeling.
The GRU in the model has two layers, and each layer contains 1024 hidden units. In order to maintain the original data distribution, the activation function of the original GRU is changed into a ReLU function.
3. Full-connection layer dimension reduction output pose
Two full connection layers are added in the model, and 128 hidden units and 6 hidden units are respectively contained in the model.
Compared with the prior art, the invention has the following beneficial effects:
firstly, the visual odometer based on deep learning is realized, the method abandons the complicated steps of the traditional visual odometer method and realizes an end-to-end positioning mode; secondly, a gate control cycle unit is added in the traditional visual odometry method based on the convolutional neural network and is used for learning the time sequence relevance in the image data; thirdly, an attention mechanism is integrated into the feature extraction module, so that the network can learn more important geometric features in pictures more intelligently, and the feature extraction capability of the convolutional neural network is enhanced.
Drawings
FIG. 1 is a system framework diagram of a method in accordance with the present invention;
FIG. 2 is a view of the attention mechanism;
FIG. 3(a) is a trace plot of different algorithms in sequence 03;
FIG. 3(b) is a trace plot of the different algorithms in sequence 04;
FIG. 3(c) is a trace plot of the different algorithms in sequence 09;
FIG. 3(d) is a trace plot of different algorithms in sequence 10;
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention utilizes an attention mechanism, a convolutional neural network and a gate control cycle unit to realize the visual odometer technology.
The method specifically comprises the following steps:
1. feature extraction based on Attention convention Model
(1) Convolutional neural network
The ACGR designs a convolution network module according to the implementation principle of an optical flow method in the traditional VO method, and utilizes the convolution network module to extract the characteristics of the picture, so that the motion characteristics of the picture are calculated and expressed in a vector mode.
The parameters of CNN are shown in table 1, and include 9 Convolutional layers (Convolutional layers) in total, and add Batch normalization (Batch normalization) after each Conv to maintain the data distribution before and after Convolutional transformation, speed up the model training speed, and avoid gradient explosion in back propagation. In the CNN, the size of a convolution kernel in the first layer is 7 multiplied by 7, the sizes of the second layer and the third layer are 5 multiplied by 5, the sizes of the last six layers are 3 multiplied by 3, and the ReLU is selected as an activation function. The input of the module is pictures of two adjacent frames in a KITTI data set, and in order to keep the geometric characteristics of original data, the sizes of the pictures are modified and unified into 1280 x 384. The tensor size of the picture after 9 layers of convolution feature extraction is 10 multiplied by 3 multiplied by 512.
TABLE 1 convolutional layer parameter List
Tab.1Convolutional layer parameter list
(2) Attention mechanism
The attention mechanism is embedded into the convolutional network, so that the network can learn more important geometric features in pictures more intelligently, and the feature extraction capability of the convolutional neural network is enhanced. The attention module calculation formula is as follows:
Mc(F)=σ(MLP(AP(F))+MLP(MP(F))) (1)
wherein the MLP represents a fully-connected layer,is specially designed forThe feature map is a map of the feature,for one-dimensional channel attention features, AP, MP represent average pooling and maximum pooling respectively,is a two-dimensional spatial attention feature. σ is sigmoid function, f7*7Representing a convolution operation with a filter size of 7 x 7.
2. GRU-based time sequence modeling
The picture data for the visual odometer have strong time sequence correlation, so the correlation can be learned by utilizing a recurrent neural network. The GRU is simpler in structure and less in parameters, so that training time can be effectively saved by using the GRU.
The current time is represented as t, and the input is xtThe hidden state at the previous moment is denoted as ht-1The outputs of the GRUs and the state update equation are as follows.
Wherein z istRepresenting the update gate in GRU, σ is sigmoid activation function for controlling the output value between 0 and 1, rtRepresenting a reset gate in the GRU, and a tanh activation function for controlling the output value between-1 and 1]Indicates that the two vectors are connected and indicates the product of the matrices. W, Wz、WrAnd representing the weight, adopting a random initialization mode, and continuously updating in the training process.
The GRU in the model has two layers, and each layer contains 1024 hidden units. The input picture sequence is subjected to feature extraction by an Attention constraint Model to obtain a tensor of 10 × 3 × 512, and then is input into the first layer GRU, and the layer output is input into the second layer GRU, which is the output of the entire GRU module.
3. Full connection layer
The model has two full connection layers which respectively comprise 128 and 6 hidden units. And the full connection layer reduces the dimension of the high-dimensional features output by the GRU module, and the finally output six-dimensional tensor is the relative posture between the current moment and the picture at the previous moment.
4. Loss function
The pose estimation problem in the visual odometer is expressed as a conditional probability problem. As shown in equation 4, for a given (n +1) pictures:
X=(X1,X2,...,Xn+1) (4)
the pose between adjacent pictures can be obtained through calculation:
Y=(Y1,Y2,...,Yn) (5)
considering the above problem as a conditional probability problem, the calculation formula is:
p(Y|X)=p(Y1,Y2,...,Yn∣X1,X2,...,Xn+1) (6)
solving for the optimal network parameter w*Maximizing the probability in equation (6):
the Mean Squared Error (MSE) is used as a loss function, as shown in equation 8:
P1idenotes the i-th shift of group Truth to the sequential input, Φ1iThe rotation angle of group Truth of the ith pair of sequential inputs is shown.Indicating the displacement of the ith pair of sample positive sequence inputs,represents the ith pair of samplesThe corner of the positive sequence input. P2iIndicates the displacement of group Truth of the ith pair of reverse order inputs, phi2iThe rotation angle of group Truth of the ith pair of reverse order inputs is shown.Indicating the displacement of the ith pair of sample input in reverse order,indicating the rotation angle of the ith pair of sample reverse order inputs. M represents the number of samples, β1And beta2Scale factors representing the positive and negative input errors, respectively.
5. Experimental procedures and results
(1) Introduction to data set
The ACGR model completes the experiment using the common data set, KITTI data set. The KITTI Visual Odometry is a large-scale open source data set and is widely applied to evaluating various Visual Odometry models. Since only the first 11 scenes of the dataset provide ground truth, the first 11 sequences of data were chosen for the experiments. In order to meet the requirements of the network on input data while preserving the geometric features of the image, all picture sizes are uniformly adjusted to 1280 × 384. In addition, the sequences 03, 04, 09, and 10 have the least number of pictures, and the larger the number of pictures in the training set, the better the model training result, so the sequences 03, 04, 09, and 10 are selected as the test set. Sequences 00, 01, 02, 05, 06, 07 and 08 contain a large number of pictures, and therefore the sequences are selected for training the model. Then randomly selecting data in the 7 training set sequences for verification, wherein the number of pictures in the verification set accounts for about one third of the number of the training sets. The specific division is shown in table 2.
TABLE 2 training set, validation set, and test set
Tab.2training set,setverification set and test set
(2) Model training
The model is built on a deep learning frame PyTorch, and the model of the display card is Nvidia Geforce RTX2080 ti. The optimizer selects adam (adaptive motion estimation), and the optimization algorithm selects Batch Gradient Decline (BGD).
And in the training process, the weights of all parameters in the network adopt a xavier initialization method, and the deviation adopts a zero initialization method. The number of iterations was set to 100 and the initial learning rate was set to 0.01. And save the model parameters of item 100 for subsequent testing procedures.
(3) Experimental results and error analysis
In the experiment, the ACGR is trained by using 7 sequences in a training set, then the model test is carried out by using the 4 sequences, and the performance of the ACGR is evaluated according to the test result. The estimated trajectories of the test set sequences 03, 04, 09, and 10 are shown in fig. 3.
In order to verify the effect of the model estimated trajectory, four methods were compared in the experiment, including ORBSLAM [22], SFMLearner algorithm based on convolutional neural network, CAGR-VO (constraint and Gate recovery Unit Visual overview) algorithm based on convolutional, cyclic neural network, and ACGR-VO algorithm. Wherein ORBSLAM is a classical visual SLAM system, and except the visual odometer, the ORBSLAM also comprises a loop detection module, and for the fairness of the comparison experiment, no loop detection is added in the ORBSLAM adopted in the experiment. As can be seen from fig. 3, the four algorithms can approximate the shape of the trajectory. The ORBSLAM based on the characteristic points is developed more mature, and the ORBSLAM selected in the experiment is not added with loop detection, so that the fitting effect of the estimated track and the true track in the 09 sequence is poor, but the fitting effect of the sequences 03, 04 and 10 is good, the track prediction precision is high, the research of the visual odometer based on deep learning is still in the development stage, and a large research space is provided. Although the other three visual odometer model-based estimations in the experiment have slightly less effect than the ORB algorithm, the fitting effect of the ACGR-VO model is better than that of the SFMLearner model based on the convolution neural network and the CAGR-VO model based on the convolution and cyclic neural network. Therefore, the accuracy of the ACGR-VO model is improved compared with that of other models based on the convolutional neural network, and the accuracy of the predicted track of the model can be effectively improved by adding an attention mechanism.
Claims (1)
1. A visual mileage calculation method based on an attention convolution neural network is characterized by comprising the following steps:
1) feature extraction based on Attention convention Model
(1) Convolutional neural network
The parameters of CNN are shown in table 1, containing a total of 9 convolutional layers, with batch normalization added after each Conv; in the CNN, the size of a convolution kernel at the first layer is 7 multiplied by 7, the sizes of the convolution kernel at the second layer and the convolution kernel at the third layer are 5 multiplied by 5, the sizes of the convolution kernel at the last six layers are 3 multiplied by 3, and a ReLU is selected as an activation function; the input of the module is pictures of two adjacent frames in a KITTI data set, and in order to keep the geometric characteristics of original data, the sizes of the pictures are modified and unified into 1280 multiplied by 384; the tensor size of the picture after 9 layers of convolution feature extraction is 10 multiplied by 3 multiplied by 512;
TABLE 1 convolutional layer parameter List
(2) Attention mechanism
Embedding the attention mechanism into the convolutional network, the attention module calculates the formula as follows:
Mc(F)=σ(MLP(AP(F))+MLP(MP(F))) (1)
wherein the MLP represents a fully-connected layer,is a characteristic map, and the characteristic map is a characteristic map,for one-dimensional channel attention characteristics, AP and MP are dividedRespectively mean pooling and maximum pooling,is a two-dimensional spatial attention feature; σ is sigmoid function, f7*7Convolution operation representing a filter size of 7 × 7;
2) timing modeling based on GRU
The current time is represented as t, and the input is xtThe hidden state at the previous moment is denoted as ht-1If so, the output of the GRU and the state update equation are as follows;
wherein z istRepresenting the update gate in GRU, σ is sigmoid activation function for controlling the output value between 0 and 1, rtRepresenting a reset gate in the GRU, and a tanh activation function for controlling the output value between-1 and 1]Representing the product of two connected vectors and a matrix; w, Wz、WrRepresenting the weight, adopting a random initialization mode, and continuously updating in the training process;
the GRU in the model has two layers, and each layer contains 1024 hidden units; the input picture sequence is subjected to feature extraction through an Attention conversion Model to obtain a tensor with the size of 10 multiplied by 3 multiplied by 512, then the tensor is input into a first layer of GRU, the output of the layer is input into a second layer of GRU, and the output of the second layer of GRU is the output of the whole GRU module;
3) full connection layer
There are two layers of full connection layers, which respectively contain 128 and 6 hidden units; the full connection layer reduces the dimension of the high-dimensional features output by the GRU module, and the finally output six-dimensional tensor is the relative posture between the current moment and the picture at the previous moment;
4) loss function
Expressing a pose estimation problem in the visual odometer as a conditional probability problem; as shown in equation 4, for a given (n +1) pictures:
X=(X1,X2,…,Xn+1) (4)
and calculating to obtain the pose between the adjacent pictures:
Y=(Y1,Y2,...,Yn) (5)
considering the above problem as a conditional probability problem, the calculation formula is:
p(Y|X)=p(Y1,Y2,...,Yn∣X1,X2,...,Xn+1) (6)
solving for the optimal network parameter w*Maximizing the probability in equation (6):
the Mean Squared Error (MSE) is used as a loss function, as shown in equation 8:
P1idenotes the i-th shift of group Truth to the sequential input, Φ1iThe rotation angle of the group Truth of the ith pair of sequential input is shown;indicating the displacement of the ith pair of sample positive sequence inputs,representing the rotation angle of the ith pair of sample positive sequence inputs; p2iIndicates the displacement of group Truth of the ith pair of reverse order inputs, phi2iThe rotation angle of the group Truth input in the ith pair of reverse orders is shown;indicating the displacement of the ith pair of sample input in reverse order,representing the rotation angle of the ith pair of sample reverse order inputs; m represents the number of samples, β1And beta2Scale factors respectively representing positive sequence input errors and negative sequence input errors;
the weights of all parameters in the network during training adopt a xavier initialization method, and the deviation adopts a zero initialization method; the iteration times are set to 100, and the initial learning rate is set to 0.01; and save the model parameters of item 100 for subsequent testing procedures.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210113074.0A CN114463420A (en) | 2022-01-29 | 2022-01-29 | Visual mileage calculation method based on attention convolution neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210113074.0A CN114463420A (en) | 2022-01-29 | 2022-01-29 | Visual mileage calculation method based on attention convolution neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114463420A true CN114463420A (en) | 2022-05-10 |
Family
ID=81410808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210113074.0A Pending CN114463420A (en) | 2022-01-29 | 2022-01-29 | Visual mileage calculation method based on attention convolution neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114463420A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180231985A1 (en) * | 2016-12-22 | 2018-08-16 | TCL Research America Inc. | System and method for vision-based flight self-stabilization by deep gated recurrent q-networks |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739082A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Stereo vision unsupervised depth estimation method based on convolutional neural network |
CN112419411A (en) * | 2020-11-27 | 2021-02-26 | 广东电网有限责任公司肇庆供电局 | Method for realizing visual odometer based on convolutional neural network and optical flow characteristics |
CN112556719A (en) * | 2020-11-27 | 2021-03-26 | 广东电网有限责任公司肇庆供电局 | Visual inertial odometer implementation method based on CNN-EKF |
US20210390723A1 (en) * | 2020-06-15 | 2021-12-16 | Dalian University Of Technology | Monocular unsupervised depth estimation method based on contextual attention mechanism |
CN113888638A (en) * | 2021-10-08 | 2022-01-04 | 南京航空航天大学 | Pedestrian trajectory prediction method based on attention mechanism and through graph neural network |
-
2022
- 2022-01-29 CN CN202210113074.0A patent/CN114463420A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180231985A1 (en) * | 2016-12-22 | 2018-08-16 | TCL Research America Inc. | System and method for vision-based flight self-stabilization by deep gated recurrent q-networks |
CN111369608A (en) * | 2020-05-29 | 2020-07-03 | 南京晓庄学院 | Visual odometer method based on image depth estimation |
CN111739082A (en) * | 2020-06-15 | 2020-10-02 | 大连理工大学 | Stereo vision unsupervised depth estimation method based on convolutional neural network |
US20210390723A1 (en) * | 2020-06-15 | 2021-12-16 | Dalian University Of Technology | Monocular unsupervised depth estimation method based on contextual attention mechanism |
CN112419411A (en) * | 2020-11-27 | 2021-02-26 | 广东电网有限责任公司肇庆供电局 | Method for realizing visual odometer based on convolutional neural network and optical flow characteristics |
CN112556719A (en) * | 2020-11-27 | 2021-03-26 | 广东电网有限责任公司肇庆供电局 | Visual inertial odometer implementation method based on CNN-EKF |
CN113888638A (en) * | 2021-10-08 | 2022-01-04 | 南京航空航天大学 | Pedestrian trajectory prediction method based on attention mechanism and through graph neural network |
Non-Patent Citations (1)
Title |
---|
王嘉楠等: "基于视觉转换器和图卷积网络的光学遥感场景分类", 《光子学报》, vol. 50, no. 11, 30 November 2021 (2021-11-30), pages 314 - 321 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335337B (en) | Method for generating visual odometer of antagonistic network based on end-to-end semi-supervision | |
CN110210551B (en) | Visual target tracking method based on adaptive subject sensitivity | |
CN108416840B (en) | Three-dimensional scene dense reconstruction method based on monocular camera | |
CN108154118B (en) | A kind of target detection system and method based on adaptive combined filter and multistage detection | |
CN108491880B (en) | Object classification and pose estimation method based on neural network | |
CN107369166B (en) | Target tracking method and system based on multi-resolution neural network | |
Wang et al. | Sne-roadseg+: Rethinking depth-normal translation and deep supervision for freespace detection | |
CN104200494B (en) | Real-time visual target tracking method based on light streams | |
CN111462191B (en) | Non-local filter unsupervised optical flow estimation method based on deep learning | |
CN110473284A (en) | A kind of moving object method for reconstructing three-dimensional model based on deep learning | |
CN113436227A (en) | Twin network target tracking method based on inverted residual error | |
CN111376273B (en) | Brain-like inspired robot cognitive map construction method | |
CN111368759B (en) | Monocular vision-based mobile robot semantic map construction system | |
CN113673313B (en) | Gesture recognition method based on hierarchical convolutional neural network | |
CN111160294A (en) | Gait recognition method based on graph convolution network | |
CN111833400B (en) | Camera pose positioning method | |
CN102663351A (en) | Face characteristic point automation calibration method based on conditional appearance model | |
CN111027586A (en) | Target tracking method based on novel response map fusion | |
CN115375737A (en) | Target tracking method and system based on adaptive time and serialized space-time characteristics | |
CN116563682A (en) | Attention scheme and strip convolution semantic line detection method based on depth Hough network | |
CN112686952A (en) | Image optical flow computing system, method and application | |
CN117710429A (en) | Improved lightweight monocular depth estimation method integrating CNN and transducer | |
CN114463420A (en) | Visual mileage calculation method based on attention convolution neural network | |
CN114140524B (en) | Closed loop detection system and method for multi-scale feature fusion | |
CN115830707A (en) | Multi-view human behavior identification method based on hypergraph learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |