CN114463420A - Visual mileage calculation method based on attention convolution neural network - Google Patents

Visual mileage calculation method based on attention convolution neural network Download PDF

Info

Publication number
CN114463420A
CN114463420A CN202210113074.0A CN202210113074A CN114463420A CN 114463420 A CN114463420 A CN 114463420A CN 202210113074 A CN202210113074 A CN 202210113074A CN 114463420 A CN114463420 A CN 114463420A
Authority
CN
China
Prior art keywords
gru
layer
attention
input
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210113074.0A
Other languages
Chinese (zh)
Inventor
高学金
牟雨曼
任明荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202210113074.0A priority Critical patent/CN114463420A/en
Publication of CN114463420A publication Critical patent/CN114463420A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a visual mileage calculation method based on an attention convolution neural network. Aiming at the problems that the traditional visual odometer requires that the picture contains a large amount of texture information, the solving process is complex, and the visual odometer based on the convolutional neural network has low precision, the visual odometer based on the attention convolutional neural network and the gate control cycle unit is provided. And the attention mechanism is utilized to improve the accuracy of feature extraction of the convolution module, so that the accuracy of visual positioning is improved. Compared with the conventional visual mileage calculation method, the method has the advantages that the precision is ensured, and meanwhile, a complex solving process is abandoned, so that the method is more suitable for practical engineering application.

Description

Visual mileage calculation method based on attention convolution neural network
Technical Field
The invention relates to the field of deep learning and a Visual positioning technology, and provides a Visual odometer based on an attention convolution Neural network and a Gated Recurrent Unit (GRU) aiming at the problems that a traditional Visual Odometer (VO) requires a picture to contain a large amount of texture information, the solving process is complex, and the precision of the Visual odometer based on the Convolution Neural Network (CNN) is low. And the attention mechanism is utilized to improve the accuracy of the feature extraction of the convolution module, thereby improving the accuracy of the visual positioning.
Background
For autonomous navigation of intelligent vehicles, the self-positioning capability of the vehicle during motion is very important. Early vehicle positioning systems typically employed wheel speed encoders to calculate vehicle range, however, this method had significant cumulative errors. With the development of computer vision technology, vision sensors are increasingly used for vehicle positioning and motion estimation. The visual sensor not only can provide abundant perception information, but also has the characteristics of low cost, small size and the like. We generally refer to the problem of visually acquiring camera pose as visual odometer. A related study of visual odometry began in the 80's of the 20 th century. In recent years, with the continuous and deep research on visual odometers by scholars at home and abroad, VOs are gradually applied to various fields such as robots, automatic driving, pedestrian navigation and the like.
The traditional visual mileage calculation method mainly comprises the steps of image feature extraction, feature matching, pose estimation and the like, and finally obtains the camera pose with six degrees of freedom, namely displacement (x, y, z) and rotation (roll, pitch, yaw). The traditional visual odometry method is mainly divided into two types: direct methods and feature point methods. The direct method calculates camera motion mainly based on gray scale information of pixels in an image, and typical algorithms include SVO, LSD-SLAM, and the like. The feature point method is considered as the mainstream method of the visual odometer, firstly, feature extraction and matching are carried out on images, then, the motion of a camera is estimated according to the matched feature points, and the methods such as ICP (inductively coupled plasma), epipolar geometry and PNP (plug-and-play) can be adopted according to different types of the camera. Algorithms such as the famous SIFT and ORB estimate the pose of the camera by extracting feature points. However, the method is time-consuming and large in calculation amount when a large number of extracted feature points are available, and when a small number of extracted feature points are available, the feature is lost and the camera motion cannot be recovered, and the feature point method still faces a great challenge in scenes with few textures or dynamic target motion. The direct method can avoid feature loss and feature calculation time, but is often used in a scene with alternating light and shade.
With the continuous development of deep learning, the method is gradually applied to various fields, such as fault diagnosis, machine translation, image classification and the like. In recent years, this technique is also used in visual odometers. In 2015, Kishore and the like firstly utilize a convolutional neural network to study a visual odometer, and designs two different convolutional networks for learning the speed and rotation of movement respectively. In the same year, Kendall et al propose a PoseNet model, estimate the position and attitude of a camera by inputting a single picture, and realize end-to-end camera attitude estimation by using a convolutional neural network for the first time. In 2017, Wang et al proposed a deep vo model, and a Recurrent Neural Network (RNN) was added on the basis of the convolutional Neural Network to maintain the timing connectivity between images. The model adopts adjacent picture sequences in KITTI data set as input, firstly extracts image features through a convolutional neural network module, then inputs the image features into a cyclic neural network to learn geometric association of images, and finally outputs camera gestures.
The existing VO estimation method based on deep learning still cannot compare the accuracy with the traditional method. Although the latest research greatly improves the pose accuracy under the condition of introducing optical flow, the method is difficult to be widely applied. At present, most of existing visual odometers based on deep learning are based on a single convolutional neural network or the combination of the convolutional neural network and a cyclic neural network, and the accuracy of the estimated trajectory is still low, so that a great research space is provided in the aspects of model construction and the like.
Disclosure of Invention
In view of the above problems, we propose an acgr (attention restriction and Gate recovery unit) model, as shown in fig. 1, which incorporates an attention mechanism based on CNN and RNN, and uses the attention mechanism to improve the accuracy of feature extraction, thereby improving the accuracy of estimated trajectory.
The method comprises the following specific steps:
1. feature extraction based on the Attention recommendation Model.
The model introduces a spatial and channel attention mechanism in the CNN at the same time. The structure is shown in fig. 2.
The ACGR designs a convolution network module according to the implementation principle of an optical flow method in the traditional VO method, and utilizes the convolution network module to extract the characteristics of the picture, thereby calculating the motion characteristics of the picture and expressing the motion characteristics in a vector mode.
2. GRU based timing modeling.
The GRU in the model has two layers, and each layer contains 1024 hidden units. In order to maintain the original data distribution, the activation function of the original GRU is changed into a ReLU function.
3. Full-connection layer dimension reduction output pose
Two full connection layers are added in the model, and 128 hidden units and 6 hidden units are respectively contained in the model.
Compared with the prior art, the invention has the following beneficial effects:
firstly, the visual odometer based on deep learning is realized, the method abandons the complicated steps of the traditional visual odometer method and realizes an end-to-end positioning mode; secondly, a gate control cycle unit is added in the traditional visual odometry method based on the convolutional neural network and is used for learning the time sequence relevance in the image data; thirdly, an attention mechanism is integrated into the feature extraction module, so that the network can learn more important geometric features in pictures more intelligently, and the feature extraction capability of the convolutional neural network is enhanced.
Drawings
FIG. 1 is a system framework diagram of a method in accordance with the present invention;
FIG. 2 is a view of the attention mechanism;
FIG. 3(a) is a trace plot of different algorithms in sequence 03;
FIG. 3(b) is a trace plot of the different algorithms in sequence 04;
FIG. 3(c) is a trace plot of the different algorithms in sequence 09;
FIG. 3(d) is a trace plot of different algorithms in sequence 10;
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The invention utilizes an attention mechanism, a convolutional neural network and a gate control cycle unit to realize the visual odometer technology.
The method specifically comprises the following steps:
1. feature extraction based on Attention convention Model
(1) Convolutional neural network
The ACGR designs a convolution network module according to the implementation principle of an optical flow method in the traditional VO method, and utilizes the convolution network module to extract the characteristics of the picture, so that the motion characteristics of the picture are calculated and expressed in a vector mode.
The parameters of CNN are shown in table 1, and include 9 Convolutional layers (Convolutional layers) in total, and add Batch normalization (Batch normalization) after each Conv to maintain the data distribution before and after Convolutional transformation, speed up the model training speed, and avoid gradient explosion in back propagation. In the CNN, the size of a convolution kernel in the first layer is 7 multiplied by 7, the sizes of the second layer and the third layer are 5 multiplied by 5, the sizes of the last six layers are 3 multiplied by 3, and the ReLU is selected as an activation function. The input of the module is pictures of two adjacent frames in a KITTI data set, and in order to keep the geometric characteristics of original data, the sizes of the pictures are modified and unified into 1280 x 384. The tensor size of the picture after 9 layers of convolution feature extraction is 10 multiplied by 3 multiplied by 512.
TABLE 1 convolutional layer parameter List
Tab.1Convolutional layer parameter list
Figure BDA0003495426010000041
(2) Attention mechanism
The attention mechanism is embedded into the convolutional network, so that the network can learn more important geometric features in pictures more intelligently, and the feature extraction capability of the convolutional neural network is enhanced. The attention module calculation formula is as follows:
Mc(F)=σ(MLP(AP(F))+MLP(MP(F))) (1)
Figure BDA0003495426010000042
wherein the MLP represents a fully-connected layer,
Figure BDA0003495426010000043
is specially designed forThe feature map is a map of the feature,
Figure BDA0003495426010000044
for one-dimensional channel attention features, AP, MP represent average pooling and maximum pooling respectively,
Figure BDA0003495426010000045
is a two-dimensional spatial attention feature. σ is sigmoid function, f7*7Representing a convolution operation with a filter size of 7 x 7.
2. GRU-based time sequence modeling
The picture data for the visual odometer have strong time sequence correlation, so the correlation can be learned by utilizing a recurrent neural network. The GRU is simpler in structure and less in parameters, so that training time can be effectively saved by using the GRU.
The current time is represented as t, and the input is xtThe hidden state at the previous moment is denoted as ht-1The outputs of the GRUs and the state update equation are as follows.
Figure BDA0003495426010000046
Wherein z istRepresenting the update gate in GRU, σ is sigmoid activation function for controlling the output value between 0 and 1, rtRepresenting a reset gate in the GRU, and a tanh activation function for controlling the output value between-1 and 1]Indicates that the two vectors are connected and indicates the product of the matrices. W, Wz、WrAnd representing the weight, adopting a random initialization mode, and continuously updating in the training process.
The GRU in the model has two layers, and each layer contains 1024 hidden units. The input picture sequence is subjected to feature extraction by an Attention constraint Model to obtain a tensor of 10 × 3 × 512, and then is input into the first layer GRU, and the layer output is input into the second layer GRU, which is the output of the entire GRU module.
3. Full connection layer
The model has two full connection layers which respectively comprise 128 and 6 hidden units. And the full connection layer reduces the dimension of the high-dimensional features output by the GRU module, and the finally output six-dimensional tensor is the relative posture between the current moment and the picture at the previous moment.
4. Loss function
The pose estimation problem in the visual odometer is expressed as a conditional probability problem. As shown in equation 4, for a given (n +1) pictures:
X=(X1,X2,...,Xn+1) (4)
the pose between adjacent pictures can be obtained through calculation:
Y=(Y1,Y2,...,Yn) (5)
considering the above problem as a conditional probability problem, the calculation formula is:
p(Y|X)=p(Y1,Y2,...,Yn∣X1,X2,...,Xn+1) (6)
solving for the optimal network parameter w*Maximizing the probability in equation (6):
Figure BDA0003495426010000051
the Mean Squared Error (MSE) is used as a loss function, as shown in equation 8:
Figure BDA0003495426010000052
P1idenotes the i-th shift of group Truth to the sequential input, Φ1iThe rotation angle of group Truth of the ith pair of sequential inputs is shown.
Figure BDA0003495426010000061
Indicating the displacement of the ith pair of sample positive sequence inputs,
Figure BDA0003495426010000062
represents the ith pair of samplesThe corner of the positive sequence input. P2iIndicates the displacement of group Truth of the ith pair of reverse order inputs, phi2iThe rotation angle of group Truth of the ith pair of reverse order inputs is shown.
Figure BDA0003495426010000063
Indicating the displacement of the ith pair of sample input in reverse order,
Figure BDA0003495426010000064
indicating the rotation angle of the ith pair of sample reverse order inputs. M represents the number of samples, β1And beta2Scale factors representing the positive and negative input errors, respectively.
5. Experimental procedures and results
(1) Introduction to data set
The ACGR model completes the experiment using the common data set, KITTI data set. The KITTI Visual Odometry is a large-scale open source data set and is widely applied to evaluating various Visual Odometry models. Since only the first 11 scenes of the dataset provide ground truth, the first 11 sequences of data were chosen for the experiments. In order to meet the requirements of the network on input data while preserving the geometric features of the image, all picture sizes are uniformly adjusted to 1280 × 384. In addition, the sequences 03, 04, 09, and 10 have the least number of pictures, and the larger the number of pictures in the training set, the better the model training result, so the sequences 03, 04, 09, and 10 are selected as the test set. Sequences 00, 01, 02, 05, 06, 07 and 08 contain a large number of pictures, and therefore the sequences are selected for training the model. Then randomly selecting data in the 7 training set sequences for verification, wherein the number of pictures in the verification set accounts for about one third of the number of the training sets. The specific division is shown in table 2.
TABLE 2 training set, validation set, and test set
Tab.2training set,setverification set and test set
Figure BDA0003495426010000065
(2) Model training
The model is built on a deep learning frame PyTorch, and the model of the display card is Nvidia Geforce RTX2080 ti. The optimizer selects adam (adaptive motion estimation), and the optimization algorithm selects Batch Gradient Decline (BGD).
And in the training process, the weights of all parameters in the network adopt a xavier initialization method, and the deviation adopts a zero initialization method. The number of iterations was set to 100 and the initial learning rate was set to 0.01. And save the model parameters of item 100 for subsequent testing procedures.
(3) Experimental results and error analysis
In the experiment, the ACGR is trained by using 7 sequences in a training set, then the model test is carried out by using the 4 sequences, and the performance of the ACGR is evaluated according to the test result. The estimated trajectories of the test set sequences 03, 04, 09, and 10 are shown in fig. 3.
In order to verify the effect of the model estimated trajectory, four methods were compared in the experiment, including ORBSLAM [22], SFMLearner algorithm based on convolutional neural network, CAGR-VO (constraint and Gate recovery Unit Visual overview) algorithm based on convolutional, cyclic neural network, and ACGR-VO algorithm. Wherein ORBSLAM is a classical visual SLAM system, and except the visual odometer, the ORBSLAM also comprises a loop detection module, and for the fairness of the comparison experiment, no loop detection is added in the ORBSLAM adopted in the experiment. As can be seen from fig. 3, the four algorithms can approximate the shape of the trajectory. The ORBSLAM based on the characteristic points is developed more mature, and the ORBSLAM selected in the experiment is not added with loop detection, so that the fitting effect of the estimated track and the true track in the 09 sequence is poor, but the fitting effect of the sequences 03, 04 and 10 is good, the track prediction precision is high, the research of the visual odometer based on deep learning is still in the development stage, and a large research space is provided. Although the other three visual odometer model-based estimations in the experiment have slightly less effect than the ORB algorithm, the fitting effect of the ACGR-VO model is better than that of the SFMLearner model based on the convolution neural network and the CAGR-VO model based on the convolution and cyclic neural network. Therefore, the accuracy of the ACGR-VO model is improved compared with that of other models based on the convolutional neural network, and the accuracy of the predicted track of the model can be effectively improved by adding an attention mechanism.

Claims (1)

1. A visual mileage calculation method based on an attention convolution neural network is characterized by comprising the following steps:
1) feature extraction based on Attention convention Model
(1) Convolutional neural network
The parameters of CNN are shown in table 1, containing a total of 9 convolutional layers, with batch normalization added after each Conv; in the CNN, the size of a convolution kernel at the first layer is 7 multiplied by 7, the sizes of the convolution kernel at the second layer and the convolution kernel at the third layer are 5 multiplied by 5, the sizes of the convolution kernel at the last six layers are 3 multiplied by 3, and a ReLU is selected as an activation function; the input of the module is pictures of two adjacent frames in a KITTI data set, and in order to keep the geometric characteristics of original data, the sizes of the pictures are modified and unified into 1280 multiplied by 384; the tensor size of the picture after 9 layers of convolution feature extraction is 10 multiplied by 3 multiplied by 512;
TABLE 1 convolutional layer parameter List
Figure FDA0003495423000000011
(2) Attention mechanism
Embedding the attention mechanism into the convolutional network, the attention module calculates the formula as follows:
Mc(F)=σ(MLP(AP(F))+MLP(MP(F))) (1)
Figure FDA0003495423000000012
wherein the MLP represents a fully-connected layer,
Figure FDA0003495423000000013
is a characteristic map, and the characteristic map is a characteristic map,
Figure FDA0003495423000000014
for one-dimensional channel attention characteristics, AP and MP are dividedRespectively mean pooling and maximum pooling,
Figure FDA0003495423000000015
is a two-dimensional spatial attention feature; σ is sigmoid function, f7*7Convolution operation representing a filter size of 7 × 7;
2) timing modeling based on GRU
The current time is represented as t, and the input is xtThe hidden state at the previous moment is denoted as ht-1If so, the output of the GRU and the state update equation are as follows;
Figure FDA0003495423000000021
wherein z istRepresenting the update gate in GRU, σ is sigmoid activation function for controlling the output value between 0 and 1, rtRepresenting a reset gate in the GRU, and a tanh activation function for controlling the output value between-1 and 1]Representing the product of two connected vectors and a matrix; w, Wz、WrRepresenting the weight, adopting a random initialization mode, and continuously updating in the training process;
the GRU in the model has two layers, and each layer contains 1024 hidden units; the input picture sequence is subjected to feature extraction through an Attention conversion Model to obtain a tensor with the size of 10 multiplied by 3 multiplied by 512, then the tensor is input into a first layer of GRU, the output of the layer is input into a second layer of GRU, and the output of the second layer of GRU is the output of the whole GRU module;
3) full connection layer
There are two layers of full connection layers, which respectively contain 128 and 6 hidden units; the full connection layer reduces the dimension of the high-dimensional features output by the GRU module, and the finally output six-dimensional tensor is the relative posture between the current moment and the picture at the previous moment;
4) loss function
Expressing a pose estimation problem in the visual odometer as a conditional probability problem; as shown in equation 4, for a given (n +1) pictures:
X=(X1,X2,…,Xn+1) (4)
and calculating to obtain the pose between the adjacent pictures:
Y=(Y1,Y2,...,Yn) (5)
considering the above problem as a conditional probability problem, the calculation formula is:
p(Y|X)=p(Y1,Y2,...,Yn∣X1,X2,...,Xn+1) (6)
solving for the optimal network parameter w*Maximizing the probability in equation (6):
Figure FDA0003495423000000022
the Mean Squared Error (MSE) is used as a loss function, as shown in equation 8:
Figure FDA0003495423000000031
P1idenotes the i-th shift of group Truth to the sequential input, Φ1iThe rotation angle of the group Truth of the ith pair of sequential input is shown;
Figure FDA0003495423000000032
indicating the displacement of the ith pair of sample positive sequence inputs,
Figure FDA0003495423000000033
representing the rotation angle of the ith pair of sample positive sequence inputs; p2iIndicates the displacement of group Truth of the ith pair of reverse order inputs, phi2iThe rotation angle of the group Truth input in the ith pair of reverse orders is shown;
Figure FDA0003495423000000034
indicating the displacement of the ith pair of sample input in reverse order,
Figure FDA0003495423000000035
representing the rotation angle of the ith pair of sample reverse order inputs; m represents the number of samples, β1And beta2Scale factors respectively representing positive sequence input errors and negative sequence input errors;
the weights of all parameters in the network during training adopt a xavier initialization method, and the deviation adopts a zero initialization method; the iteration times are set to 100, and the initial learning rate is set to 0.01; and save the model parameters of item 100 for subsequent testing procedures.
CN202210113074.0A 2022-01-29 2022-01-29 Visual mileage calculation method based on attention convolution neural network Pending CN114463420A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210113074.0A CN114463420A (en) 2022-01-29 2022-01-29 Visual mileage calculation method based on attention convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210113074.0A CN114463420A (en) 2022-01-29 2022-01-29 Visual mileage calculation method based on attention convolution neural network

Publications (1)

Publication Number Publication Date
CN114463420A true CN114463420A (en) 2022-05-10

Family

ID=81410808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210113074.0A Pending CN114463420A (en) 2022-01-29 2022-01-29 Visual mileage calculation method based on attention convolution neural network

Country Status (1)

Country Link
CN (1) CN114463420A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180231985A1 (en) * 2016-12-22 2018-08-16 TCL Research America Inc. System and method for vision-based flight self-stabilization by deep gated recurrent q-networks
CN111369608A (en) * 2020-05-29 2020-07-03 南京晓庄学院 Visual odometer method based on image depth estimation
CN111739082A (en) * 2020-06-15 2020-10-02 大连理工大学 Stereo vision unsupervised depth estimation method based on convolutional neural network
CN112419411A (en) * 2020-11-27 2021-02-26 广东电网有限责任公司肇庆供电局 Method for realizing visual odometer based on convolutional neural network and optical flow characteristics
CN112556719A (en) * 2020-11-27 2021-03-26 广东电网有限责任公司肇庆供电局 Visual inertial odometer implementation method based on CNN-EKF
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN113888638A (en) * 2021-10-08 2022-01-04 南京航空航天大学 Pedestrian trajectory prediction method based on attention mechanism and through graph neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180231985A1 (en) * 2016-12-22 2018-08-16 TCL Research America Inc. System and method for vision-based flight self-stabilization by deep gated recurrent q-networks
CN111369608A (en) * 2020-05-29 2020-07-03 南京晓庄学院 Visual odometer method based on image depth estimation
CN111739082A (en) * 2020-06-15 2020-10-02 大连理工大学 Stereo vision unsupervised depth estimation method based on convolutional neural network
US20210390723A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Monocular unsupervised depth estimation method based on contextual attention mechanism
CN112419411A (en) * 2020-11-27 2021-02-26 广东电网有限责任公司肇庆供电局 Method for realizing visual odometer based on convolutional neural network and optical flow characteristics
CN112556719A (en) * 2020-11-27 2021-03-26 广东电网有限责任公司肇庆供电局 Visual inertial odometer implementation method based on CNN-EKF
CN113888638A (en) * 2021-10-08 2022-01-04 南京航空航天大学 Pedestrian trajectory prediction method based on attention mechanism and through graph neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王嘉楠等: "基于视觉转换器和图卷积网络的光学遥感场景分类", 《光子学报》, vol. 50, no. 11, 30 November 2021 (2021-11-30), pages 314 - 321 *

Similar Documents

Publication Publication Date Title
CN110335337B (en) Method for generating visual odometer of antagonistic network based on end-to-end semi-supervision
CN110210551B (en) Visual target tracking method based on adaptive subject sensitivity
CN108416840B (en) Three-dimensional scene dense reconstruction method based on monocular camera
CN108491880B (en) Object classification and pose estimation method based on neural network
CN107369166B (en) Target tracking method and system based on multi-resolution neural network
CN109800689B (en) Target tracking method based on space-time feature fusion learning
Wang et al. Sne-roadseg+: Rethinking depth-normal translation and deep supervision for freespace detection
CN104200494B (en) Real-time visual target tracking method based on light streams
CN109740742A (en) A kind of method for tracking target based on LSTM neural network
Li et al. Dual-view 3d object recognition and detection via lidar point cloud and camera image
CN111462191B (en) Non-local filter unsupervised optical flow estimation method based on deep learning
CN111376273B (en) Brain-like inspired robot cognitive map construction method
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN110473284A (en) A kind of moving object method for reconstructing three-dimensional model based on deep learning
CN111160294B (en) Gait recognition method based on graph convolution network
CN113436227A (en) Twin network target tracking method based on inverted residual error
CN111833400B (en) Camera pose positioning method
CN115375737B (en) Target tracking method and system based on adaptive time and serialized space-time characteristics
CN112686952A (en) Image optical flow computing system, method and application
CN116563682A (en) Attention scheme and strip convolution semantic line detection method based on depth Hough network
AU2020102476A4 (en) A method of Clothing Attribute Prediction with Auto-Encoding Transformations
CN113673313B (en) Gesture recognition method based on hierarchical convolutional neural network
CN114463420A (en) Visual mileage calculation method based on attention convolution neural network
CN115830707A (en) Multi-view human behavior identification method based on hypergraph learning
CN115690170A (en) Method and system for self-adaptive optical flow estimation aiming at different-scale targets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination