CN112561947A - Image self-adaptive motion estimation method and application - Google Patents

Image self-adaptive motion estimation method and application Download PDF

Info

Publication number
CN112561947A
CN112561947A CN202011434819.0A CN202011434819A CN112561947A CN 112561947 A CN112561947 A CN 112561947A CN 202011434819 A CN202011434819 A CN 202011434819A CN 112561947 A CN112561947 A CN 112561947A
Authority
CN
China
Prior art keywords
image
neural network
convolutional neural
deep convolutional
adaptive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011434819.0A
Other languages
Chinese (zh)
Inventor
杨德龙
尚鹏
侯增涛
王博
付威廉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202011434819.0A priority Critical patent/CN112561947A/en
Publication of CN112561947A publication Critical patent/CN112561947A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Abstract

The existing algorithms do not consider the time interval between image sequences, and assume that all images are acquired at the same time, thereby causing a certain error in calculation. The application provides an image self-adaptive motion estimation method, which comprises the steps of constructing a first deep convolutional neural network and a second deep convolutional neural network; constructing an objective function according to the first deep convolutional neural network and the second deep convolutional neural network, and simultaneously training the first deep convolutional neural network and the second deep convolutional neural network through the objective function to obtain a first deep convolutional neural network with fixed parameters and a second deep convolutional neural network with fixed parameters; and inputting the monocular image into a first depth convolution neural network to output a parallax image corresponding to the monocular image, and inputting an image sequence into a second depth convolution neural network to output a camera space pose transformation matrix. The adverse effect of the non-overlapping area between the images on the image reconstruction is avoided.

Description

Image self-adaptive motion estimation method and application
Technical Field
The application belongs to the technical field of artificial intelligence, and particularly relates to an image adaptive motion estimation method and application.
Background
Vision and hearing are the main ways in which humans perceive the external environment, where over 80% of the external information is obtained visually. Scene perception based on visual approach is a great challenge in the field of artificial intelligence and is an important component of a visual navigation system of an unmanned automobile. In the visual navigation system of the unmanned vehicle, three-dimensional information of a scene (parameters such as a relative distance between the scene and a camera, a spatial position and a posture of the camera) plays an important role. Meanwhile, the monocular camera has the advantages of small size, simple equipment, low cost, easiness in deployment and the like, and has application advantages compared with other sensors. Therefore, developing monocular image motion estimation algorithm research aiming at the unmanned scene has great significance for developing the unmanned automobile visual navigation system.
Currently, monocular image motion estimation algorithms based on deep learning are classified into supervised learning algorithms and unsupervised learning algorithms. The training data set of the supervised learning algorithm consists of a sequence of input images and a set of labels corresponding to each image. However, such a tag set is mostly completed by manual labeling, and the application range of the algorithm which is greatly limited is gradually eliminated. The unsupervised learning algorithm utilizes the space geometric relationship between images to design a supervision signal, is used for replacing a label set in the supervised algorithm, can finish the training and testing of a deep learning model only by using the images, and increasingly becomes the mainstream research direction.
In the design process of the target function, all images are defaulted to be static images acquired at the same moment, and the acquisition time interval between image sequences is ignored, so that certain errors are inevitably generated, and the algorithm precision is reduced. Through the analysis of the above problems, it was found that: the existing monocular image motion estimation method based on unsupervised deep learning only considers the problem from the geometric relationship between images and ignores the time dimension. Although a shorter time interval does not result in algorithm failure, such processing methods that defaults a dynamic scene image to a static scene image reduce the accuracy and robustness of the algorithm.
Disclosure of Invention
1. Technical problem to be solved
In the model training process, a monocular image sequence is required to be used as a training sample in the monocular image depth estimation and camera space pose calculation method based on unsupervised depth learning. The existing algorithm does not consider the time interval between image sequences, and assumes that all images are acquired at the same moment, thereby causing the problem of certain error in calculation.
2. Technical scheme
In order to achieve the above object, the present application provides an image adaptive motion estimation method, comprising the steps of: step 1: constructing a first deep convolutional neural network and a second deep convolutional neural network; step 2: constructing an objective function according to the first deep convolutional neural network and the second deep convolutional neural network, and simultaneously training the first deep convolutional neural network and the second deep convolutional neural network through the objective function to obtain a first deep convolutional neural network with fixed parameters and a second deep convolutional neural network with fixed parameters; and step 3: and inputting the monocular image into the first depth convolution neural network to output a parallax image corresponding to the monocular image, and inputting the image sequence into the second depth convolution neural network to output a camera space pose transformation matrix.
Another embodiment provided by the present application is: the first depth convolution neural network is a monocular image depth of field estimation network and is used for estimating the relative distance between a monocular camera and a scene; the second depth convolution neural network is a monocular camera space pose estimation network and is used for estimating the monocular camera space position and posture.
Another embodiment provided by the present application is: the monocular image depth of field estimation network is based on a depth residual error network, and the monocular image depth of field estimation network is of an encoding-decoding structure.
Another embodiment provided by the present application is: in the coding process, the network continuously extracts the wanted high-dimensional features and performs down-sampling through a convolutional layer, an active layer and a pooling layer; in the decoding process, the network performs up-sampling processing on the image through deconvolution, and outputs a multi-scale parallax image.
Another embodiment provided by the present application is: the monocular camera space pose estimation network is of an encoding structure.
Another embodiment provided by the present application is: and training the first deep convolutional neural network and the second deep convolutional neural network to perform iterative calculation on an objective function by adopting a gradient descent method until reaching the specified calculation times to obtain the first deep convolutional neural network with fixed parameters and the second deep convolutional neural network with fixed parameters.
Another embodiment provided by the present application is: the target function comprises an adaptive function constructed based on image global brightness difference and local brightness difference; and reconstructing images between the monocular image sequences, and combining the adaptive function to construct an adaptive error loss function of the reconstructed images and an adaptive loss function about the image depth edge constructed by combining the adaptive function.
Another embodiment provided by the present application is: the adaptive error loss function is based on the input image, the parallax image and the camera pose transformation matrix structure, and the adaptive loss function is based on the input image, the parallax image and the camera pose transformation matrix structure. Another embodiment provided by the present application is: the images include a target image and a reference image, and the reference image includes a first reference image and a second reference image.
The application also provides an application of the image self-adaptive motion estimation method, and the image self-adaptive motion estimation method is applied to an outdoor unmanned automobile or an unmanned autonomous navigation robot.
3. Advantageous effects
Compared with the prior art, the image self-adaptive motion estimation method and the application have the beneficial effects that:
the image self-adaptive motion estimation method is a monocular image self-adaptive motion estimation method based on unsupervised deep learning.
According to the image adaptive motion estimation method, an adaptive function is designed by utilizing image global brightness difference and local brightness difference and is used for distinguishing and processing overlapping and non-overlapping areas between image sequences, so that the method is applied to monocular image depth of field estimation and camera attitude estimation methods, and adverse effects caused by time interval problems between the image sequences are effectively solved.
According to the image adaptive motion estimation method, in the construction process of the target function, an adaptive function is designed based on the global brightness and the local brightness of the image and is used for distinguishing overlapping and non-overlapping areas in an image sequence, so that adverse effects of the non-overlapping areas between the images on image reconstruction are avoided.
The application of the image adaptive motion estimation method provided by the application aims at the problem of vehicle motion estimation (relative distance estimation between a monocular camera and a scene and spatial position and attitude estimation of the monocular camera) in an outdoor unmanned automobile or an unmanned autonomous navigation robot, and provides the adaptive motion estimation method.
Drawings
FIG. 1 is a schematic diagram illustrating the principles of the image adaptive motion estimation method of the present application;
FIG. 2 is a graphical representation of comparative experimental results on a KITTI data set of the present application;
fig. 3 is a graphical representation of comparative experimental results on the cityscaps dataset of the present application.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and it will be apparent to those skilled in the art from this detailed description that the present application can be practiced. Features from different embodiments may be combined to yield new embodiments, or certain features may be substituted for certain embodiments to yield yet further preferred embodiments, without departing from the principles of the present application.
Referring to fig. 1 to 3, the present application provides an image adaptive motion estimation method, including the following steps: step 1: constructing a first deep convolutional neural network and a second deep convolutional neural network; step 2: constructing an objective function according to the first deep convolutional neural network and the second deep convolutional neural network, and simultaneously training the first deep convolutional neural network and the second deep convolutional neural network through the objective function to obtain a first deep convolutional neural network with fixed parameters and a second deep convolutional neural network with fixed parameters; and step 3: and inputting the monocular image into the first depth convolution neural network to output a parallax image corresponding to the monocular image, and inputting the image sequence into the second depth convolution neural network to output a camera space pose transformation matrix.
The step 1 and the step 2 are training processes, wherein in the step 2, the first deep convolutional neural network parameter and the second deep convolutional neural network parameter are adjusted through an objective function, and finally the first deep convolutional neural network with fixed parameters and the second deep convolutional neural network parameter with fixed parameters are obtained.
As shown in fig. 1, the training sample is composed of a monocular image sequence of length 3, wherein the second image is designated as the target image, and the first and third images are designated as reference image 1 and reference image 2, respectively. Training samples are simultaneously input into a monocular image depth of field estimation network AdaDepthNet and a monocular camera spatial pose estimation network AdaMotionNet. Through the calculation of the two depth convolution neural networks, the AdaDepthNet outputs a parallax image corresponding to the image sequence (in the case of known camera parameters, the parallax image and the depth image can be freely converted, and the conversion formula is as follows:
Figure BDA0002828079650000041
wherein Disparity represents a parallax image; depth represents a depth image; (i, j) represents pixel coordinates(ii) a f represents a camera focal length; b denotes a camera baseline), AdaMotionNet outputs a camera spatial pose transformation matrix T from the reference image 1, the reference image 2 to the target images1→tAnd Ts2→tLoss function L in the objective functionad_phAnd Lad_smoothAll based on the input image, the parallax image and the camera pose transformation matrix.
The method proposed by the application comprises two parts of training and testing. After the design of the AdaDepthNet and AdaMotionNet network structures and the construction of the objective function are completed, the network parameters are not known, the objective function needs to be iteratively calculated by using a gradient descent method until the specified calculation times are reached, and the process is a training process of the method. After the training process is completed, network parameters are fixed, at this time, a monocular image sequence can be used as input data, and the network directly outputs a corresponding parallax image or a camera space pose transformation matrix. The accuracy of the calculation result is directly determined by the objective function, so the design of the objective function is the core of the method.
Further, the first depth convolution neural network is a monocular image depth of field estimation network and is used for estimating the relative distance between the monocular camera and the scene; the second depth convolution neural network is a monocular camera space pose estimation network and is used for estimating the monocular camera space position and posture.
Further, the monocular image depth of field estimation network is based on a depth residual error network, and the monocular image depth of field estimation network is of an encoding-decoding structure.
Further, in the encoding process, the network continuously extracts the wanted high-dimensional features through a convolutional layer, an active layer and a pooling layer and performs down-sampling; in the decoding process, the network performs up-sampling processing on the image through deconvolution, and outputs a multi-scale parallax image.
Further, the monocular camera spatial pose estimation network is of an encoding structure.
AdaDepthNet contained in the technical scheme of the application is designed based on a depth residual error network ResNet and is an 'encoding-decoding' structure. In the coding process, the network continuously extracts the wanted high-dimensional features and performs down-sampling through a convolutional layer, an active layer and a pooling layer; in the decoding process, the network performs up-sampling processing on the image through deconvolution, and outputs a multi-scale parallax image, wherein the sizes of the multi-scale parallax image are as follows: (H, W), (H/2, W/2), (H/4, W/4), (H/8, W/8), wherein H, W respectively represent the height and width of the image. Adamobilenet directly uses an "encoding" structure (a network structure is shown in table 1, wherein conv1, conv2, … and conv6 represent the output of each layer of convolution, and the position represents a monocular camera spatial Pose transformation matrix), and a monocular image sequence is used as input, and the camera spatial Pose transformation matrix corresponding to the final output image is continuously calculated through the convolution layers. The objective function is composed of an input image, a parallax image output by AdaDepthNet and a camera space pose change matrix output by AdaMotionNet, and a functional block diagram of the method is shown in FIG. 1.
Inputting data Output channel Convolution kernel Step size Outputting the result
Monocular image sequence 16 7 2 conv1
conv1 32 5 2 conv2
conv2 64 5 2 conv3
conv3 128 3 2 conv4
conv4 256 3 2 conv5
conv5 256 3 2 conv6
conv6 48 1 1 Pose
Table 1 adamobility net network architecture
Further, the training of the first deep convolutional neural network and the second deep convolutional neural network is to perform iterative computation on an objective function by adopting a gradient descent method until a specified computation number is reached, so as to obtain fixed network parameters.
Further, the objective function comprises an adaptive function constructed based on image global brightness difference and local brightness difference; and reconstructing images between the monocular image sequences, and combining the adaptive function to construct an adaptive error loss function of the reconstructed images and an adaptive loss function about the image depth edge constructed by combining the adaptive function.
Further, the adaptive error loss function is based on the input image, the parallax image and the camera pose transformation matrix, and the adaptive loss function is based on the input image, the parallax image and the camera pose transformation matrix.
1. Construction of adaptive functions
Order (I)1,I2,I3) Representing a training sample, designating a second image I2Is a target image ItThe first and third images are reference images, and the construction principle of the target function is to reconstruct the target image from the reference images
Figure BDA0002828079650000051
By calculation of
Figure BDA0002828079650000052
And ItThe degree of similarity between replaces the supervisory signals. However, since the image sequence is continuously acquired by the moving camera in time sequence, there are inevitable non-overlapping regions between the continuous images, and the direct calculation inevitably has some abnormal points or abnormal regions which reduce the accuracy of the algorithm.
In order to enable the model to judge whether a pixel in the image belongs to the overlapping region, the method provides a self-adaptive loss function based on the brightness consistency of the image. Let I denote the input image or images,
Figure BDA0002828079650000053
representing a reconstructed image, designating the entire image as a global region, 5 x 5 imagesIf the pixel area is a local area, the global image brightness difference glopcThe calculation formula of (2) is as follows:
Figure BDA0002828079650000054
where (i, j) represents image pixel coordinates, Ω represents an image region, and | Ω | represents the number of pixels in the image region.
If the parallax image output by AdaDepthNet and the camera space pose transformation matrix output by AdaMotionNet are accurate values, the reconstruction result of the target image
Figure BDA0002828079650000055
Should be infinitely close to the input image ItThat is, the brightness values corresponding to the pixels with the same coordinates in the two images are equal or infinitely close to equal, and at this time glopc→ 0 +; on the contrary, if the depth of field estimation result and the camera pose estimation result are poor, the reconstruction result of the target image and the input image have large difference, and the glo is performed at the momentpcThe value of (a) is large and there is no regularity. Theoretically, as network training progresses, glopcWill decrease and eventually converge from positive to zero. The main reason for this is glopcThe average value of the brightness difference of the pixels of the global image is obtained, the image sequence has short acquisition interval, the overlapping rate between the reference image and the target image is high, and the adverse effect generated by a non-overlapping area after the average value is obtained is obviously reduced, but the average calculation only averages the calculation error to each pixel and does not reduce the total error value.
Local image luminance difference locpcThe calculation formula of (2) is as follows:
Figure BDA0002828079650000061
difference value loc for local image brightnesspcCannot obtain the same result as glopcSimilar conclusions are drawn. locpcCalculating the luminance of only a 5 × 5 pixel regionIn contrast, if the central pixel of the region is exactly located in the non-overlapping region, the image reconstruction algorithm cannot complete the reconstruction of the same region, even if the reconstruction accuracy of the overlapping region is high, the loc corresponding to the non-overlapping regionpcStill a random value greater than zero. Otherwise, if the central pixel point is in the overlapping region and the image reconstruction precision is very high, locpc→0+。
Therefore, the adaptive weight function constructed by the method of the present application using the difference between the global and local luminances is:
ω(i,j)=exp(-(∈locpc(i,j)+(1-∈)glopc(i,j))) (3)
the method comprises the following steps of obtaining a local brightness difference function, obtaining a global brightness difference function, obtaining a self-adaptive weight value, wherein omega (i, j) belongs to (0, 1) and is an adaptive weight value, wherein (i, j) represents a pixel coordinate, and (i, j) belongs to omega.
In the image reconstruction process, the adaptive weight function can be understood as a mask, that is, a corresponding calculation rule is generated for each pixel, and whether the pixel belongs to the overlapping region is determined. ω (i, j) is about locpc(i, j) provided that pixel p is a decreasing functiontempLocated in the non-overlapping region, where the reconstructed image and the original input image correspond to locpcThe value of (i, j) is larger, and ω (p)temp) Smaller or even close to zero. Global image luminance difference term glo whether or not the pixel is in a non-overlapping regionpc(ptemp) All of them are not changed so that the local brightness difference term loc of the imagepcPlays a major role in the adaptive weight function.
2. Adaptive error loss function for reconstructed images
For a monocular image sequence (I)1,I2,I3) The reconstruction formula from the reference image to the target image is as follows:
Figure BDA0002828079650000062
wherein K represents a camera parameter matrix, which is a known quantity; t issn→tCamera for representing reference image to target imageA spatial pose transformation matrix; dtRepresenting a depth image (which can be converted by a parallax image) corresponding to the target function; i isnRepresenting a reference image; n is 1,3, corresponding to the first and third images.
The reconstruction results from the reference image to the target image obtained by the equation (4) are respectively
Figure BDA0002828079650000063
And
Figure BDA0002828079650000064
similar to the process of constructing the objective function above, the method selects the structured similarity functions SSIM () and L1Norm construction image luminance loss function:
Figure BDA0002828079650000065
where η represents a weight value used to adjust the impact of the structured similarity function and the L1 norm on the outcome. The adaptive image luminance loss function is derived from equations (3) and (5):
Figure BDA0002828079650000066
where (i, j) represents image pixel coordinates, Ω represents an image region, and | Ω | represents the number of pixels in the image region.
When the image reconstruction effect is poor, the global brightness difference value glo between the reconstructed image and the input imagepcDifference value loc from local brightnesspcLarger, 1/ω (i, j) values become larger, resulting in an adaptive luminance loss function Lad_phConvergence is not possible; for non-overlapping regions, the local luminance difference value locpcIncrease, 1/ω (i, j) with respect to locpcAnd therefore also the loss function L, resulting inad_phConvergence is not possible; the loss function L is only used when the image reconstruction result is close to the original input image and the reconstructed pixels belong to the overlap regionad_phIt will converge.
3. Adaptive edge loss function for depth images
Considering that the brightness of the pixels of the depth image is suddenly changed in the edge area, similar to the adaptive brightness loss function, the method provides the adaptive edge loss function. Since the brightness of the image pixel also has sudden change at an isolated noise point, the input image is smoothed by using the gaussian kernel shown in formula (7) in this chapter, and then the image is subjected to laplace transform as an edge loss function.
Figure BDA0002828079650000071
"edge" may occur in any region of the image, so the method of the present application combines the adaptive function shown in equation (3) with the laplacian transform of the image to provide an adaptive function ω for the edge information of the depth imageedge
Figure BDA0002828079650000072
Wherein (i, j) represents image pixel coordinates; Ω represents an image area, | Ω | represents the number of pixels in the image area;
Figure BDA0002828079650000073
representing the laplacian operator. The adaptive edge loss function term L is obtained from equation (8)ad_smoothComprises the following steps:
Figure BDA0002828079650000074
where d represents a depth image. The adaptive edge loss function shown in equation (9) not only considers that the pixel brightness changes abruptly in the edge region, but also performs a differential calculation between the overlapping and non-overlapping regions in the two images.
4. Objective function
The objective function corresponding to the joint type (6) and (8) available self-adaptive depth of field estimation model is:
Figure BDA0002828079650000075
wherein, tau1And τ2Representing a weight value for adjusting the importance degree of an image reconstruction loss function and a depth image edge loss function; s is the scale of the image.
Further, the image includes a target image and a reference image, and the reference image includes a first reference image and a second reference image.
The application also provides an application of the image self-adaptive motion estimation method, and the image self-adaptive motion estimation method is applied to an outdoor unmanned automobile or an unmanned autonomous navigation robot.
After the objective function design is completed, iterative computation, namely a training process, is carried out on the objective function by using an open source data set KITTI and a Cityscapes data set facing an outdoor unmanned scene. In the testing process, the network parameters are fixed, and the scene depth information and the camera space pose transformation matrix corresponding to the monocular image can be directly calculated.
FIG. 2(a) is an input image randomly selected from a KITTI data set; FIG. 2(b) is a true depth image; FIG. 2(c) shows the depth of field estimation result output by the SfMLearner model; FIG. 2(d) shows the depth of field estimation result output by the GeoNet model; fig. 2(e) is a depth estimation result output by the adaptive depth estimation model proposed by the method of the present invention.
FIG. 3(a) is an input image randomly selected from the Cityscapes dataset; FIG. 3(b) is a true depth image; FIG. 3(c) shows the depth of field estimation result output by the SfMLearner model; FIG. 3(d) shows the depth of field estimation result output by the GeoNet model; fig. 3(e) is a depth estimation result output by the adaptive depth estimation model proposed in this chapter.
The results of the precision and accuracy comparison experiments are shown in table 2. The test set is 697 images (open source) divided by Eigen from KITTI data set, and the evaluation indexes are Absolute Relative Error (Absolute Relative Error, AbsRel), Square Relative Error (Square Relative Error, SqRel), Linear Root Mean Square Error (RMSElinear), logarithmic Root Mean Square Error (RMSElG 10) and precision (Correct), respectively. The first four error indexes are used for evaluating the prediction accuracy of the model, and the lower the error value is, the higher the prediction accuracy of the model is; the last accuracy index is used for evaluating the prediction accuracy of the model, and the prediction accuracy of the model is in direct proportion to the index value.
The baseline comparison model (baseline) was: supervised learning models Eigen, Liu; unsupervised learning models ACA, r.garg, SfMLearner, GeoNet, GASDA and D-SLAM. AdaModel (K), AdaModel (C) denote adaptive depth of field estimation models trained on the KITTI dataset, the cityscaps dataset, respectively, AdaModel (C + K) denotes an adaptive depth of field estimation model trained on the cityscaps dataset and the KITTI dataset.
Figure BDA0002828079650000081
TABLE 2 depth of field estimation contrast results
The method selects an ORB-SLAM model, an SfMLearner model, a GeoNet model and a D-SLAM model as comparison algorithms, and uses absolute path error (ATE) as a quantitative evaluation standard of model precision, and the comparison experiment results are shown in Table 3.
Model (model) Image sequence 09 Image sequence 10
ORB-SLAM(full) 0.014±0.008 0.012±0.011
ORB-SLAM(short) 0.064±0.141 0.064±0.130
SfMLearner 0.021±0.017 0.020±0.015
GeoNet 0.012±0.007 0.012±0.009
D-SLAM 0.017±0.008 0.015±0.017
AdaModel(K) 0.012±0.005 0.012±0.006
TABLE 3 comparative test results of visual odometer (ATE)
The method comprises a depth convolution neural network AdaDepthNet used for estimating the relative distance between the monocular camera and a scene, a depth convolution neural network AdaMotionNet used for estimating the spatial position and the attitude of the monocular camera, an adaptive function used for distinguishing an image overlapping region from a non-overlapping region, an objective function of an adaptive motion estimation method and the like.
Although the present application has been described above with reference to specific embodiments, those skilled in the art will recognize that many changes may be made in the configuration and details of the present application within the principles and scope of the present application. The scope of protection of the application is determined by the appended claims, and all changes that come within the meaning and range of equivalency of the technical features are intended to be embraced therein.

Claims (10)

1. An image adaptive motion estimation method, characterized by: the method comprises the following steps:
step 1: constructing a first deep convolutional neural network and a second deep convolutional neural network;
step 2: constructing an objective function according to the first deep convolutional neural network and the second deep convolutional neural network, and simultaneously training the first deep convolutional neural network and the second deep convolutional neural network through the objective function to obtain a first deep convolutional neural network with fixed parameters and a second deep convolutional neural network with fixed parameters;
and step 3: and inputting the monocular image into the first depth convolution neural network to output a parallax image corresponding to the monocular image, and inputting the image sequence into the second depth convolution neural network to output a camera space pose transformation matrix.
2. The image adaptive motion estimation method of claim 1, characterized in that: the first depth convolution neural network is a monocular image depth of field estimation network and is used for estimating the relative distance between a monocular camera and a scene; the second depth convolution neural network is a monocular camera space pose estimation network and is used for estimating the monocular camera space position and posture.
3. The image adaptive motion estimation method of claim 2, wherein: the monocular image depth of field estimation network is based on a depth residual error network, and the monocular image depth of field estimation network is of an encoding-decoding structure.
4. An image adaptive motion estimation method according to claim 3, characterized in that: in the coding process, the network continuously extracts the wanted high-dimensional features and performs down-sampling through a convolutional layer, an active layer and a pooling layer; in the decoding process, the network performs up-sampling processing on the image through deconvolution, and outputs a multi-scale parallax image.
5. The image adaptive motion estimation method of claim 2, wherein: the monocular camera space pose estimation network is of an encoding structure.
6. The image adaptive motion estimation method of claim 1, characterized in that: and training the first deep convolutional neural network and the second deep convolutional neural network to perform iterative calculation on an objective function by adopting a gradient descent method until reaching a specified calculation frequency to obtain the first deep convolutional neural network with fixed parameters and the first deep convolutional neural network with fixed parameters.
7. An image adaptive motion estimation method according to claim 6, characterized in that: the target function comprises an adaptive function constructed based on image global brightness difference and local brightness difference; and reconstructing images between the monocular image sequences, and combining the adaptive function to construct an adaptive error loss function of the reconstructed images and an adaptive loss function about the image depth edge constructed by combining the adaptive function.
8. An image adaptive motion estimation method according to claim 7, characterized in that: the adaptive error loss function is based on the input image, the parallax image and the camera pose transformation matrix structure, and the adaptive loss function is based on the input image, the parallax image and the camera pose transformation matrix structure.
9. An image adaptive motion estimation method according to any one of claims 1 to 8, characterized by: the images include a target image and a reference image, and the reference image includes a first reference image and a second reference image.
10. An application of an image adaptive motion estimation method is characterized in that: the image adaptive motion estimation method of any one of claims 1-9 is applied to an outdoor unmanned automobile or an unmanned autonomous navigation robot.
CN202011434819.0A 2020-12-10 2020-12-10 Image self-adaptive motion estimation method and application Pending CN112561947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011434819.0A CN112561947A (en) 2020-12-10 2020-12-10 Image self-adaptive motion estimation method and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011434819.0A CN112561947A (en) 2020-12-10 2020-12-10 Image self-adaptive motion estimation method and application

Publications (1)

Publication Number Publication Date
CN112561947A true CN112561947A (en) 2021-03-26

Family

ID=75060328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011434819.0A Pending CN112561947A (en) 2020-12-10 2020-12-10 Image self-adaptive motion estimation method and application

Country Status (1)

Country Link
CN (1) CN112561947A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361544A (en) * 2021-06-11 2021-09-07 广东省大湾区集成电路与系统应用研究院 Image acquisition equipment, method and device for correcting external parameters of image acquisition equipment and storage medium
CN114782911A (en) * 2022-06-20 2022-07-22 小米汽车科技有限公司 Image processing method, device, equipment, medium, chip and vehicle

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009674A (en) * 2019-04-01 2019-07-12 厦门大学 Monocular image depth of field real-time computing technique based on unsupervised deep learning
CN110503680A (en) * 2019-08-29 2019-11-26 大连海事大学 It is a kind of based on non-supervisory convolutional neural networks monocular scene depth estimation method
WO2019223382A1 (en) * 2018-05-22 2019-11-28 深圳市商汤科技有限公司 Method for estimating monocular depth, apparatus and device therefor, and storage medium
CN110770758A (en) * 2017-01-23 2020-02-07 牛津大学创新有限公司 Determining the position of a mobile device
CN110910447A (en) * 2019-10-31 2020-03-24 北京工业大学 Visual odometer method based on dynamic and static scene separation
CN111168722A (en) * 2019-12-12 2020-05-19 中国科学院深圳先进技术研究院 Robot following system and method based on monocular camera ranging
CN111386550A (en) * 2017-11-15 2020-07-07 谷歌有限责任公司 Unsupervised learning of image depth and ego-motion predictive neural networks
CN111771135A (en) * 2019-01-30 2020-10-13 百度时代网络技术(北京)有限公司 LIDAR positioning using RNN and LSTM for time smoothing in autonomous vehicles

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110770758A (en) * 2017-01-23 2020-02-07 牛津大学创新有限公司 Determining the position of a mobile device
CN111386550A (en) * 2017-11-15 2020-07-07 谷歌有限责任公司 Unsupervised learning of image depth and ego-motion predictive neural networks
WO2019223382A1 (en) * 2018-05-22 2019-11-28 深圳市商汤科技有限公司 Method for estimating monocular depth, apparatus and device therefor, and storage medium
CN111771135A (en) * 2019-01-30 2020-10-13 百度时代网络技术(北京)有限公司 LIDAR positioning using RNN and LSTM for time smoothing in autonomous vehicles
CN110009674A (en) * 2019-04-01 2019-07-12 厦门大学 Monocular image depth of field real-time computing technique based on unsupervised deep learning
CN110503680A (en) * 2019-08-29 2019-11-26 大连海事大学 It is a kind of based on non-supervisory convolutional neural networks monocular scene depth estimation method
CN110910447A (en) * 2019-10-31 2020-03-24 北京工业大学 Visual odometer method based on dynamic and static scene separation
CN111168722A (en) * 2019-12-12 2020-05-19 中国科学院深圳先进技术研究院 Robot following system and method based on monocular camera ranging

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D. YANG ET AL: "An Adaptive Unsupervised Learning Framework for Monocular Depth Estimation", IEEE ACESS, vol. 7, pages 148142 - 148151, XP011751392, DOI: 10.1109/ACCESS.2019.2946323 *
DELONG YANG, ET AL.: "Unsupervised framework for depth estimation and camera motion prediction from video", NEUROCOMPUTING, vol. 385, pages 169 - 185, XP086067819, DOI: 10.1016/j.neucom.2019.12.049 *
YANG, DL ,ET AL.: "Unsupervised learning of depth estimation, camera motion prediction and dynamic object localization from video", INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, pages 1 - 14 *
周祖德,孟伟,陈冰: "数字制造装备与过程的智能控制", 武汉理工大学出版社, pages: 175 - 176 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113361544A (en) * 2021-06-11 2021-09-07 广东省大湾区集成电路与系统应用研究院 Image acquisition equipment, method and device for correcting external parameters of image acquisition equipment and storage medium
CN113361544B (en) * 2021-06-11 2024-04-19 广东省大湾区集成电路与系统应用研究院 Image acquisition equipment, and external parameter correction method, device and storage medium thereof
CN114782911A (en) * 2022-06-20 2022-07-22 小米汽车科技有限公司 Image processing method, device, equipment, medium, chip and vehicle
CN114782911B (en) * 2022-06-20 2022-09-16 小米汽车科技有限公司 Image processing method, device, equipment, medium, chip and vehicle

Similar Documents

Publication Publication Date Title
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
CN108921926B (en) End-to-end three-dimensional face reconstruction method based on single image
CN113330490B (en) Three-dimensional (3D) assisted personalized home object detection
US10885659B2 (en) Object pose estimating method and apparatus
CN110503680B (en) Unsupervised convolutional neural network-based monocular scene depth estimation method
CN108242079B (en) VSLAM method based on multi-feature visual odometer and graph optimization model
Maggio et al. Loc-nerf: Monte carlo localization using neural radiance fields
CN110675423A (en) Unmanned aerial vehicle tracking method based on twin neural network and attention model
CN111325797A (en) Pose estimation method based on self-supervision learning
CN111902826A (en) Positioning, mapping and network training
CN111476806B (en) Image processing method, image processing device, computer equipment and storage medium
CN108171249B (en) RGBD data-based local descriptor learning method
CN110443849B (en) Target positioning method for double-current convolution neural network regression learning based on depth image
CN112561947A (en) Image self-adaptive motion estimation method and application
CN112489119B (en) Monocular vision positioning method for enhancing reliability
CN113724379B (en) Three-dimensional reconstruction method and device for fusing image and laser point cloud
CN113762358A (en) Semi-supervised learning three-dimensional reconstruction method based on relative deep training
CN115187638B (en) Unsupervised monocular depth estimation method based on optical flow mask
CN112084934A (en) Behavior identification method based on two-channel depth separable convolution of skeletal data
CN113962858A (en) Multi-view depth acquisition method
CN111376273A (en) Brain-like inspired robot cognitive map construction method
CN111833400B (en) Camera pose positioning method
CN114663502A (en) Object posture estimation and image processing method and related equipment
CN113570658A (en) Monocular video depth estimation method based on depth convolutional network
CN114170290A (en) Image processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination