CN109323709B

CN109323709B - Visual odometry method, device and computer-readable storage medium

Info

Publication number: CN109323709B
Application number: CN201710639962.5A
Authority: CN
Inventors: 李昊鑫; 李静雯; 王刚; 刘殿超
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2022-04-08
Anticipated expiration: 2037-07-31
Also published as: CN109323709A

Abstract

Visual odometry methods, apparatus, and computer-readable storage media for estimating a target state are provided. The method can comprise the following steps: obtaining a first number of current frames of target postures to be estimated, a second number of historical frames which are just before a first frame in the current frames and have been subjected to posture estimation, and a subsequent frame which is just after a last frame in the current frames; deducing the target attitude in the current frame according to the attitude estimation result of the historical frame; calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition; and optimizing the target posture in the current frame based on the constraint condition.

Description

Visual odometry method, device and computer-readable storage medium

Technical Field

The present disclosure relates to visual odometry, and more particularly, to visual odometry methods, apparatus, and computer-readable storage media for estimating a target pose.

Background

In the field of mobile robotics, simultaneous localization and mapping (SLAM) technology has been studied and developed for many years, and visual odometry is part of the SLAM problem, which incrementally estimates the position and pose of an object based on vision.

The most important problem with visual odometry is how to estimate the motion of the object from several neighboring images. Feature-based methods are the mainstream of current visual odometers, and have a long history of research. The feature method considers that, for two images, some representative points, called feature points, should be selected first. Thereafter, the motion of the object is estimated only for these feature points, while estimating the spatial positions of the feature points. The information of other non-feature points in the image is discarded. Thus, the feature point method converts a motion estimate for an image to a motion estimate between two sets of points.

At present, the visual odometry technology is gradually applied to the field of automatic driving, and compared with an indoor robot, the automatic driving has a huge application market, but at the same time, the moving speed of a vehicle in an automatic driving environment is often high, and factors such as illumination, weather and the like in an outdoor environment change infrequently, so that the quality of an obtained image changes, and sometimes, effective feature points are difficult to extract from some images and match the feature points, so that the posture of a corresponding frame is difficult to estimate, and certain challenges are brought to the visual odometry technology.

In the existing visual odometry methods, in order to estimate the pose of the vehicle more accurately, some methods improve the accuracy by introducing sensor data such as an IMU or a GPS, but the cost is increased by doing so.

The method using the SLAM framework is mainly located by a location recognition technology and a landmark recognition technology. In the field of automatic driving, the number of repeated roads passed by a vehicle is small, so that the situation that the adjacent posture cannot be recovered by repositioning after the posture estimation fails due to the low image quality of a part of image frames in a complex scene may exist. Moreover, the SLAM method performs pose constraint by continuously propagating landmark points forward, but when the number of features on an image is small, transmission of landmarks is difficult, and thus the problem that pose of such partial frames is difficult to estimate cannot be well solved.

Disclosure of Invention

In view of the foregoing, the present disclosure proposes a visual odometry method, apparatus, and computer-readable storage medium for estimating a target pose.

According to one aspect of the present disclosure, a visual odometry method for estimating a target pose is provided, which may include: obtaining a first number of current frames of target postures to be estimated, a second number of historical frames which are just before a first frame in the current frames and have been subjected to posture estimation, and a subsequent frame which is just after a last frame in the current frames; deducing the target attitude in the current frame according to the attitude estimation result of the historical frame; calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition; and optimizing the target posture in the current frame based on the constraint condition.

In an alternative embodiment, the step of inferring the target pose in the current frame from the pose estimation results of the historical frames may comprise: obtaining a local motion model according to the attitude estimation result of the historical frame; and calculating a target pose in the current frame based on the local motion model.

In an alternative embodiment, the step of obtaining a local motion model according to the pose estimation result in the historical frame may include: calculating a motion vector according to feature point matching between adjacent frames in the historical frames; obtaining a local motion direction category according to the motion vector by utilizing a pre-trained classifier; selecting a corresponding local motion model based on the local motion direction category; and solving the parameters of the local motion model by using the attitude estimation result of the historical frame.

In an alternative embodiment, the step of calculating a motion vector according to feature point matching between adjacent frames in the historical frames may comprise: obtaining mutually matched feature points between adjacent frames in the historical frames; transforming the matched feature points into a world coordinate system according to the camera parameters and the target postures of the frames where the matched feature points are located; and calculating motion vectors between the feature points matched with each other in the world coordinate system.

In an alternative embodiment, the step of calculating a pose change between the target pose of one frame of the historical frames and the target pose of the subsequent frame may comprise: performing feature point matching on one frame in the historical frames and the subsequent frames; and calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame based on the feature point matching result.

In an alternative embodiment, the step of calculating a pose change between the target pose of one frame in the historical frames and the target pose of the subsequent frame based on the feature point matching result may include: transforming the matched characteristic points in one frame of the historical frame and the subsequent frame into a world coordinate system according to the camera parameters and the target postures of the frames where the matched characteristic points are located; and calculating the rotation and translation quantity between one frame in the historical frames and the subsequent frame according to the matched characteristic points in the two frames in the world coordinate system as the posture change.

In an optional embodiment, the method may further comprise: calculating a target pose in the subsequent frame using the local motion model. Wherein the step of optimizing the target pose in the current frame based on the constraint condition may include: establishing a posture graph by taking the calculated target posture in the current frame and the target posture in the subsequent frame as nodes and the constraint condition as edges; the energy function of the pose graph is minimized to obtain an optimized target pose.

In an optional embodiment, the method may further comprise: calculating the average value of the changes of the target postures of all the frames in the current frame; and smoothing the posture change between the adjacent frames in the current frame based on the average value to obtain a smoothed target posture.

According to another aspect of the present disclosure, there is provided a visual odometry apparatus for estimating a target pose, the apparatus may include: an obtaining means for obtaining a first number of current frames of a target attitude to be estimated, a second number of history frames that have been subjected to attitude estimation immediately before a first frame in the current frames, and a subsequent frame immediately after a last frame in the current frames; an inference component for inferring a target pose in the current frame from a pose estimation result of the historical frame; a calculation unit configured to calculate, as a constraint condition, a posture change between a target posture of one frame of the history frames and a target posture of the subsequent frame; and an optimizing component for optimizing the target posture in the current frame based on the constraint condition.

According to another aspect of the present disclosure, there is provided an apparatus for estimating a target pose, the apparatus may include: a memory storing computer program instructions; and a processor coupled to the memory, the processor configured to execute the computer program instructions to perform the following: obtaining a first number of current frames of target postures to be estimated, a second number of historical frames which are just before a first frame in the current frames and have been subjected to posture estimation, and a subsequent frame which is just after a last frame in the current frames; deducing the target attitude in the current frame according to the attitude estimation result of the historical frame; calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition; and optimizing the target posture in the current frame based on the constraint condition.

According to another aspect of the present disclosure, there is provided a computer readable storage medium storing computer program instructions which, when executed, may perform the following: obtaining a first number of current frames of target postures to be estimated, a second number of historical frames which are just before a first frame in the current frames and have been subjected to posture estimation, and a subsequent frame which is just after a last frame in the current frames; deducing the target attitude in the current frame according to the attitude estimation result of the historical frame; calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition; and optimizing the target posture in the current frame based on the constraint condition.

According to the visual odometry method, device and computer-readable storage medium for estimating a target pose of the embodiment of the present disclosure, the target pose in a current frame is inferred from the pose estimation result of a history frame on which the pose estimation has been performed before, and the pose change between the target pose of the history frame and the target pose of a subsequent frame after the current frame is calculated as a constraint condition by which the target pose in the current frame is optimized. Therefore, even for partial image frames which have low image quality and are difficult to perform feature point matching, target posture estimation can be performed, so that the visual odometer can normally operate in a complex scene, and the robustness and the accuracy of the visual odometer are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and are intended to provide further explanation of the claimed technology.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 is a flow chart illustrating the main steps of a visual odometry method for estimating a target pose according to an embodiment of the present disclosure;

FIG. 2 is a flow chart illustrating the main steps of a gesture inference method according to another embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating the main steps of a local motion model calculation method according to another embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating the main steps of a pose optimization method according to another embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating an example pose graph according to another embodiment of the present disclosure;

FIG. 6 is a block diagram illustrating a primary configuration of a visual odometry device for estimating a target pose, according to an embodiment of the present disclosure; and

fig. 7 is a block diagram illustrating a main configuration of an apparatus for estimating a target posture according to another embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein. All other embodiments made by those skilled in the art without inventive efforts based on the embodiments of the present disclosure described in the present disclosure should fall within the scope of the present disclosure.

First, an application scenario of the present disclosure is briefly introduced. As described above, visual odometry provides a method of estimating the motion of an object from a continuous sequence of images obtained from a camera. The applicable fields include but are not limited to automatic driving, mobile robots or unmanned planes, etc. For example, when applied to an autonomous driving environment, a continuous sequence of images of a target scene may be captured by an onboard camera, and the pose of the target (i.e., the vehicle in this case) may be estimated from the continuous sequence of images using a visual odometry method.

Different camera types may be used in different applications, such as monocular cameras, stereo cameras, RGBD cameras, etc. Thus, the image captured by the camera may include a color image, a grayscale image, a depth image, an RGBD image, and so on. These image types are all applicable to the visual odometry method of the present disclosure for estimating the pose of a target. That is, the present disclosure does not limit the type of image captured by the camera.

As understood by those skilled in the art, the "pose" of an object refers to the position and orientation of the object, which may be represented by a six-dimensional vector (x, y, z, theta,

ψ) is shown. In general, the current pose of the target may be represented by the amount of rotation and translation relative to the initial pose of the target.

Next, a visual odometry method for estimating a target pose according to one embodiment of the present disclosure is described with reference to fig. 1.

As shown in fig. 1, the visual odometry method 100 according to this embodiment may include the following steps.

In step S110, a first number of current frames of the target pose to be estimated, a second number of history frames that have been subjected to pose estimation immediately before the first frame in the current frames, and a subsequent frame immediately after the last frame in the current frames are obtained.

As described above, the moving speed of the vehicle is often fast in the automatic driving environment, and at the same time, the quality of some image frames in the image sequence captured by the camera is not ideal enough due to the change of the illumination, weather and other factors in the outdoor environment, and it is difficult to extract effective feature points from the image frames and perform feature point matching. The continuous images with sparse feature points are the image frames of the estimated target pose aimed at by the visual odometry method of the embodiment.

Specifically, when the target pose is estimated by the conventional visual odometry method, feature points are extracted for each successive image frame, and the motion of the target is estimated by a feature point matching method to obtain the target pose of each frame. When extracting the characteristic points of each frame image and estimating the posture, if the image frame has less characteristic points and is difficult to estimate the posture through a matching method, adding the frame into the current image sequence. Then, feature points continue to be extracted for the next frame, and if the next frame still does not have enough feature points, the next frame is also added to the current image sequence. This operation continues until the next frame has enough feature points.

Thus, a current image sequence composed of image frames with fewer feature points before the frame with sufficient feature points is obtained as the current frame. It is clear to those skilled in the art that the current frame described herein may be one frame or may be a continuous plurality of frames.

Meanwhile, the above-described frame having sufficient feature points immediately after the last frame in the current frames is obtained as the subsequent frame. In addition, in order to infer the posture of the current frame from the historical posture of the target, a historical frame immediately before the first frame in the current frame is also obtained. As described above, the history frame is a frame that can estimate the target state according to the conventional visual odometry method and has obtained the posture estimation result. Those skilled in the art will appreciate that the historical frames are at least two frames for pose estimation, such as estimating the target pose by feature point matching. In one example, five historical frames prior to the current frame may be chosen. Of course, it will be apparent to those skilled in the art that the number of history frames is not limited to five frames and may vary depending on the particular application.

In the above feature extraction, the feature extraction method that can be adopted includes, but is not limited to, extracting corners, color blocks, etc. in the image. The feature point extraction algorithm developed in recent years can extract the same points even after the image is changed to a certain degree, and can judge the correlation between the same points. For example, commonly used features may include Harris corners, SIFT features, SURF features, ORB features, and the like.

For each feature point, in order to illustrate its distinction from other points, they may also be described using a "Descriptor" (Descriptor). A descriptor is usually a vector containing information of feature points and surrounding areas. Two feature points can be considered to be the same point if their descriptors are similar. According to the information of the feature points and the descriptors, the matching points in the two images can be calculated.

Of course, it is clear for those skilled in the art how to extract feature points is not the focus of attention herein, the above-listed methods are only examples, the applicable feature point extraction methods are not limited herein, any feature point extraction methods that are now known and developed in the future can be applied to the embodiments of the present disclosure, and those skilled in the art can select appropriate methods according to actual needs.

Next, in step S120 of the visual odometry method 100, a target pose in the current frame may be inferred from the pose estimation results of the historical frames. Since the shooting rate of the current camera is usually several tens of frames per second, such as 24 frames, 30 frames, 60 frames, or even more than 120 frames per second, the motion of the target between two adjacent frames usually does not have a drastic change, while the history frames selected in step S110 are a plurality of consecutive frames that have been subjected to the pose estimation, and therefore, the present disclosure contemplates that the pose of the target in the current frame can be roughly inferred from the pose estimation results of the history frames.

FIG. 2 illustrates one example of a gesture inference method that may be applied to embodiments of the present disclosure. As shown in FIG. 2, the pose inference method 200 can include the steps of: step S210, obtaining a local motion model according to the attitude estimation result of the historical frame; and step S220, calculating the target posture in the current frame based on the local motion model.

Regarding the method of obtaining the local motion model in step S210, fig. 3 shows one example of a local motion model calculation method applicable to an embodiment of the present disclosure. As shown in fig. 3, the local motion model calculation method 300 may include the following steps.

In step S310, a motion vector is calculated based on feature point matching between adjacent frames in the history frame.

As described above, the history frame is an image frame for which the pose estimation has been performed, and thus, the matched feature points in the adjacent frames extracted at the time of the pose estimation of the history frame and the poses in the respective frames can be obtained. For any image frame i in the history frame, the characteristic point extracted from the image frame i is marked as P_i ^jWhere j is a feature point index number in the image frame i, and the pose in the frame is denoted as T_i。

Next, the matched feature points may be transformed into a world coordinate system according to the camera parameters and the target pose of the frame in which the matched feature points are located. For example, for all feature points P in image frame i_i ^jIt can be transformed into points in the camera coordinate system in combination with known camera parameters Pa

Further, for points in the camera coordinate system

Can be based on the attitude T of the image frame i_iConverting the point into a point in a world coordinate system

As shown in the following equation 1:

wherein the content of the first and second substances,

and

respectively representing the poses T of the image frames i_iRotational and translational components relative to the target initial pose.

Thus, for each image frame i in the history frame, the feature point in the world coordinate system can be obtained

Thus, a motion vector can be calculated from the feature point matching relationship between adjacent frames in the known history frame

As shown in the following equation 2:

wherein the content of the first and second substances,

to represent

And the position of the matched characteristic point of the point in the (i + 1) th frame in a world coordinate system.

Thus, in step S310, motion vectors in all the history frames can be obtained

Is recorded as

Next, in step S320, a local motion direction category may be obtained from the motion vector obtained in step S310 using a pre-trained classifier.

To determine the motion direction class of the target in the history frame, the motion vector V obtained in step S310 may be input to a pre-trained classifier using the pre-trained classifier to obtain a corresponding motion direction class output. For example, the specific category of the movement direction may be classified into different movement categories such as acceleration movement, deceleration movement, straight movement, and turning movement. The motion of the object in the history frame is a local motion with respect to the motion of the object from the initial pose, and thus the motion direction class obtained here is also a class of local motion directions.

Regarding the adopted classifier, in the training stage of the classifier, a large number of local motion vectors V and corresponding motion direction labels y may be collected as training samples. When the classifier is trained, the trained classifier can be obtained by labeling the local motion vector V of the training sample with the local motion direction y. The present disclosure may employ various types of classifiers, such as BOW, K-means, and the like.

After the local motion direction category is obtained, in step S330, a corresponding local motion model may be selected based on the local motion direction category. The local motion model may be set based on the discrimination result of the local motion direction category y of the obtained history frame. For example, as described above, the local motion directions can be classified into the following four motion classes: acceleration, deceleration, straight-ahead movement and steering movement, when the vehicle is moving

For each local motion category y, a corresponding local motion model L may be set^y。

For example, for an acceleration motion, i.e. when y is 1, the corresponding local motion model may be set to L¹＝a*t²+ b, where a, b are parameters of the model. This accelerated motion model may be used to fit the motion pose T of the historical frame_pWhich includes both translational and rotational components. Since in practical situations the motion of the vehicle can often be approximated by a two-dimensional translational motion and a one-dimensional rotation, such local motions are easily fitted by a simple mathematical model.

Similarly, motion models in other motion directions may be set, which may be implemented by using any local trajectory fitting model, for example, a mathematical polynomial, a function, or a probability model, which is easy to implement by those skilled in the art and is not described herein again.

For the obtained local motion model L^yIn step S340, the local motion model parameters may be solved using the pose estimation results of the historical frames.

As an example, the local motion model L may be calculated by the least squares method from the known poses of the historical frames^yAs shown in the following equation 3:

wherein, w^yRepresenting a local motion model L^yIs determined by the parameters of (a) and (b),

representing the pose T of the ith frame in the historical frames_pRotational and translational components.

For example, for the above-described acceleration motion, y is 1, and the local motion model L may be calculated¹＝a*t²Parameters a, b in + b. Of course, in the same way, the parameters of the local motion model corresponding to other different motion types can be calculated.

Thus, with the local motion model calculation method 300, in step S210, a local motion model may be obtained from the pose estimation results of the historical frames.

Then, returning to the method 200, in step S220, a target pose in the current frame may be calculated based on the local motion model obtained in step S210.

For an object such as a vehicle, which tends to move linearly during a local movement, the local movement model L obtained above can be utilized^yLinearly inferring pose T in current frame_c. For example, the pose T in the current frame_cCan be calculated as shown in the following equation 4:

where n denotes the number of image frames included in the current frame image sequence, and t is the number of image frames.

Thus, with this pose estimation method 200, the target pose in the current frame can be estimated from the pose estimation results of the history frames in step S120.

Returning to the method 100, next, in step S130, a pose change between the target pose of one frame in the history frame and the target pose of the subsequent frame is calculated as a constraint.

Since the history frame and the subsequent frames are image frames with enough feature points, feature point matching can be performed on one frame in the history frame and the subsequent frames, and motion estimation can be performed on one frame in the history frame and the subsequent frames based on the feature point matching result, so that the target posture change from one frame in the history frame to the subsequent frames is calculated.

Any one of the historical frames may be selected to compute a pose change with a subsequent frame. In a preferred embodiment, the last frame in the historical frames may be selected.

When the attitude change is calculated by performing motion estimation through feature point matching, the adopted calculation mode can be the same as a common digital visual odometer method, and the feature point extraction mode can also be any known feature point extraction mode. In a preferred embodiment, for the feature points in the history frame, the same feature points as those already extracted in the above step S310 may be adopted to facilitate the improvement of efficiency. At this time, only the feature points in the subsequent frames need to be extracted, and feature point matching is performed.

In order to calculate the pose change, the feature points matched with each other in the last frame and the subsequent frame in the history frame are transformed into the world coordinate system according to the camera parameters and the target pose of the frame where the feature points matched with each other are located, and the rotation and translation amount between the last frame and the subsequent frame in the history frame is calculated as the pose change according to the feature points matched with each other in the last frame and the subsequent frame in the world coordinate system.

As described above, in the case of using the extracted feature points for the last frame of the historical frame, since the feature points of the historical frame have been transformed to the world coordinate system in step S310 above, only the matched feature points in the subsequent frames may be subjected to coordinate transformation, and the transformation method may be the same, and will not be described again here.

Thus, for a matching feature point pair P between one frame and a subsequent frame in the historical frame_i ^jCombining the camera parameters and the postures of the frames where the matched feature points are located, the projection of the camera parameters under a world coordinate system can be obtained

Thus, the relative motion between one frame in the history frame and the subsequent frame can be calculated under the world coordinate system, and the rotation and translation quantity of the relative motion of the target from one frame in the history frame to the subsequent frame is used for representing the attitude change E_cAs shown in the following equation 5:

where R and T represent the amount of rotation and translation, respectively, in the change in target pose from one frame to the next in the above historical frames, and proj (x) represents the projection of the point-bound camera parameters in the world coordinate system into the image coordinate system. The above equation can be solved by sampling the known RANSIC or Gaussian Newton method to obtain the attitude change E_cAs a constraint.

In step S140, the target pose in the current frame is optimized based on the constraint conditions obtained in step S130.

FIG. 4 illustrates one example of a pose optimization method that can be used with embodiments of the present disclosure. As shown in FIG. 4, the pose optimization method 400 can include the following steps.

In step S410, a pose graph is built with the target pose in the current frame and the target pose in the subsequent frame as nodes and the constraint condition as edges.

In the previously described step S120, the target pose in the current frame has been deduced from the pose estimation results of the historical frames. For example, as described above in step S220, a local motion model L may be utilized^yLinearly calculating the posture T in the current frame as shown in equation 4_c. As for the subsequent frames, the local motion model L can be utilized as well^yLinearly calculating the target pose T in the subsequent frame_f. Constraint E_cHas been calculated in step S130 above.

FIG. 5 illustrates one example of a pose graph according to an embodiment of the present disclosure. In this example, there are 4 current frames, each with a pose T_c1、T_c2、T_c3And T_c4With the pose of the subsequent frame being T_fConstraint condition is E_c. In an attitude T_c1、T_c2、T_c3And T_c4And T_fAs nodes, with constraint E_cThe state diagram established as an edge is shown in fig. 5.

After the state diagram is built, the pose diagram is optimized by using a graph optimization algorithm in step S420, and an optimized target pose can be obtained.

In one specific example, the optimized current frame target pose T may be obtained using an energy function approach of minimizing a graph and using an algorithm such as Gaussian Newton or Levenberg algorithm_c. Of course, this optimization method is merely an example, and any other suitable optimization method can be adopted by those skilled in the art as needed.

It should be noted that, although the last frame in the history frames is selected to calculate the pose change between the last frame and the subsequent frame in the above embodiment as the constraint condition, it should be understood by those skilled in the art that in other embodiments, other frames in the history frames may be selected to calculate the pose change between the last frame and the subsequent frame, and then the pose change between the last frame in the history frames and the subsequent frame may be obtained by referring to the pose change between the other frames and the last frame in the history frames as the constraint condition. Since all of the historical frames have been pose estimated, the pose change between the other frames and the last frame in the historical frames is readily available.

Thus, in step S140, the optimized current frame target pose is obtained, and the method 100 ends.

In an optional embodiment, after obtaining the optimized target pose in step S140, optionally, step S150 (not shown in the figure) may be further performed to smooth the optimized target pose.

In step S150, an average of changes in the target poses of all frames in the optimized current frame obtained in step S140 may be calculated, and pose changes between adjacent frames in the current frame are smoothed based on the average to obtain a smoothed target pose.

In one example, the pose of the current frame, the pose of the historical frame, and the state of the subsequent frame after optimization may be maintained in a world coordinate system, arranged in chronological order. Then, the postures of the two adjacent frames are smoothed in a linear smoothing mode.

Specifically, for two adjacent frames k and k-1, their target poses are T respectively_kAnd T_k-1Then the relative pose change between the two frames can be calculated

Thus, in this way, the attitude change T between adjacent frames can be calculated for all frames^rAnd calculating the average value of the attitude change

As described above, the attitude change is typically represented in rotational and translational components, and the average of the attitude change

Or as the mean of the rotational and translational components.

If for an arbitrary frame k at this time,

then order

Recalculating the pose T of the k-th frame as shown in equation 6 below_k：

Where θ is a threshold coefficient, which may be preset by the user.

By smoothing the attitude in the current frame of the target estimation such as a vehicle as described above, the local motion of the target can be made smoother and kept linear.

According to the visual odometer method 100 for estimating the target pose of the embodiment, in order to estimate the target pose in the current frame, the target pose in the current frame is estimated according to the pose estimation result of the historical frame, the pose constraint is established by combining the subsequent frames, and the estimated pose is optimized, so that the target pose estimation can be performed on the partial image frames which have low image quality and are difficult to perform feature point matching, the visual odometer can normally operate in a complex scene, and the robustness and the accuracy of the visual odometer are improved.

Furthermore, according to the visual odometry method of this embodiment, when the target pose in the current frame is estimated from the pose estimation result of the history frame, the pose and the motion vector of the history frame are used to discriminate the local motion direction, and the corresponding local motion model is selected to estimate the pose of the current image sequence, which is more accurate than the case of using a single model.

In addition, according to the visual odometry method of the embodiment, when the posture optimization is carried out, the local motion model is applied to the subsequent frames to establish the posture constraint, and the inferred rough posture is further optimized by using the graph optimization.

Next, a visual odometry apparatus for estimating a target pose according to an embodiment of the present disclosure will be described with reference to fig. 6.

Fig. 6 is a block diagram illustrating a main configuration of a visual odometry apparatus for estimating a target pose according to an embodiment of the present disclosure. As shown in fig. 6, the visual odometer apparatus 600 for estimating a target posture of this embodiment mainly includes: an obtaining unit 610 for obtaining a first number of current frames of the target pose to be estimated, a second number of history frames that have been subjected to pose estimation immediately before a first frame in the current frames, and a subsequent frame immediately after a last frame in the current frames; an inference component 620, configured to infer a target pose in the current frame according to a pose estimation result of the historical frame; a calculation unit 630 configured to calculate, as a constraint, a change in posture between a target posture of one frame in the history frames and a target posture of the subsequent frame; and an optimizing component 640 for optimizing the target pose in the current frame based on the constraint condition.

In one embodiment, the inference component 620 can include: a model obtaining unit 621 (not shown in the figure) configured to obtain a local motion model according to the pose estimation result of the historical frame; and a pose calculation unit 622 (not shown in the figure) for calculating a pose of the target in the current frame based on the local motion model.

In one embodiment, the model obtaining unit 621 may calculate a motion vector according to feature point matching between adjacent frames in the historical frames; obtaining a local motion direction category according to the motion vector by utilizing a pre-trained classifier; selecting a corresponding local motion model based on the local motion direction category; and solving parameters of the local motion model by using the attitude estimation result of the historical frame to obtain a local motion model.

In one embodiment, the model obtaining component 621 may calculate the motion vector as follows: and obtaining mutually matched feature points between adjacent frames in the historical frames, transforming the mutually matched feature points into a world coordinate system according to the camera parameters and the target postures of the frames in which the mutually matched feature points are located, and calculating motion vectors between the mutually matched feature points in the world coordinate system.

In one embodiment, the calculation component 630 may calculate the pose change as follows: performing feature point matching on one frame in the historical frames and the subsequent frames; and calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame based on the feature point matching result.

In one embodiment, the calculation component 630 may calculate the pose change as follows: according to camera parameters and the target postures of the frames where the matched feature points are located, the matched feature points in one frame and the subsequent frame in the historical frame are transformed into a world coordinate system; and calculating the rotation and translation quantity between one frame in the historical frames and the subsequent frame according to the matched characteristic points in the two frames in the world coordinate system as the posture change.

In one embodiment, the pose computation component 622 can utilize the local motion model to compute a target pose in the subsequent frame. The optimization component 640 can optimize the target pose as follows: establishing a posture graph by taking the calculated target posture in the current frame and the target posture in the subsequent frame as nodes and the constraint condition as edges; and optimizing the attitude map by using a map optimization algorithm to obtain an optimized target attitude.

In an embodiment, the visual odometry apparatus 600 may further include a smoothing unit 650 (not shown in the figure) for calculating an average value of changes in the target poses of all frames in the current frame, and smoothing pose changes between adjacent frames in the current frame based on the average value to obtain a smoothed target pose.

In one embodiment, the first number of current frames may be one or more frames and the second number of historical frames may be at least two frames.

It is readily understood that the obtaining component 610, the inferring component 620, the calculating component 630, the optimizing component 640, and the optional smoothing component 650 in the visual odometry apparatus 600 of this embodiment may be configured by a Central Processing Unit (CPU) of the apparatus 600. Alternatively, the obtaining means 610, the inferring means 620, the calculating means 630, the optimizing means 640, and the optional smoothing means 650 may also be configured by a dedicated processing unit in the apparatus 600, such as an Application Specific Integrated Circuit (ASIC) or the like. That is, the obtaining component 610, the inferring component 620, the calculating component 630, the optimizing component 640, and the optional smoothing component 650 may be configured, for example, by hardware, software, firmware, and any feasible combination thereof.

Of course, for simplicity, only some of the components of the visual odometry apparatus 600 that are germane to the present disclosure are shown in fig. 6. Of course, the visual odometry device 600 may also include other modules, such as input-output components, display components, communication components, and the like. Of course, the visual odometer means 600 may comprise a storage device for storing, in a volatile or non-volatile manner, the images, data, results obtained, commands and intermediate data, etc. involved in the above-described processing. The storage device may include various volatile or non-volatile memory such as Random Access Memory (RAM), Read Only Memory (ROM), hard disk, or semiconductor memory, among others. In addition, components such as buses, input/output interfaces, and the like are also omitted from the drawings. In addition, the visual odometry device 600 may include any other suitable components, depending on the particular application.

Next, an apparatus for estimating a target posture of another embodiment of the present disclosure is described with reference to fig. 7.

As shown in fig. 7, the apparatus 700 for estimating a target pose of the present embodiment mainly includes a memory 710, a processor 720, an input/output device (e.g., keyboard, mouse, speaker, etc.) 730, a display device 740, and the like, which are interconnected by a bus system 750 and/or other form of connection mechanism (not shown). It should be noted that the components and configuration of the device 700 shown in fig. 7 are exemplary only, and not limiting, and that the device 700 may have other components and configurations as desired. For example, device 700 may also have an image acquisition component, such as a camera, for acquiring images of the target scene.

Memory 710 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, EPROM memory, EEPROM memory, and the like. The computer-readable storage medium may also include registers, hard disk, floppy disk, solid state disk, removable disk, CD-ROM, DVD-ROM, Blu-ray disk, and the like. On which one or more computer program instructions may be stored and executed by processor 720 to implement the desired functionality.

Processor 720 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, including but not limited to, for example, one or more processors or microprocessors, etc., and may be coupled to memory 710 to execute computer program instructions stored in memory 710 to perform the following: obtaining a first number of current frames of target postures to be estimated, a second number of historical frames which are just before a first frame in the current frames and have been subjected to posture estimation, and a subsequent frame which is just after a last frame in the current frames; deducing the target attitude in the current frame according to the attitude estimation result of the historical frame; calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition; and optimizing the target posture in the current frame based on the constraint condition.

Furthermore, embodiments of the present invention also provide a computer-readable storage medium having stored thereon computer program instructions that, when executed by a computer, perform any of the embodiments of the visual odometry method for estimating a target pose described above with reference to fig. 1 to 5.

As described above, the computer-readable storage medium may include, for example, volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, EPROM memory, EEPROM memory, and the like. The computer-readable storage medium may also include registers, hard disk, floppy disk, solid state disk, removable disk, CD-ROM, DVD-ROM, Blu-ray disk, and the like.

The visual odometry method, apparatus and computer-readable storage medium for estimating a target state according to embodiments of the present disclosure are described above with reference to fig. 1 to 7.

According to the present disclosure, a target posture in a current frame is inferred from a posture estimation result of a history frame on which a posture estimation has been previously performed, and a posture change between the target posture of the history frame and a target posture of a subsequent frame following the current frame is calculated as a constraint condition with which the target posture in the current frame is optimized. Therefore, even for partial image frames which have low image quality and are difficult to perform feature point matching, target posture estimation can be performed, so that the visual odometer can normally operate in a complex scene, and the robustness and the accuracy of the visual odometer are improved.

It should be noted that, in the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Also, as used herein, including in the claims, "or" as used in a list of items beginning with "at least one" indicates a separate list, such that a list of "A, B or at least one of C" means a or B or C, or AB or AC or BC, or ABC (i.e., a and B and C). Furthermore, the word "exemplary" does not mean that the described example is preferred or better than other examples.

Further, it should be noted that each component or each step described in the present specification may be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by hardware entirely. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.

In embodiments of the present invention, the units/modules may be implemented in software for execution by various types of processors. An identified module of executable code may, for instance, comprise one or more physical or logical blocks of computer instructions which may, for instance, be constructed as an object, procedure, or function. Nevertheless, the executables of an identified module need not be physically located together, but may comprise disparate instructions stored in different bits which, when joined logically together, comprise the unit/module and achieve the stated purpose for the unit/module.

When a unit/module can be implemented by software, considering the level of existing hardware technology, the unit/module can be implemented by software, and those skilled in the art can build corresponding hardware circuits to implement corresponding functions, without considering the cost, the hardware circuits include conventional Very Large Scale Integration (VLSI) circuits or gate arrays and existing semiconductors such as logic chips, transistors, or other discrete components. A module may also be implemented in programmable hardware devices such as field programmable gate arrays, programmable array logic, programmable logic devices or the like.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A visual odometry method for estimating a target pose, comprising:

obtaining a first number of current frames of target postures to be estimated, a second number of historical frames which are just before a first frame in the current frames and have been subjected to posture estimation, and a subsequent frame which is just after a last frame in the current frames;

obtaining a local motion model according to the attitude estimation result of the historical frame, and calculating the target attitude in the current frame based on the local motion model;

calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition;

calculating a target pose in a subsequent frame using the local motion model;

establishing a posture graph by taking the calculated target posture in the current frame and the target posture in the subsequent frame as nodes and the constraint condition as edges; and

and optimizing the attitude map by using a map optimization algorithm to obtain an optimized target attitude.

2. The method of claim 1, wherein obtaining a local motion model from pose estimates in the historical frames comprises:

calculating a motion vector according to feature point matching between adjacent frames in the historical frames;

obtaining a local motion direction category according to the motion vector by utilizing a pre-trained classifier;

selecting a corresponding local motion model based on the local motion direction category; and

and solving the parameters of the local motion model by using the attitude estimation result of the historical frame.

3. The method of claim 2, wherein computing motion vectors from feature point matches between adjacent frames in the historical frames comprises:

obtaining mutually matched feature points between adjacent frames in the historical frames;

transforming the matched feature points into a world coordinate system according to the camera parameters and the target postures of the frames where the matched feature points are located; and

and calculating motion vectors between the feature points which are matched with each other in the world coordinate system.

4. The method of any of claims 1-3, wherein calculating a change in pose between a target pose of one of the historical frames and a target pose of the subsequent frame comprises:

performing feature point matching on one frame in the historical frames and the subsequent frames;

and calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame based on the feature point matching result.

5. The method of claim 4, wherein calculating a pose change between a target pose of one of the historical frames and a target pose of the subsequent frame based on feature point matching results comprises:

according to camera parameters and the target postures of the frames where the matched feature points are located, the matched feature points in one frame and the subsequent frame in the historical frame are transformed into a world coordinate system; and

and calculating the rotation and translation quantity between one frame and the subsequent frame in the world coordinate system according to the matched characteristic points in the two frames as the posture change.

6. The method of claim 1, further comprising:

calculating the average value of the changes of the target postures of all the frames in the current frame;

and smoothing the posture change between the adjacent frames in the current frame based on the average value to obtain a smoothed target posture.

7. The method of claim 1, wherein the first number of current frames is one or more frames and the second number of historical frames is at least two frames.

8. A visual odometry apparatus for estimating a target pose, comprising:

an obtaining means for obtaining a first number of current frames of a target attitude to be estimated, a second number of history frames that have been subjected to attitude estimation immediately before a first frame in the current frames, and a subsequent frame immediately after a last frame in the current frames;

the inference component is used for obtaining a local motion model according to the attitude estimation result of the historical frame and calculating the target attitude in the current frame based on the local motion model;

a calculation unit configured to calculate a posture change between a target posture of one frame in the history frames and a target posture of the subsequent frame as a constraint condition, and calculate a target posture in the subsequent frame using the local motion model; and

and the optimization component is used for establishing a posture graph by taking the calculated target posture in the current frame and the target posture in the subsequent frame as nodes and the constraint condition as an edge, and optimizing the posture graph by utilizing a graph optimization algorithm to obtain the optimized target posture.

9. An apparatus for estimating a pose of a target, comprising:

a memory storing computer program instructions; and

a processor coupled to the memory, the processor configured to execute the computer program instructions to perform the following:

calculating the attitude change between the target attitude of one frame in the historical frames and the target attitude of the subsequent frame as a constraint condition; and

calculating a target pose in a subsequent frame using the local motion model;

10. A computer readable storage medium storing computer program instructions which, when executed, perform the process of:

calculating a target pose in a subsequent frame using the local motion model;