WO2019084804A1

WO2019084804A1 - Visual odometry and implementation method therefor

Info

Publication number: WO2019084804A1
Application number: PCT/CN2017/108684
Authority: WO
Inventors: 周游; 叶长春; 严嘉祺
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2019-05-09
Also published as: CN110520694A

Abstract

Disclosed is a visual odometry implementation method. The method comprises: acquiring a first image collected at a first moment by a photographic device and a second image collected at a second moment by same, wherein the first moment is before the second moment; based on pose information of the photographic device at the first moment, estimating pose estimation information of the photographic device at the second moment; based on the pose information at the first moment and the pose estimation information at the second moment, performing projection transformation on at least one of the first image and the second image to obtain two images in the same coordinate system; performing feature point matching in the two images in the same coordinate system; and based on matching feature points in the two images, calculating pose information of the photographic device at the second moment. Further disclosed are visual odometry, an unmanned device, an augmented reality device, a virtual reality device and a readable storage medium.

Description

Visual odometer and implementation method thereof

[Technical Field]

The present application relates to the field of motion estimation, and in particular to a visual odometer and an implementation method thereof, an unmanned device, an augmented reality device, a virtual reality device, and a readable storage medium.

【Background technique】

Visual Odometry (VO) analyzes the relevant image sequence and performs motion estimation on the camera/camera carrier to determine the current position and posture of the camera/camera carrier to obtain its motion trajectory.

The visual odometer can use the feature method, which extracts feature points on the image and then tracks the feature points to calculate the pose information. The optical flow method can be used for feature point tracking. The optical flow method is to obtain the feature point matching result by minimizing the gray difference of the frames around the two feature points with different frames. The optical flow method is ideal for matching in parallel motion. If the shooting device rotates during image capture, the resulting image will also rotate. The grayscale difference between the frames around the same feature point of different frames may be larger, which reduces the success rate of feature point matching, thus reducing the accuracy of pose information and even The matching failure cannot calculate the pose information, which in turn affects the visual odometer's ability to handle rotation.

[Summary of the Invention]

In order to at least partially solve the above problems, the present invention provides a visual odometer implementation method, the method comprising: acquiring a first image acquired by a photographing device at a first moment and a second image acquired at a second moment, the first moment being Before the second moment; estimating pose estimation information of the photographing device at the second moment based on the pose information of the photographing device at the first moment; and correcting the first image based on the pose information of the first moment and the pose estimation information of the second moment Performing projection transformation with at least one of the second images to obtain two images in the same coordinate system; performing feature point matching in two images in the same coordinate system; calculating the photographing device based on the matched feature points in the two images Pose information at the second moment.

In order to at least partially solve the above problems, the present invention provides a visual odometer comprising a photographing device and at least one processor working alone or in cooperation, the photographing device being coupled to the processor; the processor for executing instructions to implement the aforementioned method .

In order to at least partially solve the above problems, the present invention proposes an unmanned device, the unmanned device package Including the aforementioned visual odometer.

In order to at least partially solve the above problems, the present invention proposes an augmented reality device comprising the aforementioned visual odometer.

In order to at least partially solve the above problems, the present invention proposes a virtual reality device including the aforementioned visual odometer.

In order to at least partially solve the above problems, the present invention proposes a readable storage medium storing instructions that are implemented when the instructions are executed.

The beneficial effect of the present application is to estimate the pose estimation information of the second moment based on the pose information of the imaging device at the first moment, and then first based on the pose information of the first moment and the pose estimation information of the second moment. At least one of the image and the second image is subjected to projection transformation to obtain two images in the same coordinate system for feature point matching, and the projection transformation completes compensation of motion information between the first time and the second time, thereby effectively reducing The gray level difference between the frames around the feature points caused by the rotational motion of the imaging device between the two images in the same coordinate system, thereby improving the feature point matching success rate and the processing ability of the visual odometer to rotate.

[Description of the Drawings]

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, those skilled in the art can obtain other drawings according to the drawings without any creative work.

1 is a schematic flow chart of a first embodiment of a method for implementing a visual odometer of the present invention;

2 is a schematic diagram of a posture of a photographing apparatus at a first moment and a second moment in an example of the first embodiment of the visual odometer implementation method of the present invention;

3 is a schematic diagram of a corrected first image and a second image in an example of the first embodiment of the visual odometer implementation method of the present invention;

4 is a schematic flow chart of calculating the pose information of the photographing device at the second moment based on the matched feature points in the two images in the first embodiment of the visual odometer implementation method of the present invention;

FIG. 5 is a schematic diagram showing the working flow of a visual odometer according to an embodiment of the present invention; FIG.

6 is a schematic flow chart of a second embodiment of a method for implementing a visual odometer;

7 is a schematic flow chart of a third embodiment of a method for implementing a visual odometer of the present invention;

Figure 8 is a schematic structural view of a first embodiment of the visual odometer of the present invention;

Figure 9 is a schematic structural view of a second embodiment of the visual odometer of the present invention;

10 is a schematic structural view of an embodiment of an unmanned device according to the present invention;

11 is a schematic structural diagram of an embodiment of an augmented reality device of the present invention;

12 is a schematic structural diagram of an embodiment of a virtual reality device according to the present invention;

Figure 13 is a block diagram showing an embodiment of a readable storage medium of the present invention.

【Detailed ways】

The invention will now be described in detail in conjunction with the drawings and embodiments. The conflicts in the following embodiments can be combined with each other. It is apparent that the described embodiments are only a part of the embodiments of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The terms "first", "second", "third", "fourth", etc. (if present) in the specification and claims of the present invention and the above figures are used to distinguish similar objects without being used for Describe a specific order or order. It is to be understood that the data so used may be interchanged as appropriate, such that the embodiments of the invention described herein can be implemented, for example, in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.

As shown in FIG. 1, the first embodiment of the visual odometer implementation method of the present invention includes:

S1: Acquire a first image acquired by the photographing device at the first moment and a second image acquired at the second moment.

The shooting device can be a camera or a camera that can record images using optical principles. The camera can be attached to its carrier. The carrier can be driven by itself or by other objects, and can be unmanned devices (such as drones, unmanned ships, etc.), augmented reality devices, virtual reality devices, robots, automobiles, and the like. In an embodiment, the photographing device is a binocular camera that is fixed to the drone.

The first moment is before the second moment. The shooting device usually takes continuous shooting to get multiple images. Sequence of images. The first image and the second image may or may not be adjacent in the sequence of images. The pose information obtained by the visual odometer can be used for controlling the carrier of the photographing device. To ensure the real-time control, the second image can be the latest frame captured by the photographing device, or the current frame.

S2: Estimating the pose estimation information of the photographing device at the second moment based on the pose information of the photographing device at the first moment.

The pose information at the first moment is calculated according to the algorithm of the visual odometer, and its accuracy is relatively high. The pose estimation information at the second moment is obtained based on the pose information of the first moment and the inertial navigation data, and the accuracy is relatively low.

The pose information at the first moment may include a position, a speed, and a rotation matrix/attitude quaternion of the photographing device at the first moment, and the rotation matrix is a rotation relationship between the photographing device and the world coordinate system, and the pose quaternion may mutually Conversion. The pose estimation information at the second moment may include an estimated position of the photographing device at the second moment, an estimated speed, and an estimated rotation matrix/estimated pose quaternion. Since the estimated speed of the second moment can be used in the subsequent calculation process, the estimated speed can be omitted.

The inertial navigation data may be from an inertial device mounted on a carrier of the photographing device/photographing device, which typically includes an accelerometer and a gyroscope, and the inertial navigation data may include accelerometer readings and gyroscope readings. The estimated inertial navigation data used for the motion estimation may be the inertial navigation data at the first moment or the inertial navigation data at the second moment, or may be calculated based on the inertial navigation data at the first moment and the second moment, for example, Average/weighted average. According to the estimated reading of the accelerometer in the inertial navigation data, the acceleration can be calculated. After the acceleration is integrated, the speed change amount is obtained, and the speed of the first moment can be obtained to obtain the estimated speed at the second moment, and the speed is integrated to obtain the displacement amount. The estimated position of the second moment can be obtained from the position at the first moment. According to the estimated gyroscope reading in the inertial navigation data, the angular velocity can be calculated, and the angular velocity is integrated to obtain the attitude difference. Combined with the rotation matrix/attitude quaternion at the first moment, the estimated rotation matrix/estimated attitude quaternion at the second moment can be obtained. .

For example, a specific calculation method of the pose estimation information at the second moment is assumed. It is assumed that the inertial navigation data at the second moment is used for motion estimation. Since the interval between the first moment and the second moment is generally very short, the time can be The motion inside is regarded as the motion of constant acceleration, and the calculation formula is as follows:

v ₂ =v ₁ +(R _wi (a _m -b _a )+g)Δt

Δq=q ₁ {(ω-b _ω )Δt}

Where p ₁ is the position of the first time, v ₁ is the speed of the first time, and q ₁ is the attitude quaternion of the first time. p ₂ is the estimated position at the second time, v ₂ is the estimated speed at the second time (may be omitted), and q ₂ is the estimated pose quaternion at the second time.

Δt is the duration of the time interval between the first time and the second time, and can be calculated according to the order of the first image and the second image in the image sequence and the frame rate of the photographing device. For example, if the first image and the second image are two adjacent frames, the frame rate of the photographing device is 20 Hz, and in the case of a rough calculation, Δt can be directly 50 ms, and if accurate calculation is required, the first is added. The difference in exposure time between the image and the second image. R _wi is the rotational relationship between the coordinate system of the photographing device and the world coordinate system, and is obtained by the attitude quaternion q ₁ . a _m is the reading of the accelerometer at the second moment, g is the gravitational acceleration, ω is the reading of the gyroscope at the second moment, and Δq is the attitude difference between the first moment and the second moment. b _a is the zero-axis deviation of the accelerometer, and b _ω is the zero-axis deviation of the gyroscope. These two parameters generally do not change with time, ie (b _a ) ₁ =(b _a ) ₂ , (b _ω ) ₁ =( b _ω ) ₂ .

S3: Perform projection transformation on at least one of the first image and the second image based on the pose information of the first moment and the pose estimation information of the second moment to obtain two images in the same coordinate system.

The coordinate system in which the first image is located is the first coordinate system, and the coordinate system in which the second image is estimated is the second coordinate system. The first coordinate system corresponds to the pose information of the first moment, and the second coordinate system corresponds to the pose estimation information of the second moment. If the photographing device is moving between the first time and the second time, the first coordinate system is different from the second coordinate system.

A pinhole model is generally used to describe the projection relationship between the world coordinate system and the image plane acquired by the photographing device:

Where [u v 1] ^T is the point in the image plane, [x _w y _w z _w ] ^T is the spatial point in the world coordinate system, K is the internal parameter matrix of the photographing device, and R is the photographing device relative to the world coordinate system The rotation matrix can be obtained by the quaternion conversion of the shooting device's attitude, T is the translation vector of the shooting device relative to the world coordinate system, determined by the position of the shooting device, and R and T are combined to indicate the movement of the shooting device relative to the world coordinate system. Relationship, which defines the coordinate system in which the image is located. For the first image, R is converted from the attitude quaternion at the first moment, T is determined by the position of the first moment; for the second image, R is obtained by the estimated pose quaternion at the second moment, and T is obtained by the second The estimated position of the time is determined, the calculation basis of R and T is the estimated value, and the second coordinate system is also the estimated value, and is not necessarily the coordinate system in which the second image is actually located.

The projection transformation may include performing a first projection transformation on the first image in the partial or complete first coordinate system to convert it into a third image in the second coordinate system, and the two images in the same coordinate system are the third image. And a partial or complete second image; or performing a second projection transformation on the second image in the partial or complete second coordinate system to convert it into a fourth image in the first coordinate system, two in the same coordinate system The image is a fourth image and a partial or complete first image; or a third projection transformation is performed on the first image in the partial or complete first coordinate system to convert it into a fifth image in the third coordinate system and Performing a fourth projection transformation on the second image in the partial or complete second coordinate system to convert it into a sixth image in the third coordinate system, the third coordinate system being different from the first coordinate system and the second coordinate system, The two images in the same coordinate system are the fifth image and the sixth image.

The third coordinate system may be the corrected first coordinate system, in which case the third projection transformation is a correction transformation, and the fourth projection transformation includes a motion compensation transformation and a correction transformation. The third coordinate system may also be a corrected second coordinate system, in which case the third projection transformation includes an inverse motion compensation transformation and a correction transformation, and the fourth projection transformation is a correction transformation. There is no limit to the order between the correction transformation and the motion compensation transformation/reverse motion compensation transformation.

The third coordinate system may also be a coordinate system other than the first/second coordinate system before/after the correction, for example, a coordinate system when feature point matching is performed after the previous image is converted by the first image. For example, the visual odometer is continuously operated, and the kth frame is used as the first image, and the k+1th frame is used as the second image. After the method of the embodiment is used, the pose information corresponding to the k+1th frame is calculated. The third coordinate system to which the kth frame and the k+1th frame are projected to be transformed is the coordinate system in which the corrected kth frame is located; then the k+1th frame is taken as the first image, the k+2th frame As the second image, the coordinate system in which the corrected kth frame is located can be used as the third coordinate system in the process of processing using the method of the embodiment, so that the projection transformation result of the last k+1 frame can be directly used. Instead of performing the projection transformation on the k+1th frame again, the k+2 frame projection can be transformed into the third coordinate system, thereby reducing the amount of calculation.

Among the above various types of transformations, the parameters of the correction transformation can be obtained from the distortion information of the photographing device. For other transformations involving the pose change of the photographing device, the spatial point in the world coordinate system to which the pinhole model to which the pre-transformed coordinate system is used as the parameter may be used according to the image before the transformation, and the transformed image is used after the transformation. The coordinate system is used as the parameter of the pinhole model to map to the same spatial point in the world coordinate system to derive the transformed parameters. The formula for the transformation can be:

Where K is the internal reference matrix of the photographing device, [u ₀ v ₀ 1] ^T is the point in the image plane before the transformation, [u _eis v _eis 1] ^T is the point in the transformed image plane, and R _wc is the transformed The rotation relationship of the coordinate system with respect to the coordinate system before the transformation can be calculated from the pose information at the first moment and the pose estimation information at the second moment.

For other transformations involving the pose change of the shooting device, the two-dimensional motion model of the image can also be directly used to calculate the transformed parameters. The formula for the transformation can be:

Where s is the zoom parameter between the coordinate systems before and after the transformation, θ is the rotation angle between the coordinate systems before and after the transformation, and [Δu Δv] ^T is the translation vector between the coordinate systems before and after the transformation, which can be based on the first moment. The pose information and the pose estimation information at the second moment are calculated.

The process of projection transformation will be exemplified below with reference to the accompanying drawings. In one example, the photographing device is a wide-angle fisheye camera, and the posture of the photographing device at the first moment and the second moment is as shown in FIG. 2, wherein the solid lines are 10 and 20 respectively for the photographing device at the first moment and The shooting range at the second time, 11 and 21 in the broken line frame are the ranges corresponding to the fifth image and the sixth image cropped around the position of the center of gravity of the photographing device, respectively.

As shown in FIG. 3, the point 14 in the first image 13 after correction is the position of the center of gravity of the photographing apparatus, the partial image is cropped around the point 14 as the fifth image 15, and the point 24 in the second image 23 after the correction is photographed. The position of the center of gravity of the device, wherein the partial image 25 surrounding the point 24 corresponds to the sixth image, that is, the sixth image is obtained after the motion compensation transformation of the partial image 25.

The coordinates of the fifth image and the sixth image obtained after the projection transformation are the same. The complete image 23 can be subjected to motion compensation transformation and then cropped from the coordinates of the fifth image 15 to obtain a sixth image having the same coordinates, so that the amount of calculation required is relatively large. In order to simplify the calculation process, the coordinates of each pixel in the fifth image identical to the sixth image coordinate may be used as the result of the motion compensation transformation, and then the backtracking point before the motion compensation transformation is inversely calculated according to the formula given above. Coordinates, all of the backtracking points constitute a partial image 25, and then the grayscale value of each retrospective point is calculated as the grayscale value of the corresponding pixel of the sixth image.

If the coordinates of a backtracking point exceed the boundary of the second image 23 after the correction, the grayscale value of the backtracking point (ie, the grayscale value of the corresponding pixel of the sixth image) is set to 0 or a random point. Random points can be subject to, but not limited to, Gaussian distribution/average distribution. If the coordinates of a backtracking point are exactly the same as the coordinates of a certain pixel point in the second image 23 after the correction, the grayscale value of the pixel point may be directly used as the grayscale value of the backtracking point. If the coordinates of a backtracking point are within the second image 23 after the correction and are not integers, then the backtracking point in the second image 23 after the correction is not the phase in the coordinate dimension direction of the integer The linear interpolation result of the grayscale value of the adjacent pixel is used as the grayscale value of the backtracking point.

For the last case, the coordinates of the backtracking point are (x, y), and g(·) is the grayscale value of the pixel, that is, g(x, y) is the grayscale value of the point with the coordinate (x, y). .

Suppose x is not an integer and the two adjacent integers are x ₁ and x ₂ . Since x ₂ -x ₁ =1, y is an integer, linear interpolation can be obtained: g(x, y)=(x ₂ -x)g (x ₁ , y) + (xx ₁ ) g (x ₂ , y). Similarly, suppose y is not an integer and the two adjacent integers are y ₁ and y ₂ . Since y ₂ -y ₁ =1, x is an integer, linear interpolation can obtain g(x, y)=(y ₂ -y )g(x, y ₁ )+(yy ₁ )g(x, y ₂ ). Suppose x and y are not integers and the two integers adjacent to x are x ₁ and x ₂ , the two integers adjacent to y are y ₁ and y ₂ , and the bilinear interpolation can be g(x, y)=( x ₂ -x)(y ₂ -y)g(x ₁ ,y ₁ )+(x ₂ -x)(yy ₁ )g(x ₁ ,y ₂ )+(xx ₁ )(y ₂ -y)g (x ₂ , y ₁ ) + (xx ₁ ) (yy ₁ )g (x ₂ , y ₂ ).

In this example, the third coordinate system is the corrected first coordinate system, and the corrected second coordinate system may be used as the third coordinate system. In this case, the partial or complete corrected first image needs to be performed. The inverse motion compensation transform yields a fifth image. Similarly, in order to reduce the amount of calculation, the coordinates of each pixel in the sixth image can be used as the result of the inverse motion compensation transformation, and the coordinates before the inverse motion compensation transformation are calculated according to the inverse motion compensation transformation parameters, and the calculation is performed. The coordinates before the inverse motion compensation transform are used as gray scale values of the corresponding pixels in the corrected first transform image, and the coordinates before the inverse motion compensation transform exceed the boundary/not integer processing. Refer to the foregoing. The inverse motion compensation transform and the motion compensation transform can be inverse transforms. By analogy, this reverse calculation can be applied to other types of projection transformations.

S4: Feature point matching is performed in two images in the same coordinate system.

Since the second coordinate system is an estimated value, the same coordinate system obtained after the projection transformation is not necessarily in the same coordinate system, but even if the two images are not actually in the same coordinate system, the coordinate systems of the two (especially the rotation matrix) The amount of change of the ) is also significantly reduced with respect to the amount of change of the first image and the second image coordinate system before the projection transformation. After the projection transformation, the gray scale difference of the frame around the feature points in the two images is significantly reduced or even disappeared, so that the matching success rate of the feature points is improved.

S5: Calculate the pose information of the photographing device at the second moment based on the matched feature points in the two images.

The accuracy of the pose estimation information at the second moment is relatively low, and the actual pose information of the photographing device at the second moment is not necessarily the same, so it is necessary to use a visual odometer algorithm to calculate a more accurate pose of the second moment. information. The original/corrected first image and second image are used in the calculation. As shown in FIG. 4, this step may include:

S51: performing a back projection transformation on at least one of the matched feature points in the two images to obtain the original Matching corresponding points in the first/corrected first image and the second image.

S52: Calculate inter-frame motion information according to coordinates of the corresponding point in the original/corrected first image and the second image.

S53: Acquire the pose information of the second moment by using the pose information and the interframe motion information at the first moment.

The calculation obtained in this step is the pose information of the photographing device at the second moment. Since the control of the carrier may need to use the pose of the carrier instead of the pose of the photographing device, after this step, the pose information of the carrier at the second moment may be calculated according to the fixed relative pose relationship between the photographing device and the carrier.

With the continuous acquisition of the image of the photographing device, the method provided in this embodiment may be repeatedly performed to sequentially calculate the pose information of the photographing device at the acquisition time corresponding to different frames, thereby obtaining the motion track of the carrier of the photographing device/photographing device.

The method provided by this embodiment can be applied to a monocular visual odometer, and can also be applied to a binocular visual odometer. If applied to a binocular visual odometer, the method provided by this embodiment can be applied to the left/right image sequence separately.

With the implementation of the embodiment, the pose estimation information of the second moment is estimated based on the pose information of the imaging apparatus at the first moment, and then the first image is based on the pose information of the first moment and the pose estimation information of the second moment. Projection transformation is performed with at least one of the second images to obtain two images in the same coordinate system for feature point matching, and the projection transformation completes compensation of motion information between the first time and the second time, thereby effectively reducing The difference in the gray level of the frame around the feature point caused by the rotational motion of the imaging device between the two images in the same coordinate system, thereby improving the feature point matching success rate and the processing ability of the visual odometer to rotate.

Illustration, shown in Figure 5, in one embodiment of the present invention, a visual workflow may odometer comprising: at time T _m, to capture a new image capture device (i.e. m-th frame, this may also be referred to as Frame) as the second image, the previous frame of the new image (ie, the m-1th frame acquired at time t _m-1 ) is used as the first image, and the new image is corrected and transformed, and then the inter-frame motion estimation result is performed. Motion compensation transformation, in which the interframe motion estimation is performed based on the accurate pose calculated at time t _m-1 and the new inertial navigation data (ie, the inertial navigation data collected by the sensor at time t _m ). The correction transformation and motion compensation transformation in the dashed line in the figure can be collectively referred to as projection transformation, and the order of the correction transformation and the motion compensation transformation can be exchanged or completed at one time.

After the projection transformed new image and the corrected first image are in the same coordinate system, feature point matching can be performed on the two, and then the precise pose at time t _m is calculated based on the matched feature points for carrier control. . Then, at time t _m+1 , the photographing device collects the m+1th frame, at which time the new image becomes the m+1th frame as the second image, and the mth frame becomes the first image, and the foregoing steps are repeatedly performed, The pose of the photographing device is calculated in real time to obtain the motion trajectory of the carrier of the photographing device/photographing device.

As shown in FIG. 6, the second embodiment of the visual odometer implementation method of the present invention is based on the first embodiment of the visual odometer implementation method of the present invention, and further includes:

S20: Estimating a rotation angle between the first image and the second image by using the inertial navigation data at the second moment.

In general, the angle of rotation can be estimated from the readings of the gyroscope at the second moment. In other embodiments, the inertial navigation data at the first moment may be utilized, or the rotational angle may be estimated in conjunction with the inertial navigation data at the first and second moments.

The rotation angle can be represented by an array of n elements, where n can be an integer greater than 0, and each element represents an amount of angular change in a rotatable direction. The size of n is related to the degree of freedom of movement of the carrier of the photographing device/photographing device. For example, when the photographing device is mounted on the aircraft, n may be equal to 3, and the elements in the angle of rotation represent the pitch angle, the yaw angle, and the roll angle, respectively.

The rotation angle can be directional, in which case each element in the rotation angle may be positive or negative or 0; the rotation angle may be directional, and each element in the rotation angle may be Positive number or 0.

S21: Determine whether the rotation angle is within a preset range.

If the rotation angle is directional, to simplify the judgment, the absolute value of the rotation angle can be compared with the preset range to remove the influence of the directivity. If the angle of rotation is not directional, the angle of rotation can be directly compared to the preset range.

Regardless of the directivity (the rotation angle itself does not have directivity or the absolute value of the rotation angle with directivity), if the absolute value of the rotation angle/rotation angle is smaller than the lower limit of the preset range (ie, the minimum value) It indicates that the posture change of the photographing device is relatively stable, the amount of rotation is small, and a good feature point matching result can be obtained without performing projection transformation. If the absolute value of the rotation angle/rotation angle is greater than the upper limit of the preset range (ie, the maximum value), the upper limit of the preset range may be determined by the angle of view of the photographing device, indicating that the posture of the photographing device changes too sharply, and the amount of rotation is large. Even if the projection transformation is performed, the obtained two images do not overlap at all or the overlapping portion is insufficient to support the desired matching feature points, and the projection transformation has no meaning.

The preset range can be limited or infinite without considering directivity. If the preset range is infinite, the upper limit can be positive infinity, and the lower limit can be positive, meaning that the absolute value of the rotation angle/rotation angle cannot be greater than the upper limit. If the preset range is finite, the upper limit is not positive infinity, meaning that the absolute value of the rotation angle/rotation angle cannot be greater than the upper limit. If the lower limit is 0, it means that the absolute value of the rotation angle/rotation angle cannot be less than the lower limit. A positive limit means that the absolute value of the rotation angle/rotation angle may be less than the lower limit. By analogy, you can get the rotation angle directly with the consideration of directivity. The case where the preset range is compared.

If the rotation angle is within the preset range, the subsequent steps including S3 are performed; if the rotation angle is not within the preset range, the subsequent steps including S3 may be performed without using the conventional visual mileage calculation method. Point matching (for example, less than the lower limit of the preset range), or discarding the use of the mileage calculation method (for example, greater than the upper limit of the preset range) and using other methods to locate the photographing device, for example, the pose at the second moment The estimated information is used as the pose information of the second moment. In other embodiments, subsequent steps including S3 may continue to be performed when the rotation angle is not within the preset range.

S20/S21 may be performed before or at an estimated position/speed in the second time in S2. If both S20 and S21 are performed before the estimated position/speed of the second time estimate in S2, and the rotation angle is not within the preset range, the estimated position/speed of the estimated second time in S2 may be skipped.

As shown in FIG. 7, the third embodiment of the visual odometer implementation method of the present invention is based on the second embodiment of the visual odometer implementation method of the present invention. The field of view of the photographic device is smaller than a preset threshold, and further includes after S20. :

S22: Determine whether the rotation angle belongs to the yaw angle.

The photographing device in this embodiment is mounted on the aircraft, and the field of view of the photographing device is less than a preset threshold, which may indicate that the photographing device is not wide-angle.

If the rotation angle belongs to the yaw angle, it means that the pitch angle and the roll angle in the rotation angle are less than or equal to the threshold (the threshold can be close to or equal to 0), and the subsequent step (S21 or S3) is performed; if the rotation angle does not belong to the yaw angle , meaning that the pitch angle and/or the roll angle in the rotation angle is larger than the threshold. Since the range of the shooting device in the direction of the pitch angle and the roll angle is very small, even if the projection transformation is performed, the two images obtained do not overlap or overlap at all. In part, it is not sufficient to support the required matching feature points, and the subsequent steps may not be performed, but other methods are used to locate the photographing device, for example, the pose estimation information at the second moment is used as the pose information of the second moment. In other embodiments, the subsequent steps may continue to be performed when the rotation angle is not within the preset range.

The order of execution between S22 and S21 is not limited. In this embodiment, the subsequent steps including S3 are performed only when the rotation angle belongs to the yaw angle and is within the preset range. In other embodiments, it is also possible to judge only whether the rotation angle belongs to the yaw angle. Determine if it is within the preset range.

S22 may be performed before estimating the estimated position/speed at the second moment in S2, or may be performed thereafter. If S22 is both performed before the estimated position/speed of the second time estimate in S2, and the rotation angle is not within the preset range, the estimated position/speed of the estimated second time in S2 may be skipped.

As shown in FIG. 8, the first embodiment of the visual odometer of the present invention includes a processor 110 and a photographing device 120. The photographing device 120 is connected 110. Only one processor 110 and camera 120 are shown in the figure, and the actual number can be more. The processor 110 can work alone or in concert. If the number of processors 110 is greater than one, there may be portions of the processor 110 that are not connected to the photographing device 120.

The processor 110 controls the operation of the visual odometer, which may also be referred to as a CPU (Central Processing Unit). Processor 110 may be an integrated circuit chip with the processing capabilities of a signal sequence. The processor 110 can also be a general purpose processor, a digital signal sequence processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware. Component. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.

The photographing device 120 can be a camera or a camera for acquiring images for processing by the processor 110.

The processor 110 is configured to execute an instruction to process an image acquired by the photographing device to implement a method provided by any one of the embodiments of the visual odometer implementation method of the present invention and a non-conflicting combination.

As shown in FIG. 9, the second embodiment of the visual odometer of the present invention includes

processors

211, 212 and photographing device 220. The photographing device 220 is connected to the

processors

211 and 212, respectively.

A detailed description of the

processors

211, 212 and the photographing device 220 can be referred to the contents of the first embodiment of the visual odometer of the present invention. The

processors

211 and 212 cooperate to implement the method provided by any embodiment of the visual odometer implementation method of the present invention and the non-conflicting combination, wherein the processor 211 can be a dedicated acceleration circuit, such as a DSP, an ASIC, an FPGA, etc., responsible for At least one of the first image and the second image is subjected to projection transformation, and the processor 212 is responsible for other portions.

The connection relationship in the figure is only an illustration, and actually only the processor 211/212 can be connected to the photographing device 220.

Processors

211 and 212 in this embodiment are two separate devices, and in other embodiments both may be integrated.

As shown in FIG. 10, an embodiment of the unmanned device 300 of the present invention includes a visual odometer 310, which may be a visual odometer provided by any of the embodiments of the visual odometer of the present invention. The unmanned device 300 can be a drone, an unmanned boat, an unmanned vehicle, or the like. Taking a drone as an example, the shooting device in the visual odometer can be mounted on the body. The processor can be an independent hardware, integrated with the shooting device, or integrated into the flight controller of the drone. in.

As shown in FIG. 11, an embodiment of the augmented reality device 400 of the present invention includes a visual odometer 410, which may be a visual odometer provided by any of the embodiments of the visual odometer of the present invention.

As shown in FIG. 12, an embodiment of the virtual reality device 500 of the present invention includes a visual odometer 510, which may be a visual odometer provided by any of the embodiments of the visual odometer of the present invention.

As shown in FIG. 13, a first embodiment of a readable storage medium of the present invention includes a memory 610 that stores instructions that, when executed, implement any of the embodiments of the visual odometer implementation method of the present invention and a non-conflicting combination Methods.

The memory 610 may include a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory, a hard disk, an optical disk, and the like.

In the several embodiments provided by the present invention, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the device implementations described above are merely illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be used. Combinations can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) or a processor to perform all or part of the steps of the methods of the various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The above is only the embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the invention and the drawings are directly or indirectly applied to other related technologies. The fields are all included in the scope of patent protection of the present invention.

Claims

A visual odometer implementation method, comprising:

Obtaining a first image acquired by the photographing device at a first moment and a second image acquired at a second moment, the first moment being before the second moment;

Estimating the pose estimation information of the photographing device at the second moment based on the pose information of the photographing device at the first moment;

Performing projection transformation on at least one of the first image and the second image based on the pose information of the first moment and the pose estimation information of the second moment to obtain two images in the same coordinate system ;

Feature point matching in two images in the same coordinate system;

The pose information of the photographing device at the second moment is calculated based on the matched feature points in the two images.
The method of claim 1 wherein

The coordinate system in which the first image is located is a first coordinate system, the coordinate system in which the second image is estimated is a second coordinate system, and the first coordinate system corresponds to the pose information of the first time, The second coordinate system corresponds to the pose estimation information of the second moment;

The projecting transforming the at least one of the first image and the second image based on the pose information of the first moment and the pose estimation information of the second moment comprises:

Performing a first projection transformation on the partial or complete first image to obtain a third image, where the coordinate system of the third image is the second coordinate system, and two images in the same coordinate system are a third image and a partial or complete second image; or

Performing a second projection transformation on the partial or complete second image to obtain a fourth image, where the coordinate system of the fourth image is the first coordinate system, and two images in the same coordinate system are a fourth image and the partial or complete first image; or

Performing a third projection transformation on the partial or complete first image to obtain a fifth image, and performing fourth projection transformation on the partial or complete second image to obtain a sixth image, the fifth image and the first image The coordinate system in which the six images are located is a third coordinate system, the third coordinate system is different from the first coordinate system and the second coordinate system, and the two images in the same coordinate system are the fifth coordinate system. An image and the sixth image.
The method of claim 2 wherein:

The third coordinate system is the corrected first coordinate system, the third projection transformation is a correction transformation, and the fourth projection transformation includes a motion compensation transformation and the correction transformation; or

The third coordinate system is the corrected second coordinate system, the third projection transformation includes an inverse motion compensation transformation and the correction transformation, and the fourth projection transformation is the correction transformation; or

The third coordinate system is a coordinate system when the feature point is matched by the first image after the previous projection transformation.
The method of claim 3 wherein:

The parameters of the correction transformation are obtained based on the distortion information of the photographing device.
The method of claim 3 wherein:

The pixel coordinates of the fifth image and the sixth image are the same;

The calculation manner of the motion compensation transform includes: using coordinates of each of the pixels in the fifth image as a result of the motion compensation transformation, and converting parameters according to the motion compensation Calculating a coordinate before the motion compensation transformation, calculating a grayscale value of the coordinate before the motion compensation transformation in the second image after the correction transformation as a gray of a corresponding pixel of the sixth image Order value.
The method of claim 5 wherein:

If the coordinate before the motion compensation transformation is not an integer, the grayscale value of the corresponding pixel of the sixth image is a linear interpolation result of the grayscale value of the adjacent pixel in the coordinate dimension direction that is not an integer;

If the coordinates before the motion compensation transformation exceeds the boundary, the grayscale value of the corresponding pixel of the sixth image is 0 or a random point of the Gaussian distribution/average distribution.
The method of claim 3 wherein:

The pixel coordinates of the fifth image and the sixth image are the same;

The calculation manner of the inverse motion compensation transform includes: using coordinates of each of the pixels in the sixth image as a result of the inverse motion compensation transform, and inversely calculating parameters according to the inverse motion compensation transform Calculating a coordinate before the inverse motion compensation transformation, and calculating a grayscale value of the coordinate before the inverse motion compensation transformation in the first image after the correction transformation as a corresponding pixel of the fifth image Grayscale value.
The method of claim 7 wherein:

If the coordinate before the inverse motion compensation transform is not an integer, the grayscale value of the corresponding pixel of the fifth image is a linear interpolation result of the grayscale value of the adjacent pixel in the coordinate dimension direction that is not an integer;

If the coordinates before the inverse motion compensation transform exceeds the boundary, the grayscale value of the corresponding pixel of the fifth image is 0 or a random point of the Gaussian distribution/average distribution.
The method of claim 3 wherein:

The motion compensation transform and the inverse motion compensation transform are inverse transforms, and the parameters of both are calculated based on the pose information of the first moment and the pose estimation information of the second moment.
A method according to any one of claims 1-9, wherein

The estimating the pose estimation information of the photographing device at the second moment based on the pose information of the photographing device at the first moment comprises:

The motion estimation is performed by using the pose information of the first moment and the inertial navigation data of the second moment to estimate the pose estimation information of the second moment.
The method of claim 10 wherein:

The method further includes: before performing the projection transformation on the at least one of the first image and the second image based on the pose information of the first moment and the pose estimation information of the second moment:

Estimating a rotation angle between the first image and the second image by using the inertial navigation data of the second moment;

The projection transformation is performed when the rotation angle is within a preset range.
The method according to claim 11, wherein the estimating the rotation angle between the first image and the second image by using the inertial navigation data of the second moment further comprises:

The projection transformation is not performed when the rotation angle is not within the preset range.
The method of claim 11 wherein

The upper limit of the preset range is determined by the angle of view of the photographing device.
The method of claim 11 wherein

The field of view of the photographing device is less than a preset threshold, and the estimating the rotation angle between the first image and the second image by using the inertial navigation data of the second moment further comprises:

The projection transformation is performed when the rotation angle belongs to the yaw angle.
The method of claim 11 wherein

The field of view of the photographing device is less than a preset threshold, and the estimating the rotation angle between the first image and the second image by using the inertial navigation data of the second moment further comprises:

The projection transformation is not performed when the rotation angle does not belong to the yaw angle.
A method according to any one of claims 1 to 15, wherein

Calculating the pose information of the photographing device at the second moment based on the matched feature points in the two images includes:

Performing a back projection transformation on at least one of the coordinates of the matched feature points in the two images to obtain a matching corresponding point in the first image and the second image;

Calculating inter-frame motion information based on the matched corresponding points in the first image and the second image;

Acquiring the pose information of the second moment by using the pose information of the first moment and the interframe motion information.
A visual odometer characterized by comprising:

a photographing device and at least one processor working alone or in cooperation, the photographing device being connected to the processor;

The processor is operative to execute instructions to implement the method of any of claims 1-16.
A visual odometer according to claim 17, wherein

The processor includes a dedicated acceleration circuit for projecting transformation of at least one of the first image and the second image.
An unmanned device, comprising the visual odometer of claim 17 or 18.
The unmanned device according to claim 19, wherein said unmanned device is a drone.
An augmented reality device comprising the visual odometer of claim 17 or 18.
A virtual reality device comprising the visual odometer of claim 17 or 18.
A readable storage medium storing instructions, wherein the instructions are executed to implement the method of any of claims 1-16.