WO2020038386A1

WO2020038386A1 - Determination of scale factor in monocular vision-based reconstruction

Info

Publication number: WO2020038386A1
Application number: PCT/CN2019/101704
Authority: WO
Inventors: 沈冰伟; 朱建华; 蒋腻聪; 郭斌
Original assignee: 杭州萤石软件有限公司
Priority date: 2018-08-22
Filing date: 2019-08-21
Publication date: 2020-02-27
Also published as: CN110858403B; CN110858403A

Abstract

The present application provides a method for determining a scale factor in monocular vision-based reconstruction, the method comprising: acquiring, by means of a monocular camera, a first image of a specified object at a first time point and a second image thereof at a second time point; performing feature point extraction and matching on the first image and the second image, and calculating, according to paired feature points, a normalized translation vector of the monocular camera from the first time point to the second time point; calculating a first pose of the specified object relative to the monocular camera at the first time point and a second pose thereof relative to the monocular camera at the second time point; calculating, according to the first pose and the second pose, an actual translation vector of the monocular camera in the physical world from the first time point to the second time point; and determining a ratio of a norm of the actual translation vector to a norm of the normalized translation vector to be a scale factor in monocular vision-based reconstruction.

Description

Determining scale factors in monocular vision reconstruction

Cross-reference to related applications

This patent application claims priority from a Chinese patent application filed on August 22, 2018, with an application number of 2018109614346, and the invention name is "A Method for Determining Mesoscale Factors in Monocular Visual Reconstruction and a Mobile Robot" The entire text is incorporated herein by reference.

Technical field

The present application relates to the field of mobile robot technology, and in particular, to a method for determining a mesoscale factor for monocular vision reconstruction and a mobile robot.

Background technique

In recent years, with the development of computer vision technology, the simultaneous positioning and map construction algorithms based on monocular vision have become the focus of current mobile robot research. However, the traditional simultaneous positioning and map construction methods based on monocular vision can only achieve 3D reconstruction at projective scale or affine-scaling, that is, there is a scale factor between the reconstructed scene and the real-world scene. The scale factor is the ratio of the real world map scale to the constructed map scale. Therefore, if the scale factor can be determined when the mobile robot is initialized, the actual rotation and translation of the monocular camera in the real world can be calculated based on the projection model, and a map with the same scale as the real world can be constructed.

Summary of the Invention

In view of this, the present application provides a method for determining a mesoscale factor for monocular vision reconstruction and a mobile robot.

A first aspect of the present application provides a method for determining a mesoscale factor in monocular vision reconstruction. The method is applied to a mobile robot, and the method includes:

Acquiring a first image of a designated object at a first moment and a second image of the designated object at a second moment through a monocular camera;

Performing feature point extraction and matching on the first image and the second image, and calculating a normalized translation vector of the monocular camera from the first moment to the second moment according to the paired feature points ;

Calculating a first pose of the designated object relative to the monocular camera at the first moment, and a second pose of the designated object relative to the monocular camera at the second moment;

Calculating an actual translation vector of the monocular camera from the first moment to the second moment according to the first pose and the second pose;

The ratio of the modulus of the actual translation vector to the modulus of the normalized translation vector is determined as a scale factor in the monocular vision reconstruction of the device.

A second aspect of the present application provides a mobile robot, which includes a monocular camera and a processor; wherein,

The monocular camera is configured to acquire a first image of a designated object at a first moment and a second image of the designated object at a second moment;

The processor is configured to:

Performing feature point extraction and matching on the first image and the second image, and calculating a normalized translation vector of the monocular camera from the first moment to the second moment according to the matched feature points;

A third aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the steps of any of the methods provided in the first aspect of the present application.

The method for determining the meso-scale factor of the monocular vision reconstruction and the mobile robot provided in this application, because the position of the designated object is fixed, therefore, in the case of the mobile robot slipping, jamming, etc., by calculating the designated object at the first moment The first pose relative to the monocular camera and the second pose of the designated object relative to the monocular camera at the second moment, so that the real-time The actual translation vector. Therefore, the method provided in this application does not have the problem that the determined scale factor is inaccurate due to slipping, jamming, and the like of the mobile robot.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of Embodiment 1 of a method for determining a scale factor in monocular vision reconstruction provided by the present application.

Fig. 2 is a schematic diagram of a monocular camera acquiring an image of a specified object according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart of calculating a pose of a specified object relative to a monocular camera according to an exemplary embodiment of the present application.

FIG. 4 is a hardware structural diagram of a first embodiment of a mobile robot provided in this application.

detailed description

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of devices and methods consistent with certain aspects of the application as detailed in the appended claims.

The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and / or" as used herein refers to and includes any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this application to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present application, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information. Depending on the context, the word "if" as used herein can be interpreted as "at" or "when" or "in response to determination".

A related method for determining the meso-scale factor in monocular vision reconstruction is proposed in the related art. This method uses two adjacent frames of images collected by a monocular camera, and uses epipolar geometry to calculate the normalized translation vector of the monocular camera between the two frames of images; and uses code disk data and IMU ( Inertial measurement unit (inertial measurement unit) data calculates the actual translation vector of the monocular camera in the real world between the two frames of images, and then uses the normalized translation vector and the actual translation vector to obtain the scale factor in monocular vision reconstruction.

However, when the above-mentioned method is used to determine the scale factor in the monocular vision reconstruction, the code disk count is inconsistent with the actual situation due to the mobile robot's slipping, jamming, etc., resulting in the actual calculation of the code disk data combined with the IMU data in this case. The translation vector is also inaccurate, and the scale factor calculated based on the actual translation vector is also inaccurate.

The present application provides a method for determining a scale factor in a monocular vision reconstruction and a mobile robot, so as to solve the problem that the determined scale factor is inaccurate due to the slipping, jamming, etc. of the mobile robot in the existing method.

The method provided by this embodiment can be applied to a mobile robot. For example, it can be applied to a cleaning robot.

Several specific embodiments are given below to introduce the technical solution of the present application in detail. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.

FIG. 1 is a flowchart of Embodiment 1 of a method for determining a scale factor in monocular vision reconstruction provided by the present application. Referring to FIG. 1, the method provided in this embodiment may include:

S101. Obtain a first image of a specified object at a first moment and a second image of the specified object at a second moment through a monocular camera.

Specifically, the mobile robot is provided with a monocular camera, and images can be collected by the monocular camera. Optionally, the designated object may be a charging device for charging the mobile robot. After detecting that the device is disconnected from the designated object, the mobile robot may obtain a first image of the designated object at the first moment through a monocular camera. And the second image of the specified object at the second moment. For example, there are neighboring sampling times: the first time t1 and the second time t2. The mobile robot can obtain the first image F1 of the specified object at the first time t1 and the second image of the specified object at the second time t2 through the monocular camera. Image F2.

It should be noted that the mobile robot is at different positions at the first time t1 and the second time t2, that is, the monocular camera is at different shooting positions at the first time t1 and the second time t2.

Fig. 2 is a schematic diagram of a monocular camera acquiring an image of a specified object according to an exemplary embodiment of the present application. Please refer to FIG. 2. In the example shown in FIG. 2, the designated object is a charging device for charging the mobile robot.

Referring to FIG. 2, the monocular camera 110 is at different shooting positions at a first time t1 and a second time t2. In conjunction with the previous introduction, for example, in one embodiment, the mobile robot may turn to the charging device after detecting that the device is disconnected from the charging device 200, and then photograph the charging device 200 with a monocular camera at a position different from the previous one. . In this way, a first image of the charging device 200 at a first time t1 can be obtained, the first image corresponding to the first shooting position, and a second image of the charging device 200 at a second time t2, the second image corresponds to the second Shooting position.

S102. Perform feature point extraction and matching on the first image and the second image, and calculate a normalized translation vector of the monocular camera from the first time to the second time according to the paired feature points.

Specifically, for specific implementation principles and implementation processes of performing feature point extraction and matching on the first image and the second image, refer to descriptions in related technologies, and details are not described herein again.

Further, after the matching is completed, the pixel coordinates of the matched feature points in the first image and the second image may be used to calculate the monocular from the first time to the second time based on the epipolar constraint. The normalized translation vector of the camera between the first shooting position and the second shooting position. For example, the normalized translation vector of the monocular camera from the first moment to the second moment may be calculated using eight pairs of paired feature points.

Specifically, the epipolar constraint can be expressed by the following formula:

Among them, K is the internal parameter matrix of the monocular camera, p ₁ and p ₂ are the pixel homogeneous coordinates of the paired feature points on the first image and the second image, respectively, and _Rep is the monocular camera from the first time t1 to The rotation change amount at the second time t2, t _ep is the normalized translation vector of the monocular camera from the first time t1 to the second time t2.

It should be noted that the specific implementation process of calculating the normalized translation vector of the monocular camera from the first time t1 to the second time t2 based on the epipolar constraint and the paired feature points can be referred to the introduction in the related technology. I will not repeat them here.

S103. Calculate a first pose of the designated object relative to the monocular camera at the first moment, and a second pose of the designated object relative to the monocular camera at the second moment.

Specifically, FIG. 3 is a flowchart of calculating a pose of a specified object relative to a monocular camera according to an exemplary embodiment of the present application. Referring to FIG. 3, calculating the pose of the specified object relative to the monocular camera may include:

S301. For each frame of the first image and the second image, obtain pixel coordinates of a specified point on the specified object from the frame image; the number of the specified points is greater than or equal to 4.

Specifically, in this step, the specified object may be identified from the image based on the attribute information of the specified object, and then based on the identified specified object, the pixel coordinates of the specified point on the specified object may be obtained from the image.

It should be noted that the attribute information of the designated object may include material attributes, color attributes, shape attributes, and the like. In this embodiment, this is not limited.

For example, in one embodiment, the designated object may be a charging device for charging the mobile robot. The charging device is provided with a marker. For example, the marker may be a marker composed of several marker blocks of a specific material, a specific color, a specific shape, a specified number, and / or a specified content. As another example, the marker may be a designated shape marker made of a specific material. For example, when the monocular camera is an infrared camera, the marker may be composed of a specified number of highly reflective material; for another example, when the monocular camera is an RGB camera, the marker may be a specified number of black and white printed Consisting of checkered checkered blocks. In this embodiment, the specific setting form of the marker is not limited.

It should be noted that the marker on the charging device can reflect the attribute information of the charging device, and the charging device in the image can be identified based on the marker of the charging device. For the specific implementation principle and implementation process of identifying the specified object in the image based on the attribute information of the specified object, refer to the description in the related technology, and details are not described herein again.

Further, the designated point on the designated object may be set according to actual needs, for example, the designated point may be a corner point, a center point, etc. of the marker. In this embodiment, the specific position of the designated point is not limited. It should be noted that the number of the designated points is greater than or equal to four.

The following uses the example shown in Figure 2 to explain the detailed implementation of this step in detail:

Specifically, referring to FIG. 2, in the example shown in FIG. 2, the marker 210 on the charging device 200 is composed of four marker blocks 1, 2, 3, and 4, and designated points on the charging device 200 are designated as the marker blocks. The center point. At this time, the four marker blocks 1, 2, 3, and 4 can be identified from the image based on attribute information such as the material, color, shape, and the distance between the marker blocks. 4. Furthermore, the pixel coordinates of the center point of each marker block are obtained. In this way, the pixel coordinates of the specified point on the specified object can be obtained.

For the convenience of description, the center point of each marked block is sequentially recorded as Bi, where i is equal to 1 to 4. The pixel coordinates of the center point Bi of the i-th labeled block are labeled (u _i , v _i ).

S302. According to the pixel coordinates of each of the specified points, a distortion correction algorithm is used to calculate the first coordinates of each of the specified points after distortion correction.

Specifically, the distortion correction algorithm is expressed by the following formula:

Among them, K is the internal parameter matrix of the monocular camera;

k ₁ , k ₂ , k ₃ , p ₁ , p ₂ are distortion parameters of the monocular camera;

(u _i , v _i ) are the pixel coordinates of the i-th specified point;

(x _i , y _i ) is the first coordinate after distortion correction of the i-th designated point.

S303. Calculate a rotation matrix and a translation vector of the specified object relative to the monocular camera according to the first coordinates of each of the specified point distortion corrections and the pre-stored second coordinates of each of the specified points in the specified coordinate system.

Specifically, the designated coordinate system is an absolute coordinate system. Specifically, in the example shown in FIG. 2, the designated coordinate system is a coordinate system marked on the charging device. That is, in the example shown in FIG. 2, the origin of the designated coordinate system is the center point of the charging device, the X-axis is horizontal to the right, and the Y-axis is perpendicular to the X-axis downward.

It should be noted that, in this step, based on the proportional orthogonal projection iterative transformation algorithm (Position from Orthography and Scaling with Iterations, POSIT) in the visual servoing, according to a plurality of specified points on the specified object in the specified coordinate system. The two coordinates and the first coordinates after the distortion correction of the plurality of designated points are subjected to orthogonal projection iteration to calculate a rotation matrix and a translation vector of the designated object relative to the monocular camera.

Specifically, the specific implementation process of this step may include:

Calculate the first vector i, the second vector j ′, and the third vector k ′ according to the first formula according to the first coordinates after the distortion correction of each of the specified points and the second coordinates of the predetermined points in the specified coordinate system that are stored in advance. And the first coefficient z.

Specifically, the first formula is:

Wherein, the designated point includes a reference designated point and a target designated point, and A is the second coordinate of each target designated point in the designated coordinate system and the first coordinate of the reference designated point in the designated coordinate system. A matrix formed by the difference between two coordinates; the X is the X coordinate in the first coordinate after the distortion correction of the designated point of each target and the X coordinate in the first coordinate after the distortion of the reference designated point are corrected A vector formed by the difference between them; the Y is between the Y coordinate in the first coordinate after the distortion correction of the designated point of each target and the Y coordinate in the first coordinate after the distortion of the reference designated point are corrected A vector of differences.

It should be noted that the reference designated point may be any designated point. In this embodiment, the reference designated point is used as an example for description.

Further, for convenience of explanation, the second sitting mark of the i-th designated point in the designated coordinate system is (a _i , b _i , 0).

With reference to the above example, referring to FIG. 2, at this time:

It should be noted that each of the first vector i, the second vector j ′, the third vector k ′, and the first coefficient z includes three elements.

Arranging the i, the j ′, and the k ′ in a row direction of the matrix in order to obtain a rotation matrix of the specified object relative to the monocular camera.

Specifically, the rotation matrix of the specified object relative to the monocular camera is denoted as R, and at this time, there are:

Calculate a translation vector of a specified object relative to the monocular camera according to the i, j ′, k ′, and z according to a second formula.

Specifically, the second formula is:

Where (a ₁ , b ₁ ) are the second coordinates of the reference designated point under the designated coordinates; (x ₁ , y ₁ ) are the first coordinates of the reference designated point after distortion correction; i ₁ , i ₂ is a first element and a second element in the first vector i; j ' ₁ and j' ₂ are a first element and a second element in the second vector j '; k ′ ₁ and k ′ ₂ are the first element and the second element in the third vector k ′, respectively; and t is a translation vector of the specified object relative to the monocular camera.

In this way, through the above steps, based on the first image, the first rotation matrix R _t1 and the first translation vector t _{t1 of the} specified object with respect to the monocular camera at the first time t1 can be calculated. Based on the second image, a second rotation matrix R _t2 and a second translation vector t _{t2 of the} designated object relative to the monocular camera at the second time t2 can be calculated.

S304. Obtain the pose of the specified object relative to the monocular camera according to the rotation matrix and translation vector of the specified object relative to the monocular camera.

In an embodiment, the first pose of the designated object relative to the monocular camera at the first time t1 is recorded as T _t1 . With the foregoing introduction, it can be known that:

Further, the second pose of the designated object relative to the monocular camera at the second time t2 is recorded as T _t2 . In combination with the previous introduction, it can be known that:

S104. Calculate an actual translation vector of the monocular camera from the first moment to the second moment according to the first pose and the second pose.

Specifically, the specific implementation process of this step may include: calculating the change in pose of the monocular camera from the first moment to the second moment according to the first pose and the second pose. Obtaining the true translation vector from the pose change.

Specifically, the pose change of the monocular camera from the first time t1 to the second time t2 can be calculated according to the following formula

Further, the pose change of the monocular camera from the first time t1 to the second time t2 includes an actual rotation matrix and an actual translation vector, and has:

Therefore, based on the previously calculated pose changes, the actual rotation matrix and actual translation vector of the monocular camera from the first time t1 to the second time t2 can be obtained. Seeing the above formula, it can be known that the actual translation vector is a vector composed of the first three elements of the last column vector in the pose change.

S105. Determine a ratio between a mode of the actual translation vector and a mode of the normalized translation vector as a scale factor in the monocular vision reconstruction of the device.

Specifically, the normalized translation vector of the monocular camera from the first time t1 to the second time t2 is calculated through step S102, and the monocular camera is calculated from the first time t1 to the second time t2 in the real world through step S104. After the actual translation vector, the ratio between the modulus of the actual translation vector and the modulus of the normalized translation vector is determined as the scale factor in the monocular vision reconstruction of the device. which is:

It should be noted that after the scale factor in the monocular vision reconstruction of this device is calculated, the map corresponding to the change in the pose and feature points of the monocular camera in the real world at two moments can be calculated, and then positioned at the same time in the subsequent In map reconstruction, the existing vision-based simultaneous positioning and map reconstruction algorithms can be used to calculate the change in pose and position of map points of subsequent monocular cameras in the real world by minimizing reprojection errors. In this way, combined with loop detection to correct the monocular camera pose and map point position drift, a map at a real scale can be located and constructed.

In the method for determining the scale factor in the monocular vision reconstruction provided by this embodiment, since the position of the designated object is fixed, when the mobile robot is slipping, stuck, etc., the calculated value of the designated object relative to the first time is calculated by The first pose and designated object of the monocular camera are relative to the second pose of the monocular camera at the second moment, and the monocular camera is calculated from the first moment to the second moment according to the first pose and the second pose. The actual amount of translation in the real world. Therefore, the method provided by the present application can effectively avoid the problem that the determined scale factor is inaccurate due to slipping, jamming, and the like of the mobile robot.

The method for determining the meso-scale factor of the monocular vision reconstruction provided in the present application is described above, and the mobile robot provided in the present application is introduced below:

FIG. 4 is a hardware structural diagram of a first embodiment of a mobile robot provided in this application. Referring to FIG. 4, the mobile robot 100 provided in this embodiment may include a monocular camera 410 and a processor 420. Among them,

The monocular camera 410 is configured to acquire a first image of a designated object at a first moment and a second image of the designated object at a second moment;

The processor 420 is configured to:

The mobile robot of this embodiment may be used to execute the technical solution of the method embodiment shown in FIG. 1, and the implementation principles and technical effects thereof are similar, and details are not described herein again.

Further, the processor 420 is specifically configured to:

For each frame of the first image and the second image, obtain pixel coordinates of a specified point on the specified object from the frame image; the number of the specified points is greater than or equal to 4;

Using a distortion correction algorithm according to the pixel coordinates of each of the specified points to obtain the first coordinate after the distortion correction of each of the specified points;

Calculating a rotation matrix and a translation vector of the specified object relative to the monocular camera according to the first coordinates of each of the specified point distortion corrections and the pre-stored second coordinates of each of the specified points in the specified coordinate system;

According to a rotation matrix and a translation vector of the designated object with respect to the monocular camera, a posture of the designated object with respect to the monocular camera is obtained.

Further, the processor 420 is specifically configured to:

Calculating a change in pose of the monocular camera from the first moment to the second moment according to the first pose and the second pose;

The actual translation vector is obtained from the pose change.

Further, the processor 420 is configured to identify the specified object from the frame image based on the attribute information of the specified object, and obtain a designation on the specified object based on the identified specified object. The pixel coordinates of the point.

Further, the processor 420 is specifically configured to:

Calculate the first vector i, the second vector j ′, and the third vector according to the first formula according to the first coordinates of each of the specified point distortion corrections and the second coordinates of each of the specified points stored in the specified coordinate system. k 'and the first coefficient z;

Arranging i, j ′ and k ′ in the row direction of the matrix in order to obtain a rotation matrix of the specified object relative to the monocular camera;

Calculating a translation vector of the specified object with respect to the monocular camera according to the i, the j ′, the k ′, and the z according to a second formula;

The first formula is:

The second formula is:

The designated point includes a reference designated point and a target designated point, and A is a second coordinate of each target designated point in the designated coordinate system and a second coordinate of the reference designated point in the designated coordinate system. A matrix formed by the differences; the X is a vector formed by the difference between the X coordinate in the first coordinate after the distortion of the designated point of the target is corrected and the X coordinate in the first coordinate after the distortion of the reference designated point is corrected The Y is a vector formed by the difference between the Y coordinate in the first coordinate after the distortion of the specified point of the target is corrected and the Y coordinate in the first coordinate after the distortion of the reference point is corrected;

Where (a ₁ , b ₁ ) are the second coordinates of the reference designated point under the designated coordinates; (x ₁ , y ₁ ) are the first coordinates of the reference designated point after distortion correction; and i ₁ and i ₂ are the first and second elements in i, respectively; the j ' ₁ and j' ₂ are the first and second elements in j ', respectively Elements; the k ′ ₁ and the k ′ ₂ are the first element and the second element in the k ′, respectively; and t is a translation vector of the specified object with respect to the monocular camera.

Further, the designated object is a charging device for charging the device; and the processor 420 is configured to obtain the designated object through a monocular camera after detecting that the device is disconnected from the designated object. A first image at a first moment and a second image of the designated object at a second moment.

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the steps of the method according to any one of the first aspect of the application are implemented.

Specifically, a computer-readable storage medium suitable for storing computer program instructions includes all forms of non-volatile memory, media, and memory devices, such as semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks) Or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.

The above are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, or improvement made within the spirit and principles of this application shall be included in this application Within the scope of protection.

Claims

A method for determining a mesoscale factor in monocular vision reconstruction, which is applied to a mobile robot, the method includes:

Acquiring a first image of a designated object at a first moment and a second image of the designated object at a second moment through a monocular camera;

Performing feature point extraction and matching on the first image and the second image, and calculating a normalized translation vector of the monocular camera from the first moment to the second moment according to the paired feature points ;

Calculating a first pose of the designated object relative to the monocular camera at the first moment, and a second pose of the designated object relative to the monocular camera at the second moment;

Calculating an actual translation vector of the monocular camera from the first moment to the second moment according to the first pose and the second pose;

A ratio between a mode of the actual translation vector and a mode of the normalized translation vector is determined as a scale factor in monocular vision reconstruction.
The method according to claim 1, wherein calculating the pose of the designated object relative to the monocular camera comprises:

For each frame of the first image and the second image, obtain pixel coordinates of a specified point on the specified object from the frame image; the number of the specified points is greater than or equal to 4;

Calculating, according to the pixel coordinates of each of the specified points, a first coordinate after distortion correction of each of the specified points by using a distortion correction algorithm;

Calculating a rotation matrix and a translation vector of the specified object relative to the monocular camera according to the first coordinates of each of the specified point distortion corrections and the pre-stored second coordinates of each of the specified points in the specified coordinate system;

Obtaining the pose of the specified object relative to the monocular camera according to the rotation matrix and the translation vector of the specified object relative to the monocular camera.
The method according to claim 1, wherein an actual translation vector of the monocular camera from the first moment to the second moment is calculated according to the first pose and the second pose. ,include:

Calculating a change in pose of the monocular camera from the first moment to the second moment according to the first pose and the second pose;

The actual translation vector is obtained from the pose change.
The method according to claim 2, wherein obtaining pixel coordinates of a specified point on the specified object from the frame image comprises:

Identifying the designated object from the frame image based on the attribute information of the designated object;

Based on the identified specified object, pixel coordinates of a specified point on the specified object are obtained.
The method according to claim 2, characterized in that the relative of the specified object is calculated according to the first coordinates of each of the specified point distortion corrections and the second coordinates of each of the specified points stored in the specified coordinate system in advance. The rotation matrix and translation vector for the monocular camera include:

Calculate the first vector i, the second vector j ′, and the third vector according to the first formula according to the first coordinates of each of the specified point distortion corrections and the second coordinates of each of the specified points stored in the specified coordinate system. k 'and the first coefficient z;

Arranging i, j ′ and k ′ in the row direction of the matrix in order to obtain a rotation matrix of the specified object relative to the monocular camera;

Calculating a translation vector of the specified object with respect to the monocular camera according to the i, the j ′, the k ′, and the z according to a second formula;

The first formula is:

The second formula is:

Wherein, the designated point includes a reference designated point and a target designated point, and the A is a second coordinate of each of the target designated points in the designated coordinate system and a reference coordinate of the reference designated point in the designated coordinate system. A matrix formed by the difference of the second coordinates; the X is the difference between the X coordinate in the first coordinate after the distortion correction of the designated point of the target and the X coordinate in the first coordinate after the distortion of the reference designated point are corrected A vector formed by values; the Y is a vector formed by a difference between the Y coordinate in the first coordinate after the distortion correction of the designated point of the target and the Y coordinate in the first coordinate after the distortion of the reference designated point is corrected;

Where (a 1 , b 1 ) are the second coordinates of the reference designated point under the designated coordinates; (x 1 , y 1 ) are the first coordinates of the reference designated point after distortion correction; and i 1 and i 2 are the first and second elements in i, respectively; the j ′ 1 and j ′ 2 are the first and second elements in j ′, respectively Element k ′ 1 and k ′ 2 are the first element and the second element of k ′, respectively; and t is a translation vector of the specified object with respect to the monocular camera.
The method according to claim 1, wherein the designated object is a charging device used to charge the device; a first image of the designated object at a first moment is obtained through a monocular camera, and the designated object is at the first time. The second image at two moments includes:

After detecting that the device is disconnected from the designated object, the first image of the designated object at the first moment and the second image of the designated object at the second moment are obtained through the monocular camera.
A mobile robot includes:

A monocular camera, configured to obtain a first image of a specified object at a first moment and a second image of the specified object at a second moment;

Processors for:

Performing feature point extraction and matching on the first image and the second image, and calculating a normalized translation vector of the monocular camera from the first moment to the second moment according to the paired feature points ;

Calculating a first pose of the designated object relative to the monocular camera at the first moment, and a second pose of the designated object relative to the monocular camera at the second moment;

Calculating an actual translation vector of the monocular camera from the first moment to the second moment according to the first pose and the second pose;

A ratio of a modulus of the actual translation vector to a modulus of the normalized translation vector is determined as a scale factor in monocular vision reconstruction.
The mobile robot according to claim 7, wherein the processor is configured to:

For each frame of the first image and the second image, obtain pixel coordinates of a specified point on the specified object from the frame image; the number of the specified points is greater than or equal to 4;

Calculating, according to the pixel coordinates of each of the specified points, a first coordinate after distortion correction of each of the specified points by using a distortion correction algorithm;

Calculating a rotation matrix and a translation vector of the specified object relative to the monocular camera according to the first coordinates of each of the specified point distortion corrections and the pre-stored second coordinates of each of the specified points in the specified coordinate system;

Obtaining the pose of the specified object relative to the monocular camera according to the rotation matrix and the translation vector of the specified object relative to the monocular camera.
The mobile robot according to claim 7, wherein the processor is specifically configured to:

Calculating a change in pose of the monocular camera from the first moment to the second moment according to the first pose and the second pose;

The actual translation vector is obtained from the pose change.
A computer-readable storage medium having stored thereon a computer program, characterized in that the program is executed by a processor to implement the steps of the method according to any one of claims 1-6.