CN117994332A - Pose determination method and device, computer readable storage medium and electronic equipment - Google Patents

Pose determination method and device, computer readable storage medium and electronic equipment Download PDF

Info

Publication number
CN117994332A
CN117994332A CN202211336046.1A CN202211336046A CN117994332A CN 117994332 A CN117994332 A CN 117994332A CN 202211336046 A CN202211336046 A CN 202211336046A CN 117994332 A CN117994332 A CN 117994332A
Authority
CN
China
Prior art keywords
camera
color image
pose
coordinate system
feature points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211336046.1A
Other languages
Chinese (zh)
Inventor
尹赫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202211336046.1A priority Critical patent/CN117994332A/en
Priority to PCT/CN2023/118752 priority patent/WO2024087927A1/en
Publication of CN117994332A publication Critical patent/CN117994332A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a pose determining method, a pose determining device, a computer readable storage medium and an electronic device, and relates to the technical field of computer vision. The pose determining method comprises the following steps: determining the re-projection error of the matching characteristic points between every two images in a first color image set, wherein the first color image set consists of a current frame color image acquired by a first camera and a previous n frames of color images acquired by the first camera, determining the re-projection error of the matching characteristic points between every two images in a second color image set by combining a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, and determining the target pose when the first camera acquires the current frame of color images based on the determined re-projection error. The present disclosure may improve accuracy and robustness of positioning.

Description

Pose determination method and device, computer readable storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer vision, and in particular, to a pose determining method, a pose determining device, a computer readable storage medium, and an electronic apparatus.
Background
In the technical field of computer vision, vision positioning is a technology for positioning by using images shot by a camera to determine the position of the camera in the real world, and has important application value in the fields of augmented reality, virtual reality, robots, intelligent transportation and the like.
In a scene where a plurality of cameras perform visual localization, a problem of poor localization accuracy may occur.
Disclosure of Invention
The present disclosure provides a pose determination method, a pose determination device, a computer-readable storage medium, and an electronic apparatus, thereby overcoming the problem of poor visual positioning accuracy at least to some extent.
According to a first aspect of the present disclosure, there is provided a pose determination method applied to a terminal device configured with a first camera and at least one second camera, the pose determination method comprising: acquiring matching feature points between every two images in a first color image set, and determining the re-projection error of the matching feature points between every two images in the first color image set, wherein the first color image set consists of a current frame color image acquired by a first camera and a previous n frames color image acquired by the first camera; acquiring matching feature points between every two images in a second color image set, and determining a re-projection error of the matching feature points between every two images in the second color image set by combining a conversion matrix between a first camera coordinate system of a first camera and a second camera coordinate system of a second camera, wherein the second color image set consists of a current frame color image acquired by the second camera and a previous n frames color image acquired by the second camera; optimizing the pose to be optimized when the first camera collects the color image of the current frame based on the re-projection errors of the matching feature points between every two images in the first color image set and the re-projection errors of the matching feature points between every two images in the second color image set so as to determine the target pose when the first camera collects the color image of the current frame; wherein n is a positive integer.
According to a second aspect of the present disclosure, there is provided a pose determining apparatus configured to a terminal device, the terminal device further configured with a first camera and at least one second camera, the pose determining apparatus comprising: the first error determining module is used for acquiring matching characteristic points between every two images in the first color image set and determining the re-projection error of the matching characteristic points between every two images in the first color image set, wherein the first color image set consists of a current frame of color images acquired by a first camera and a previous n frames of color images acquired by the first camera; the second error determining module is used for acquiring matching characteristic points between every two images in a second color image set, determining the re-projection error of the matching characteristic points between every two images in the second color image set by combining a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, wherein the second color image set consists of a current frame color image acquired by the second camera and a previous n frame color image acquired by the second camera; the target pose determining module is used for optimizing the pose to be optimized when the first camera collects the color image of the current frame based on the re-projection error of the matching characteristic points between every two images in the first color image set and the re-projection error of the matching characteristic points between every two images in the second color image set so as to determine the target pose when the first camera collects the color image of the current frame; wherein n is a positive integer.
According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described pose determination method.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising a processor; and the memory is used for storing one or more programs, and when the one or more programs are executed by the processor, the processor is enabled to realize the pose determining method.
In some embodiments of the present disclosure, in one aspect, a re-projection error of a matching feature point between two images in a first color image set formed by a current frame color image and a previous n-frame color image acquired by a first camera is determined, and in another aspect, a re-projection error of a matching feature point between two images in a second color image set formed by a current frame color image and a previous n-frame color image acquired by a second camera is determined, and then, based on the determined re-projection error, a pose to be optimized when the current frame color image is acquired by the first camera is determined to be optimized, so as to obtain an optimized pose when the current frame color image is acquired by the first camera. According to the scheme, error constraints are established for different cameras respectively, and the pose is optimized by combining the error constraints, so that the accuracy and the robustness of positioning can be improved. In addition, the establishment of the error constraint of the present disclosure depends on the matching feature points between the current frame and the previous n frames, that is, the scheme considers the correlation between the adjacent frames, and further improves the accuracy and the robustness of the positioning.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:
FIG. 1 shows a schematic diagram of a system architecture of a pose determination system of an embodiment of the present disclosure;
fig. 2 is a schematic diagram illustrating a placement manner of a dual camera on a terminal device according to an embodiment of the disclosure;
FIG. 3 illustrates a schematic view of a pose angle of a dual camera according to an embodiment of the present disclosure;
FIG. 4 shows a schematic diagram of various processing stages involved in a pose determination scheme of an embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow chart of a pose determination method according to an exemplary embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a method of determining a pose to be optimized according to an embodiment of the disclosure;
FIG. 7 illustrates a schematic diagram of dual camera point pair matching in an embodiment of the present disclosure;
FIG. 8 illustrates a flow chart of a process of location initialization of an embodiment of the present disclosure;
FIG. 9 shows a schematic diagram of determining two planes for an embodiment of the present disclosure;
FIG. 10 shows a schematic diagram of determining a ground plane in accordance with an embodiment of the present disclosure;
FIG. 11 illustrates a flowchart of a process of determining a transformation matrix between a first camera coordinate system and a world coordinate system in an embodiment of the present disclosure;
FIG. 12 shows a schematic view of a sliding window of an embodiment of the present disclosure;
FIG. 13 illustrates a schematic diagram of determining a two-frame inter-re-projection error in accordance with an embodiment of the present disclosure;
FIG. 14 shows a schematic of the walking path of a machine dog in the test protocol of the present disclosure;
Fig. 15 schematically illustrates a block diagram of a pose determination apparatus according to an exemplary embodiment of the present disclosure;
Fig. 16 schematically shows a block diagram of a pose determination apparatus according to another exemplary embodiment of the present disclosure;
fig. 17 schematically shows a block diagram of a pose determination apparatus of still another exemplary embodiment of the present disclosure;
Fig. 18 schematically illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations. In addition, all of the following terms "first," "second," "third," "fourth," etc. are for the purpose of distinction only and should not be taken as limitations of the present disclosure.
Through the visual positioning technology, the computer equipment can autonomously sense the pose state of the computer equipment in the environment so as to execute tasks proposed by any user, such as tracking, monitoring, interaction, displaying pictures, playing audio and the like. The degree of accuracy of positioning greatly affects the implementation of computer device functions.
In order to improve the accuracy of visual positioning of the device, the disclosed embodiments provide a new positioning scheme.
Fig. 1 shows a schematic diagram of a system architecture of a pose determination system of an embodiment of the present disclosure. Referring to fig. 1, a terminal device 1 may include a processor 100, a first camera 110, and at least one second camera 120.
The terminal device 1 may for example comprise a robot, an intelligent monitoring device, an intelligent tracking device or the like. It may be an entire apparatus or an apparatus system composed of a plurality of entity units.
For example, the terminal device 1 may be a machine dog. The robot dog is a robot form, has the advantages of flexibility, strong mobility and the like, and can realize the tasks of security patrol, article transportation, emotion accompaniment and the like.
The first camera 110 and the at least one second camera 120, as input sensors for the pose determination scheme of the presently disclosed embodiment, may transmit the sensed color image and depth image to the processor 100.
For example, the first camera 110 and the second camera 120 may be REALSENSED455 cameras. REALSENSED455 the camera consists of one RGB camera, two IR (infrared) cameras, one IR emitter. The RGB camera outputs a color image and the two IR cameras can output a dense depth map aligned with the color image. The FOV (field of view) of REALSENSED455 cameras is 90 ° in the horizontal direction and 65 ° in the vertical direction.
In case the terminal device 1 comprises a first camera 110 and one second camera 120, the first camera 110 may be a left eye (left) camera and the second camera 120 may be a right eye (right) camera, in the following embodiments, the left eye camera referred to may be understood as the first camera 110 and the right eye camera referred to may be understood as the second camera 120. However, it should be understood that "left", "right", "first", "second" are merely illustrative descriptions for distinction, and in other embodiments of the present disclosure, the first camera 110 may be a right-eye camera and the second camera 120 may be a left-eye camera, which is not limited in this disclosure.
Taking the first camera 110 and the second camera 120 as two cameras in total as an example, fig. 2 shows a schematic diagram of a placement manner of the dual camera on a terminal device according to an embodiment of the disclosure. It should be understood that the placement shown in fig. 2 is merely an exemplary illustration, and that there may be multiple placement depending on the type of terminal device and camera configuration space, which is not limited by the present disclosure.
Fig. 3 shows a schematic view of a placement angle of a dual camera according to an embodiment of the present disclosure. For the first camera 110 and the second camera 120, which are both vertically disposed, their viewing angles are 65 °, corresponding to the angle a and the angle B in fig. 3, respectively. When the two cameras are placed, the leftmost line of sight of the first camera 110 may be parallel to the rightmost line of sight of the second camera 120, and the two cameras may obtain the maximum field of view, namely 130 °, corresponding to the angle C in fig. 3. There is a small common view area between the first camera 110 and the second camera 120. According to the above design of the angle, it can be determined that the first camera 110 and the second camera 120 are placed at an included angle of 115 ° corresponding to the angle D in fig. 3.
Therefore, the first camera 110 and the second camera 120 are vertically arranged in parallel at an included angle of 115 degrees, and the fields of view of the two cameras are 130 degrees in the horizontal direction and 90 degrees in the vertical direction. The method realizes the maximized superposition of the fields of view of the two cameras, effectively increases the field of view of the terminal equipment 1, and provides more sufficient accuracy for the subsequent positioning algorithm.
In addition, the first camera 110 and the second camera 120 support multi-camera hardware synchronization, the first camera 110 and the second camera 120 can be connected through wires, and the same pulse signal is used for triggering the two cameras to simultaneously expose, so that the hardware synchronization of a plurality of cameras is realized. After the hardware synchronous setting, the images input into the subsequent positioning algorithm are images shot at the same moment. Thus, additional errors due to inconsistent shooting moments of multiple cameras are avoided.
After the first camera and the second camera are placed in the mode, the two cameras can be calibrated for internal parameters and external parameters respectively for subsequent algorithms. The present disclosure does not limit the calibration process.
In the pose determination scheme of the embodiment of the present disclosure, the current frame color image acquired by the first camera 110 and the first n frames of color images acquired by the first camera 110 are recorded as the first color image set. The processor 100 may obtain the matching feature points between the images in the first set of color images and determine the re-projection error of the matching feature points between the images in the first set of color images. Wherein n is a positive integer.
The current frame color image acquired by the second camera 120 and the first n frames of color images acquired by the second camera 120 are noted as a second set of color images. The processor 100 may obtain the matching feature points between the two images in the second color image set, and determine the re-projection error of the matching feature points between the two images in the second color image set by combining the transformation matrix between the first camera coordinate system of the first camera 110 and the second camera coordinate system of the second camera 120.
Next, the processor 100 may optimize the pose to be optimized when the first camera collects the current frame color image based on the re-projection error of the matching feature points between two images in the first color image set and the re-projection error of the matching feature points between two images in the second color image set, so as to determine the target pose when the first camera collects the current frame color image.
The pose to be optimized when the first camera acquires the color image of the current frame can be understood as a pre-determined rough pose, and the target pose obtained by optimizing the re-projection error is a fine pose corresponding to the rough pose. The pose accuracy of the target pose is better than that of the pose to be optimized.
In the case where the terminal device 1 is configured with a plurality of second cameras 120, the re-projection error of the matching feature points between the images in the second color image set is determined for each second camera 120. In determining the target pose, pose optimization is performed based on the re-projection errors corresponding to the first camera 110 and all the second cameras 120.
It can be appreciated that the placement positions of the first camera 110 and the second camera 120 on the terminal device 1 are fixed, and the current pose of the second camera 120 and the current pose of the terminal device 1 can be obtained under the condition that the current pose of the first camera 110 is determined.
Further, in the case where the terminal apparatus 1 configures two or more cameras, it is possible to determine any one camera as the first camera 110 on the algorithm implementation and determine the remaining cameras as the second cameras 120.
Based on the pose determination scheme of the embodiment of the disclosure, the determined pose can be further optimized, and a more accurate target pose is obtained. According to the scheme, error constraints are established for different cameras respectively, and the pose is optimized by combining the error constraints, so that the accuracy and the robustness of positioning can be improved. In addition, the establishment of the error constraint of the present disclosure depends on the matching feature points between the current frame and the previous n frames, that is, the scheme considers the correlation between the adjacent frames, and further improves the accuracy and the robustness of the positioning.
In implementing the pose determination process of embodiments of the present disclosure, multiple processing stages are involved. Referring to fig. 4, the processing stages involved include, but are not limited to, a coordinate system alignment stage, a localization initialization stage, a real-time localization stage, and a pose optimization stage. The coordinate system alignment stage, the positioning initialization stage and the real-time positioning stage are configured to determine a pose to be optimized when the first camera acquires the current frame color image, and the pose optimization stage is configured to optimize the pose to be optimized to determine a target pose, namely an optimized pose, when the first camera acquires the current frame color image.
For the coordinate system alignment stage, the terminal device determines a transformation matrix between the first camera coordinate system and the world coordinate system.
First, the terminal device may construct a point cloud using the depth image output by the first camera and the depth image output by the second camera. The three-dimensional space points corresponding to the two depth images can be combined to obtain a point cloud of the three-dimensional feature points.
Then, the terminal device extracts plane information from the point cloud by using a plane detection algorithm, and screens out a designated plane (such as a ground plane) according to the extracted plane information.
The terminal device may then calculate a transformation matrix from the normal vector and the gravity vector of the specified plane to achieve alignment of the first camera coordinate system with the world coordinate system.
In addition, it can be appreciated that the transformation matrix between the first camera coordinate system and the second camera coordinate system can be known based on the obtained internal and external parameter calibration results. In this case, a conversion matrix between the second camera coordinate system and the world coordinate system can be obtained, so that the alignment among the first camera coordinate system, the second camera coordinate system and the world coordinate system can be realized.
For the positioning initialization phase, the terminal device may determine the pose of the first camera when initially capturing a color image. It should be understood that the pose of the camera as referred to in this disclosure when taking an image refers to a pose in the world coordinate system.
In one aspect, the terminal device may determine a three-dimensional feature point corresponding to the initial frame color image acquired by the first camera, where the three-dimensional feature point is a feature point under the first camera coordinate system.
On the other hand, an initial rotation matrix and an initial translation vector may be set. For example, the initial rotation matrix is an identity matrix and the initial translation vector is [0, 0].
After the three-dimensional feature points corresponding to the initial frame color image, the initial rotation matrix and the initial translation vector are determined, positioning initialization under the first camera coordinate system is completed.
And then, combining a conversion matrix between the first camera coordinate system and the world coordinate system determined in the coordinate system alignment stage, converting a positioning initialization result under the first camera coordinate system into a positioning initialization result under the world coordinate system, namely determining the pose of the first camera when the first camera collects the initial frame color image.
For the real-time positioning stage, the terminal equipment can acquire the pose of the current frame in real time by combining the initial pose pair determined in the positioning initialization stage. In the process, the features of the second camera can be transferred to the first camera coordinate system, pose solving is carried out in combination with the features of the first camera, pose prediction of the current frame is completed, and the pose to be optimized when the first camera collects the color image of the current frame is obtained.
In the pose optimization stage, the terminal equipment can optimize the optimized pose when the first camera acquires the color image of the current frame so as to obtain the target pose.
The embodiment of the disclosure provides a pose determining method for a processing procedure of a pose optimizing stage.
Fig. 5 schematically shows a flowchart of a pose determination method of an exemplary embodiment of the present disclosure. Referring to fig. 5, the pose determination method may include the steps of:
s52, obtaining matching feature points between every two images in a first color image set, and determining re-projection errors of the matching feature points between every two images in the first color image set, wherein the first color image set consists of a current frame color image acquired by a first camera and a first n frame color image acquired by the first camera.
In an exemplary embodiment of the present disclosure, a set of a current frame color image acquired by a first camera and a previous n frames of color images acquired by the first camera is denoted as a first color image set. It is understood that the first n color images are consecutive frame images before the current color image, where n is a positive integer, and the specific value of n is not limited in this disclosure. The larger n is, the higher the algorithm precision is, and the more calculation resources are consumed; the smaller n, the relatively lower the algorithm accuracy and the faster the processing speed. The value of n can be comprehensively determined based on the required precision, time consumption, processing capacity of equipment and the like.
For example, n is set to 5, and if the current frame color image is the 10 th frame image, the previous n frame color image includes the 9 th frame image, the 8 th frame image, the 7 th frame image, the 6 th frame image, and the 5 th frame image.
Under the condition that the first color image set is determined, the terminal equipment can acquire matching feature points between every two images in the first color image set. The matching feature points between every two images in the first color image set refer to the matching feature points between all image pairs in the first color image set. It should be appreciated that the image pair is not limited to adjacent frames, and any two color images in the first set of color images constitute the image pair.
It should be noted that the matching feature points between the two images are 2D-2D (two-dimensional-two-dimensional) feature points, and the terminal device may determine the matching feature points between the two images in the first color image set using a feature point matching algorithm, which is not limited in this disclosure.
After determining the matching feature points between the two images in the first color image set, the terminal device may determine a re-projection error of the matching feature points between the two images in the first color image set.
The process of determining the re-projection error of the matching feature point between the first color image and the second color image included in the first color image set will be described below by taking the first color image and the second color image as examples.
First, the terminal device may acquire feature points on the first color image that match the second color image, and record the feature points as first matching feature points. The terminal device may obtain depth information of the first matching feature point, which may be output by the first camera, or may be sensed by other depth cameras equipped with the terminal device, which is not limited by the present disclosure.
Then, the terminal device may determine a three-dimensional feature point of the first matching feature point in the world coordinate system by using the first matching feature point, depth information of the first matching feature point, and a pose of the first camera when the first camera collects the first color image.
Specifically, the first matching feature point, the depth information of the first matching feature point, and the pose of the first camera when the first camera collects the first color image may be multiplied to obtain a three-dimensional feature point of the first matching feature point under the world coordinate system.
The disclosed embodiments record feature points on the second color image that match the first color image as second matching feature points. Then, under the condition that the second matching feature point is obtained, the terminal equipment can determine the re-projection error of the matching feature point between the first color image and the second color image by using the second matching feature point, the three-dimensional feature point of the first matching feature point under the world coordinate system and the pose of the first camera when the second color image is collected.
Specifically, the three-dimensional feature point of the first matching feature point under the world coordinate system and the pose of the first camera when the second color image is acquired are multiplied inversely, normalization processing is performed on the multiplied result, and then the normalized result and the second matching feature point are subtracted, so that the re-projection error of the matching feature point between the first color image and the second color image is constructed.
It is understood that the first matching feature point and the second matching feature point may be normalized feature points.
S54, obtaining matching feature points between every two images in a second color image set, and determining a re-projection error of the matching feature points between every two images in the second color image set by combining a conversion matrix between a first camera coordinate system of a first camera and a second camera coordinate system of a second camera, wherein the second color image set consists of a current frame color image acquired by the second camera and a previous n frame color image acquired by the second camera.
In an exemplary embodiment of the present disclosure, a set of the current frame color image acquired by the second camera and the first n frames of color images acquired by the second camera is denoted as a second set of color images. Here, n is the same as n in step S52.
And under the condition that the second color image set is determined, the terminal equipment can acquire the matching characteristic points between every two images in the second color image set. Similarly, the second set of color images of the present disclosure is not limited to adjacent frames, and any two of the second set of color images are the two images. The present disclosure is also not limited in the manner in which the matching feature points are determined.
After determining the matching feature points between every two images in the second color image set, the terminal device may determine a re-projection error of the matching feature points between every two images in the second color image set in combination with a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera.
The process of determining the re-projection error of the matching feature point between the third color image and the fourth color image included in the second color image set will be described below by taking the third color image and the fourth color image as an example.
First, the terminal device may acquire feature points on the third color image that match the fourth color image, and record the feature points as third matching feature points. The terminal device may obtain depth information of the third matching feature point, which may be output by the second camera, or may be sensed by another depth camera equipped with the terminal device, which is not limited by the present disclosure.
Next, the terminal device may determine a three-dimensional feature point of the third matching feature point in the world coordinate system using the third matching feature point, depth information of the third matching feature point, a transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera, and a pose of the first camera when the second camera acquires the third color image.
Specifically, the third matching feature point, depth information of the third matching feature point, a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, and a pose of the first camera when the second camera collects the third color image may be multiplied, so as to obtain a three-dimensional feature point of the third matching feature point under the world coordinate system.
The embodiment of the disclosure marks the feature points on the fourth color image that match the third color image as fourth matching feature points. Then, under the condition that the fourth matching feature point is obtained, the terminal equipment can determine the re-projection error of the matching feature point between the third color image and the fourth color image by using the fourth matching feature point, the three-dimensional feature point of the third matching feature point under the world coordinate system, the pose of the first camera when the second camera collects the fourth color image and the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera.
Specifically, the three-dimensional feature point of the third matching feature point under the world coordinate system, the inverse of the pose of the first camera when the second camera collects the fourth color image, and the inverse of the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera may be multiplied, the result of the multiplication is normalized, and then the normalized result is subtracted from the fourth matching feature point, so as to construct the reprojection error of the matching feature point between the third color image and the fourth color image.
It is understood that the third matching feature point and the fourth matching feature point may be normalized feature points.
S56, optimizing the pose to be optimized when the first camera collects the color image of the current frame based on the re-projection errors of the matching feature points between every two images in the first color image set and the re-projection errors of the matching feature points between every two images in the second color image set, so as to determine the target pose when the first camera collects the color image of the current frame.
In an exemplary embodiment of the present disclosure, the terminal device may accumulate the re-projection errors determined in step S52 and step S54 to obtain a total error function. It will be appreciated that the total error function is a non-linear function that varies with the pose of the first camera when acquiring each of the first set of color images. Since the first set of color images includes the current frame of color images, the pose of the first camera when acquiring each color image in the first set of color images includes the pose to be optimized when the first camera acquires the current frame of color images.
The terminal equipment can utilize an iterative processing mode to minimize the total error function, and when the total error function reaches the minimum value, the target pose when the first camera collects the color image of the current frame can be determined, namely the pose to be optimized is the pose after the optimization.
Specifically, the Jacobian matrix of each error in the total error function about the optimization variable can be solved, and iterative optimization is performed in a nonlinear optimization mode to minimize the total error function, so that the accurate pose of the first camera when the first camera collects the color image of the current frame is finally determined.
In the above processing procedure, the pose to be optimized when the first camera collects the color image of the current frame is involved. Specifically, the first color image may be a current frame color image, and in this case, the pose of the first camera when the first camera collects the first color image is a pose to be optimized when the first camera collects the current frame color image.
It should be noted that the pose to be optimized may be a predetermined pose, and the present disclosure further provides a method for determining the pose to be optimized. This process is explained below with reference to fig. 6.
S602, acquiring a current frame color image acquired by a first camera, and determining a first two-dimensional feature point matched with a previous frame color image acquired by the first camera on the current frame color image acquired by the first camera.
After the current frame color image acquired by the first camera is acquired, the terminal device may extract feature points of the current color image acquired by the first camera.
Feature extraction algorithms employed by exemplary embodiments of the present disclosure may include, but are not limited to, FAST feature point detection algorithms, DOG feature point detection algorithms, harris feature point detection algorithms, SIFT feature point detection algorithms, SURF feature point detection algorithms, and the like. The feature descriptors may include, but are not limited to, BRIEF feature point descriptors, BRISK feature point descriptors, FREAK feature point descriptors, and the like.
According to one embodiment of the present disclosure, the combination of feature extraction algorithm and feature descriptor may be a FAST feature point detection algorithm and a BRIEF feature point descriptor. According to further embodiments of the present disclosure, the combination of the feature extraction algorithm and the feature descriptor may be a DOG feature point detection algorithm and a frak feature point descriptor.
It should be understood that different combinations may also be used for different texture scenes, for example, for strong texture scenes, FAST feature point detection algorithm and BRIEF feature point descriptors may be used for feature extraction; for weak texture scenes, a DOG feature point detection algorithm and a FREAK feature point descriptor can be adopted to perform feature extraction.
In the processing of the previous frame color image corresponding to the current frame color image, there is also a process of extracting feature points. Therefore, the terminal equipment can determine the two-dimensional characteristic points matched between the two images by utilizing the characteristic points of the current frame of color image acquired by the first camera and the characteristic points of the previous frame of color image acquired by the first camera, namely the first two-dimensional characteristic points.
Specifically, an optical flow method may be used to determine a matching relationship of the feature points, that is, the feature points of the current frame of color image collected by the first camera and the feature points of the previous frame of color image collected by the first camera are used to perform optical flow tracking, so as to determine the first two-dimensional feature points. In addition, other image matching methods may be employed to determine 2D-2D feature point pairs, which are not limited by the present disclosure.
S604, acquiring a current frame color image acquired by the second camera, and determining a second two-dimensional feature point matched with a previous frame color image acquired by the second camera on the current frame color image acquired by the second camera.
It should be understood that, in comparison with step S602, although descriptions of the current frame color image and the previous frame color image exist, the current frame color image and the previous frame color image in step S602 are acquired by the first camera, and the current frame color image and the previous frame color image in step S604 are acquired by the second camera.
After the current frame color image acquired by the second camera is acquired, the terminal device may extract feature points of the current color image acquired by the second camera. The feature point extraction method may be the same as the feature point extraction method in step S602, and will not be described again.
The terminal device may perform optical flow tracking by using the feature point of the current frame of color image collected by the second camera and the feature point of the previous frame of color image collected by the second camera, so as to determine a second two-dimensional feature point.
S606, converting the second two-dimensional feature points into third two-dimensional feature points under the first camera coordinate system by utilizing a conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera.
In an exemplary embodiment of the present disclosure, for distinction, a camera coordinate system of a first camera is denoted as a first camera coordinate system, and a camera coordinate system of a second camera is denoted as a second camera coordinate system.
Under the condition that the placement positions of the first camera and the second camera on the terminal equipment are fixed, the first camera and the second camera are calibrated with internal parameters and external parameters in advance, and a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera can be determined from a calibration result.
The terminal device may acquire a conversion matrix between the first camera coordinate system and the second camera coordinate system and depth information of the second two-dimensional feature point, and determine a third two-dimensional feature point according to the conversion matrix, the depth information of the second two-dimensional feature point, and the second feature point. The third two-dimensional feature point is a two-dimensional feature point obtained by converting the second two-dimensional feature point into the first camera coordinate system.
Specifically, the transformation matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point may be multiplied, and the result of the multiplication may be normalized to determine the third two-dimensional feature point. Wherein the second two-dimensional feature points in the multiplication operation refer to position coordinate information of the feature points. The third two-dimensional feature point can be determined by using equation 1
Wherein T lr is a transformation matrix between the first camera coordinate system and the second camera coordinate system, d j is a depth value of the second two-dimensional feature point,Is a second two-dimensional feature point.
S608, determining the pose to be optimized when the first camera collects the current frame of color image according to the first two-dimensional feature point, the third two-dimensional feature point, the three-dimensional feature point of the last frame of color image collected by the first camera under the world coordinate system and the three-dimensional feature point of the last frame of color image collected by the second camera under the world coordinate system.
In an exemplary embodiment of the present disclosure, the first two-dimensional feature point and the third two-dimensional feature point constitute two-dimensional coordinate information, and the three-dimensional feature point of the previous color image collected by the first camera under the world coordinate system and the three-dimensional feature point of the previous color image collected by the second camera under the world coordinate system constitute three-dimensional coordinate information.
The terminal equipment can correlate the two-dimensional coordinate system information with the three-dimensional coordinate information to obtain Point pair information, solve a perspective n-Point (PERSPECTIVE-n-Point, pnP) problem by utilizing the Point pair information, and determine the pose to be optimized when the first camera collects the color image of the current frame according to the solving result.
Wherein PnP is a method in the field of machine vision, and can determine the relative pose of a camera from n feature points in a scene. The rotation matrix and translation vector of the camera can be determined specifically according to n feature points on the scene.
It should be noted that the process of determining the three-dimensional feature point of the previous frame color image in the world coordinate system may be performed during the processing of the current frame or may be performed during the processing of the previous frame, which is not limited by the present disclosure.
The process of determining three-dimensional feature points of the previous color image acquired by the first camera in the world coordinate system will be described.
First, the terminal device may acquire a previous frame of color image acquired by the first camera, and extract feature points of the previous frame of color image acquired by the first camera. The process of extracting the feature points is the same as that in step S602, and will not be described in detail here.
Then, the terminal device may perform spatial projection on the feature point of the previous frame of color image collected by the first camera by using the previous frame of depth image aligned with the previous frame of color image collected by the first camera, so as to obtain a three-dimensional feature point of the previous frame of color image collected by the first camera under the first camera coordinate system. Wherein the last frame depth image may be output by the first camera or may be obtained by other depth cameras equipped with the terminal device, which is not limited by the present disclosure.
In addition, to further improve the accuracy of the positioning of the present disclosure, the spatial projection process may also be constrained. Specifically, the terminal device may perform spatial projection on a feature point within a predetermined depth range in the feature points of the previous frame of color image acquired by the first camera by using the previous frame of depth image aligned with the previous frame of color image acquired by the first camera, so as to obtain a three-dimensional feature point of the previous frame of color image acquired by the first camera under the first camera coordinate system.
The predetermined depth range is determined based on the measuring range of the depth measurement, the values of the predetermined depth range may be different according to different types and models of the depth camera, and the specific values of the predetermined depth range are not limited in the present disclosure. For example, feature points with depth values greater than 0.5m and less than 6m are spatially projected.
Then, the terminal equipment can convert the three-dimensional characteristic points under the first camera coordinate system according to the pose of the first camera when the last frame of color image is acquired, so as to obtain the three-dimensional characteristic points of the last frame of color image acquired by the first camera under the world coordinate system. Referring to equation 2:
Wherein, Three-dimensional characteristic points of a previous frame of color image acquired by the first camera under a world coordinate system,And T w_kast is the pose of the first camera when the last frame of color image is acquired.
It should be noted that, the pose of the first camera when acquiring the previous frame of color image may be determined during the processing of the previous frame of image, that is, the pose corresponding to the previous frame is known during the processing of the current frame. For initial pose, the process of positioning initialization of the present disclosure is described.
The process of determining three-dimensional feature points of the last color image acquired by the second camera in the world coordinate system will be described.
First, the terminal device may acquire a previous frame of color image acquired by the second camera, and extract feature points of the previous frame of color image acquired by the second camera. The process of extracting the feature points is the same as that in step S602, and will not be described in detail here.
Then, the terminal device may perform spatial projection on the feature point of the previous frame of color image collected by the second camera by using the previous frame of depth image aligned with the previous frame of color image collected by the second camera, so as to obtain a three-dimensional feature point of the previous frame of color image collected by the second camera under the second camera coordinate system. Wherein the last frame depth image may be output by the second camera or may be obtained by other depth cameras equipped with the terminal device, which is not limited by the present disclosure.
Similarly, to further improve the accuracy of the positioning of the present disclosure, the spatial projection process may also be constrained. Specifically, the terminal device may perform spatial projection on a feature point within a predetermined depth range in the feature points of the previous frame of color image acquired by the second camera by using the previous frame of depth image aligned with the previous frame of color image acquired by the second camera, so as to obtain a three-dimensional feature point of the previous frame of color image acquired by the second camera under the second camera coordinate system.
The predetermined depth range is determined based on the measuring range of the depth measurement, the values of the predetermined depth range may be different according to different types and models of the depth camera, and the specific values of the predetermined depth range are not limited in the present disclosure. For example, feature points with depth values greater than 0.5m and less than 6m are spatially projected.
The terminal device may then convert the three-dimensional feature points of the last frame of color image acquired by the second camera under the second camera coordinate system into three-dimensional feature points under the first camera coordinate system using a conversion matrix between the first camera coordinate system and the second camera coordinate system.
And then, the terminal equipment can convert the three-dimensional characteristic points under the converted first camera coordinate system again according to the pose of the first camera when the last frame of color image is acquired, so as to obtain the three-dimensional characteristic points of the last frame of color image acquired by the second camera under the world coordinate system.
The above procedure is described below with reference to equation 3:
Wherein, Three-dimensional characteristic points of a previous frame of color image acquired by the second camera under the world coordinate system,For three-dimensional feature points of the previous frame of color image acquired by the second camera under the second camera coordinate system, T w_last is the pose of the first camera when the previous frame of color image is acquired, and T lr is the conversion matrix between the first camera coordinate system and the second camera coordinate system.
In combination with the above point-to-point matching relationship, fig. 7 shows a schematic diagram of the point-to-point matching of the first camera and the second camera to further implement PnP pose solving, where the relationship of 2D-2D feature point matching and the matching relationship of 3D-2D feature points of the current frame are involved.
In the process of determining the three-dimensional characteristic points of the previous frame of color image under the world coordinate system, the pose of the first camera when the previous frame of color image is acquired is utilized. The following describes a determination process of the initial pose of the first camera.
According to some embodiments of the present disclosure, first, a terminal device may acquire an initial frame color image acquired by a first camera and extract feature points of the initial frame color image acquired by the first camera. The process of extracting the feature points is the same as that in step S602, and will not be described in detail here.
Then, the terminal device may spatially project the feature points of the initial frame color image acquired by the first camera by using the initial frame depth image aligned with the initial frame color image acquired by the first camera, so as to obtain three-dimensional feature points of the initial frame color image acquired by the first camera in the first camera coordinate system.
Similarly, to further improve the accuracy of the positioning of the present disclosure, the spatial projection process may also be constrained. Specifically, the terminal device may perform spatial projection by using a feature point within a predetermined depth range from the feature points of the initial frame color image acquired by the first camera, so as to obtain a three-dimensional feature point of the initial frame color image acquired by the first camera under the first camera coordinate system.
The predetermined depth range is determined based on the measuring range of the depth measurement, the values of the predetermined depth range may be different according to different types and models of the depth camera, and the specific values of the predetermined depth range are not limited in the present disclosure. For example, feature points with depth values greater than 0.5m and less than 6m are spatially projected.
Then, the terminal equipment can determine an initial positioning result of the first camera under the first camera coordinate system according to the three-dimensional feature points, the initial rotation matrix and the initial translation vector of the initial frame color image collected by the first camera under the first camera coordinate system.
In one embodiment of the present disclosure, the initial rotation matrix may be set as an identity matrix and the translation vector may be set to [0, 0].
It should be noted that, in the case where the three-dimensional feature point, the initial rotation matrix, and the initial translation vector of the initial frame color image acquired by the first camera in the first camera coordinate system are known, only the pose of the first camera in the first camera coordinate system is determined at this time. In order to obtain the pose applied to the subsequent current frame processing process, the pose needs to be converted to obtain the pose of the first camera under the world coordinate system.
Specifically, the terminal device may convert the initial positioning result of the first camera under the first camera coordinate system by using a conversion matrix between the first camera coordinate system and the world coordinate system, so as to determine the pose of the first camera when the first camera collects the initial frame color image.
In accordance with further embodiments of the present disclosure, the determination of the initial pose for the first camera may also be combined with the feature data of the second camera, as will be described below.
In one aspect, the terminal device may determine three-dimensional feature points of the initial frame color image acquired by the first camera in the first camera coordinate system.
On the other hand, the terminal device may acquire an initial frame color image acquired by the second camera, and extract feature points of the initial frame color image acquired by the second camera. The process of extracting the feature points is the same as that in step S602, and will not be described in detail here.
The terminal device may perform spatial projection on the feature points of the initial frame color image acquired by the second camera by using the initial frame depth image aligned with the initial frame color image acquired by the second camera, so as to obtain three-dimensional feature points of the initial frame color image acquired by the second camera under the second camera coordinate system.
Similarly, the spatial projection process may also be constrained. Specifically, the terminal device may perform spatial projection by using a feature point within a predetermined depth range from the feature points of the initial frame color image acquired by the second camera, so as to obtain a three-dimensional feature point of the initial frame color image acquired by the second camera under the second camera coordinate system.
The predetermined depth range is determined based on the measuring range of the depth measurement, the values of the predetermined depth range may be different according to different types and models of the depth camera, and the specific values of the predetermined depth range are not limited in the present disclosure. For example, feature points with depth values greater than 0.5m and less than 6m are spatially projected.
Next, the terminal device may convert the three-dimensional feature point of the initial frame color image acquired by the second camera under the second camera coordinate system to the three-dimensional feature point under the first camera coordinate system using a conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera.
The converted three-dimensional feature points and the three-dimensional feature points of the initial frame color image acquired by the first camera under the first camera coordinate system can be combined to obtain combined three-dimensional feature points. It is understood that the three-dimensional feature points after the merging are three-dimensional feature points in the first camera coordinate system.
Then, the terminal equipment can determine an initial positioning result of the first camera under the first camera coordinate system according to the combined three-dimensional feature points, the initial rotation matrix and the initial translation vector. For example, the initial rotation matrix may be set as an identity matrix and the translation vector may be set to [0, 0].
Then, the terminal device can utilize a conversion matrix between the first camera coordinate system and the world coordinate system to convert an initial positioning result of the first camera under the first camera coordinate system so as to determine the pose of the first camera when the first camera collects the initial frame color image.
The process of location initialization of the embodiment of the present disclosure will be described below with reference to fig. 8.
In step S802, the terminal device may acquire an initial frame color image acquired by the first camera, and extract feature points of the initial frame color image acquired by the first camera.
In step S804, the terminal device may perform spatial projection in combination with the depth image aligned with the initial frame color image acquired by the first camera, so as to obtain three-dimensional feature points of the initial frame color image acquired by the first camera in the first camera coordinate system. As described in the above embodiment, the three-dimensional feature points determined in step S804 may further include three-dimensional feature points corresponding to the color image of the initial frame acquired by the second camera.
In step S806, the terminal device may determine an initial positioning result of the first camera under the first camera coordinate system according to the three-dimensional feature point, the initial rotation matrix, and the initial translation vector determined in step S804.
In step S808, the terminal device may convert the initial positioning result by using the conversion matrix between the first camera coordinate system and the world coordinate system, so as to determine the pose of the first camera when the first camera collects the color of the initial frame, and complete the positioning initialization.
In the above process, the conversion matrix between the first camera coordinate system and the world coordinate system is utilized, and for the predetermined conversion matrix, the embodiment of the present disclosure provides a coordinate system alignment scheme. In particular, the coordinate system alignment is achieved in combination with depth information, and for the sake of distinction, in the following embodiments, the process of coordinate system alignment is described using terms of reference depth images.
First, the terminal device may acquire a reference depth image output by the first camera.
Next, in the case where it is determined that a specified plane exists in the scene in combination with the reference depth image output by the first camera, the terminal device may determine a conversion matrix between the first camera coordinate system and the world coordinate system from a normal vector and a gravity vector of the specified plane.
Where the gravity vector may be N g (0, 1), in which case the designated plane is typically a ground plane to match the scene of the terminal device, e.g., a machine dog. However, it is understood that the designated plane may also be a plane manually designated in a particular scenario, such as a wall surface, a desktop, etc., which is not limited by the present disclosure.
If the normal vector of the designated plane is denoted as N c, N c is rotated by R wc and then coincides with N g, the alignment of the first camera coordinate system with the world coordinate system can be achieved. Wherein. R wc is a transformation matrix between the first camera coordinate system and the world coordinate system, and the rotation axis ω of R wc can be obtained by cross multiplying N g and N c, as shown in formula 4:
ω=n g×nc (equation 4)
The rotation angle θ of R wc can be multiplied by N g and N c as shown in equation 5:
The rotation axis ω and the rotation angle θ form a rotation vector between the first camera coordinate system and the world coordinate system, and the terminal device can calculate a transformation matrix R wc between the first camera coordinate system and the world coordinate system according to the rodrich formula. Thus, the thread with the aligned coordinate system ends.
In the above processing procedure, if the specified plane does not exist in the scene, the terminal device may return to the step of acquiring the reference depth image, acquire the reference depth image again, and perform the judging procedure of whether the specified plane exists.
The determination process of the specified plane is explained below.
Firstly, the terminal equipment can combine the reference depth image output by the first camera to determine the point cloud corresponding to the first camera, and record the point cloud as the reference point cloud.
According to some embodiments of the present disclosure, the terminal device determines, for each pixel point on the reference depth image output by the first camera, a three-dimensional spatial point of each pixel point on the reference depth image according to the pixel point, a depth value of the pixel point, and a camera internal parameter of the first camera. Equation 6 gives the way in which three-dimensional points are determined here:
P=z×k -1 ×p (formula 6)
Wherein, P represents a three-dimensional space point projected to space, z represents a depth value of the pixel point, K -1 represents an inverse of a camera internal reference matrix, and P represents a coordinate position of the pixel point.
In these embodiments, a reference point cloud corresponding to the first camera may be constructed from the three-dimensional spatial points obtained through this process.
According to other embodiments of the present disclosure, in one aspect, the terminal device determines, for each pixel on the reference depth image output by the first camera, a three-dimensional spatial point of each pixel on the reference depth image according to the pixel, a depth value of the pixel, and a camera reference of the first camera.
On the other hand, the terminal device may acquire the reference depth image output by the second camera, and determine a three-dimensional spatial point of each pixel point on the reference depth image output by the second camera in combination with the above formula 6.
The terminal device may convert the three-dimensional space point of each pixel point on the reference depth image output by the second camera according to the conversion matrix between the first camera coordinate system and the second camera coordinate system, so as to obtain a converted three-dimensional space point.
And combining the three-dimensional space point of each pixel point on the reference depth image output by the first camera with the converted three-dimensional space point to construct a reference point cloud corresponding to the first camera. Referring to equation 7:
Pc_mix=pc_left+t lr pc_right (formula 7)
Wherein pc_mix is the determined reference point cloud, pc_right is the three-dimensional spatial point of each pixel on the reference depth image output by the second camera, pc_left is the three-dimensional spatial point of each pixel on the reference depth image output by the first camera, and T lr is the transformation matrix between the first camera coordinate system and the second camera coordinate system.
In these embodiments, the construction of the reference point cloud fuses the information of the depth image output by the second camera, so that the spatial feature points are more comprehensive, and the accuracy of the algorithm is improved.
After determining the reference point cloud corresponding to the first camera, the terminal device may extract plane information of the reference point cloud. The plane extraction method is not limited in the present disclosure, and a ransac fitting method, a normal vector region growing method, a hierarchical clustering method, and the like can be adopted, so long as plane information in a scene can be extracted. Some embodiments of the present disclosure employ a hierarchical clustering based plane extraction algorithm peac, with reference to fig. 9, with which two planes may be extracted, fig. 9 being merely an example, with which all planes in a scene may be extracted.
It is understood that the extracted plane information includes, but is not limited to, plane id, normal vector of the plane, distance of the plane from the camera, and the like.
After extracting the plane based on the reference point cloud, the terminal device may screen the specified plane according to the plane information of the reference point cloud. Specifically, the terminal device may screen the specified plane according to distance information of the plane included in the plane information of the reference point cloud from the first camera.
In the case where the distance information includes a distance within a predetermined distance range, the terminal device may determine candidate planes corresponding to the distance, and the number of the determined candidate planes is one or more.
In the case where the number of candidate planes is one, the terminal device may determine the candidate plane as the specified plane.
In the case where the number of candidate planes is plural, the terminal device may determine a candidate plane whose distance from the first camera is closest to the distance threshold as the specified plane. Wherein the distance threshold is within the predetermined distance range.
Fig. 10 shows a schematic diagram of screening out a ground plane, with respect to the results of the plane detection, by the distance-based screening process described above, a plane such as a ceiling has been eliminated.
Taking the example that the terminal device is a robot dog, the terminal device is provided with a first camera and a second camera, the configuration positions of the two cameras are fixed, and in the implementation, the robot dog is controlled to move for a short period of time and only moves on a ground plane. Based on this a priori condition, the position of the ground plane in the first camera coordinate system is substantially fixed. The height of the ground plane from the camera is comparable to the height of the robot dog, about 0.3m. Thereby, the above-mentioned predetermined distance range can be set to 0.25m to 0.35m as the ground plane. If a plurality of candidate planes are screened out, the plane closest to the ground plane by 0.3m is taken as the ground plane.
It will be appreciated that if no ground plane is detected in this process, the control terminal device repeats the above-described process of depth image plane determination and plane screening until the terminal device detects a ground plane.
The process of coordinate system alignment of the embodiment of the present disclosure is described below with reference to fig. 11.
In step S1102, the terminal device acquires a reference depth image output by the first camera, and back-projects the reference depth image to obtain a three-dimensional spatial point in space.
In step S1104, the terminal device acquires a reference depth image output by the second camera, and back-projects the reference depth image to obtain a three-dimensional spatial point in space.
In step S1106, the terminal device converts the three-dimensional space point obtained in step S1104 to a three-dimensional space point under the first camera coordinate system.
In step S1108, the terminal device merges the three-dimensional space point obtained in step S1102 with the three-dimensional space point obtained in step S1106, so as to obtain a reference point cloud corresponding to the first camera.
In step S1110, the terminal device may extract plane information based on the reference point cloud.
In step S1112, the terminal device may screen the extracted plane to determine a ground plane;
In step S1114, the terminal device may determine a transformation matrix between the first camera coordinate system and the world coordinate system using the normal vector and the gravity vector of the ground plane to complete alignment of the first camera coordinate system and the world coordinate system.
In addition, in view of the fact that the relation between the first camera coordinate system and the second camera coordinate system is determined through calibration, a conversion matrix between the second camera coordinate system and the world coordinate system can be obtained, and therefore alignment of the first camera coordinate system, the second camera coordinate system and the world coordinate system can be achieved. Thus, the coordinate system alignment result can be applied to the pose determination process of the present disclosure.
The above-mentioned determination process of the pose to be optimized according to the embodiment of the present disclosure, although the pose to be optimized is determined, however, the pose calculation is performed by converting the feature points collected by the second camera to the first camera coordinate system together with the feature points collected by the first camera, because the feature points are from at least two cameras and are unified in coordinate system, the collected feature points are more, i.e. the feature points participating in unified processing are more comprehensive, the determined pose is more accurate, and the positioning accuracy is improved. In addition, in the pose determining process, the correlation among frames is considered, the characteristic information of the previous frame of image is combined, and the data of the previous frame is used for constraint, so that the positioning accuracy is further improved.
It should be noted that the above process of determining the pose to be optimized is only described as an example, and in a case where the terminal device is equipped with an IMU (Inertial Measurement Unit ), the pose to be optimized may also be determined in combination with inertial data sensed by the IMU, which is not limited in this disclosure.
In addition, in the embodiment of the present disclosure, a sliding window may be maintained to implement the above-mentioned pose determining method.
Specifically, when determining the pose to be optimized when the first camera collects the color image of the current frame, the terminal device may add the color image group of the current frame into the sliding window, so as to determine the target pose when the first camera collects the color image of the current frame by combining the first color image set and the second color image set contained in the sliding window. The current frame color image group comprises a current frame color image acquired by the first camera and a current frame color image acquired by the second camera. In implementation, the sliding window may be implemented using an array.
That is, after the current frame color image group is added to the sliding window, the processing of steps S52 to S56 described above may be performed.
The sliding window referred to in this disclosure may be a fixed size sliding window, the sliding window size being characterized by the maximum number of color image sets that can be included. If the sliding window is configured to be able to contain a maximum of 10 color image groups, the sliding window has a size of 10.
When adding the color image group of the current frame to the sliding window, if the number of color image groups included in the sliding window is equal to the maximum value of the color image groups that can be included in the sliding window, the color image group that is added to the sliding window earliest is removed from the sliding window. For example, the size of the sliding window is 10, and when the current frame color image group is added to the sliding window, the first color image group in the sliding window is shifted out, so that after the current frame color image group is added to the sliding window, 10 color image groups are still contained in the sliding window.
When adding the current frame color image group into the sliding window, if the number of color image groups contained in the sliding window is smaller than the maximum value of the color image groups contained in the sliding window, the pose determining process based on the first color image set and the second color image set is directly performed by using all the color image groups contained in the sliding window.
The pose determining method according to the embodiment of the present disclosure will be described below taking the case where the size of the sliding window is configured to 10 as an example.
Referring to fig. 12, when a current frame color image group is added to a sliding window, if the sliding window is full, the image group earliest in the sliding window is shifted out. In adding color image sets to the sliding window in time, the sliding window shifts out the first color image set within the window when the algorithm inputs the current frame color image set.
After the current frame color image set is added to the sliding window, the pose can be optimally solved using the constraints provided by the 10 color image sets within the sliding window. It should be appreciated that each color image group within the sliding window includes a color image captured by the first camera and a color image captured by the second camera.
The process of constructing the error constraint will be described below by taking the 9 th color image group and the 10 th color image group in the sliding window as examples. For each group, the following processing may be performed.
Referring to fig. 13, feature points in the color image captured by the first camera in the 9 th color image group that match the color image captured by the first camera in the 10 th color image group are noted asIn response, feature points in the color image captured by the first camera in the 10 th color image group, which match the color image captured by the first camera in the 9 th color image group, are recorded as/>
Characteristic points matched with the color images shot by the second camera in the 9 th color image group in the 10 th color image group are marked asIn response, feature points in the color image captured by the second camera in the 10 th color image group, which match the color image captured by the second camera in the 9 th color image group, are recorded as/>
In relation to the first camera of the camera system,Three-dimensional feature points/>, in world coordinate systemCan be expressed as formula 8:
Wherein d i is the feature point T w9 is the pose of the first camera when acquiring the 9 th color image set, it can be understood that the pose is the pose of the first camera with respect to the world coordinate system.
Thus, the feature points can be constructedE left, as shown in equation 9:
where norm () represents the normalization of the three-dimensional feature points, T w10 is the pose of the first camera when the 10 th color image group is acquired.
In the case of the second camera being aimed at,Three-dimensional feature points/>, in world coordinate systemCan be expressed as formula 10:
Wherein d j is the feature point T lr is a transformation matrix between a first camera coordinate system of a first camera and a second camera coordinate system of the second camera.
Thus, the feature points can be constructedE right, as shown in equation 11:
The above-described processing is performed for each set of color images contained within the sliding window, resulting in multiple re-projection errors for the first camera and multiple re-projection errors for the second camera.
Next, these error constraints are accumulated to construct a pose-optimized total error function e total, as shown in fig. 12:
e total=∑i∈lefT_factor‖ei2+∑j∈right_factor||ej||2 (equation 12)
The variable to be optimized is 10 poses T w1……Tw10 of the first camera corresponding to the color image in the acquisition sliding window. For the depth value variable of the feature point, the error is small because it is a known quantity read from the depth map, and this process is not the variable to be optimized.
The objective of the optimization is to minimize the total error function e total. And when the total error function e total is minimum, the pose T w1......Tw10 optimization is finished. The optimized T w10 is the target pose when the first camera described in the present disclosure acquires the current frame color image, corresponding to the input current frame color image group.
As can be seen from the above equation, the total error function e total is a nonlinear function with respect to the T w1......Tw10 variable. Thus, e=f (x) can be defined, x representing the variable to be optimized.
First, the nonlinear function may be linearized as shown in equation 13:
f (x+Δx) ≡f (x) +JΔx (formula 13)
Where J is the derivative of f (x) with Δx.
Next, it can be converted into a least squares problem as shown in equation 14
Deriving the above equation and making the derivative equal to 0, as in equation 15 and equation 16:
/>
j TJΔx=-JT f (formula 16)
Δx can be calculated according to the above formula, the variable to be optimized is updated, and the updated variable can be expressed as formula 17:
x=x+Δx (formula 17)
If the updated x is brought into the least squares equation F (x), the value of the updated x can be reduced, and the update is valid. And circularly executing the calculation process until the value of F (x) is smaller than a certain threshold value, and stopping updating, wherein the obtained x is an optimized pose result.
In order to verify the effect of the pose determination scheme of the embodiment of the disclosure, the disclosure also provides a test mode and provides a test result.
Taking the machine dog as an example, referring to fig. 14, the machine dog is controlled to walk one round along a rectangle of length 6m and width 5m, returning to the origin. 3 road signs, namely a road sign 1, a road sign 2 and a road sign 3, are arranged on a rectangular walking route, and the distances and angles between the three road signs and an origin are measured and used as true values for evaluating the positioning scheme.
It can be understood that the accuracy of the positioning scheme of the present disclosure can be evaluated by recording the pose output by the positioning algorithm when the machine dog passes through 3 road signs and comparing it with the true value.
Four sets of test data were recorded and the test results are shown in table 1.
TABLE 1
Road sign 1 Road sign 2 Road sign 3 Origin of origin Root mean square error
1 2cm/0.95° 1cm/4.7° 2cm/4.4° 5.6cm/2.6° 0.15%
2 5cm/0.7° 1cm/1.48° 3cm/2.44° 7.3cm/1.09° 0.22%
3 0cm/2.06° 4cm/1.65° 8cm/5.5° 19cm/4.6° 0.5%
4 6cm/1.14° 3cm/1.6° 1cm/1° 12cm/1.0° 0.32%
As can be seen from table 1, by applying the pose determination scheme of the embodiment of the present disclosure, the positioning distance accuracy is within 0.5%, and the rotation accuracy is within 5 °. It can be seen that the scheme of the embodiment of the disclosure can achieve the effect of higher positioning accuracy.
It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
Further, in this example embodiment, a pose determining apparatus is also provided. The pose determining device is configured on a terminal device, and the terminal device is further configured with a first camera and at least one second camera.
Fig. 15 schematically shows a block diagram of a pose determination apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 15, the pose determining apparatus 15 according to an exemplary embodiment of the present disclosure may include a first error determining module 151, a second error determining module 153, and a target pose determining module 155.
Specifically, the first error determining module 151 may be configured to obtain matching feature points between two images in a first color image set, and determine a reprojection error of the matching feature points between two images in the first color image set, where the first color image set is formed by a current frame color image collected by the first camera and a previous n frame color image collected by the first camera; the second error determining module 153 may be configured to obtain matching feature points between two images in a second color image set, and determine a re-projection error of the matching feature points between two images in the second color image set by combining a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, where the second color image set is formed by a current frame color image acquired by the second camera and a previous n frame color image acquired by the second camera; the target pose determining module 155 may be configured to optimize a pose to be optimized when the first camera collects a current frame color image based on a reprojection error of a matching feature point between two images in the first color image set and a reprojection error of a matching feature point between two images in the second color image set, so as to determine a target pose when the first camera collects the current frame color image; wherein n is a positive integer.
According to an exemplary embodiment of the present disclosure, the first set of color images includes a first color image and a second color image. In this case, the process of the first error determination module 151 determining the re-projection error of the matching feature point between the first color image and the second color image may be configured to perform: acquiring first matching feature points matched with a second color image on the first color image; determining three-dimensional feature points of the first matching feature points under a world coordinate system by using the first matching feature points, depth information of the first matching feature points and the pose of the first camera when the first camera collects the first color image; acquiring a second matching characteristic point matched with the first color image on the second color image; and determining the re-projection error of the matching feature points between the first color image and the second color image by using the second matching feature points, the three-dimensional feature points of the first matching feature points under the world coordinate system and the pose of the first camera when the second color image is acquired.
According to an exemplary embodiment of the present disclosure, the first color image is a current frame color image, and the pose when the first camera acquires the first color image is a pose to be optimized when the first camera acquires the current frame color image. In this case, referring to fig. 16, the pose determination device 16 may further include a pose estimation module 161 to be optimized, as compared to the pose determination device 15.
Specifically, the pose estimation module to be optimized 161 may be configured to perform: acquiring a current frame color image acquired by a first camera, and determining a first two-dimensional feature point matched with a previous frame color image acquired by the first camera on the current frame color image acquired by the first camera; acquiring a current frame color image acquired by a second camera, and determining a second two-dimensional feature point matched with a previous frame color image acquired by the second camera on the current frame color image acquired by the second camera; converting the second two-dimensional feature point into a third two-dimensional feature point under the first camera coordinate system by utilizing a conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera; and determining the pose to be optimized when the first camera collects the current frame of color image according to the first two-dimensional feature point, the third two-dimensional feature point, the three-dimensional feature point of the previous frame of color image collected by the first camera under the world coordinate system and the three-dimensional feature point of the previous frame of color image collected by the second camera under the world coordinate system.
According to an exemplary embodiment of the present disclosure, the pose estimation module to be optimized 161 may be further configured to perform: acquiring a previous frame of color image acquired by a first camera, and extracting feature points of the previous frame of color image acquired by the first camera; performing space projection on the characteristic points of the previous frame of color image acquired by the first camera by using the previous frame of depth image aligned with the previous frame of color image acquired by the first camera so as to obtain three-dimensional characteristic points of the previous frame of color image acquired by the first camera under a first camera coordinate system; and converting the three-dimensional characteristic points under the first camera coordinate system according to the pose of the first camera when the first camera collects the previous frame of color image, so as to obtain the three-dimensional characteristic points of the previous frame of color image collected by the first camera under the world coordinate system.
According to an exemplary embodiment of the present disclosure, the second set of color images includes a third color image and a fourth color image. In this case, the process of the second error determination module 153 determining the re-projection error of the matching feature point between the third color image and the fourth color image in conjunction with the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera may be configured to perform: acquiring a third matching characteristic point matched with a fourth color image on the third color image; determining three-dimensional feature points of the third matching feature points under the world coordinate system by using the third matching feature points, depth information of the third matching feature points, a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera and the pose of the first camera when the second camera collects the third color image; acquiring a fourth matching characteristic point matched with the third color image on the fourth color image; and determining the re-projection error of the matching feature points between the third color image and the fourth color image by using the fourth matching feature points, the three-dimensional feature points of the third matching feature points under the world coordinate system, the pose of the first camera when the second camera acquires the fourth color image and a conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera.
According to an example embodiment of the present disclosure, the target pose determination module 155 may be configured to perform: accumulating the re-projection errors of the matched characteristic points between every two images in the first color image set and the re-projection errors of the matched characteristic points between every two images in the second color image set to determine a total error function, wherein the total error function is a nonlinear function taking the pose of each color image in the first color image set collected by the first camera as a variable, and the pose of each color image in the first color image set collected by the first camera comprises the pose to be optimized when the color image of the current frame is collected by the first camera; and minimizing the total error function by using an iterative processing mode to determine the target pose when the first camera acquires the color image of the current frame.
According to an exemplary embodiment of the present disclosure, referring to fig. 17, the pose determining apparatus 17 may further include a sliding window operation module 171 as compared to the pose determining apparatus 15.
Specifically, the sliding window operating module 171 may be configured to perform: when determining the pose to be optimized when the first camera collects the color image of the current frame, adding the color image group of the current frame into the sliding window so as to combine the first color image set and the second color image set contained in the sliding window to determine the target pose when the first camera collects the color image of the current frame; the current frame color image group comprises a current frame color image acquired by the first camera and a current frame color image acquired by the second camera.
According to an exemplary embodiment of the present disclosure, the sliding window operating module 171 may be further configured to perform: when adding the color image group of the current frame to the sliding window, if the number of color image groups included in the sliding window is equal to the maximum value of the color image groups that can be included in the sliding window, the color image group that is added to the sliding window earliest is removed from the sliding window.
Since each functional module of the pose determination apparatus according to the embodiment of the present disclosure is the same as that in the above-described method embodiment, a detailed description thereof will be omitted.
Fig. 18 shows a schematic diagram of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure. The terminal device of the exemplary embodiments of the present disclosure may be configured as in the form of fig. 18. It should be noted that the electronic device shown in fig. 18 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiments of the present disclosure.
The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, enable the processor to implement the pose determination method of the exemplary embodiments of the present disclosure.
Specifically, as shown in fig. 18, the electronic device 180 includes at least: processor 1810, internal memory 1821, external memory interface 1822, universal serial bus (Universal Serial Bus, USB) interface 1830, charge management module 1840, power management module 1841, battery 1842, antenna, wireless communication module 1850, audio module 1860, display 1870, sensor module 1880, camera module 1890, and the like. The sensor module 1880 may include a depth sensor, a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
It is to be understood that the illustrated structure of the presently disclosed embodiments does not constitute a particular limitation of the electronic device 180. In other embodiments of the present disclosure, the electronic device 180 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 1810 may include one or more processing units such as, for example: the Processor 1810 may include an application Processor (Application Processor, AP), a modem Processor, a graphics Processor (Graphics Processing Unit, GPU), an image signal Processor (IMAGE SIGNAL Processor, ISP), a controller, a video codec, a digital signal Processor (DIGITAL SIGNAL Processor, DSP), a baseband Processor and/or a neural network Processor (Neural-network Processing Unit, NPU), and the like. Wherein the different processing units may be separate devices or may be integrated in one or more processors. In addition, a memory may be provided in the processor 1810 for storing instructions and data.
The electronic device 180 may implement a photographing function through an ISP, a camera module 1890, a video codec, a GPU, a display screen 1870, an application processor, and the like. In some embodiments, the electronic device 180 may include at least two camera modules 1890, where one camera module is determined to be a reference camera and the feature data collected by the other camera modules is transferred to the coordinate system of the reference camera for processing when implementing the disclosed solution. For example, the electronic device 180 is configured with two REALSENSED cameras 455.
The internal memory 1821 may be used to store computer-executable program code, including instructions. The internal memory 1821 may include a stored program area and a stored data area. The external memory interface 1822 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 180.
The present disclosure also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiments; or may exist alone without being incorporated into the electronic device.
The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The computer-readable storage medium carries one or more programs which, when executed by one such electronic device, cause the electronic device to implement the methods as described in the embodiments of the present disclosure.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A pose determination method, characterized by being applied to a terminal device, the terminal device being configured with a first camera and at least one second camera, the pose determination method comprising:
Acquiring matching feature points between every two images in a first color image set, and determining a re-projection error of the matching feature points between every two images in the first color image set, wherein the first color image set consists of a current frame color image acquired by a first camera and a previous n frames color image acquired by the first camera;
Acquiring matching feature points between every two images in a second color image set, and determining a re-projection error of the matching feature points between every two images in the second color image set by combining a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, wherein the second color image set consists of a current frame color image acquired by the second camera and a previous n frames color image acquired by the second camera;
Optimizing the pose to be optimized when the first camera collects the current frame color image based on the re-projection errors of the matching feature points between every two images in the first color image set and the re-projection errors of the matching feature points between every two images in the second color image set so as to determine the target pose when the first camera collects the current frame color image;
Wherein n is a positive integer.
2. The pose determination method according to claim 1, wherein the first set of color images comprises a first color image and a second color image; wherein determining the re-projection error of the matching feature point between the first color image and the second color image comprises:
Acquiring first matching feature points matched with the second color image on the first color image;
determining three-dimensional feature points of the first matching feature points under a world coordinate system by using the first matching feature points, depth information of the first matching feature points and pose of the first camera when the first camera collects the first color image;
Acquiring a second matching characteristic point matched with the first color image on the second color image;
And determining a re-projection error of the matching feature points between the first color image and the second color image by using the second matching feature points, the three-dimensional feature points of the first matching feature points under a world coordinate system and the pose of the first camera when the second color image is acquired.
3. The pose determination method according to claim 2, wherein the first color image is a current frame color image, and the pose when the first camera acquires the first color image is a pose to be optimized when the first camera acquires the current frame color image; the pose determining method further comprises the following steps:
Acquiring a current frame color image acquired by the first camera, and determining a first two-dimensional feature point matched with a previous frame color image acquired by the first camera on the current frame color image acquired by the first camera;
Acquiring a current frame color image acquired by the second camera, and determining a second two-dimensional feature point matched with a previous frame color image acquired by the second camera on the current frame color image acquired by the second camera;
Converting the second two-dimensional feature point into a third two-dimensional feature point under the first camera coordinate system by utilizing a conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera;
And determining the pose to be optimized when the first camera collects the current frame of color image according to the first two-dimensional feature point, the third two-dimensional feature point, the three-dimensional feature point of the last frame of color image collected by the first camera under the world coordinate system and the three-dimensional feature point of the last frame of color image collected by the second camera under the world coordinate system.
4. The pose determination method according to claim 3, characterized in that the pose determination method further comprises:
Acquiring a previous frame of color image acquired by the first camera, and extracting characteristic points of the previous frame of color image acquired by the first camera;
Performing space projection on characteristic points of a previous frame of color image acquired by the first camera by using a previous frame of depth image aligned with the previous frame of color image acquired by the first camera so as to obtain three-dimensional characteristic points of the previous frame of color image acquired by the first camera under the first camera coordinate system;
And converting the three-dimensional characteristic points under the first camera coordinate system according to the pose of the first camera when the first camera collects the previous frame of color image, so as to obtain the three-dimensional characteristic points of the previous frame of color image collected by the first camera under the world coordinate system.
5. The pose determination method according to claim 1, wherein the second set of color images comprises a third color image and a fourth color image; wherein determining a re-projection error of a matching feature point between the third color image and the fourth color image in combination with a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera comprises:
Acquiring a third matching characteristic point matched with the fourth color image on the third color image;
Determining three-dimensional feature points of the third matching feature points under a world coordinate system by using the third matching feature points, depth information of the third matching feature points, a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, and a pose of the first camera when the second camera collects the third color image;
Acquiring a fourth matching characteristic point matched with the third color image on the fourth color image;
And determining a reprojection error of the matching feature points between the third color image and the fourth color image by using the fourth matching feature points, the three-dimensional feature points of the third matching feature points under a world coordinate system, the pose of the first camera when the second camera acquires the fourth color image, and a conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera.
6. The pose determination method according to claim 1, wherein optimizing the pose to be optimized when the first camera acquires the current frame color image based on the re-projection error of the matching feature points between two images in the first color image set and the re-projection error of the matching feature points between two images in the second color image set to determine the target pose when the first camera acquires the current frame color image comprises:
Accumulating the re-projection errors of the matched feature points between every two images in the first color image set and the re-projection errors of the matched feature points between every two images in the second color image set to determine a total error function, wherein the total error function is a nonlinear function taking the pose of each color image in the first color image set acquired by the first camera as a variable, and the pose of each color image in the first color image set acquired by the first camera comprises the pose to be optimized when the current frame of color image is acquired by the first camera;
and minimizing the total error function by using an iterative processing mode to determine the target pose of the first camera when the first camera acquires the current frame color image.
7. The pose determination method according to any one of claims 1 to 6, characterized in that the pose determination method further comprises:
when determining the pose to be optimized when the first camera collects the current frame color image, adding the current frame color image group into a sliding window so as to determine the target pose when the first camera collects the current frame color image by combining the first color image set and the second color image set contained in the sliding window;
The current frame color image group comprises a current frame color image acquired by the first camera and a current frame color image acquired by the second camera.
8. The pose determination method according to claim 7, characterized in that the pose determination method further comprises:
When adding the current frame color image group to the sliding window, if the number of color image groups contained in the sliding window is equal to the maximum value of the color image groups which can be contained in the sliding window, the color image group which is added to the sliding window earliest is removed from the sliding window.
9. A pose determining apparatus, characterized by being applied to a terminal device configured with a first camera and at least one second camera, comprising:
The first error determining module is used for acquiring matching characteristic points between every two images in a first color image set and determining the re-projection error of the matching characteristic points between every two images in the first color image set, wherein the first color image set consists of a current frame of color images acquired by the first camera and a previous n frames of color images acquired by the first camera;
The second error determining module is used for acquiring matching characteristic points between every two images in a second color image set, determining the re-projection error of the matching characteristic points between every two images in the second color image set by combining a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera, wherein the second color image set consists of a current frame color image acquired by the second camera and a previous n frame color image acquired by the second camera;
The target pose determining module is used for optimizing the pose to be optimized when the first camera collects the current frame color image based on the re-projection error of the matching feature points between every two images in the first color image set and the re-projection error of the matching feature points between every two images in the second color image set so as to determine the target pose when the first camera collects the current frame color image;
Wherein n is a positive integer.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the pose determination method according to any of claims 1 to 8.
11. An electronic device, comprising:
A processor;
A memory for storing one or more programs that, when executed by the processor, cause the processor to implement the pose determination method according to any of claims 1 to 8.
CN202211336046.1A 2022-10-28 2022-10-28 Pose determination method and device, computer readable storage medium and electronic equipment Pending CN117994332A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211336046.1A CN117994332A (en) 2022-10-28 2022-10-28 Pose determination method and device, computer readable storage medium and electronic equipment
PCT/CN2023/118752 WO2024087927A1 (en) 2022-10-28 2023-09-14 Pose determination method and apparatus, and computer-readable storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211336046.1A CN117994332A (en) 2022-10-28 2022-10-28 Pose determination method and device, computer readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117994332A true CN117994332A (en) 2024-05-07

Family

ID=90829971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211336046.1A Pending CN117994332A (en) 2022-10-28 2022-10-28 Pose determination method and device, computer readable storage medium and electronic equipment

Country Status (2)

Country Link
CN (1) CN117994332A (en)
WO (1) WO2024087927A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019050417A1 (en) * 2017-09-06 2019-03-14 Auckland Uniservices Limited Stereoscopic system calibration and method
CN111415387B (en) * 2019-01-04 2023-12-29 南京人工智能高等研究院有限公司 Camera pose determining method and device, electronic equipment and storage medium
CN114677439A (en) * 2022-03-29 2022-06-28 Oppo广东移动通信有限公司 Camera pose determination method and device, electronic equipment and storage medium
CN114998433A (en) * 2022-05-31 2022-09-02 Oppo广东移动通信有限公司 Pose calculation method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
WO2024087927A1 (en) 2024-05-02

Similar Documents

Publication Publication Date Title
US7554575B2 (en) Fast imaging system calibration
CN109887003B (en) Method and equipment for carrying out three-dimensional tracking initialization
EP3028252B1 (en) Rolling sequential bundle adjustment
Clipp et al. Parallel, real-time visual SLAM
AU2011265430B2 (en) 3D reconstruction of partially unobserved trajectory
CN110111388B (en) Three-dimensional object pose parameter estimation method and visual equipment
CN106210538A (en) Show method and apparatus and the program of image based on light field on a user device
Pintore et al. Omnidirectional image capture on mobile devices for fast automatic generation of 2.5 D indoor maps
KR102398478B1 (en) Feature data management for environment mapping on electronic devices
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
WO2011047888A1 (en) Method of providing a descriptor for at least one feature of an image and method of matching features
CN110361005B (en) Positioning method, positioning device, readable storage medium and electronic equipment
CN111709973A (en) Target tracking method, device, equipment and storage medium
CN109613974B (en) AR home experience method in large scene
CN115035235A (en) Three-dimensional reconstruction method and device
CN110310325B (en) Virtual measurement method, electronic device and computer readable storage medium
CN111829522B (en) Instant positioning and map construction method, computer equipment and device
Bao et al. Robust tightly-coupled visual-inertial odometry with pre-built maps in high latency situations
JP2014102805A (en) Information processing device, information processing method and program
CN110849380A (en) Map alignment method and system based on collaborative VSLAM
Laskar et al. Robust loop closures for scene reconstruction by combining odometry and visual correspondences
CN117994332A (en) Pose determination method and device, computer readable storage medium and electronic equipment
CN111260544B (en) Data processing method and device, electronic equipment and computer storage medium
CN116136408A (en) Indoor navigation method, server, device and terminal
CN117994333A (en) Pose determination method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination