CN111105462B

CN111105462B - Pose determining method and device, augmented reality equipment and readable storage medium

Info

Publication number: CN111105462B
Application number: CN201911403477.3A
Authority: CN
Inventors: 范锡睿; 杨东清; 孙峰; 陆柳慧; 盛兴东
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2024-05-28
Anticipated expiration: 2039-12-30
Also published as: CN111105462A

Abstract

The present disclosure provides a pose determination method, comprising: obtaining a first image obtained by a left camera and a second image obtained by a right camera in the binocular camera; obtaining at least one group of matching feature points according to the first image and the second image; determining a depth value and an error value of the depth value of a physical space where the binocular camera is positioned according to at least one group of matching feature points; and determining the pose of the binocular camera according to the depth value and the error value of the depth value. The disclosure also provides a pose determination device, an augmented reality device and a computer readable storage medium.

Description

Pose determining method and device, augmented reality equipment and readable storage medium

Technical Field

The present disclosure relates to a pose determination method and apparatus, an augmented reality device, and a readable storage medium.

Background

With the rapid development of technology, a new technology of integrating real world information and virtual world information in a seamless manner, namely augmented reality, is expected to be applied to more scenes to enrich the real world and build a richer and better world.

In the related art, in the positioning method of the mainstream augmented reality device, a key step is to acquire depth information of a surrounding environment. In order to acquire the depth information, left and right external parameters calibrated in advance by a binocular camera can be adopted to acquire the depth information. However, the depth calculated by the binocular camera is often highly uncertain due to various factors, which can certainly affect the final positioning accuracy of the augmented reality device.

Disclosure of Invention

One aspect of the present disclosure provides a pose determination method, the method comprising: obtaining a first image obtained by a left camera and a second image obtained by a right camera in the binocular camera; obtaining at least one group of matching feature points according to the first image and the second image; determining a depth value and an error value of the depth value of a physical space where the binocular camera is positioned according to at least one group of matching feature points; and determining the pose of the binocular camera according to the depth value and the error value of the depth value.

Optionally, the obtaining at least one set of matching feature points includes: identifying a first image and a second image to obtain a first characteristic point group aiming at the first image and a second characteristic point group aiming at the second image; and determining the matched first feature point and second feature point according to the first feature point group and the second feature point group so as to obtain at least one group of matched feature points. Wherein each set of matching feature points includes a first feature point and a second feature point that match each other.

Optionally, the pose determining method further includes, before determining the depth value and the error value of the depth value of the physical space where the binocular camera is located: calibrating the binocular camera to obtain left and right external parameters of the binocular camera; and determining the baseline length of the binocular camera according to the left and right external parameters.

Optionally, determining the depth value and the error value of the depth value of the physical space where the binocular camera is located includes: determining depth values of target positions in the physical space corresponding to at least one group of matching feature points respectively to obtain at least one depth value for at least one group of matching feature points; and determining respective error values for the at least one depth value based on the at least one depth value, the baseline length, and the target error model.

Optionally, determining the pose of the binocular camera includes: determining the estimated pose of the binocular camera according to at least one group of matching characteristic points and the left and right external parameters; according to the depth value and the error value of the depth value, adjusting the target optimization model to obtain an adjusted optimization model; and optimizing the estimated pose according to the adjusted optimization model to obtain the pose of the binocular camera.

Optionally, the depth values include at least one depth value for at least one set of matching feature points; the error values of the depth values comprise respective error values for at least one depth value; adjusting the target optimization model, the obtaining the adjusted optimization model comprises: determining a depth residual for the binocular camera according to at least one depth value and respective error values of the at least one depth value; and adjusting the target optimization model according to the depth residual error to obtain an adjusted optimization model.

Another aspect of the present disclosure provides a pose determination apparatus, the apparatus comprising: the image acquisition module is used for acquiring a first image obtained by a left camera and a second image obtained by a right camera in the binocular camera; the matching characteristic point obtaining module is used for obtaining at least one group of matching characteristic points according to the first image and the second image; the numerical value determining module is used for determining a depth value and an error value of the depth value of the physical space where the binocular camera is positioned according to at least one group of matching characteristic points; and the pose determining module is used for determining the pose of the binocular camera according to the depth value and the error value of the depth value.

Optionally, the matching feature point obtaining module includes: the characteristic point obtaining sub-module is used for identifying the first image and the second image and obtaining a first characteristic point group aiming at the first image and a second characteristic point group aiming at the second image; and the characteristic point matching submodule is used for determining the matched first characteristic point and second characteristic point according to the first characteristic point group and the second characteristic point group so as to obtain at least one group of matched characteristic points. Wherein each set of matching feature points includes a first feature point and a second feature point that match each other.

Another aspect of the present disclosure provides an augmented reality device, the device comprising: the binocular camera comprises a left camera and a right camera, wherein the left camera is used for shooting a first image of a physical space where the binocular camera is positioned; the right camera is used for shooting a second image of the physical space where the binocular camera is located, and the first image and the second image are images of two different angles of the physical space. One or more processors coupled to the binocular camera for acquiring the first image and the second image; and a storage device for storing one or more programs. Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the pose determination method described above.

The present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described pose determination method.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a pose determination method and apparatus, an augmented reality device, and a readable storage medium according to an embodiment of the present disclosure;

fig. 2 schematically illustrates a flowchart of a pose determination method according to an exemplary embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of deriving at least one set of matching feature points in accordance with an embodiment of the present disclosure;

fig. 4 schematically illustrates a flowchart of a pose determination method according to an exemplary embodiment two of the present disclosure;

FIG. 5 schematically illustrates a flow chart for determining depth values and error values for depth values for a physical space in which a binocular camera is located, according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of determining pose of a binocular camera according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow diagram of an adapted optimization model in accordance with an embodiment of the disclosure;

Fig. 8 schematically illustrates a block diagram of a pose determination apparatus according to an embodiment of the present disclosure; and

Fig. 9 schematically illustrates a block diagram of a configuration of an augmented reality device according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a convention should be interpreted in accordance with the meaning of one of skill in the art having generally understood the convention (e.g., "a system having at least one of A, B and C" would include, but not be limited to, systems having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart. The techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer-readable storage medium having instructions stored thereon, the computer program product being for use by or in connection with an instruction execution system.

The embodiment of the disclosure provides a pose determining method, which comprises the following steps: obtaining a first image obtained by a left camera and a second image obtained by a right camera in the binocular camera; obtaining at least one group of matching feature points according to the first image and the second image; determining a depth value and an error value of the depth value of a physical space where the binocular camera is positioned according to at least one group of matching feature points; and determining the pose of the binocular camera according to the depth value and the error value of the depth value.

According to the pose determining method, when the pose of the binocular camera is determined, the accuracy of the determined pose of the camera can be at least partially improved by introducing the error value of the depth value. The camera pose with high accuracy can be used for constructing more accurate three-dimensional structure information, so that the fusion degree of a virtual image rendered by the augmented reality equipment and a real environment is improved, and the user experience is improved.

Fig. 1 schematically illustrates an application scenario diagram of a pose determination method and apparatus, an augmented reality device, and a readable storage medium according to an embodiment of the present disclosure.

As shown in fig. 1, the application scene 100 includes an augmented reality device 110, and the augmented reality device 110 is provided with a binocular camera 111. The binocular camera 111 may include, for example, a left camera and a right camera, where the left camera and the right camera can collect images of an environment within a visual range, and the images collected by the left camera and the right camera are images of different angles in the same physical space.

The augmented reality device 110 may further include a display system, for example, a combination of a display screen and optical elements such as a prism and an optical waveguide, and the display system may display an environmental image of a physical space where a user is located.

According to embodiments of the present disclosure, to facilitate rendering a virtual image in an actual environment image, it is necessary to accurately construct three-dimensional structure information of a space in which an augmented reality device is located, for example, the three-dimensional structure information may be constructed using a visual SLAM (Simultaneous localization AND MAPPING) technique. Among these, one of the key problems in SLAM is solving the pose of a binocular camera.

The binocular camera generally firstly calibrates left and right external parameters, and then calculates an absolute depth value of a target position in a physical space according to the left and right external parameters. But the depth value has some error related to the distance of the target position from the binocular camera, the performance of the binocular camera itself, etc. This may result in a large uncertainty in the absolute depth value calculated. In order to avoid the effect of the depth error on the SLAM system solving the pose of the binocular camera, it may be considered to use other sensor information (e.g., inertial sensing unit) to reduce the effect of the depth error on the pose of the binocular camera. However, the method brings additional software and hardware cost, and finally causes the pose of the binocular camera obtained by positioning to have size drift due to the influence of depth errors.

Considering that the pose of the camera is generally determined by adopting a method of minimizing an error function when the SLAM technology is used for positioning the pose, the depth error can be considered to be used as one of error items of the error function to perform the minimization calculation when the error function is minimized, so that the accuracy of the determined pose is improved.

Note that the pose determination method of the embodiment of the present disclosure may be performed by the augmented reality device 110, for example. Accordingly, the pose determination apparatus of the embodiments of the present disclosure may be provided in the augmented reality device 110, for example. It will be appreciated that the configuration of the augmented reality device in fig. 1 is merely an example to facilitate understanding of the present disclosure, which is not limited thereto.

The pose determining method provided by the present disclosure will be described in detail below with reference to fig. 2 to 7.

Fig. 2 schematically illustrates a flowchart of a pose determination method according to an exemplary embodiment of the present disclosure.

As shown in fig. 2, the pose determination method of this embodiment may include, for example, operations S210 to S240.

In operation S210, a first image obtained by a left camera and a second image obtained by a right camera of the binocular camera are obtained. The first image may be, for example, an image in a visual range captured by a left camera, and the second image may be an image in a visual range captured by a right camera. The first image and the second image are images which are shot at the same moment and have different angles in the same physical space.

In operation S220, at least one set of matching feature points is obtained from the first image and the second image. The operation S220 may include, for example: firstly, comparing the first image with the second image to determine the same object in the first image and the second image; and then extracting the characteristics of the same object in the two images, and taking the characteristics of the same object in the two images as a group of matching characteristic points. Or the operation S220 may also be implemented by the flow described in fig. 3, for example, and will not be described in detail herein.

In operation S230, a depth value and an error value of the depth value of the physical space where the binocular camera is located are determined according to at least one set of matching feature points.

In accordance with an embodiment of the present disclosure, this operation S230 may employ, for example, triangulation to determine depth values of the object characterized by each set of matching feature points relative to the binocular camera. Wherein, the depth value of the object determined by the matching feature points is considered to have a certain error, the error depth value, the baseline length of the binocular camera, the angle of the object and the binocular camera, and the like. Thus, operation S230 may determine an error value of the depth value according to the depth value and the parameters, for example. Specifically, this operation S230 may determine the depth value and the error value of the depth value through, for example, the flow described in fig. 5, which will not be described in detail herein.

In operation S240, the pose of the binocular camera is determined according to the depth value and the error value of the depth value.

According to an embodiment of the present disclosure, the operation S240 may include, for example: firstly, according to the coordinate value of each group of matching feature points in the image, which is determined in operation S220, the estimated pose of the binocular camera is determined, and then the estimated pose is optimized by taking the depth value and the error value of the depth value as the constraint condition of a preset optimization model, so as to obtain the optimized pose. And finally, taking the optimized pose as the determined pose of the binocular camera. The pose of the binocular camera may be determined, for example, by the process described in fig. 6, and will not be described in detail herein.

In summary, according to the pose determining method of the embodiment, when determining the pose of the binocular camera, the accuracy of the determined pose of the camera can be at least partially improved by introducing the error value of the depth value. The camera pose with high accuracy can be used for constructing more accurate three-dimensional structure information, so that the fusion degree of a virtual image rendered by the augmented reality equipment and a real environment is improved, and the user experience is improved.

Fig. 3 schematically illustrates a flow chart of deriving at least one set of matching feature points in accordance with an embodiment of the present disclosure.

As shown in fig. 3, operation S220 of obtaining at least one set of matching feature points may include, for example, operations S321 to S322.

In operation S321, the first image and the second image are identified, resulting in a first set of feature points for the first image and a second set of feature points for the second image.

According to an embodiment of the present disclosure, the first image and the second image may be identified by using, for example, SIFT (Scale-INVARIANT FEATURE TRANSFORM ) feature extraction algorithm, HOG feature extraction method, or neural network obtained by pre-training, so as to extract a first feature point group and a second feature point group.

The first characteristic point group comprises a plurality of first characteristic points extracted from the first image, and the second characteristic point group comprises a plurality of second characteristic points extracted from the first image. The feature points may comprise, for example, edge points of the object or corner points of the object, etc.

In operation S322, the matched first and second feature points are determined according to the first and second feature point groups to obtain at least one set of matched feature points.

According to an embodiment of the present disclosure, this operation S322 may include, for example: and comparing each first characteristic point included in the first characteristic point group with a plurality of second characteristic points included in the second characteristic point group, and determining the second characteristic points with the similarity higher than the preset similarity (such as 50 percent) with each first characteristic point. The first feature point and the second feature point which are higher than the predetermined similarity are determined as the matched first feature point and the second feature point, and the matched first feature point and the matched second feature point are combined into a group of matched feature points. Wherein the similarity may refer to, for example, color similarity, size similarity, and/or edge similarity, etc., and the above predetermined similarity is merely an example to facilitate understanding of the present disclosure, which is not limited by the present disclosure.

According to the embodiment of the disclosure, in order to facilitate determining the depth value and the error value of the depth value, the binocular camera should be calibrated in advance to obtain the left and right external parameters of the binocular camera, and the like.

Fig. 4 schematically illustrates a flowchart of a pose determination method according to an exemplary embodiment two of the present disclosure.

As shown in fig. 4, the pose determination method of the embodiment may further include operations S450 to S460 in addition to operations S210 to S240, and the operations S450 to S460 may be performed before operation S230, for example.

In operation S450, the binocular camera is calibrated to obtain the left and right external parameters of the binocular camera.

According to the embodiment of the disclosure, the left and right external parameters can represent the pose relationship of the left camera and the right camera, and specifically can be a transformation relationship between a three-dimensional coordinate system established based on the left camera and a three-dimensional coordinate system established based on the right camera. For a point in the three-dimensional coordinate system established based on the left camera, the coordinate value of the point in the three-dimensional coordinate system established based on the right camera can be obtained through transformation of the rotation matrix R and the translation vector T. The rotation vector R and the translation vector T are left and right external parameters obtained through calibration.

In operation S460, a baseline length of the binocular camera is determined according to the left and right external parameters.

According to an embodiment of the present disclosure, the baseline of the binocular camera refers to: the length of the base line refers to the length of the line between the optical center of the left camera and the optical center of the right camera. The baseline length may be, for example, the value of the first element in translation vector T.

Fig. 5 schematically illustrates a flowchart for determining depth values and error values of the depth values of a physical space in which a binocular camera is located, according to an embodiment of the present disclosure.

As shown in fig. 5, the operation S230 of determining the depth value and the error value of the depth value of the physical space where the binocular camera is located may include operations S531 to S532, for example.

In operation S531, depth values of the target positions in the physical space corresponding to the at least one set of matching feature points are determined, and at least one depth value for the at least one set of matching feature points is obtained.

According to the embodiments of the present disclosure, assuming that the left camera and the right camera are located on the same plane, i.e., the optical axes are parallel, and the parameters (e.g., focal length f) of the two cameras are identical, the depth z=f×b/a of the target position corresponding to each set of matching feature points from the camera may be determined according to the triangle similarity principle. Where f is the focal length of the two cameras and b is the left and right camera baselines of the two cameras. a is a relation between each pixel point of a first image shot by a left camera and a corresponding pixel point in a second image shot by a right camera, and can be determined according to coordinate values of a first feature point and coordinate values of a second feature point in a group of matched feature points. The coordinate value of the first feature point refers to the coordinate value of the first feature point in a two-dimensional coordinate system established based on the first image, and the coordinate value of the second feature point refers to the coordinate value of the second feature point in the two-dimensional coordinate system established based on the second image.

Operation S531 may include, for example: for each group of matching feature points, the vertical distance between the points in the physical space corresponding to each group of matching feature points and the base line of the binocular camera is obtained by adopting the method and is used as the depth value of each group of matching feature points, and at least one depth value corresponding to at least one group of matching feature points is obtained.

In operation S532, respective error values of the at least one depth value are determined based on the at least one depth value, the baseline length, and the target error model.

According to embodiments of the present disclosure, the target error model may include, for example, a reduced model derived from triangulation and a small hole imaging model. The reduced model may be expressed, for example, using the following formula:

Wherein Error is the Error value of the depth value; k is a constant determined from the difference between the true value of the depth value determined a plurality of times and the measured value; d is a depth value; b is the base line length of the binocular camera, and θ is the included angle between the line between the target position corresponding to the depth value and the base line center point and the vertical line of the base line. For at least one depth value for at least one set of matching feature points calculated in operation S531, the respective error values of the at least one depth value may be calculated by the above formula. It is to be understood that the above brief description is merely exemplary to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto. The reduced model may be specifically determined based on the relationship between the depth value and each parameter affecting the depth value.

Fig. 6 schematically illustrates a flowchart of determining pose of a binocular camera according to an embodiment of the present disclosure.

As shown in fig. 6, operation S240 of determining the pose of the binocular camera may include operations S641 to S643, for example.

In operation S641, the estimated pose of the binocular camera is determined according to at least one set of matching feature points and the left and right eye external parameters.

According to an embodiment of the present disclosure, the operation S641 may include, for example: the coordinate value of the target position in the camera coordinate system in the physical space corresponding to each group of matching feature points is determined according to the first coordinate value of the first feature point in the two-dimensional coordinate system established based on the first image, the second coordinate value of the second feature point in the two-dimensional coordinate system established based on the second image and the left and right external parameters. And then determining the coordinate value of the target position under the world coordinate system according to the conversion relation between the camera coordinate system and the world coordinate system. And finally, calculating to obtain the estimated pose of the camera according to the coordinate value of the target position in the world coordinate system and the coordinate value of the target position in the camera coordinate system. This operation S641 may calculate the initial pose of the camera using, for example, pnP (PERSPECTIVE-n-Points) algorithm.

In operation S642, the target optimization model is adjusted according to the depth value and the error value of the depth value, and the adjusted optimization model is obtained.

According to an embodiment of the present disclosure, in order to calculate the pose of the binocular camera, for example, an error function (such as a photometric error, a reprojection residual error, a 3D geometric error, etc.) using the pose of the camera as a variable may be used, and the pose of the binocular camera is an accurate pose when an error corresponding to the error function takes a minimum value. Thus, the target optimization model may be a model that minimizes the aforementioned errors.

According to embodiments of the present disclosure, the target optimization model may include, for example, a BA (Bundle adjustment) model. The essence of this BA model is to optimize pose while minimizing the re-projection residual. The re-projection residual is the difference between the projection of the target position in the physical space on the image plane (the pixel point on the image acquired by the left camera or the right camera) and the re-projection (the virtual pixel point acquired by calculation). Wherein, the reprojection means: first projection is carried out, namely, a camera projects points in a physical space onto a shot image when shooting, then, some characteristic points are triangulated by utilizing the shot image, and the positions of the points in the physical space are determined by constructing triangles by utilizing geometric information. And finally, performing secondary projection by using the determined positions of the points in the physical space and the initial camera pose to obtain virtual pixel points. The initial camera pose may be, for example, the estimated pose described above.

The adjustment of the target optimization model in operation S642 may be, for example, adding a residual term for the error value of the depth value on the basis of the re-projection residual. This operation S642 may be implemented, for example, by the flow described in fig. 7, and will not be described in detail herein.

In operation S643, the estimated pose is optimized according to the adjusted optimization model, and the pose of the binocular camera is obtained.

The operation S643 may include, for example: the camera pose when the adjusted optimization model takes the minimum value is obtained through iterative calculation by taking the estimated pose as an initial value through methods such as a gradient descent method, a Newton method, a Gaussian-Newton method, a Levenberg-Marquardt method (Levenberg-Marquardt) and the like, and the camera pose when the adjusted optimization model takes the minimum value is taken as the pose of the finally determined binocular camera.

FIG. 7 schematically illustrates a flow chart of an adjusted optimization model in accordance with an embodiment of the disclosure.

As shown in fig. 7, operation S642 of obtaining the adjusted optimization model may include, for example, operations S7421 to S7422.

In operation S7421, a depth residual for the binocular camera is determined according to the at least one depth value and the respective error values of the at least one depth value. According to an embodiment of the present disclosure, the operation S7421 may include, for example: the square value of the error value of each depth value is calculated first to obtain at least one square value. The at least one squared value is then summed to obtain a depth residual for the binocular camera.

In operation S7422, the target optimization model is adjusted according to the depth residual, and the adjusted optimization model is obtained. According to an embodiment of the present disclosure, the operation S7422 may include, for example: and taking the depth residual error as one of summation items in the target optimization model to obtain an adjusted optimization model.

Fig. 8 schematically shows a block diagram of the configuration of the pose determination apparatus according to the embodiment of the present disclosure.

As shown in fig. 8, the pose determining apparatus 800 may include, for example, an image obtaining module 810, a matching feature point obtaining module 820, a numerical value determining module 830, and a pose determining module 840.

The image obtaining module 810 is for obtaining a first image obtained by a left camera and a second image obtained by a right camera of the binocular camera (operation S210).

The matching feature point obtaining module 820 is configured to obtain at least one set of matching feature points according to the first image and the second image (operation S220).

The numerical value determining module 830 is configured to determine a depth value and an error value of the depth value of the physical space where the binocular camera is located according to at least one set of matching feature points (operation S230).

The pose determining module 840 is configured to determine the pose of the binocular camera according to the depth value and the error value of the depth value (operation S240).

According to an embodiment of the present disclosure, as shown in fig. 8, the matching feature point obtaining module 820 may include, for example, a feature point obtaining sub-module 821 and a feature point matching sub-module 822. The feature point obtaining sub-module 821 is used to identify the first image and the second image, and obtains a first feature point group for the first image and a second feature point group for the second image (operation S321). The feature point matching sub-module 822 is configured to determine a matched first feature point and second feature point according to the first feature point group and the second feature point group, so as to obtain at least one set of matched feature points (operation S322). Wherein each set of matching feature points includes a first feature point and a second feature point that match each other.

According to an embodiment of the present disclosure, as shown in fig. 8, the pose determination apparatus 800 may further include, for example, a camera calibration module 850 and a baseline length determination module 860. The camera calibration module 850 is configured to calibrate the binocular camera to obtain left and right external parameters of the binocular camera before the value determination module 830 determines the depth value and the error value of the depth value of the physical space in which the binocular camera is located (operation S450). The baseline length determination module 860 is used to determine the baseline length of the binocular camera according to the left and right external parameters (operation S550).

According to an embodiment of the present disclosure, as shown in fig. 8, the numerical determination module 830 may include, for example, a depth value determination submodule 831 and an error value determination submodule 832. The depth value determining sub-module 831 is configured to determine a depth value of a target position in the physical space corresponding to each of at least one set of matching feature points, to obtain at least one depth value for at least one set of matching feature points (operation S531). The error value determination submodule 832 is configured to determine respective error values of the at least one depth value according to the at least one depth value, the baseline length, and the target error model (operation S532).

According to an embodiment of the present disclosure, as shown in fig. 8, the pose determination module 840 may include, for example, a pre-estimation sub-module 841, a model adjustment sub-module 842, and a pose determination sub-module 843. The estimating sub-module 841 is configured to determine an estimated pose of the binocular camera according to at least one set of matching feature points and the left and right eye external parameters (operation S641). The model adjustment sub-module 842 is configured to adjust the target optimization model according to the depth value and the error value of the depth value, and obtain an adjusted optimization model (operation S642). The pose determination submodule 843 is configured to optimize the estimated pose according to the adjusted optimization model to obtain the pose of the binocular camera (operation S643).

According to an embodiment of the present disclosure, the depth values comprise at least one depth value for at least one set of matching feature points; the error values of the depth values comprise respective error values for at least one depth value. The model adjustment sub-module 842 may be specifically configured to perform the following operations: determining a depth residual for the binocular camera based on the at least one depth value and the respective error values of the at least one depth value (operation S7421); and adjusting the target optimization model according to the depth residual error, resulting in an adjusted optimization model (operation S7422).

Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Or one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.

As shown in fig. 9, the augmented reality device 900 includes a processor 910, a computer-readable storage medium 920, and a binocular camera 930. The augmented reality device 900 may be, for example, the augmented reality device 110 described in fig. 1, and may perform an information processing method according to an embodiment of the present disclosure.

In particular, processor 910 can include, for example, a general purpose microprocessor, an instruction set processor, and/or an associated chipset and/or special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 910 may also include on-board memory for caching purposes. Processor 910 may be a single processing unit or multiple processing units for performing different actions in accordance with the method flows of embodiments of the disclosure.

Computer-readable storage medium 920, which may be, for example, a non-volatile computer-readable storage medium, specific examples include, but are not limited to: magnetic storage devices such as magnetic tape or hard disk (HDD); optical storage devices such as compact discs (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; etc.

The computer-readable storage medium 920 may include a computer program 921, which computer program 921 may include code/computer-executable instructions that, when executed by the processor 910, cause the processor 910 to perform a method according to an embodiment of the disclosure, or any variation thereof.

The computer program 921 may be configured to have computer program code comprising, for example, computer program modules. For example, in an example embodiment, code in the computer program 921 may include one or more program modules, including, for example, 921A, modules 921B, … …. It should be noted that the division and number of modules is not fixed, and that a person skilled in the art may use suitable program modules or combinations of program modules according to the actual situation, which when executed by the processor 910, enable the processor 910 to perform a method according to an embodiment of the disclosure or any variations thereof.

According to an embodiment of the invention, the binocular camera 930 may comprise, for example, a left camera and a right camera, and the binocular camera 930 may be, for example, the binocular camera 111 depicted in fig. 1. The augmented reality device 900 may perform a pose determination method from, for example, two images captured by two cameras of the binocular camera 930.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

While the present disclosure has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the appended claims and their equivalents. The scope of the disclosure should, therefore, not be limited to the above-described embodiments, but should be determined not only by the following claims, but also by the equivalents of the following claims.

Claims

1. A pose determination method, comprising:

obtaining a first image obtained by a left camera and a second image obtained by a right camera in the binocular camera;

obtaining at least one group of matching feature points according to the first image and the second image;

Calibrating the binocular camera to obtain left and right external parameters of the binocular camera, and determining the baseline length of the binocular camera according to the left and right external parameters;

According to the at least one group of matching feature points, determining a depth value of a physical space where the binocular camera is located and an error value of the depth value comprises: determining depth values of the at least one group of matching feature points corresponding to target positions in the physical space respectively, and obtaining at least one depth value aiming at the at least one group of matching feature points; determining respective error values of the at least one depth value according to the at least one depth value, the baseline length and a target error model;

determining the estimated pose of the binocular camera according to the coordinate values of each group of matching feature points in the first image and the second image;

Optimizing the estimated pose according to the depth value and the error value of the depth value;

and determining the pose of the binocular camera according to the optimized estimated pose.

2. The method of claim 1, wherein deriving the at least one set of matching feature points comprises:

Identifying the first image and the second image to obtain a first characteristic point group aiming at the first image and a second characteristic point group aiming at the second image; and

Determining a matched first feature point and a matched second feature point according to the first feature point group and the second feature point group to obtain at least one matched feature point,

Wherein each set of matching feature points includes a first feature point and a second feature point that match each other.

3. The method of claim 1, wherein the determining the pose of the binocular camera comprises:

Determining the estimated pose of the binocular camera according to the at least one group of matching characteristic points and the left and right external parameters;

according to the depth value and the error value of the depth value, adjusting a target optimization model to obtain an adjusted optimization model; and

And optimizing the estimated pose according to the adjusted optimization model to obtain the pose of the binocular camera.

4. A method according to claim 3, wherein the depth values comprise at least one depth value for the at least one set of matching feature points; the error values of the depth values include respective error values for the at least one depth value; the adjusting the target optimization model, the obtaining the adjusted optimization model comprises the following steps:

Determining a depth residual for the binocular camera according to the at least one depth value and respective error values of the at least one depth value; and

And adjusting the target optimization model according to the depth residual error to obtain the adjusted optimization model.

5. A pose determination apparatus comprising:

the image acquisition module is used for acquiring a first image obtained by a left camera and a second image obtained by a right camera in the binocular camera;

The matching characteristic point obtaining module is used for obtaining at least one group of matching characteristic points according to the first image and the second image; calibrating the binocular camera to obtain left and right external parameters of the binocular camera, and determining the baseline length of the binocular camera according to the left and right external parameters;

The numerical value determining module is configured to determine, according to the at least one set of matching feature points, a depth value of a physical space in which the binocular camera is located and an error value of the depth value, and includes: determining depth values of the at least one group of matching feature points corresponding to target positions in the physical space respectively, and obtaining at least one depth value aiming at the at least one group of matching feature points; determining respective error values of the at least one depth value according to the at least one depth value, the baseline length and a target error model; and

The pose determining module is used for determining the estimated pose of the binocular camera according to the coordinate values of each group of matching characteristic points in the first image and the second image; optimizing the estimated pose according to the depth value and the error value of the depth value; and determining the pose of the binocular camera according to the optimized estimated pose.

6. The apparatus of claim 5, wherein the matching feature point derivation module comprises:

The characteristic point obtaining sub-module is used for identifying the first image and the second image and obtaining a first characteristic point group aiming at the first image and a second characteristic point group aiming at the second image; and

A feature point matching sub-module for determining a matched first feature point and a matched second feature point according to the first feature point group and the second feature point group to obtain at least one group of matched feature points,

7. An augmented reality device, comprising:

the binocular camera comprises a left camera and a right camera, wherein the left camera is used for shooting a first image of a physical space where the binocular camera is located; the right camera is used for shooting a second image of the physical space where the binocular camera is located, and the first image and the second image are images of two different angles of the physical space;

one or more processors, coupled to the binocular camera, for acquiring the first image and the second image; and

Storage means for storing one or more programs,

Wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the pose determination method of any of claims 1-4.

8. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the pose determination method according to any of claims 1-4.