WO2024087917A1

WO2024087917A1 - Pose determination method and apparatus, computer readable storage medium, and electronic device

Info

Publication number: WO2024087917A1
Application number: PCT/CN2023/118181
Authority: WO
Inventors: 尹赫
Original assignee: Oppo广东移动通信有限公司
Priority date: 2022-10-28
Filing date: 2023-09-12
Publication date: 2024-05-02
Also published as: CN117994333A

Abstract

A pose determination method, comprising: according to a first two-dimensional feature point, matching a previous frame of color image collected by a first camera, on a current frame of color image collected by the first camera, according to a third two-dimensional feature point in a first camera coordinate system converted from a second two-dimensional feature point, matching a previous frame of color image collected by a second camera, on a current frame of color image collected by the second camera, and according to three-dimensional feature points, in a world coordinate system, of the previous frames of color images respectively collected by the first camera and the second camera, determining the pose of the first camera when the first camera collects the current frame of color image.

Description

Method and device for determining posture, computer-readable storage medium, and electronic device

This application claims priority to a Chinese patent application filed with the Patent Office on October 28, 2022, with application number 202211337302.9 and application name “Posture determination method and device, computer-readable storage medium and electronic device”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present disclosure relates to the field of computer vision technology, and in particular to a posture determination method, a posture determination device, a computer-readable storage medium, and an electronic device.

Background technique

In the field of computer vision technology, visual positioning is a technology that uses images taken by a camera to determine the camera's position in the real world. It has important application value in augmented reality, virtual reality, robotics, intelligent transportation and other fields.

Summary of the invention

The present disclosure provides a posture determination method, a posture determination device, a computer-readable storage medium, and an electronic device.

According to a first aspect of the present disclosure, a posture determination method is provided, which is applied to a terminal device, wherein the terminal device is configured with a first camera and at least one second camera, and the posture determination method includes: obtaining a current frame color image captured by the first camera, and determining first two-dimensional feature points on the current frame color image captured by the first camera that match a previous frame color image captured by the first camera; obtaining a current frame color image captured by the second camera, and determining second two-dimensional feature points on the current frame color image captured by the second camera that match a previous frame color image captured by the second camera; converting the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system by using a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera; and determining the posture of the first camera when capturing the current frame color image according to the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame color image captured by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame color image captured by the second camera in the world coordinate system.

According to a second aspect of the present disclosure, a posture determination device is provided, which is configured in a terminal device, and the terminal device is further configured with a first camera and at least one second camera. The posture determination device includes: a first feature point determination module, which is used to obtain a current frame color image captured by the first camera, and determine a first two-dimensional feature point on the current frame color image captured by the first camera that matches a previous frame color image captured by the first camera; a second feature point determination module, which is used to obtain a current frame color image captured by the second camera, and determine a second two-dimensional feature point on the current frame color image captured by the second camera that matches a previous frame color image captured by the second camera; a feature point conversion module, which is used to convert the second two-dimensional feature point into a third two-dimensional feature point in the first camera coordinate system by using a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera; and a posture determination module, which is used to determine the posture of the first camera when capturing the current frame color image according to the first two-dimensional feature point, the third two-dimensional feature point, the three-dimensional feature point of the previous frame color image captured by the first camera in the world coordinate system, and the three-dimensional feature point of the previous frame color image captured by the second camera in the world coordinate system.

According to a third aspect of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the above-mentioned posture determination method is implemented.

According to a fourth aspect of the present disclosure, there is provided an electronic device, comprising a processor; a memory for storing a or multiple programs, when one or more programs are executed by a processor, the processor implements the above-mentioned posture determination method.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings herein are incorporated into the specification and constitute a part of the specification, showing embodiments consistent with the present disclosure, and together with the specification, are used to explain the principles of the present disclosure. Obviously, the drawings described below are only some embodiments of the present disclosure, and for ordinary technicians in this field, other drawings can be obtained based on these drawings without creative work. In the drawings:

FIG1 is a schematic diagram showing a system architecture of a posture determination system according to an embodiment of the present disclosure;

FIG2 is a schematic diagram showing a placement of dual cameras on a terminal device according to an embodiment of the present disclosure;

FIG3 is a schematic diagram showing the placement angles of the dual cameras according to an embodiment of the present disclosure;

FIG4 is a schematic diagram showing various processing stages involved in the posture determination solution of an embodiment of the present disclosure;

FIG5 schematically shows a flow chart of a method for determining a posture according to an exemplary embodiment of the present disclosure;

FIG6 is a schematic diagram showing dual-camera point pair matching according to an embodiment of the present disclosure;

FIG7 shows a flowchart of a positioning initialization process according to an embodiment of the present disclosure;

FIG8 is a schematic diagram showing a method of determining two planes according to an embodiment of the present disclosure;

FIG9 is a schematic diagram showing a method of determining a ground plane according to an embodiment of the present disclosure;

FIG10 is a flowchart showing a process of determining a transformation matrix between a first camera coordinate system and a world coordinate system according to an embodiment of the present disclosure;

FIG11 schematically shows a block diagram of a posture determination apparatus according to a first exemplary embodiment of the present disclosure;

FIG12 schematically shows a block diagram of a posture determination apparatus according to a second exemplary embodiment of the present disclosure;

FIG13 schematically shows a block diagram of a posture determination apparatus according to a third exemplary embodiment of the present disclosure;

FIG14 schematically shows a block diagram of a posture determination apparatus according to a fourth exemplary embodiment of the present disclosure;

FIG. 15 schematically shows a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. However, example embodiments can be implemented in a variety of forms and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concepts of the example embodiments are fully conveyed to those skilled in the art. The described features, structures, or characteristics may be combined in one or more embodiments in any suitable manner. In the following description, many specific details are provided to provide a full understanding of the embodiments of the present disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced while omitting one or more of the specific details, or other methods, components, devices, steps, etc. may be adopted. In other cases, known technical solutions are not shown or described in detail to avoid obscuring various aspects of the present disclosure.

In addition, the drawings are only schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings represent the same or similar parts, and their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities can be implemented in software form, or in one or more hardware modules or integrated circuits, or in different networks and/or processing. These functional entities are implemented in a device device and/or a microcontroller device.

The flowcharts shown in the accompanying drawings are only exemplary and do not necessarily include all the steps. For example, some steps may be decomposed, while some steps may be combined or partially combined, so the actual execution order may change according to the actual situation. In addition, all the terms "first", "second", "third", etc. below are only for the purpose of distinction and should not be used as limitations of the present disclosure.

Through visual positioning technology, computer devices can autonomously perceive their own position in the environment, so as to perform any tasks proposed by the user, such as tracking, monitoring, interaction, displaying images, playing audio, etc. The accuracy of positioning greatly affects the realization of computer device functions.

In order to improve the accuracy of device visual positioning, the embodiments of the present disclosure provide a new positioning solution.

The present application provides a posture determination method, which is applied to a terminal device, wherein the terminal device is configured with a first camera and at least one second camera, and the posture determination method includes:

Acquire a current frame color image acquired by the first camera, and determine a first two-dimensional feature point on the current frame color image acquired by the first camera that matches a previous frame color image acquired by the first camera;

Acquire a current frame color image acquired by the second camera, and determine a second two-dimensional feature point on the current frame color image acquired by the second camera that matches a previous frame color image acquired by the second camera;

Convert the second two-dimensional feature points to third two-dimensional feature points in the first camera coordinate system using a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera;

The posture of the first camera when capturing the current frame of color image is determined based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame of color image captured by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.

In one embodiment, determining a first two-dimensional feature point on a current frame color image acquired by the first camera that matches a previous frame color image acquired by the first camera includes:

Extracting feature points of the current frame color image acquired by the first camera;

Optical flow tracking is performed using feature points of a current frame color image captured by the first camera and feature points of a previous frame color image captured by the first camera to determine the first two-dimensional feature points.

In one embodiment, using a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera to transform the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system includes:

Acquire a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera and depth information of the second two-dimensional feature point;

The third two-dimensional feature point is determined according to the conversion matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point.

In one embodiment, determining the third two-dimensional feature point according to the conversion matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point includes:

The conversion matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point are multiplied, and a result of the multiplication is normalized to determine the third two-dimensional feature point.

In one embodiment, the first two-dimensional feature points and the third two-dimensional feature points constitute two-dimensional coordinate information, and the three-dimensional feature points of the previous frame color image acquired by the first camera in the world coordinate system and the three-dimensional feature points of the previous frame color image acquired by the second camera in the world coordinate system constitute three-dimensional coordinate information; wherein, according to the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame color image acquired by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame color image acquired by the second camera in the world coordinate system, determining the position and posture of the first camera when acquiring the current frame color image includes:

Associating the two-dimensional coordinate information with the three-dimensional coordinate information to obtain point pair information;

The point pair information is used to solve the perspective n-point problem, and the pose of the first camera when capturing the current frame color image is determined in combination with the solution result.

In one embodiment, the posture determination method further includes:

Acquire a previous frame of color image captured by the first camera, and extract feature points of the previous frame of color image captured by the first camera;

Using a previous frame of depth image aligned with the previous frame of color image acquired by the first camera, spatially projecting feature points of the previous frame of color image acquired by the first camera to obtain three-dimensional feature points of the previous frame of color image acquired by the first camera in the first camera coordinate system;

According to the posture of the first camera when capturing the last frame of color image, the three-dimensional feature points in the first camera coordinate system are transformed to obtain the three-dimensional feature points of the last frame of color image captured by the first camera in the world coordinate system.

In one embodiment, using a previous frame of depth image aligned with a previous frame of color image acquired by the first camera, spatially projecting feature points of the previous frame of color image acquired by the first camera to obtain three-dimensional feature points of the previous frame of color image acquired by the first camera in the first camera coordinate system includes:

Using a previous frame of depth image aligned with the previous frame of color image acquired by the first camera, spatially projecting feature points within a predetermined depth range among feature points of the previous frame of color image acquired by the first camera, so as to obtain three-dimensional feature points of the previous frame of color image acquired by the first camera in the first camera coordinate system;

The predetermined depth range is determined based on a range of depth measurement.

In one embodiment, the posture determination method further includes:

Acquire a previous frame of color image captured by the second camera, and extract feature points of the previous frame of color image captured by the second camera;

Using a previous frame of depth image aligned with a previous frame of color image acquired by the second camera, spatially projecting feature points of the previous frame of color image acquired by the second camera to obtain three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system;

Using a transformation matrix between the first camera coordinate system and the second camera coordinate system, the three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system are transformed into three-dimensional feature points in the first camera coordinate system;

According to the posture of the first camera when capturing the last frame of color image, the three-dimensional feature points in the first camera coordinate system are transformed to obtain the three-dimensional feature points of the last frame of color image captured by the second camera in the world coordinate system.

In one embodiment, using a previous frame of depth image aligned with a previous frame of color image acquired by the second camera, Performing spatial projection on feature points of the previous frame of color image acquired by the second camera to obtain three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system includes:

Using a previous frame of depth image aligned with the previous frame of color image acquired by the second camera, spatially projecting feature points within a predetermined depth range among feature points of the previous frame of color image acquired by the second camera, so as to obtain three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system;

In one embodiment, the posture determination method further includes:

Acquire an initial frame color image captured by the first camera, and extract feature points of the initial frame color image captured by the first camera;

Using the initial frame depth image aligned with the initial frame color image acquired by the first camera, spatially projecting the feature points of the initial frame color image acquired by the first camera to obtain three-dimensional feature points of the initial frame color image acquired by the first camera in the first camera coordinate system;

Determine an initial positioning result of the first camera in the first camera coordinate system according to the three-dimensional feature points, the initial rotation matrix, and the initial translation vector of the initial frame color image acquired by the first camera in the first camera coordinate system;

The initial positioning result of the first camera in the first camera coordinate system is transformed by using the transformation matrix between the first camera coordinate system and the world coordinate system, so as to determine the position and posture of the first camera when capturing the initial frame color image.

In one embodiment, the posture determination method further includes:

Acquire a reference depth image output by the first camera;

When a designated plane is determined in combination with the reference depth image output by the first camera, a transformation matrix between the first camera coordinate system and the world coordinate system is determined according to a normal vector and a gravity vector of the designated plane.

In one embodiment, the posture determination method further includes:

Determine a reference point cloud corresponding to the first camera in combination with a reference depth image output by the first camera;

Extracting plane information of the reference point cloud;

The designated plane is selected according to the plane information of the reference point cloud.

In one embodiment, determining a reference point cloud corresponding to the first camera in combination with a reference depth image output by the first camera includes:

For each pixel point on the reference depth image output by the first camera, determine the three-dimensional space point of the pixel point according to the pixel point, the depth value of the pixel point and the camera intrinsic parameter of the first camera;

A reference point cloud corresponding to the first camera is constructed by combining the three-dimensional space point of each pixel point on the reference depth image output by the first camera.

In one embodiment, combining the three-dimensional space point of each pixel point on the reference depth image output by the first camera to construct a reference point cloud corresponding to the first camera includes:

Acquire a reference depth image output by the second camera;

Determine a three-dimensional spatial point of each pixel on the reference depth image output by the second camera;

The second camera output is converted into a coordinate system according to the conversion matrix between the first camera coordinate system and the second camera coordinate system. The three-dimensional space point of each pixel on the reference depth image is converted to obtain a converted three-dimensional space point;

The three-dimensional space point of each pixel point on the reference depth image output by the first camera is merged with the converted three-dimensional space point to construct a reference point cloud corresponding to the first camera.

In one embodiment, screening the designated plane according to the plane information of the reference point cloud includes:

The designated plane is filtered according to distance information between the plane and the first camera included in the plane information of the reference point cloud.

In one embodiment, screening the designated plane according to the distance information between the plane and the first camera included in the plane information of the reference point cloud includes:

In a case where the distance information includes a distance within a predetermined distance range, determining a candidate plane corresponding to the distance in the distance information within the predetermined distance range;

When the number of the candidate plane is one, determining the candidate plane as the designated plane;

When there are multiple candidate planes, determine a candidate plane whose distance from the first camera is closest to a distance threshold as the designated plane;

Wherein, the distance threshold is within the predetermined distance range.

In one embodiment, the designated plane is a ground plane.

FIG1 is a schematic diagram showing a system architecture of a position and posture determination system according to an embodiment of the present disclosure. Referring to FIG1 , a terminal device 1 may include a processor 100 , a first camera 110 , and at least one second camera 120 .

The terminal device 1 may include, for example, a robot, an intelligent monitoring device, an intelligent tracking device, etc. It may be a whole device, or a device system composed of multiple entity units.

For example, the terminal device 1 may be a robot dog. A robot dog is a robot form with advantages such as flexibility and strong mobility, and can perform tasks such as security patrol, transporting items, and emotional companionship.

The first camera 110 and the at least one second camera 120 serve as input sensors of the posture determination solution of the embodiment of the present disclosure, and can transmit the sensed color image and depth image to the processor 100 .

For example, the first camera 110 and the second camera 120 may be Realsense D455 cameras. The Realsense D455 camera consists of an RGB camera, two IR (infrared) cameras, and an IR transmitter. The RGB camera outputs a color image, and the two IR cameras may output a dense depth map aligned with the color image. The FOV (field of view) of the Realsense D455 camera is 90° horizontally and 65° vertically.

In the case where the terminal device 1 includes a first camera 110 and a second camera 120, the first camera 110 may be a left camera, and the second camera 120 may be a right camera. In the following embodiments, the left camera involved may be understood as the first camera 110, and the right camera involved may be understood as the second camera 120. However, it should be understood that "left", "right", "first", and "second" are merely exemplary descriptions for distinction. In other embodiments of the present disclosure, the first camera 110 may be a right camera, and the second camera 120 may be a left camera, and the present disclosure does not limit this.

Taking a first camera 110 and a second camera 120 as an example, FIG2 shows a schematic diagram of the placement of the dual cameras on a terminal device according to an embodiment of the present disclosure. It should be understood that the placement shown in FIG2 is only an exemplary description, and there may be multiple placements according to the type of terminal device and the camera configuration space. This is not a restriction.

FIG3 shows a schematic diagram of the placement angles of the dual cameras of the embodiment of the present disclosure. For the first camera 110 and the second camera 120, both of which are placed vertically, their viewing angles are both 65°, corresponding to angles A and B in FIG3 , respectively. When placed, the leftmost line of sight of the first camera 110 can be parallel to the rightmost line of sight of the second camera 120. At this time, the two cameras can obtain the maximum field of view, that is, 130°, corresponding to angle C in FIG3 . There is a narrow common viewing area between the first camera 110 and the second camera 120. According to the design of the above angles, it can be determined that the angle between the first camera 110 and the second camera 120 is 115°, corresponding to angle D in FIG3 .

Thus, the first camera 110 and the second camera 120 are placed vertically side by side at an angle of 115°, and the fields of view of the two cameras are 130° in the horizontal direction and 90° in the vertical direction. This achieves the maximum superposition of the fields of view of the two cameras, effectively increases the field of view of the terminal device 1, and provides more sufficient accuracy for the subsequent positioning algorithm.

In addition, the first camera 110 and the second camera 120 support multi-camera hardware synchronization. The first camera 110 and the second camera 120 can be connected by a wire, and the same pulse signal is used to trigger the two cameras to expose simultaneously, thereby realizing hardware synchronization of multiple cameras. After the hardware synchronization setting, the image input into the subsequent positioning algorithm is the image taken at the same time. In this way, additional errors caused by inconsistent shooting time of multiple cameras are avoided.

After placing the first camera and the second camera in the above manner, the internal and external parameters of the two cameras can be calibrated respectively for use by subsequent algorithms. The present disclosure does not limit the calibration process.

In the posture determination scheme of the embodiment of the present disclosure, the processor 100 can obtain the current frame color image captured by the first camera 110, and determine the first two-dimensional feature points on the current frame color image captured by the first camera 110 that match the previous frame color image captured by the first camera 110.

The processor 100 may acquire a current frame color image captured by the second camera 120, and determine a second two-dimensional feature point on the current frame color image captured by the second camera 120 that matches a previous frame color image captured by the second camera 120. The processor 100 converts the second two-dimensional feature point into a third two-dimensional feature point in the first camera coordinate system using a conversion matrix between the first camera coordinate system of the first camera 110 and the second camera coordinate system of the second camera.

Next, the processor 100 can determine the posture of the first camera 110 when capturing the current frame color image based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame color image captured by the first camera 110 in the world coordinate system, and the three-dimensional feature points of the previous frame color image captured by the second camera 120 in the world coordinate system.

In the case where the terminal device 1 is configured with a plurality of second cameras 120 , feature point data of each second camera 120 may be mapped to the first camera coordinate system of the first camera 110 for processing.

It is understandable that the placement positions of the first camera 110 and the second camera 120 on the terminal device 1 are fixed, and when the current posture of the first camera 110 is determined, the current posture of the second camera 120 and the current posture of the terminal device 1 can be obtained.

In addition, when the terminal device 1 is configured with more than two cameras, any one of the cameras may be determined as the first camera 110 in algorithm implementation, and the remaining cameras may be determined as the second camera 120 .

Based on the posture determination scheme of the embodiment of the present disclosure, the feature points collected by the second camera 120 are converted to the first camera coordinate system to perform posture calculation together with the feature points collected by the first camera 110. Since the feature points come from at least two cameras and the coordinate systems are unified, more feature points are collected, that is, the feature points involved in the unified processing are more comprehensive. The determined position and posture are more accurate, which improves the accuracy of positioning. In addition, the position and posture determination process of the present disclosure takes into account the correlation between frames, combines the feature information of the previous frame image, and uses the data of the previous frame for constraints, which further improves the accuracy of positioning.

In the process of realizing the posture determination of the embodiment of the present disclosure, multiple processing stages are involved. Referring to FIG4 , the processing stages involved include but are not limited to a coordinate system alignment stage, a positioning initialization stage, and a real-time positioning stage.

For the coordinate system alignment stage, the terminal device determines the transformation matrix between the first camera coordinate system and the world coordinate system.

First, the terminal device can construct a point cloud using the depth image output by the first camera and the depth image output by the second camera, wherein the three-dimensional space points corresponding to the two depth images can be merged to obtain a point cloud of three-dimensional feature points.

Next, the terminal device uses a plane detection algorithm to extract plane information from the point cloud, and selects a specified plane (such as the ground plane) based on the extracted plane information.

Then, the terminal device may calculate a transformation matrix according to the normal vector and the gravity vector of the specified plane to align the first camera coordinate system with the world coordinate system.

In addition, it can be understood that based on the pre-calibrated results of the internal and external parameters, the transformation matrix between the first camera coordinate system and the second camera coordinate system can be obtained. In this case, the transformation matrix between the second camera coordinate system and the world coordinate system can also be obtained to achieve alignment among the first camera coordinate system, the second camera coordinate system, and the world coordinate system.

For the positioning initialization stage, the terminal device can determine the position and posture of the first camera when initially capturing a color image. It should be understood that the position and posture of the camera when capturing an image in the present disclosure refers to the position and posture in the world coordinate system.

On the one hand, the terminal device can determine the three-dimensional feature points corresponding to the initial frame color image captured by the first camera, and the three-dimensional feature points are feature points in the first camera coordinate system.

On the other hand, the initial rotation matrix and the initial translation vector can be set. For example, the initial rotation matrix is the identity matrix, and the initial translation vector is [0,0,0].

After the three-dimensional feature points corresponding to the initial frame color image, the initial rotation matrix and the initial translation vector are determined, the positioning initialization in the first camera coordinate system is completed.

Next, combined with the transformation matrix between the first camera coordinate system and the world coordinate system determined in the coordinate system alignment stage, the positioning initialization result in the first camera coordinate system can be converted into the positioning initialization result in the world coordinate system, that is, the position and posture of the first camera when capturing the initial frame color image is determined.

In the real-time positioning stage, the terminal device can combine the initial pose determined in the positioning initialization stage to obtain the pose of the current frame in real time. In this process, the features of the second camera can be transferred to the coordinate system of the first camera, and the pose can be solved in conjunction with the features of the first camera to complete the pose prediction of the current frame.

The following is an exemplary description of the posture determination method of the embodiment of the present disclosure.

FIG5 schematically shows a flow chart of a method for determining a posture according to an exemplary embodiment of the present disclosure. Referring to FIG5 , the method for determining a posture may include the following steps:

S52. Obtain a current frame color image acquired by the first camera, and determine a first two-dimensional feature point on the current frame color image acquired by the first camera that matches a previous frame color image acquired by the first camera.

In an exemplary embodiment of the present disclosure, the current frame color image is the color image captured by the camera at the current moment, and the previous frame color image is the color image captured by the camera in the previous frame. No restrictions.

After acquiring the current frame color image captured by the first camera, the terminal device may extract feature points of the current color image captured by the first camera.

The feature extraction algorithm used in the exemplary embodiments of the present disclosure may include but is not limited to the FAST feature point detection algorithm, the DOG feature point detection algorithm, the Harris feature point detection algorithm, the SIFT feature point detection algorithm, the SURF feature point detection algorithm, etc. The feature descriptor may include but is not limited to the BRIEF feature point descriptor, the BRISK feature point descriptor, the FREAK feature point descriptor, etc.

According to one embodiment of the present disclosure, the combination of the feature extraction algorithm and the feature descriptor may be a FAST feature point detection algorithm and a BRIEF feature point descriptor. According to other embodiments of the present disclosure, the combination of the feature extraction algorithm and the feature descriptor may be a DOG feature point detection algorithm and a FREAK feature point descriptor.

It should be understood that different combinations can be used for different texture scenes. For example, for strong texture scenes, the FAST feature point detection algorithm and the BRIEF feature point descriptor can be used for feature extraction; for weak texture scenes, the DOG feature point detection algorithm and the FREAK feature point descriptor can be used for feature extraction.

In the process of processing the previous color image frame corresponding to the current color image frame, there is also a process of extracting feature points. Therefore, the terminal device can use the feature points of the current color image frame captured by the first camera and the feature points of the previous color image frame captured by the first camera to determine the two-dimensional feature points that match between the two images, that is, the first two-dimensional feature points mentioned in the present disclosure.

Specifically, the optical flow method can be used to determine the matching relationship of the feature points, that is, the feature points of the current frame color image captured by the first camera and the feature points of the previous frame color image captured by the first camera are used for optical flow tracking to determine the first two-dimensional feature points. In addition, other image matching methods can also be used to determine 2D-2D feature point pairs, which is not limited in the present disclosure.

S54. Obtain a current frame color image acquired by the second camera, and determine a second two-dimensional feature point on the current frame color image acquired by the second camera that matches a previous frame color image acquired by the second camera.

It should be understood that, compared with step S52, although there are descriptions of the current frame color image and the previous frame color image, the current frame color image and the previous frame color image in step S52 are captured by the first camera, and the current frame color image and the previous frame color image in step S54 are captured by the second camera.

After acquiring the current frame color image captured by the second camera, the terminal device can extract feature points of the current color image captured by the second camera. The method of extracting feature points can be the same as the method of extracting feature points in step S52, which will not be repeated.

The terminal device may perform optical flow tracking using the feature points of the current frame color image captured by the second camera and the feature points of the previous frame color image captured by the second camera to determine the second two-dimensional feature points.

S56. Use the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera to convert the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system.

In an exemplary embodiment of the present disclosure, for the purpose of distinction, a camera coordinate system of the first camera is recorded as a first camera coordinate system, and a camera coordinate system of the second camera is recorded as a second camera coordinate system.

In the case where the first camera and the second camera are placed at fixed positions on the terminal device, the first camera and the second camera are pre-positioned. The two cameras are calibrated with internal and external parameters, and the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera can be determined from the calibration results.

The terminal device can obtain the conversion matrix between the first camera coordinate system and the second camera coordinate system and the depth information of the second two-dimensional feature point, and determine the third two-dimensional feature point according to the conversion matrix, the depth information of the second two-dimensional feature point and the second feature point. The third two-dimensional feature point is the two-dimensional feature point converted from the second two-dimensional feature point to the first camera coordinate system.

Specifically, the transformation matrix, the depth information of the second two-dimensional feature points, and the second two-dimensional feature points can be multiplied, and the multiplication result can be normalized to determine the third two-dimensional feature points. The second two-dimensional feature points in the multiplication operation refer to the position coordinate information of these feature points. The third two-dimensional feature points can be determined using formula 1

Wherein, T _lr is the transformation matrix between the first camera coordinate system and the second camera coordinate system, d _j is the depth value of the second two-dimensional feature point, is the second two-dimensional feature point.

S58. Determine the posture of the first camera when acquiring the current frame of color image based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame of color image acquired by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame of color image acquired by the second camera in the world coordinate system.

In an exemplary embodiment of the present disclosure, the first two-dimensional feature points and the third two-dimensional feature points constitute two-dimensional coordinate information, and the three-dimensional feature points of the previous frame color image captured by the first camera in the world coordinate system and the three-dimensional feature points of the previous frame color image captured by the second camera in the world coordinate system constitute three-dimensional coordinate information.

The terminal device can associate the two-dimensional coordinate system information with the three-dimensional coordinate information to obtain point pair information, and use the point pair information to solve the perspective-n-Point (PnP) problem, and determine the posture of the first camera when capturing the current frame color image based on the solution result.

Among them, PnP is a method in the field of machine vision, which can determine the relative position of the camera based on n feature points in the scene. Specifically, the rotation matrix and translation vector of the camera can be determined based on the n feature points on the scene.

It should be noted that the process of determining the three-dimensional feature points of the previous frame color image in the world coordinate system in the present disclosure can be performed during the processing of the current frame or during the processing of the previous frame, and the present disclosure does not impose any limitation on this.

The following is an explanation of the process of determining the three-dimensional feature points of the previous frame of color image captured by the first camera in the world coordinate system.

First, the terminal device can obtain the last frame of color image captured by the first camera, and extract the feature points of the last frame of color image captured by the first camera. The process of extracting feature points is the same as the process in step S52, which will not be repeated here. State.

Next, the terminal device can use the previous frame depth image aligned with the previous frame color image captured by the first camera to perform spatial projection on the feature points of the previous frame color image captured by the first camera to obtain the three-dimensional feature points of the previous frame color image captured by the first camera in the first camera coordinate system. The previous frame depth image can be output by the first camera, or can be obtained by other depth cameras equipped by the terminal device, and the present disclosure does not limit this.

In addition, in order to further improve the accuracy of the positioning disclosed in the present invention, the spatial projection process can also be constrained. Specifically, the terminal device can use the previous frame of depth image aligned with the previous frame of color image captured by the first camera to perform spatial projection on the feature points within a predetermined depth range among the feature points of the previous frame of color image captured by the first camera, so as to obtain the three-dimensional feature points of the previous frame of color image captured by the first camera in the first camera coordinate system.

The predetermined depth range is determined based on the range of the depth measurement. The value of the predetermined depth range may vary depending on the type and model of the depth camera. The present disclosure does not limit the specific value of the predetermined depth range. For example, feature points with a depth value greater than 0.5m and less than 6m are spatially projected.

Then, the terminal device can transform the three-dimensional feature points in the first camera coordinate system according to the posture when the first camera captured the last frame of color image, so as to obtain the three-dimensional feature points in the world coordinate system of the last frame of color image captured by the first camera. Refer to formula 2:

in, is the 3D feature point of the previous color image captured by the first camera in the world coordinate system, are the three-dimensional feature points of the last frame of color image captured by the first camera in the first camera coordinate system, and T _{w_last} is the position and posture of the first camera when capturing the last frame of color image.

It should be noted that the position and posture of the first camera when capturing the previous color image can be determined during the processing of the previous image, that is, during the processing of the current frame, the position and posture corresponding to the previous frame is known. The initial position and posture are explained in the process of positioning initialization of the present disclosure.

The following is a description of the process of determining the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.

First, the terminal device can obtain the last frame of color image captured by the second camera, and extract feature points of the last frame of color image captured by the second camera. The process of extracting feature points is the same as the process in step S52, which will not be repeated here.

Next, the terminal device can use the previous frame of depth image aligned with the previous frame of color image captured by the second camera to perform spatial projection on the feature points of the previous frame of color image captured by the second camera to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the second camera coordinate system. The previous frame of depth image can be output by the second camera, or can be obtained by other depth cameras equipped by the terminal device, and the present disclosure does not limit this.

Similarly, in order to further improve the accuracy of the positioning disclosed in the present invention, the spatial projection process can also be constrained. Specifically, the terminal device can use the previous frame of depth image aligned with the previous frame of color image captured by the second camera to perform spatial projection on the feature points within a predetermined depth range among the feature points of the previous frame of color image captured by the second camera, so as to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the second camera coordinate system.

Subsequently, the terminal device may use the transformation matrix between the first camera coordinate system and the second camera coordinate system to transform the three-dimensional feature points of the previous frame color image captured by the second camera in the second camera coordinate system into the three-dimensional feature points in the first camera coordinate system.

Then, the terminal device can convert the three-dimensional feature points in the converted first camera coordinate system again according to the posture of the first camera when capturing the previous frame of color image, so as to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.

The above process is explained below with reference to Formula 3:

in, is the 3D feature point of the previous color image captured by the second camera in the world coordinate system, are the three-dimensional feature points of the last frame color image captured by the second camera in the second camera coordinate system, T _{w_last} is the position and posture when the first camera captured the last frame color image, and T _lr is the transformation matrix between the first camera coordinate system and the second camera coordinate system.

Combined with the above point pair matching relationship, FIG6 shows a schematic diagram of point pair matching between the first camera and the second camera to achieve PnP pose solution, which involves the matching relationship of 2D-2D feature points of the current frame and the matching relationship of 3D-2D feature points.

In the above process of determining the three-dimensional feature points of the previous color image in the world coordinate system, the position and posture of the first camera when capturing the previous color image is used. The process of determining the initial position and posture of the first camera is described below.

According to some embodiments of the present disclosure, first, the terminal device may obtain an initial frame color image captured by the first camera, and extract feature points of the initial frame color image captured by the first camera. The process of extracting feature points is the same as the process in step S52, and will not be repeated here.

Next, the terminal device can use the initial frame depth image aligned with the initial frame color image captured by the first camera to spatially project the feature points of the initial frame color image captured by the first camera to obtain the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.

Similarly, in order to further improve the accuracy of the positioning disclosed in the present invention, the spatial projection process can also be constrained. Specifically, the terminal device can use the feature points in the initial frame color image captured by the first camera that are within a predetermined depth range for spatial projection to obtain the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.

Subsequently, the terminal device can determine the initial positioning of the first camera in the first camera coordinate system based on the three-dimensional feature points, initial rotation matrix and initial translation vector of the initial frame color image captured by the first camera in the first camera coordinate system. result.

In one embodiment of the present disclosure, the initial rotation matrix may be set to the identity matrix, and the translation vector may be set to [0, 0, 0].

It should be noted that, when the three-dimensional feature points, initial rotation matrix and initial translation vector of the initial frame color image captured by the first camera in the first camera coordinate system are known, only the position and posture of the first camera in the first camera coordinate system is determined at this time. In order to obtain the position and posture applied to the subsequent current frame processing process, the position and posture needs to be transformed to obtain the position and posture of the first camera in the world coordinate system.

Specifically, the terminal device may transform the initial positioning result of the first camera in the first camera coordinate system using the transformation matrix between the first camera coordinate system and the world coordinate system, so as to determine the posture of the first camera when capturing the initial frame color image.

According to some other embodiments of the present disclosure, the process of determining the initial position and posture of the first camera may also be combined with feature data of the second camera, and this process is described below.

On the one hand, the terminal device can determine the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system.

On the other hand, the terminal device can obtain the initial frame color image captured by the second camera, and extract feature points of the initial frame color image captured by the second camera. The process of extracting feature points is the same as the process in step S52, which will not be repeated here.

The terminal device can use the initial frame depth image aligned with the initial frame color image captured by the second camera to spatially project the feature points of the initial frame color image captured by the second camera to obtain the three-dimensional feature points of the initial frame color image captured by the second camera in the second camera coordinate system.

Similarly, the spatial projection process can also be constrained. Specifically, the terminal device can use the feature points in the initial frame color image captured by the second camera that are within a predetermined depth range to perform spatial projection to obtain the three-dimensional feature points of the initial frame color image captured by the second camera in the second camera coordinate system.

Next, the terminal device can use the transformation matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera to transform the three-dimensional feature points of the initial frame color image captured by the second camera in the second camera coordinate system into the three-dimensional feature points in the first camera coordinate system.

The converted 3D feature points and the 3D feature points of the initial frame color image captured by the first camera in the first camera coordinate system can be combined to obtain combined 3D feature points. It can be understood that the combined 3D feature points are 3D feature points in the first camera coordinate system.

Subsequently, the terminal device can determine the initial positioning result of the first camera in the first camera coordinate system according to the combined three-dimensional feature points, the initial rotation matrix and the initial translation vector. For example, the initial rotation matrix can be set to the unit matrix and the translation vector can be set to [0,0,0].

Then, the terminal device can use the conversion matrix between the first camera coordinate system and the world coordinate system to convert the first camera The initial positioning result in the first camera coordinate system is transformed to determine the position and posture of the first camera when it captures the initial frame color image.

The process of positioning initialization according to the embodiment of the present disclosure will be described below with reference to FIG. 7 .

In step S702, the terminal device may acquire an initial frame color image captured by the first camera, and extract feature points of the initial frame color image captured by the first camera.

In step S704, the terminal device may perform spatial projection in combination with the depth image aligned with the initial frame color image captured by the first camera to obtain the three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system. As described in the above embodiment, the three-dimensional feature points determined in step S704 may also include the three-dimensional feature points corresponding to the initial frame color image captured by the second camera.

In step S706, the terminal device may determine an initial positioning result of the first camera in the first camera coordinate system according to the three-dimensional feature points, the initial rotation matrix, and the initial translation vector determined in step S704.

In step S708, the terminal device may transform the initial positioning result using the transformation matrix between the first camera coordinate system and the world coordinate system to determine the position and posture of the first camera when capturing the initial frame color, thereby completing the positioning initialization.

In the above processing, the transformation matrix between the first camera coordinate system and the world coordinate system is used. For the predetermined transformation matrix, the embodiment of the present disclosure provides a coordinate system alignment solution. Specifically, the coordinate system alignment is achieved in combination with the depth information. For the sake of distinction, in the following embodiments, the coordinate system alignment process is described using the terminology of the reference depth image.

First, the terminal device can obtain a reference depth image output by the first camera.

Next, when it is determined that there is a designated plane in the scene in combination with the reference depth image output by the first camera, the terminal device can determine the transformation matrix between the first camera coordinate system and the world coordinate system according to the normal vector and gravity vector of the designated plane.

Among them, the gravity vector can be _Ng (0,0,1), in which case the designated plane is usually the ground plane to match the scenario where the terminal device is, for example, a robot dog. However, it is understandable that the designated plane can also be a plane manually designated in a specific scenario, such as a wall, a desktop, etc., and the present disclosure does not limit this.

If the normal vector of the specified plane is recorded as n _c , after rotating n _c by R _wc , it can coincide with N _g , and the alignment of the first camera coordinate system and the world coordinate system can be achieved. Where R _wc is the transformation matrix between the first camera coordinate system and the world coordinate system, and the rotation axis ω of R _wc can be obtained by the cross product of N _g n _c , as shown in Formula 4:
ω＝N _g ×n _c (Formula 4)

The rotation angle θ of R _wc can be obtained by multiplying N _g and n _c , as shown in Formula 5:

The rotation axis ω and the rotation angle θ constitute the rotation vector between the first camera coordinate system and the world coordinate system. According to Rodriguez The terminal device can calculate the transformation matrix R _wc between the first camera coordinate system and the world coordinate system. Thus, the thread of coordinate system alignment ends.

In the above processing, if the designated plane does not exist in the scene, the terminal device may return to the step of acquiring the reference depth image, reacquire the reference depth image, and perform a process of determining whether the designated plane exists.

The process of determining the specified plane is described below.

First, the terminal device can determine the point cloud corresponding to the first camera in combination with the reference depth image output by the first camera, which is recorded as the reference point cloud.

According to some embodiments of the present disclosure, the terminal device determines the three-dimensional space point of each pixel on the reference depth image output by the first camera according to the pixel, the depth value of the pixel and the camera internal parameter of the first camera. Formula 6 gives the method of determining the three-dimensional space point here:
P＝z*K ^-1 *p (Formula 6)

Among them, P represents the three-dimensional space point projected into the space, z represents the depth value of the pixel point, K ^-1 represents the inverse of the camera intrinsic parameter matrix, and p represents the coordinate position of the pixel point.

In these embodiments, a reference point cloud corresponding to the first camera may be constructed from the three-dimensional space points obtained through this process.

According to some other embodiments of the present disclosure, on the one hand, the terminal device determines, for each pixel point on the reference depth image output by the first camera, the three-dimensional spatial point of each pixel point on the reference depth image according to the pixel point, the depth value of the pixel point and the camera intrinsic parameters of the first camera.

On the other hand, the terminal device can obtain the reference depth image output by the second camera, and determine the three-dimensional space point of each pixel on the reference depth image output by the second camera in combination with the above formula 6.

The terminal device can transform the three-dimensional space point of each pixel on the reference depth image output by the second camera according to the transformation matrix between the first camera coordinate system and the second camera coordinate system to obtain the transformed three-dimensional space point.

Therefore, the three-dimensional space point of each pixel on the reference depth image output by the first camera is combined with the three-dimensional space point after the above conversion to construct a reference point cloud corresponding to the first camera. Refer to formula 7:
PC_mixture＝PC_left+T _lr *PC_right (Formula 7)

Among them, PC_mixture is the determined reference point cloud, PC_right is the three-dimensional space point of each pixel point on the reference depth image output by the second camera, PC_left is the three-dimensional space point of each pixel point on the reference depth image output by the first camera, and T _lr is the transformation matrix between the first camera coordinate system and the second camera coordinate system.

In these embodiments, the construction of the reference point cloud incorporates information of the depth image output by the second camera, thereby making the spatial feature points more comprehensive and improving the accuracy of the algorithm.

After determining the reference point cloud corresponding to the first camera, the terminal device can extract the plane information of the reference point cloud. The present disclosure does not limit the plane extraction method, and can adopt the RANSAC fitting method, the normal vector region growing method, the hierarchical clustering method, etc., as long as the plane information in the scene can be extracted. Some embodiments of the present disclosure adopt the plane extraction algorithm PEAC based on hierarchical clustering. Referring to Figure 8, two planes can be extracted using this algorithm. Figure 8 is only an example. All planes in the scene can be extracted using the above algorithm.

It is understood that the extracted plane information includes but is not limited to the plane ID, the plane normal vector, the plane distance from the camera, distance, etc.

After extracting the plane based on the reference point cloud, the terminal device may filter the designated plane according to the plane information of the reference point cloud. Specifically, the terminal device may filter the designated plane according to the distance information of the plane from the first camera included in the plane information of the reference point cloud.

In a case where the distance information includes a distance within a predetermined distance range, the terminal device may determine a candidate plane corresponding to the distance, and in this case, the number of the determined candidate planes is one or more.

When the number of candidate planes is one, the terminal device may determine the candidate plane as the designated plane.

In the case where there are multiple candidate planes, the terminal device may determine the candidate plane whose distance from the first camera is closest to a distance threshold as the designated plane, wherein the distance threshold is within the above-mentioned predetermined distance range.

FIG. 9 is a schematic diagram showing the screening of the ground plane. Compared with the result of plane detection, planes such as the ceiling are eliminated through the above distance-based screening process.

Take the case where the terminal device is a robot dog. The terminal device is equipped with a first camera and a second camera. The configuration positions of the two cameras are fixed. When implementing the solution, the robot dog is controlled to move for a short period of time and only moves on the ground plane. Based on this prior condition, the position of the ground plane in the coordinate system of the first camera is basically fixed. The height of the ground plane from the camera is equivalent to the height of the robot dog, which is about 0.3m. Therefore, the above-mentioned predetermined distance range can be set to 0.25m to 0.35m as the ground plane. If multiple candidate planes are screened out, the plane with the closest distance of 0.3m is used as the ground plane.

It should be understood that if the ground plane is not detected during this process, the terminal device is controlled to continuously repeat the above-mentioned process of determining the plane using the depth image and plane screening until the terminal device detects the ground plane.

The coordinate system alignment process of the embodiment of the present disclosure is described below with reference to FIG. 10 .

In step S1002, the terminal device obtains a reference depth image output by the first camera, and back-projects the reference depth image to obtain a three-dimensional space point in space.

In step S1004, the terminal device obtains a reference depth image output by the second camera, and back-projects the reference depth image to obtain a three-dimensional space point in space.

In step S1006, the terminal device converts the three-dimensional space point obtained in step S1004 to a three-dimensional space point in the first camera coordinate system.

In step S1008, the terminal device merges the three-dimensional space point obtained in step S1002 with the three-dimensional space point obtained in step S1006 to obtain a reference point cloud corresponding to the first camera.

In step S1010, the terminal device may extract plane information based on the reference point cloud.

In step S1012, the terminal device may screen the extracted planes to determine the ground plane;

In step S1014, the terminal device may determine a transformation matrix between the first camera coordinate system and the world coordinate system using the normal vector of the ground plane and the gravity vector to complete the alignment of the first camera coordinate system and the world coordinate system.

In addition, since the relationship between the first camera coordinate system and the second camera coordinate system has been determined through calibration, the transformation matrix between the second camera coordinate system and the world coordinate system can also be obtained to achieve alignment of the first camera coordinate system, the second camera coordinate system, and the world coordinate system. Therefore, the coordinate system alignment result can be applied to the above-mentioned posture determination process of the present disclosure.

It should be noted that although the steps of the method in the present disclosure are described in a specific order in the drawings, this is not necessarily the case. It is required or implied that the steps must be performed in this particular order, or that all the steps shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step, and/or one step may be decomposed into multiple steps, etc.

Furthermore, this exemplary embodiment also provides a posture determination device, which is configured in a terminal device, and the terminal device is also configured with a first camera and at least one second camera.

FIG11 schematically shows a block diagram of a posture determination device according to an exemplary embodiment of the present disclosure. Referring to FIG11 , the posture determination device 11 according to an exemplary embodiment of the present disclosure may include a first feature point determination module 111 , a second feature point determination module 113 , a feature point conversion module 115 , and a posture determination module 117 .

Specifically, the first feature point determination module 111 can be used to obtain the current frame color image captured by the first camera, and determine the first two-dimensional feature points on the current frame color image captured by the first camera that match the previous frame color image captured by the first camera; the second feature point determination module 113 can be used to obtain the current frame color image captured by the second camera, and determine the second two-dimensional feature points on the current frame color image captured by the second camera that match the previous frame color image captured by the second camera; the feature point conversion module 115 can be used to use the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera to convert the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system; the posture determination module 117 can be used to determine the posture of the first camera when capturing the current frame color image based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame color image captured by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame color image captured by the second camera in the world coordinate system.

According to an exemplary embodiment of the present disclosure, the first feature point determination module 111 can be configured to perform: extracting feature points of the current frame color image captured by the first camera; performing optical flow tracking using the feature points of the current frame color image captured by the first camera and the feature points of the previous frame color image captured by the first camera to determine the first two-dimensional feature points.

According to an exemplary embodiment of the present disclosure, the feature point conversion module 115 can be configured to perform: obtaining the conversion matrix between the first camera coordinate system of the first camera and the second camera coordinate system of the second camera and the depth information of the second two-dimensional feature point; determining the third two-dimensional feature point based on the conversion matrix, the depth information of the second two-dimensional feature point and the second two-dimensional feature point.

According to an exemplary embodiment of the present disclosure, the feature point conversion module 115 may be configured to perform: multiplying the conversion matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point, and normalizing the multiplication result to determine a third two-dimensional feature point.

According to an exemplary embodiment of the present disclosure, the first two-dimensional feature point and the third two-dimensional feature point constitute two-dimensional coordinate information, and the three-dimensional feature points of the previous frame color image captured by the first camera in the world coordinate system and the three-dimensional feature points of the previous frame color image captured by the second camera in the world coordinate system constitute three-dimensional coordinate information. In this case, the posture determination module 117 can be configured to perform: associating the two-dimensional coordinate information with the three-dimensional coordinate information to obtain point pair information; solving the perspective n-point problem using the point pair information, and determining the posture of the first camera when capturing the current frame color image in combination with the solution result.

According to an exemplary embodiment of the present disclosure, referring to FIG. 12 , compared with the position and posture determining apparatus 11 , the position and posture determining apparatus 12 may further include a third feature point determining module 121 .

Specifically, the third feature point determination module 121 can be configured to execute: obtaining the last frame of color image collected by the first camera, extracting the feature points of the last frame of color image collected by the first camera; The last frame of depth image aligned with the color image is spatially projected on the feature points of the last frame of color image acquired by the first camera to obtain the three-dimensional feature points of the last frame of color image acquired by the first camera in the first camera coordinate system; according to the posture of the first camera when acquiring the last frame of color image, the three-dimensional feature points in the first camera coordinate system are transformed to obtain the three-dimensional feature points of the last frame of color image acquired by the first camera in the world coordinate system.

According to an exemplary embodiment of the present disclosure, the third feature point determination module 121 can be configured to execute: utilizing a previous frame depth image aligned with a previous frame color image captured by the first camera, and spatially projecting feature points within a predetermined depth range among the feature points of the previous frame color image captured by the first camera to obtain three-dimensional feature points of the previous frame color image captured by the first camera in the first camera coordinate system; wherein the predetermined depth range is determined based on the range of the depth measurement.

According to an exemplary embodiment of the present disclosure, the third feature point determination module 121 can also be configured to execute: obtaining the previous frame of color image captured by the second camera, and extracting the feature points of the previous frame of color image captured by the second camera; using the previous frame of depth image aligned with the previous frame of color image captured by the second camera, spatially projecting the feature points of the previous frame of color image captured by the second camera to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the second camera coordinate system; using the transformation matrix between the first camera coordinate system and the second camera coordinate system to convert the three-dimensional feature points of the previous frame of color image captured by the second camera in the second camera coordinate system into the three-dimensional feature points in the first camera coordinate system; according to the posture of the first camera when capturing the previous frame of color image, converting the three-dimensional feature points in the first camera coordinate system to obtain the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.

According to an exemplary embodiment of the present disclosure, the third feature point determination module 121 can also be configured to execute: utilizing a previous frame depth image aligned with a previous frame color image captured by the second camera, spatially projecting feature points within a predetermined depth range among the feature points of the previous frame color image captured by the second camera, so as to obtain three-dimensional feature points of the previous frame color image captured by the second camera in the second camera coordinate system; wherein the predetermined depth range is determined based on the range of the depth measurement.

According to an exemplary embodiment of the present disclosure, referring to FIG. 13 , compared with the position and posture determining apparatus 11 , the position and posture determining apparatus 13 may further include a positioning initialization module 131 .

Specifically, the positioning initialization module 131 can be configured to execute: obtaining an initial frame color image captured by the first camera, and extracting feature points of the initial frame color image captured by the first camera; using an initial frame depth image aligned with the initial frame color image captured by the first camera, spatially projecting the feature points of the initial frame color image captured by the first camera to obtain three-dimensional feature points of the initial frame color image captured by the first camera in the first camera coordinate system; determining an initial positioning result of the first camera in the first camera coordinate system based on the three-dimensional feature points, initial rotation matrix and initial translation vector of the initial frame color image captured by the first camera in the first camera coordinate system; using a transformation matrix between the first camera coordinate system and the world coordinate system, transforming the initial positioning result of the first camera in the first camera coordinate system to determine the posture of the first camera when the initial frame color image is captured.

According to an exemplary embodiment of the present disclosure, referring to FIG. 14 , compared with the posture determination device 13 , the posture determination device 14 may further include a transformation matrix determination module 141 .

Specifically, the transformation matrix determination module 141 may be configured to execute: obtaining a reference depth image output by the first camera; determining a specified plane in combination with the reference depth image output by the first camera, and converting the specified plane into a normal plane according to the normal plane. The volume and the gravity vector determine the transformation matrix between the first camera coordinate system and the world coordinate system.

According to an exemplary embodiment of the present disclosure, the transformation matrix determination module 141 can be configured to perform: determining a reference point cloud corresponding to the first camera in combination with a reference depth image output by the first camera; extracting plane information of the reference point cloud; and filtering a specified plane according to the plane information of the reference point cloud.

According to an exemplary embodiment of the present disclosure, the process of determining the reference point cloud by the transformation matrix determination module 141 can be configured to perform: for each pixel point on the reference depth image output by the first camera, determine the three-dimensional space point of the pixel point according to the pixel point, the depth value of the pixel point and the camera intrinsic parameters of the first camera; and construct a reference point cloud corresponding to the first camera in combination with the three-dimensional space point of each pixel point on the reference depth image output by the first camera.

According to an exemplary embodiment of the present disclosure, the process of determining the reference point cloud by the transformation matrix determination module 141 can also be configured to execute: obtaining a reference depth image output by the second camera; determining the three-dimensional space point of each pixel point on the reference depth image output by the second camera; transforming the three-dimensional space point of each pixel point on the reference depth image output by the second camera according to the transformation matrix between the first camera coordinate system and the second camera coordinate system to obtain the transformed three-dimensional space point; merging the three-dimensional space point of each pixel point on the reference depth image output by the first camera with the transformed three-dimensional space point to construct a reference point cloud corresponding to the first camera.

According to an exemplary embodiment of the present disclosure, the process of selecting the designated plane by the transformation matrix determination module 141 may be configured to perform: selecting the designated plane according to the distance information of the plane from the first camera included in the plane information of the reference point cloud.

According to an exemplary embodiment of the present disclosure, the process of the transformation matrix determination module 141 screening the designated plane can be configured to perform: when the distance information contains a distance within a predetermined distance range, determining a candidate plane corresponding to the distance in the distance information within the predetermined distance range; when the number of candidate planes is one, determining the candidate plane as the designated plane; when the number of candidate planes is multiple, determining the candidate plane whose distance from the first camera is closest to a distance threshold as the designated plane; wherein the distance threshold is within the predetermined distance range.

According to an exemplary embodiment of the present disclosure, the designated plane is a ground plane.

Since the various functional modules of the posture determination device of the embodiment of the present disclosure are the same as those in the above-mentioned method implementation, they will not be described again here.

FIG15 shows a schematic diagram of an electronic device suitable for implementing an exemplary embodiment of the present disclosure. The terminal device of the exemplary embodiment of the present disclosure may be configured as shown in FIG15. It should be noted that the electronic device shown in FIG15 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory, and the memory is used to store one or more programs. When the one or more programs are executed by the processor, the processor can implement the posture determination method of the exemplary embodiment of the present disclosure.

Specifically, as shown in FIG15 , the electronic device 150 at least includes: a processor 1510, an internal memory 1521, an external memory interface 1522, a Universal Serial Bus (USB) interface 1530, a charging management module 1540, a power management module 1541, a battery 1542, an antenna, a wireless communication module 1550, an audio module 1560, a display screen 1570, a sensor module 1580, a camera module 1590, etc. The sensor module 1580 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.

It is to be understood that the structure illustrated in the embodiment of the present disclosure does not constitute a specific limitation on the electronic device 150. In other embodiments of the present disclosure, the electronic device 150 may include more or fewer components than shown in the figure, or combine some components, or split some components, or arrange the components differently. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 1510 may include one or more processing units, for example, the processor 1510 may include an application processor (AP), a modem processor, a graphics processor (GPU), an image signal processor (ISP), a controller, a video codec, a digital signal processor (DSP), a baseband processor and/or a neural network processor (NPU). Different processing units may be independent devices or integrated in one or more processors. In addition, a memory may be provided in the processor 1510 for storing instructions and data.

The electronic device 150 can implement the shooting function through the ISP, the camera module 1590, the video codec, the GPU, the display screen 1570 and the application processor. In some embodiments, the electronic device 150 may include at least two camera modules 1590. When implementing the disclosed solution, one camera module is determined as the reference camera, and the feature data collected by the other camera modules is transferred to the coordinate system of the reference camera for processing. For example, the electronic device 150 is configured with two Realsense D455 cameras.

The internal memory 1521 can be used to store computer executable program codes, which include instructions. The internal memory 1521 can include a program storage area and a data storage area. The external memory interface 1522 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 150.

The present disclosure also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may exist independently without being assembled into the electronic device.

Computer-readable storage media may be, for example, but not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or components, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memories (RAM), read-only memories (ROM), erasable programmable read-only memories (EPROM or flash memory), optical fibers, portable compact disk read-only memories (CD-ROMs), optical storage devices, magnetic storage devices, or any suitable combination thereof. In the present disclosure, computer-readable storage media may be any tangible medium containing or storing a program that may be used by or in conjunction with an instruction execution system, device, or device.

Computer-readable storage media can send, propagate or transmit programs for use by or in conjunction with an instruction execution system, apparatus or device. The program code contained on the computer-readable storage medium can be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.

The computer-readable storage medium carries one or more programs. When the one or more programs are executed by an electronic device, the electronic device implements the method described in the embodiments of the present disclosure.

The flowcharts and block diagrams in the accompanying drawings illustrate the possible architecture, functions and operations of the systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each box in the flowchart or block diagram may represent a module, a program segment, or a portion of code, which contains one or more executable instructions for implementing the specified logical functions. It should also be noted that in some alternative implementations, the box may be a module, a program segment, or a portion of code. The functions noted in the figures may also occur in a different order than that noted in the figures. For example, two blocks shown in succession may actually be executed substantially in parallel, or they may sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each block in a block diagram or flow chart, and combinations of blocks in a block diagram or flow chart, may be implemented by a dedicated hardware-based system that performs the specified functions or operations, or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or hardware, and the units described may also be arranged in a processor. The names of these units do not constitute limitations on the units themselves in some cases.

Through the description of the above implementation, it is easy for those skilled in the art to understand that the example implementation described here can be implemented by software, or by software combined with necessary hardware. Therefore, the technical solution according to the implementation of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network, including several instructions to enable a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the implementation of the present disclosure.

In addition, the above-mentioned figures are only schematic illustrations of the processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It is easy to understand that the processes shown in the above-mentioned figures do not indicate or limit the time sequence of these processes. In addition, it is also easy to understand that these processes can be performed synchronously or asynchronously, for example, in multiple modules.

It should be noted that, although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more modules or units described above can be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided into multiple modules or units to be embodied.

Those skilled in the art will readily appreciate other embodiments of the present disclosure after considering the specification and practicing what is disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or customary technical means in the art that are not disclosed in the present disclosure. The specification and embodiments are to be considered as exemplary only, and the true scope and spirit of the present disclosure are indicated by the claims.

It should be understood that the present disclosure is not limited to the exact structures that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A method for determining a posture, wherein the method is applied to a terminal device, wherein the terminal device is configured with a first camera and at least one second camera, and the method for determining a posture comprises:

Acquire a current frame color image acquired by the first camera, and determine a first two-dimensional feature point on the current frame color image acquired by the first camera that matches a previous frame color image acquired by the first camera;

Acquire a current frame color image acquired by the second camera, and determine a second two-dimensional feature point on the current frame color image acquired by the second camera that matches a previous frame color image acquired by the second camera;

Convert the second two-dimensional feature points to third two-dimensional feature points in the first camera coordinate system using a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera;

The posture of the first camera when capturing the current frame of color image is determined based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame of color image captured by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.
The method for determining the position and posture of claim 1, wherein determining the first two-dimensional feature point on the current frame color image captured by the first camera that matches the previous frame color image captured by the first camera comprises:

Extracting feature points of the current frame color image acquired by the first camera;

Optical flow tracking is performed using feature points of a current frame color image captured by the first camera and feature points of a previous frame color image captured by the first camera to determine the first two-dimensional feature points.
The method for determining the position and posture of claim 1, wherein using a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera to transform the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system comprises:

Acquire a transformation matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera and depth information of the second two-dimensional feature point;

The third two-dimensional feature point is determined according to the conversion matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point.
The method for determining the posture according to claim 3, wherein determining the third two-dimensional feature point according to the transformation matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point comprises:

The conversion matrix, the depth information of the second two-dimensional feature point, and the second two-dimensional feature point are multiplied, and a result of the multiplication is normalized to determine the third two-dimensional feature point.
The method for determining the position and posture of claim 1, wherein the first two-dimensional feature points and the third two-dimensional feature points constitute two-dimensional coordinate information, and the three-dimensional feature points of the previous frame of color image acquired by the first camera in the world coordinate system and the three-dimensional feature points of the previous frame of color image acquired by the second camera in the world coordinate system constitute three-dimensional coordinate information; wherein, according to the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame of color image acquired by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame of color image acquired by the second camera in the world coordinate system, determining the position and posture of the first camera when acquiring the current frame of color image includes:

Associating the two-dimensional coordinate information with the three-dimensional coordinate information to obtain point pair information;

The point pair information is used to solve the perspective n-point problem, and the first camera is used to capture the current frame based on the solution result. Color image pose.
The method for determining a posture according to claim 1, wherein the method for determining a posture further comprises:

Acquire a previous frame of color image captured by the first camera, and extract feature points of the previous frame of color image captured by the first camera;

Using a previous frame of depth image aligned with the previous frame of color image acquired by the first camera, spatially projecting feature points of the previous frame of color image acquired by the first camera to obtain three-dimensional feature points of the previous frame of color image acquired by the first camera in the first camera coordinate system;

According to the posture of the first camera when capturing the last frame of color image, the three-dimensional feature points in the first camera coordinate system are transformed to obtain the three-dimensional feature points of the last frame of color image captured by the first camera in the world coordinate system.
The method for determining the position and posture of claim 6, wherein the feature points of the previous frame color image captured by the first camera are spatially projected using the previous frame depth image aligned with the previous frame color image captured by the first camera to obtain the three-dimensional feature points of the previous frame color image captured by the first camera in the first camera coordinate system, comprising:

Using a previous frame of depth image aligned with the previous frame of color image acquired by the first camera, spatially projecting feature points within a predetermined depth range among feature points of the previous frame of color image acquired by the first camera, so as to obtain three-dimensional feature points of the previous frame of color image acquired by the first camera in the first camera coordinate system;

The predetermined depth range is determined based on a range of depth measurement.
The method for determining a posture according to claim 1, wherein the method for determining a posture further comprises:

Acquire a previous frame of color image captured by the second camera, and extract feature points of the previous frame of color image captured by the second camera;

Using a previous frame of depth image aligned with a previous frame of color image acquired by the second camera, spatially projecting feature points of the previous frame of color image acquired by the second camera to obtain three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system;

Using a transformation matrix between the first camera coordinate system and the second camera coordinate system, the three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system are transformed into three-dimensional feature points in the first camera coordinate system;

According to the posture of the first camera when capturing the last frame of color image, the three-dimensional feature points in the first camera coordinate system are transformed to obtain the three-dimensional feature points of the last frame of color image captured by the second camera in the world coordinate system.
The method for determining the position and posture of claim 8, wherein the feature points of the previous frame color image captured by the second camera are spatially projected using the previous frame depth image aligned with the previous frame color image captured by the second camera to obtain the three-dimensional feature points of the previous frame color image captured by the second camera in the second camera coordinate system, comprising:

Using a previous frame of depth image aligned with the previous frame of color image acquired by the second camera, spatially projecting feature points within a predetermined depth range among feature points of the previous frame of color image acquired by the second camera, so as to obtain three-dimensional feature points of the previous frame of color image acquired by the second camera in the second camera coordinate system;

The predetermined depth range is determined based on a range of depth measurement.
The method for determining a posture according to any one of claims 1 to 9, wherein the method for determining a posture further comprises:

Acquire an initial frame color image captured by the first camera, and extract feature points of the initial frame color image captured by the first camera;

Using the initial frame depth image aligned with the initial frame color image acquired by the first camera, spatially projecting the feature points of the initial frame color image acquired by the first camera to obtain three-dimensional feature points of the initial frame color image acquired by the first camera in the first camera coordinate system;

Determine an initial positioning result of the first camera in the first camera coordinate system according to the three-dimensional feature points, the initial rotation matrix, and the initial translation vector of the initial frame color image acquired by the first camera in the first camera coordinate system;

The initial positioning result of the first camera in the first camera coordinate system is transformed by using the transformation matrix between the first camera coordinate system and the world coordinate system, so as to determine the position and posture of the first camera when capturing the initial frame color image.
The method for determining a posture according to claim 10, wherein the method for determining a posture further comprises:

Acquire a reference depth image output by the first camera;

When a designated plane is determined in combination with the reference depth image output by the first camera, a transformation matrix between the first camera coordinate system and the world coordinate system is determined according to a normal vector and a gravity vector of the designated plane.
The method for determining a posture according to claim 11, wherein the method for determining a posture further comprises:

Determine a reference point cloud corresponding to the first camera in combination with a reference depth image output by the first camera;

Extracting plane information of the reference point cloud;

The designated plane is selected according to the plane information of the reference point cloud.
The method for determining a posture according to claim 12, wherein determining a reference point cloud corresponding to the first camera in combination with a reference depth image output by the first camera comprises:

For each pixel point on the reference depth image output by the first camera, determine the three-dimensional space point of the pixel point according to the pixel point, the depth value of the pixel point and the camera intrinsic parameter of the first camera;

A reference point cloud corresponding to the first camera is constructed by combining the three-dimensional space point of each pixel point on the reference depth image output by the first camera.
The method for determining the position and posture of claim 13, wherein, combining the three-dimensional space point of each pixel point on the reference depth image output by the first camera to construct a reference point cloud corresponding to the first camera comprises:

Acquire a reference depth image output by the second camera;

Determine a three-dimensional spatial point of each pixel on the reference depth image output by the second camera;

transforming a three-dimensional space point of each pixel on the reference depth image output by the second camera according to a transformation matrix between the first camera coordinate system and the second camera coordinate system to obtain a transformed three-dimensional space point;

The three-dimensional space point of each pixel point on the reference depth image output by the first camera is merged with the converted three-dimensional space point to construct a reference point cloud corresponding to the first camera.
The method for determining a posture according to claim 12, wherein screening the specified plane according to the plane information of the reference point cloud comprises:

The designated plane is filtered according to distance information between the plane and the first camera included in the plane information of the reference point cloud.
The method for determining the position and posture of claim 15, wherein screening the designated plane according to the distance information of the plane from the first camera contained in the plane information of the reference point cloud comprises:

In a case where the distance information includes a distance within a predetermined distance range, determining a candidate plane corresponding to the distance in the distance information within the predetermined distance range;

When the number of the candidate plane is one, determining the candidate plane as the designated plane;

When there are multiple candidate planes, determine a candidate plane whose distance from the first camera is closest to a distance threshold as the designated plane;

Wherein, the distance threshold is within the predetermined distance range.
The method for determining a posture according to claim 11, wherein the specified plane is a ground plane.
A posture determination device, wherein the device is configured in a terminal device, the terminal device is further configured with a first camera and at least one second camera, and the posture determination device comprises:

A first feature point determination module is used to obtain a current frame color image captured by the first camera, and determine a first two-dimensional feature point on the current frame color image captured by the first camera that matches a previous frame color image captured by the first camera;

A second feature point determination module is used to obtain a current frame color image captured by the second camera, and determine a second two-dimensional feature point on the current frame color image captured by the second camera that matches a previous frame color image captured by the second camera;

a feature point conversion module, configured to convert the second two-dimensional feature points into third two-dimensional feature points in the first camera coordinate system by using a conversion matrix between a first camera coordinate system of the first camera and a second camera coordinate system of the second camera;

A posture determination module is used to determine the posture of the first camera when capturing the current frame of color image based on the first two-dimensional feature points, the third two-dimensional feature points, the three-dimensional feature points of the previous frame of color image captured by the first camera in the world coordinate system, and the three-dimensional feature points of the previous frame of color image captured by the second camera in the world coordinate system.
A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the posture determination method according to any one of claims 1 to 17 is implemented.
An electronic device, comprising:

processor;

A memory for storing one or more programs, which, when executed by the processor, enables the processor to implement the posture determination method as described in any one of claims 1 to 17.