WO2018205803A1

WO2018205803A1 - Pose estimation method and apparatus

Info

Publication number: WO2018205803A1
Application number: PCT/CN2018/083376
Authority: WO
Inventors: 孙志明; 张潮; 李雨倩; 吴迪; 樊晨; 李政; 贾士伟; 李祎翔; 张连川; 刘新月
Original assignee: 北京京东尚科信息技术有限公司; 北京京东世纪贸易有限公司
Priority date: 2017-05-09
Filing date: 2018-04-17
Publication date: 2018-11-15
Also published as: CN107123142A; CN107123142B

Abstract

Disclosed in the present invention are a pose estimation method and an apparatus. A specific embodiment of said method comprises: acquiring range image video from a range sensor location; selecting from among individual range image frames a first range image and a second range image, the first range image and the second range image sharing at least one pixel point indicating the same object; determining a first pixel point set of the first range image and a second pixel point set of the second range image, each pixel point of the first pixel point set corresponding one-to-one with a pixel point of the second pixel point set, two corresponding pixel points indicating the same object; for any pixel point of the first pixel point set, determining a pose change parameter on the basis of first two-dimensional coordinates of the pixel point in the first range image and a first range value of the corresponding pixel point in the second range image. The present embodiment decreases consumption of computational resources, increasing computational efficiency and ensuring real-time pose estimation.

Description

Position estimation method and device

Cross-reference to related applications

This patent application is filed on May 9, 2017, the application number is 201710321322.X, the applicant is Beijing Jingdong Shangke Information Technology Co., Ltd. and Beijing Jingdong Century Trading Co., Ltd., and the invention name is “Position Estimation Method and Device”. Priority of the Chinese Patent Application, the entire contents of which is hereby incorporated by reference.

Technical field

The present application relates to the field of computer vision technology, and in particular to the field of pose estimation, and in particular to a pose estimation method and apparatus.

Background technique

In recent years, with the rapid updating of sensors, processors and other hardware and the emergence of advanced algorithms such as positioning, reconstruction, and learning, the rapid development of related industries such as drones and robots has been promoted. The core technologies related to these industries mainly include: poses. Estimation, 3D reconstruction, path planning, machine learning, etc. Pose estimation, especially visual pose estimation, involves knowledge of many disciplines such as image processing, computer vision, inertial navigation, mathematical statistics, optimization, etc. It is the basic technology of many emerging industries and industries, in current and future production and Life will play an important role.

The existing pose estimation method usually needs to extract the feature points of the image and establish a descriptor, so it has the characteristics of consuming a large amount of computing resources, and the real-time performance of the pose estimation is poor.

Summary of the invention

The purpose of the present application is to propose a pose estimation method and apparatus to solve the technical problems mentioned in the background art section above.

In a first aspect, an embodiment of the present application provides a pose estimation method for acquiring a depth image video from a depth sensor, and selecting a first depth image and a second depth image from each frame depth image of the depth image video, where Determining at least one pixel point indicating the same object in the first depth image and the second depth image; determining a first pixel point set in the first depth image and a second one in the second depth image a set of pixel points, wherein each pixel point in the first set of pixel points is in one-to-one correspondence with each pixel point in the second set of pixel points, and corresponding two pixel points indicate the same object; Any one of the pixels in the set of pixels, based on the first two-dimensional coordinates of the pixel in the first depth image and the first pixel corresponding to the pixel in the second depth image a depth value that determines a pose transformation parameter of the depth sensor.

In some embodiments, before the selecting the first depth image and the second depth image from the frame depth images, the method further includes deleting, for each frame depth image, the preset in the frame depth image The pixel of the condition; smoothing the deleted depth image.

In some embodiments, the deleting the pixel points in the frame depth image that meet the preset condition comprises: detecting a depth value of each pixel point; and the depth value is greater than the first preset value, and the depth value is less than the second pre- The pixel of the set value is deleted.

In some embodiments, the deleting a pixel point in the frame depth image that meets a preset condition comprises: determining a first partial derivative of the frame depth image in a horizontal direction and a second partial derivative in a vertical direction; Determining, by the first partial derivative and the second partial derivative, a geometric edge pixel in the frame depth image; deleting the geometric edge pixel.

In some embodiments, the deleting a pixel point in the frame depth image that meets a preset condition includes: determining a failed pixel point that the depth value does not exist in the frame depth image; and the invalidating pixel point and the invalidation The pixels adjacent to the pixel are deleted.

In some embodiments, the first depth value based on the first two-dimensional coordinates of the pixel in the first depth image and the corresponding pixel corresponding to the pixel in the second depth image, Determining the pose transformation parameter of the depth sensor, comprising: mapping a first two-dimensional coordinate of the pixel in the first depth image to a first three-dimensional coordinate of a coordinate system to which the first depth image belongs; Converting the first three-dimensional space coordinate to a coordinate system to which the second depth image belongs to obtain a second three-dimensional space coordinate; mapping the second three-dimensional space coordinate into the second depth image to obtain a second two-dimensional coordinate Determining a second depth value of the second two-dimensional coordinate in the second depth image; determining a pose transformation parameter of the depth sensor based on the first depth value and the second depth value.

In some embodiments, determining the pose transformation parameter of the depth sensor based on the first depth value and the second depth value comprises: according to the first depth value and the second depth value Determining a depth difference between the first depth image and the second depth image; determining the depth difference as a depth residual, and based on the depth residual, performing the following iterative steps: based on the depth Residual, determining a pose estimation increment; determining whether the depth residual is less than a preset threshold; and accumulating the pose estimation increment and the first depth value in response to the depth residual being less than a preset threshold, Determining a pose estimation value; determining a pose transformation parameter of the depth sensor according to the pose estimation value; determining, in response to the depth residual being greater than or equal to a preset threshold, determining the cumulative pose estimation increment as The depth residual continues to perform the iterative step.

In some embodiments, the method further comprises: obtaining angular velocity and acceleration from an inertial measurement device physically bound to the depth sensor; determining a pose transformation of the inertial measurement device based on the angular velocity and the acceleration a parameter; a pose transformation parameter of the depth sensor and a pose transformation parameter of the inertial measurement device are combined to determine an integrated pose transformation parameter.

In some embodiments, determining the pose transformation parameter of the inertial measurement device according to the angular velocity and the acceleration comprises: determining a first pose transformation parameter of the inertial measurement device according to the angular velocity; Determining, according to the acceleration, a second pose transformation parameter of the inertial measurement device; and combining the first pose transformation parameter and the second pose transformation parameter to determine a pose transformation parameter of the inertial measurement device.

In a second aspect, the embodiment of the present application provides a pose estimating apparatus, the apparatus comprising: a first acquiring unit, configured to acquire a depth image video from a depth sensor; and an image selecting unit, configured to use the depth image video Selecting a first depth image and a second depth image in each of the frame depth images, wherein the first depth image and the second depth image share at least one pixel point indicating the same object; the pixel point set determining unit, Determining a first set of pixel points in the first depth image and a second set of pixel points in the second depth image, wherein each pixel point in the first set of pixel points and the second pixel Each of the pixel points in the point set corresponds to the one-to-one correspondence, and the corresponding two pixel points indicate the same object; the first parameter determining unit is configured to use, according to the pixel point, any pixel point in the first pixel point set. Determining the depth sensing by the first two-dimensional coordinates in the first depth image and the first depth value of the corresponding pixel point corresponding to the pixel point in the second depth image The pose transformation parameters of the device.

In some embodiments, the apparatus further includes a pre-processing unit, the pre-processing unit includes a pixel point deletion module and a smoothing module; the pixel point deletion module is configured to select, in the image selection unit, the frame depth Before selecting the first depth image and the second depth image in the image, for each frame depth image, the pixel points in the frame depth image that meet the preset condition are deleted; the smoothing module is configured to perform smoothing on the deleted depth image. .

In some embodiments, the pixel point deleting module is further configured to: detect a depth value of each pixel point; delete a pixel point whose depth value is greater than the first preset value, and the depth value is smaller than the second preset value.

In some embodiments, the pixel point deleting module is further configured to: determine a first partial derivative of the frame depth image in a horizontal direction and a second partial derivative in a vertical direction; according to the first partial derivative and the Deriving a second partial derivative to determine a geometric edge pixel in the frame depth image; deleting the geometric edge pixel.

In some embodiments, the pixel point deleting module is further configured to: determine a failed pixel point that the depth value does not exist in the frame depth image; delete the failed pixel point and a pixel point adjacent to the failed pixel point .

In some embodiments, the first parameter determining unit includes: a first mapping module, configured to map the first two-dimensional coordinates of the pixel in the first depth image to the coordinates of the first depth image a first three-dimensional space coordinate; a transformation module, configured to transform the first three-dimensional space coordinate to a coordinate system to which the second depth image belongs, to obtain a second three-dimensional space coordinate; and a second mapping module, configured to: Mapping a second three-dimensional space coordinate to the second depth image to obtain a second two-dimensional coordinate; a depth value determining module, configured to determine a second depth value of the second two-dimensional coordinate in the second depth image a first parameter determining module, configured to determine a pose transformation parameter of the depth sensor based on the first depth value and the second depth value.

In some embodiments, the first parameter determining module is further configured to: determine a depth between the first depth image and the second depth image according to the first depth value and the second depth value Determining the depth difference as a depth residual, and based on the depth residual, performing an iterative step of determining a pose estimation increment based on the depth residual; determining whether the depth residual is less than a pre- Setting a threshold value; in response to the depth residual being less than a preset threshold, accumulating the pose estimation increment and the first depth value, determining a pose estimation value; determining the depth sensor according to the pose estimation value a pose transformation parameter; in response to the depth residual being greater than or equal to a preset threshold, determining that the accumulated pose estimation increment is the depth residual, and continuing to perform the iterative step.

In some embodiments, the apparatus further includes: a second acquisition unit configured to acquire angular velocity and acceleration from an inertial measurement device physically bound to the depth sensor; a second parameter determination unit configured to determine an angular velocity Determining a pose transformation parameter of the inertial measurement device with the acceleration; a parameter fusion unit configured to fuse a pose transformation parameter of the depth sensor and a pose transformation parameter of the inertial measurement device to determine a comprehensive pose transformation parameter.

In some embodiments, the second parameter determining unit includes: a first sub-parameter determining module, configured to determine a first pose transformation parameter of the inertial measurement device according to the angular velocity; and a second sub-parameter determination module, Determining, according to the acceleration, a second pose transformation parameter of the inertial measurement device; a fusion module, configured to fuse the first pose transformation parameter and the second pose transformation parameter, and determine the inertial measurement The pose transformation parameters of the device.

In a third aspect, an embodiment of the present application provides a server, including: one or more processors; and a storage device, configured to store one or more programs, when the one or more programs are the one or more The processor executes such that the one or more processors implement the methods described in any of the above embodiments.

In a fourth aspect, the embodiment of the present application provides a computer readable storage medium, where a computer program is stored, and when the program is executed by the processor, the method described in any one of the foregoing embodiments is implemented.

The pose estimation method and apparatus provided by the present application acquires a depth image video collected by the depth sensor from the depth sensor, and selects two frames from which at least one pixel image indicating the same object is selected, and then determines the mutual image in the two frame depth images. Corresponding first pixel point set and second pixel point set, and then for each pixel point in the first pixel point set, according to the first two-dimensional coordinate in the first depth image and corresponding to the pixel point A first depth value of the pixel in the second depth image determines a pose transformation parameter of the depth sensor. The pose estimation method of the present application uses the depth image to perform pose estimation, reduces the consumption of computing resources, improves the calculation efficiency, and ensures the real-time performance of the pose estimation.

DRAWINGS

Other features, objects, and advantages of the present application will become more apparent from the detailed description of the accompanying drawings.

1 is an exemplary system architecture diagram to which the present application can be applied;

2 is a flow chart of one embodiment of a pose estimation method in accordance with the present application;

3 is a schematic diagram of an application scenario of a pose estimation method according to the present application;

4 is a flowchart of determining a pose transformation parameter of a depth sensor in a pose estimation method according to the present application;

5 is a schematic diagram of a principle of pose transformation in a pose estimation method according to the present application;

6 is a flow chart of another embodiment of a pose estimation method according to the present application;

7 is a schematic structural view of an embodiment of a pose estimating apparatus according to the present application;

FIG. 8 is a block diagram showing the structure of a computer system suitable for implementing the server of the embodiment of the present application.

detailed description

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention, rather than the invention. It is also to be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings.

FIG. 1 illustrates an exemplary system architecture 100 in which embodiments of the pose estimation method or pose estimation apparatus of the present application may be applied.

As shown in FIG. 1, system architecture 100 can include depth sensor 101, network 102, and server 103. Network 102 is used to provide a medium for the communication link between depth sensor 101 and server 103. Network 102 can include a variety of connection types, such as wired, wireless communication links, fiber optic cables, and the like.

The depth sensor 101 interacts with the server 103 via the network 102 to transmit depth image video or the like. The depth sensor 101 can be mounted on various moving objects, such as an unmanned vehicle, a robot, an unmanned delivery vehicle, a smart wearable device, a virtual reality device, and the like.

The depth sensor 101 may be various depth sensors capable of continuously acquiring multi-frame depth images.

The server 103 may be a server that provides various services, such as a background server that processes depth image video acquired by the depth sensor 101. The background server can analyze and process data such as the received depth image video.

It should be noted that the pose estimation method provided by the embodiment of the present application is generally performed by the server 103. Accordingly, the pose estimation apparatus is generally disposed in the server 103.

It should be understood that the number of depth sensors, networks, and servers in Figure 1 are merely illustrative. Depending on the implementation needs, there can be any number of depth sensors, networks, and servers.

With continued reference to FIG. 2, a flow 200 of one embodiment of a pose estimation method in accordance with the present application is illustrated. The pose estimation method of this embodiment includes the following steps:

Step 201: Acquire a depth image video from the depth sensor.

In the present embodiment, the electronic device on which the pose estimation method operates (for example, the server shown in FIG. 1) can acquire the depth image video from the depth sensor by a wired connection or a wireless connection. Each frame image in the depth image video is a depth image. The depth image, also called the distance image, refers to the distance (depth) of the image collector to each point in the scene as the image of the pixel value, which directly reflects the geometry of the visible surface of the scene. Each pixel in the depth image represents the distance between the object at a particular coordinate and the camera plane of the depth sensor in the field of view of the depth sensor.

It should be noted that the above wireless connection manners may include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods now known or developed in the future.

Step 202: Select a first depth image and a second depth image from each frame depth image of the depth image video.

Wherein, at least one pixel point indicating the same object is shared in the first depth image and the second depth image, that is, at least one object is shared in the first depth image and the second depth image. For example, the first depth image and the second depth image may be adjacent two-frame depth images in the depth image video, or may be two-frame depth images in which the sequence numbers in the depth image video differ by less than a preset value.

Step 203: Determine a first pixel point set in the first depth image and a second pixel point set in the second depth image.

Each pixel point in the first pixel point set corresponds to each pixel point in the second pixel point set, and the corresponding two pixel points indicate the same object, and more specifically, the corresponding two pixel point indications The same location of the same object. It can be understood that the number of pixels in the first set of pixel points is equal to the number of pixels in the second set of pixel points, and the number thereof is the same as the pixel indicating the same object shared by the first depth image and the second depth image. The number is the same.

Step 204: For any pixel in the first set of pixel points, based on the first two-dimensional coordinates of the pixel in the first depth image and the corresponding pixel corresponding to the pixel in the second depth image A depth value determines the pose transformation parameter of the depth sensor.

The server may be based on the first two-dimensional coordinates of each pixel in the first set of pixel points in the first depth image and the corresponding pixel points in the second set of pixels corresponding to the pixel in the second depth image. A depth value is used to determine the pose transformation parameters of the depth sensor. It can be understood that the first two-dimensional coordinate is a coordinate of the pixel in the image coordinate system of the first depth image, and the depth value of the pixel is not included in the first two-dimensional coordinate. A corresponding pixel point corresponding to the pixel point exists in the second pixel point set. Since each pixel point has a depth value, the first depth value of the corresponding pixel point may be determined by the second depth image. The pose transformation parameter may be a pose transformation parameter between the first depth image and the second depth image.

With continued reference to FIG. 3, FIG. 3 is a schematic diagram of an application scenario of the pose estimation method according to the present embodiment. In the application scenario of FIG. 3, the depth sensor 301 is installed on the unmanned vehicle 302. As the unmanned vehicle 302 travels, the depth sensor 301 collects the depth image video and sends the collected depth image video to the server 303. After receiving the depth image video, the server 303 determines the pose transformation parameter of the depth sensor 301, and then sends the pose transformation parameter to the unmanned vehicle 302, and the unmanned vehicle 302 can navigate according to the pose transformation parameter. And obstacle avoidance.

The pose estimation method provided by the above embodiment of the present application acquires the depth image video collected by the depth sensor from the depth sensor, and selects two depth images of at least one pixel point indicating the same object from the two frames, and then determines the two-frame depth image. And corresponding to the first pixel point set and the second pixel point set, and then for each pixel point in the first pixel point set, according to the first two-dimensional coordinate in the first depth image and corresponding to the pixel point The first depth value of the corresponding pixel in the second depth image determines the pose transformation parameter of the depth sensor, reduces the consumption of computing resources, improves the calculation efficiency, and ensures the real-time performance of the pose estimation.

In some optional implementation manners of this embodiment, the foregoing method further includes the following steps not shown in FIG. 2:

For each frame depth image, pixels in the frame depth image that meet the preset condition are deleted; and the deleted depth image is smoothed.

The depth sensor typically emits probe light (eg, infrared, laser, radar) and receives the reflected light reflected from the surface of the object to determine the distance between the object and the depth sensor. Due to the occlusion of the object, the absorption of the detected light by the surface of the object, and the diffuse reflection, the depth sensor cannot completely receive the reflected light reflected back. Therefore, many pixel points in the depth image have no depth value or the depth value is inaccurate. In this implementation manner, in order to ensure the accuracy of the pose estimation, it is necessary to delete the pixel points in the depth image of each frame that meet the preset condition. At the same time, in order to improve the robustness of the depth value and suppress the noise of the depth value, the depth image from which the pixel is deleted can be smoothed. The above smoothing processing may include linear smoothing, interpolation smoothing, convolution smoothing, Gaussian filtering, bilateral filtering, and the like.

In some optional implementation manners of the embodiment, the depth value of each pixel point may be detected first, and the pixel point whose depth value is greater than the first preset value and smaller than the second preset value may be deleted.

Due to the limitation of the depth sensor itself, the uncertainty of the pixel point of the depth value between the first preset value and the second preset value is very high, so these pixel points need to be deleted. It can be understood that the values of the first preset value and the second preset value are related to the model of the depth sensor, which is not limited in this implementation manner.

In some optional implementation manners of the embodiment, the first partial derivative Zu of the depth image in the horizontal direction u direction and the second partial derivative Zv in the vertical direction v direction may be determined, and then determined according to Zu and Zv. The geometric edge pixel points in the frame depth image are removed from the determined geometric edge pixel points.

Since the position of the probe light emitter on the depth sensor does not coincide with the position of the probe light receiver, the depth value of the pixel at the edge of the object has a high degree of uncertainty, and at the same time on both sides of the pixel where the edge is located The pixel depth value will jump, and in order to ensure the accuracy of the pose estimation, the above geometric edge pixel points can be deleted.

In some optional implementations of this embodiment, the failed pixel points where the depth value does not exist in each depth image may be determined, and then the failed pixel points and the pixel points adjacent to the failed pixel points are deleted.

If the detection light emitted by the detection light emitter of the depth sensor is blocked or absorbed by the object, the detection light receiver cannot receive the detection light reflected back from the object, and thus the depth value of the pixel cannot be determined. These pixels are called invalid pixels. point. At the same time, in order to improve the accuracy of the pose estimation, the pixel points adjacent to the failed pixel point are also deleted.

With continued reference to FIG. 4, a flow 400 for determining a pose transformation parameter of a depth sensor in a pose estimation method in accordance with the present application is illustrated. As shown in FIG. 4, in this embodiment, the pose transformation parameter of the depth sensor can be determined by the following steps:

Step 401: Map the first two-dimensional coordinates of the pixel in the first depth image to the first three-dimensional coordinate of the coordinate system to which the first depth image belongs.

For any pixel in the first set of pixel points, the first two-dimensional coordinates (x ₁ , y ₁ ) of the pixel in the first depth image may be first determined, and then the first two-dimensional coordinates (x ₁₎ , y ₁ ) is mapped to the first three-dimensional space coordinates. At the time of mapping, the first three-dimensional space coordinates (x ₁ ', y ₁ ', z ₁ ') of the coordinate system to which the first depth image belongs may be obtained by π ^-1 mapping in the pinhole camera model.

The pinhole camera model includes a mapping relationship π of two-dimensional coordinates that project a three-dimensional space point to a pixel plane and a two-dimensional coordinate of a point with depth on the image as a mapping relationship π ^-1 of the three-dimensional space point.

Step 402: Transform the first three-dimensional space coordinate into a coordinate system to which the second depth image belongs, to obtain a second three-dimensional space coordinate.

In this embodiment, the pose transformation parameter between the first depth image and the second depth image is recorded as T _1→2 , as shown in FIG. 5 . In Fig. 5, a pedestrian in the world coordinate system x _w -y _w -z _w is denoted as point P, the depth sensor is located on the left side at time t ₁ and on the right side at time t ₂ . Point P is the point P ₁ in the depth image obtained at time t ₁ , its coordinate is (x ₁ , y ₁ ), its depth value is Z ₁ , and its coordinate system is x _c1 -y _c1 -z _c1 . Point P is a point P ₂ in the depth image obtained at time t ₂ , its coordinate is (x ₂ , y ₂ ), its depth value is Z ₂ , and its associated coordinate system is x _c2 -y _c2 -z _c2 . The pose transformation parameter between the coordinate system x _c1 -y _c1 -z _c1 and the coordinate system x _c2 -y _c2 -z _c2 is T _1→2 .

The above-described pose transformation parameter T _{1→2 is} represented by a Lie algebra se(3) as a ξ, and is represented by a matrix as a Lie group T(ξ). When the first three-dimensional space coordinate is transformed to the coordinate system of the second depth image, a Lie group T(ξ) may be preset, and the preset Lie group T(ξ) is used to complete the transformation to obtain the second three-dimensional coordinate. (x ₂ ', y ₂ ', z ₂ ').

Step 403: Map the second three-dimensional space coordinates into the second depth image to obtain the second two-dimensional coordinates.

In this embodiment, the second two-dimensional coordinates (x ₂ , y ₂ ) can be obtained by using the π mapping in the pinhole camera model.

Step 404, determining a second depth value of the second two-dimensional coordinate in the second depth image.

A second depth value of the second two-dimensional coordinate (x ₂ , y ₂ ) is determined in the second depth image. Ideally, the second depth value should be the same as the first depth value of the corresponding pixel point corresponding to the pixel point in the second pixel point set. However, due to the presence of noise, the two are often different.

Step 405: Determine a pose change parameter of the depth sensor based on the first depth value and the second depth value.

After the second depth value is obtained, the pose change parameter of the depth sensor may be determined in conjunction with the first depth value.

In some optional implementation manners of this embodiment, the foregoing step 405 can be implemented by the following steps not shown in FIG. 4:

Determining a depth difference between the first depth image and the second depth image according to the first depth value and the second depth value; determining that the depth difference is a depth residual, and performing the following iterative steps based on the depth residual: based on a depth residual, determining a pose estimation increment; determining whether the depth residual is less than a preset threshold; and in response to the depth residual being less than a preset threshold, accumulating the pose estimation increment and the first depth value, determining the pose estimate; The pose estimation value determines a pose transformation parameter of the depth sensor; in response to the depth residual being greater than or equal to a preset threshold, determining that the accumulated pose estimation increment is a depth residual, and continuing to perform the iterative step.

In this implementation manner, if the first depth value and the second depth value are not equal, determining a difference between the two is a depth difference between the first depth image and the second depth image. Taking the depth difference as the depth residual, and then performing an iterative step: determining the pose estimation increment based on the depth residual, and then determining whether the depth residual is less than a preset threshold, and if less than, determining the pose The increment and the first depth value are accumulated to obtain a pose estimation value, and the difference between the pose estimate and the second depth value at this time is within an acceptable range, and thus can be directly determined according to the pose estimate described above. Pose transformation parameters of the depth sensor. If the depth residual is greater than or equal to the preset threshold, the pose estimation increment obtained each time the iterative step is performed is accumulated, the accumulated value is taken as a new depth residual, and then the iterative step is continued.

In this implementation manner, the pose transformation parameter of the depth sensor may be determined according to the following formula:

ξ ^* =arg min∑|Z ₂ (P ₂ ')-[T(ξ)·π ^-1 (P ₁ )] _Z |

Where ξ* is an estimated value of the pose transformation parameter, Z ₂ is a second depth image, P ₁ is a pixel point in the first depth image, and P ₂ ' is a corresponding pixel point of P ₁ in the second depth image, Z ₂ (P ₂ ') is the depth value of the P ₂ ' point in the second depth image (ie, the first depth value), T is the Lie group, ξ is the pose transformation parameter, and T(ξ) is the preset bit. Lie group of pose transformation, π ^-1 is the π ^-1 map in the pinhole camera model, [T(ξ)·π ^-1 (P ₁ )] _Z is the point where P ₁ is transformed into a three-dimensional space point by π mapping and then The space is transformed to the depth value of the pixel point obtained by the π ^-1 mapping of the coordinate system to which the second depth image belongs, and the arg min function is such that ∑|Z ₂ (P ₂ ')-[T( ξ)·π ^-1 (P ₁ )] _Z | The ξ value at the minimum value is denoted as ξ*.

The pose estimation method provided by the above embodiment of the present application utilizes the depth residual between the depth images to solve the pose transformation parameters, thereby avoiding the complicated process of extracting feature points and establishing descriptors in the prior art, and saving calculations. Resources ensure the real-time performance of the calculation.

With further reference to FIG. 6, a flow 600 of another embodiment of a pose estimation method in accordance with the present application is illustrated. As shown in FIG. 6, the pose estimation method of the embodiment may further include the following steps after obtaining the pose transformation parameter of the depth sensor:

Step 601, obtaining angular velocity and acceleration from an inertial measurement device physically bound to the depth sensor.

In this embodiment, in order to further ensure the accuracy of the pose estimation, an inertial measurement unit (IMU) may be physically bound to the depth sensor. The above physical binding can be understood as the fact that the inertial measurement device is coincident with the center of the depth sensor and fixed together. The inertial measurement device measures the angular velocity and acceleration of the object's movement. In this embodiment, the server performing the pose estimation method can acquire the angular velocity and the acceleration from the inertial measurement device by wire or wirelessly.

Step 602: Determine a pose transformation parameter of the inertial measurement device according to the angular velocity and the acceleration.

The server may determine the pose transformation parameters of the inertial measurement device after acquiring the angular velocity and acceleration described above.

In some optional implementation manners of this embodiment, the foregoing step 602 may be implemented by the following steps not shown in FIG. 6:

Determining a first pose transformation parameter of the inertial measurement device according to the angular velocity; determining a second pose transformation parameter of the inertial measurement device according to the acceleration; and combining the first pose transformation parameter and the second pose transformation parameter to determine the inertial measurement device Pose transformation parameters.

In this implementation manner, determining the first pose transformation parameter by using the angular velocity, and determining the second pose transformation parameter by using the acceleration is well known to those skilled in the art, and details are not described herein again. After obtaining the first pose transformation parameter and the second pose transformation parameter, the two can be fused to determine the pose transformation parameter of the inertial measurement device.

Step 603, combining the pose transformation parameter of the depth sensor and the pose transformation parameter of the inertial measurement device to determine the integrated pose transformation parameter.

After obtaining the pose transformation parameters of the depth sensor and the pose transformation parameters of the inertial measurement device, the coupling methods (such as loose coupling or tight coupling) can be used to fuse the two to determine the integrated pose transformation parameters.

In some alternative implementations of this embodiment, to reduce the effect of noise on acceleration and angular velocity values, the acceleration and angular velocity may be first filtered prior to step 602. In this implementation manner, a complementary filter can be used to remove the acceleration and angular velocity noise, and improve the accuracy of the pose transformation parameters.

The pose estimation method provided by the above embodiment of the present application can improve the accuracy of the pose estimation parameter.

With further reference to FIG. 7, as an implementation of the method shown in the above figures, the present application provides an embodiment of a pose estimation apparatus, the apparatus embodiment corresponding to the method embodiment shown in FIG. 2, the apparatus specific Can be applied to a variety of electronic devices.

As shown in FIG. 7, the pose estimating apparatus 700 of the present embodiment includes a first acquiring unit 701, an image selecting unit 702, a pixel point set determining unit 703, and a first parameter determining unit 704.

The first obtaining unit 701 is configured to acquire a depth image video from the depth sensor.

The image selecting unit 702 is configured to select the first depth image and the second depth image from each frame depth image of the depth image video.

Wherein, at least one pixel point indicating the same object is shared in the first depth image and the second depth image.

The pixel point set determining unit 703 is configured to determine a first pixel point set in the first depth image and a second pixel point set in the second depth image.

The pixel points in the first pixel point set are in one-to-one correspondence with the pixel points in the second pixel point set, and the corresponding two pixel points indicate the same object.

The first parameter determining unit 704 is configured to: for any pixel in the first set of pixel points, based on the first two-dimensional coordinates of the pixel in the first depth image and the corresponding pixel corresponding to the pixel A first depth value in the two depth images determines a pose transformation parameter of the depth sensor.

In some optional implementation manners of the embodiment, the foregoing apparatus 700 may further include a pre-processing unit not shown in FIG. 7. The pre-processing unit includes a pixel point deletion module and a smoothing module.

The pixel point deleting module is configured to: before the image selecting unit 702 selects the first depth image and the second depth image from each frame depth image, delete the pixel that meets the preset condition in the frame depth image for each frame depth image. point.

A smoothing module for smoothing the deleted depth image.

In some optional implementation manners of the embodiment, the pixel point deleting module may be further configured to: detect a depth value of each pixel point; set the depth value to be greater than the first preset value, and the depth value is smaller than the second preset The pixel of the value is deleted.

In some optional implementation manners of the embodiment, the pixel point deleting module may be further configured to: determine a first partial derivative of the frame depth image in a horizontal direction and a second partial derivative in a vertical direction; A partial derivative and a second partial derivative are used to determine geometric edge pixel points in the frame depth image; the geometric edge pixel points are deleted.

In some optional implementation manners of the embodiment, the pixel deletion module may be further configured to: determine a failed pixel point where the depth value does not exist in the frame depth image; and the failed pixel point and the adjacent to the failed pixel point The pixel is deleted.

In some optional implementation manners of the embodiment, the first parameter determining unit 704 may further include a first mapping module, a transform module, a second mapping module, a depth value determining module, and the first Parameter determination module.

The first mapping module is configured to map the first two-dimensional coordinates of the pixel in the first depth image to the first three-dimensional coordinate of the coordinate system to which the first depth image belongs.

And a transform module, configured to transform the first three-dimensional space coordinate to a coordinate system to which the second depth image belongs, to obtain a second three-dimensional space coordinate.

The second mapping module is configured to map the second three-dimensional space coordinates into the second depth image to obtain the second two-dimensional coordinates.

a depth value determining module, configured to determine a second depth value of the second two-dimensional coordinate in the second depth image.

The first parameter determining module is configured to determine a pose transformation parameter of the depth sensor based on the first depth value and the second depth value.

In some optional implementation manners of the embodiment, the first parameter determining module may be further configured to: determine a depth difference between the first depth image and the second depth image according to the first depth value and the second depth value Determining the depth difference as the depth residual, and based on the depth residual, performing the following iterative steps: determining the pose estimation increment based on the depth residual; determining whether the depth residual is less than a preset threshold; and responding to the depth residual is less than a preset threshold, a cumulative pose estimation increment and a first depth value, determining a pose estimation value; determining a pose transformation parameter of the depth sensor according to the pose estimation value; determining, in response to the depth residual being greater than or equal to a preset threshold, determining The accumulated pose estimation increment is the depth residual and the above iterative steps are continued.

In some optional implementation manners of the embodiment, the pose estimating apparatus 700 may further include a second acquiring unit, a second parameter determining unit, and a parameter fusion unit not shown in FIG. 7.

A second acquisition unit is configured to acquire angular velocity and acceleration from an inertial measurement device physically bound to the depth sensor.

The second parameter determining unit is configured to determine a pose transformation parameter of the inertial measurement device according to the angular velocity and the acceleration.

The parameter fusion unit is configured to combine the pose transformation parameter of the depth sensor and the pose transformation parameter of the inertial measurement device to determine the integrated pose transformation parameter.

In some optional implementation manners of the embodiment, the second parameter determining unit may further include a first sub-parameter determining module, a second sub-parameter determining module, and a converging module, which are not illustrated in FIG. 7 .

The first sub-parameter determining module is configured to determine a first pose change parameter of the inertial measurement device according to the angular velocity.

The second sub-parameter determining module is configured to determine a second pose change parameter of the inertial measurement device according to the acceleration.

The fusion module is configured to combine the first pose transformation parameter and the second pose transformation parameter to determine a pose transformation parameter of the inertial measurement device.

The pose estimating apparatus provided by the above embodiment provides the depth image video acquired by the depth sensor from the depth sensor, and selects two depth images of at least one pixel indicating the same object from the two frames, and then determines the two-frame depth image. And corresponding to the first pixel point set and the second pixel point set, and then for each pixel point in the first pixel point set, according to the first two-dimensional coordinate in the first depth image and corresponding to the pixel point The first depth value of the corresponding pixel in the second depth image determines the pose transformation parameter of the depth sensor, reduces the consumption of computing resources, improves the calculation efficiency, and ensures the real-time performance of the pose estimation.

It should be understood that the units 701 to 704 described in the pose estimating apparatus 700 correspond to respective steps in the method described with reference to FIG. 2, respectively. Thus, the operations and features described above for the method for synthesizing singing voice are equally applicable to the apparatus 700 and the units contained therein, and are not described herein again. Corresponding units of device 700 may cooperate with units in the server to implement the solution of the embodiments of the present application.

Referring now to Figure 8, a block diagram of a computer system 800 suitable for use in implementing a server of an embodiment of the present application is shown. The server shown in FIG. 8 is merely an example, and should not impose any limitation on the function and scope of use of the embodiments of the present application.

As shown in FIG. 8, computer system 800 includes a central processing unit (CPU) 801 that can be loaded into a program in random access memory (RAM) 803 according to a program stored in read only memory (ROM) 802 or from storage portion 808. And perform various appropriate actions and processes. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804. An input/output (I/O) interface 805 is also coupled to bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, etc.; an output portion 807 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 808 including a hard disk or the like. And a communication portion 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the Internet. Driver 810 is also coupled to I/O interface 805 as needed. A removable medium 811, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 810 as needed so that a computer program read therefrom is installed into the storage portion 808 as needed.

In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, an embodiment of the present disclosure includes a computer program product comprising a computer program carried on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network via communication portion 809, and/or installed from removable media 811. When the computer program is executed by the central processing unit (CPU) 801, the above-described functions defined in the method of the present application are performed.

It should be noted that the computer readable medium described herein may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device. In the present application, a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems, methods and computer program products in accordance with various embodiments of the present application. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the logic functions for implementing the specified. Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software or by hardware. The described unit may also be disposed in the processor, for example, as a processor including a first acquisition unit, an image selection unit, a pixel point determination unit, and a first parameter determination unit. The names of these units do not constitute a limitation on the unit itself under certain circumstances. For example, the first acquisition unit may also be described as “a unit that acquires depth image video from the depth sensor”.

In another aspect, the present application also provides a computer readable medium, which may be included in the apparatus described in the above embodiments, or may be separately present and not incorporated into the apparatus. The computer readable medium carries one or more programs that, when executed by the device, cause the device to: acquire a depth image video from a depth sensor; from each frame depth image of the depth image video Selecting a first depth image and a second depth image, wherein the first depth image and the second depth image share at least one pixel point indicating the same object; determining the first pixel point set and the second depth image in the first depth image a second set of pixel points, wherein each pixel point in the first pixel point set corresponds to each pixel point in the second pixel point set, and the corresponding two pixel points indicate the same object; for the first pixel Determining the depth sensor based on the first two-dimensional coordinates of the pixel in the first depth image and the first depth value of the corresponding pixel corresponding to the pixel in the second depth image The pose transformation parameters.

The above description is only a preferred embodiment of the present application and a description of the principles of the applied technology. It should be understood by those skilled in the art that the scope of the invention referred to in the present application is not limited to the specific combination of the above technical features, and should also be covered by the above technical features or without departing from the above inventive concept. Other technical solutions formed by arbitrarily combining the equivalent features. For example, the above features are combined with the technical features disclosed in the present application, but are not limited to the technical features having similar functions.

Claims

A pose estimation method, characterized in that the method comprises:

Obtaining a depth image video from the depth sensor;

And selecting, from the frame depth images of the depth image video, a first depth image and a second depth image, wherein the first depth image and the second depth image share at least one pixel point indicating the same object;

Determining a first pixel point set in the first depth image and a second pixel point set in the second depth image, wherein each pixel point and the second pixel point in the first pixel point set Each pixel in the set corresponds one-to-one, and the corresponding two pixels point to the same object;

For any pixel in the first set of pixel points, based on the first two-dimensional coordinates of the pixel in the first depth image and the corresponding pixel corresponding to the pixel in the second depth image The first depth value in the determination of the pose transformation parameter of the depth sensor.
The method according to claim 1, wherein before the selecting the first depth image and the second depth image from the frame depth images, the method further comprises:

For each frame depth image, deleting a pixel point in the frame depth image that meets a preset condition;

Smoothing the deleted depth image.
The method according to claim 2, wherein the deleting the pixel points in the frame depth image that meet the preset condition comprises:

Detecting the depth value of each pixel;

A pixel point whose depth value is greater than the first preset value and whose depth value is smaller than the second preset value is deleted.
The method according to claim 2, wherein the deleting the pixel points in the frame depth image that meet the preset condition comprises:

Determining a first partial derivative of the frame depth image in a horizontal direction and a second partial derivative in a vertical direction;

Determining geometric edge pixel points in the frame depth image according to the first partial derivative and the second partial derivative;

The geometric edge pixel points are deleted.
The method according to claim 2, wherein the deleting the pixel points in the frame depth image that meet the preset condition comprises:

Determining a failed pixel point where the depth value does not exist in the depth image of the frame;

The failed pixel point and a pixel point adjacent to the failed pixel point are deleted.
The method according to claim 1, wherein the first two-dimensional coordinates in the first depth image based on the pixel point and corresponding pixel points corresponding to the pixel point are in the second depth image The first depth value in the determining the pose transformation parameter of the depth sensor, including:

Mapping a first two-dimensional coordinate of the pixel in the first depth image to a first three-dimensional coordinate of a coordinate system to which the first depth image belongs;

Converting the first three-dimensional space coordinate to a coordinate system to which the second depth image belongs, to obtain a second three-dimensional space coordinate;

Mapping the second three-dimensional space coordinates into the second depth image to obtain a second two-dimensional coordinate;

Determining a second depth value of the second two-dimensional coordinate in the second depth image;

Determining a pose change parameter of the depth sensor based on the first depth value and the second depth value.
The method according to claim 6, wherein the determining the pose transformation parameter of the depth sensor based on the first depth value and the second depth value comprises:

Determining a depth difference between the first depth image and the second depth image according to the first depth value and the second depth value;

Determining that the depth difference is a depth residual, and based on the depth residual, performing an iterative step of: determining a pose estimation increment based on the depth residual; determining whether the depth residual is less than a preset threshold; And responsive to the depth residual being less than a preset threshold, accumulating the pose estimation increment and the first depth value, determining a pose estimation value; determining a pose of the depth sensor according to the pose estimation value Transform parameters

And responsive to the depth residual being greater than or equal to a preset threshold, determining that the accumulated pose estimation increment is the depth residual, and continuing to perform the iterative step.
The method of claim 1 further comprising:

Obtaining angular velocity and acceleration from an inertial measurement device physically bound to the depth sensor;

Determining a pose transformation parameter of the inertial measurement device according to the angular velocity and the acceleration;

The pose transformation parameter of the depth sensor and the pose transformation parameter of the inertial measurement device are combined to determine an integrated pose transformation parameter.
The method according to claim 8, wherein the determining the pose transformation parameter of the inertial measurement device according to the angular velocity and the acceleration comprises:

Determining a first pose transformation parameter of the inertial measurement device according to the angular velocity;

Determining, according to the acceleration, a second pose transformation parameter of the inertial measurement device;

And combining the first pose transformation parameter and the second pose transformation parameter to determine a pose transformation parameter of the inertial measurement device.
A pose estimation apparatus, characterized in that the apparatus comprises:

a first acquiring unit, configured to acquire a depth image video from the depth sensor;

An image selecting unit, configured to select a first depth image and a second depth image from each frame depth image of the depth image video, wherein at least one of the first depth image and the second depth image indicates the same The pixel point of the object;

a pixel point set determining unit, configured to determine a first pixel point set in the first depth image and a second pixel point set in the second depth image, where each pixel in the first pixel point set The point is in one-to-one correspondence with each pixel point in the second set of pixel points, and the corresponding two pixel points indicate the same object;

a first parameter determining unit, configured to: for any pixel in the first set of pixel points, based on a first two-dimensional coordinate of the pixel in the first depth image and a corresponding pixel corresponding to the pixel A pose value of the depth sensor is determined by a first depth value in the second depth image.
The device according to claim 10, wherein the device further comprises a preprocessing unit, the preprocessing unit comprising a pixel point deleting module and a smoothing module;

The pixel point deleting module is configured to: before the image selecting unit selects the first depth image and the second depth image from the frame depth images, delete the frame depth image in accordance with the preset for each frame depth image Conditional pixel point;

The smoothing module is configured to perform smoothing on the deleted depth image.
The device according to claim 11, wherein the pixel point deleting module is further configured to:

Detecting the depth value of each pixel;

A pixel point whose depth value is greater than the first preset value and whose depth value is smaller than the second preset value is deleted.
The device according to claim 11, wherein the pixel point deleting module is further configured to:

Determining a first partial derivative of the frame depth image in a horizontal direction and a second partial derivative in a vertical direction;

Determining geometric edge pixel points in the frame depth image according to the first partial derivative and the second partial derivative;

The geometric edge pixel points are deleted.
The device according to claim 11, wherein the pixel point deleting module is further configured to:

Determining a failed pixel point where the depth value does not exist in the depth image of the frame;

The failed pixel point and a pixel point adjacent to the failed pixel point are deleted.
The apparatus according to claim 10, wherein the first parameter determining unit comprises:

a first mapping module, configured to map a first two-dimensional coordinate of the pixel in the first depth image to a first three-dimensional coordinate of a coordinate system to which the first depth image belongs;

a transform module, configured to transform the first three-dimensional space coordinate to a coordinate system to which the second depth image belongs, to obtain a second three-dimensional space coordinate;

a second mapping module, configured to map the second three-dimensional space coordinates into the second depth image to obtain a second two-dimensional coordinate;

a depth value determining module, configured to determine a second depth value of the second two-dimensional coordinate in the second depth image;

a first parameter determining module, configured to determine a pose transformation parameter of the depth sensor based on the first depth value and the second depth value.
The apparatus according to claim 15, wherein the first parameter determining module is further configured to:

Determining a depth difference between the first depth image and the second depth image according to the first depth value and the second depth value;

Determining that the depth difference is a depth residual, and based on the depth residual, performing an iterative step of: determining a pose estimation increment based on the depth residual; determining whether the depth residual is less than a preset threshold; And responsive to the depth residual being less than a preset threshold, accumulating the pose estimation increment and the first depth value, determining a pose estimation value; determining a pose of the depth sensor according to the pose estimation value Transform parameters

And responsive to the depth residual being greater than or equal to a preset threshold, determining that the accumulated pose estimation increment is the depth residual, and continuing to perform the iterative step.
The device according to claim 10, wherein the device further comprises:

a second acquiring unit, configured to acquire angular velocity and acceleration from an inertial measurement device physically bound to the depth sensor;

a second parameter determining unit, configured to determine a pose transformation parameter of the inertial measurement device according to the angular velocity and the acceleration;

And a parameter fusion unit configured to fuse the pose transformation parameter of the depth sensor and the pose transformation parameter of the inertial measurement device to determine an integrated pose transformation parameter.
The device according to claim 17, wherein the second parameter determining unit comprises:

a first sub-parameter determining module, configured to determine a first pose change parameter of the inertial measurement device according to the angular velocity;

a second sub-parameter determining module, configured to determine a second pose change parameter of the inertial measurement device according to the acceleration;

And a fusion module, configured to combine the first pose transformation parameter and the second pose transformation parameter to determine a pose transformation parameter of the inertial measurement device.
A server, comprising:

One or more processors;

a storage device for storing one or more programs,

The one or more programs are executed by the one or more processors such that the one or more processors implement the method of any of claims 1-9.
A computer readable storage medium having stored thereon a computer program, wherein the program, when executed by a processor, implements the method of any of claims 1-9.