CN118314200A

CN118314200A - Pose acquisition method, pose acquisition device, terminal and storage medium

Info

Publication number: CN118314200A
Application number: CN202310029633.4A
Authority: CN
Inventors: 郑远力
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Filing date: 2023-01-09
Publication date: 2024-07-09

Abstract

The application discloses a pose acquisition method, a pose acquisition device, a terminal and a storage medium, and relates to the technical field of computers. The method comprises the following steps: acquiring a shooting image obtained by shooting a target object by a camera; obtaining a preliminary coordinate of the target object under a camera coordinate system according to the physical size of the target object and the pixel size of the detection frame corresponding to the target object; determining the preliminary pose of the target object under the camera coordinate system based on the preliminary coordinates and the preliminary pose of the target object under the camera coordinate system; and adjusting the preliminary pose according to a first contour mask corresponding to the target object to obtain the final pose of the target object under a camera coordinate system, wherein the first contour mask is used for identifying the corresponding pixels of the target object in the photographed image. According to the application, the preliminary pose of the target object under the camera coordinate system is obtained through the detection frame of the target object in the shot image, and then the preliminary pose is adjusted according to the outline mask of the target object, so that the obtained pose is more accurate.

Description

Pose acquisition method, pose acquisition device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a pose acquisition method, a pose acquisition device, a terminal and a storage medium.

Background

Currently, people can interact between virtual and reality through a virtual reality device, which refers to a device that uses virtual reality technology.

In the related art, the Virtual Reality device includes a VR (Virtual Reality) head display and a VR handle, a plurality of infrared light sources are arranged on the VR handle, and an image is acquired by an infrared camera on the VR head display to the VR handle in the process of using the Virtual Reality device by a user, so that the position and the posture of the VR handle under the VR head display coordinate system are calculated according to the infrared light source information in the image, and then the operation of the user is synchronized to an application program.

However, in the use process of the VR handle, the infrared light source is easily shielded and interfered by the hand, so that the problems of reduced pose determination accuracy, incapability of obtaining and the like of the VR handle are caused.

Disclosure of Invention

The embodiment of the application provides a pose acquisition method, a pose acquisition device, a terminal and a storage medium, which can improve the pose determination accuracy of a target object. The technical scheme is as follows:

according to an aspect of an embodiment of the present application, there is provided a pose determination method, the method including:

Acquiring a shooting image obtained by shooting a target object by a camera;

Obtaining preliminary coordinates of the target object under a camera coordinate system according to the physical size of the target object and the pixel size of a detection frame corresponding to the target object, wherein the detection frame is used for identifying the area of the target object in the shot image;

determining a preliminary pose of the target object under the camera coordinate system based on the preliminary coordinate and the preliminary pose of the target object under the camera coordinate system, wherein the preliminary pose is obtained based on a first inertial measurement unit arranged on the target object;

and adjusting the preliminary pose according to a first contour mask corresponding to the target object to obtain a final pose of the target object under the camera coordinate system, wherein the first contour mask is used for identifying pixels corresponding to the target object in the photographed image.

According to an aspect of an embodiment of the present application, there is provided a pose determination apparatus, the apparatus including:

The shooting image acquisition module is used for acquiring shooting images obtained by shooting a target object by a camera;

The primary coordinate acquisition module is used for acquiring primary coordinates of the target object under a camera coordinate system according to the physical size of the target object and the pixel size of a detection frame corresponding to the target object, wherein the detection frame is used for identifying the area of the target object in the shot image;

The initial pose acquisition module is used for determining the initial pose of the target object under the camera coordinate system based on the initial coordinate and the initial pose of the target object under the camera coordinate system, wherein the initial pose is obtained based on a first inertia detection unit arranged on the target object;

The primary pose adjusting module is used for adjusting the primary pose according to a first contour mask corresponding to the target object to obtain a final pose of the target object under the camera coordinate system, and the first contour mask is used for identifying pixels corresponding to the target object in the photographed image.

According to an aspect of an embodiment of the present application, there is provided a terminal device including a processor and a memory, in which a computer program is stored, the computer program being loaded and executed by the processor to implement the above-described pose acquisition method.

According to an aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the above-described pose acquisition method.

According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the terminal device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the terminal device executes the above-described pose acquisition method.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

The method comprises the steps of obtaining the approximate coordinates of a target object under a camera coordinate system through the real size of the target object and the size of a detection frame corresponding to the target object in a shooting image, obtaining the approximate pose of the target object under the camera coordinate system by combining the approximate pose determined by a first inertia detection unit on the target object, optimizing the approximate pose according to the outline information corresponding to the target object in the shooting image, and enabling the pixel size of the target object in the image to be matched with the physical size of the target object in an actual physical environment, so that the accuracy and the writing reality of the pose of the target object under the camera coordinate system are improved, and the obtaining accuracy of the pose is further improved.

In addition, the coordinates of the target object under the camera coordinate system are obtained based on the corresponding detection frame and the outline information of the target object in the shot image, so that the coordinate can be obtained without the aid of texture information of the target object according to the technical scheme provided by the embodiment of the application, the pose acquisition of the object with insufficient texture information can be supported according to the technical scheme provided by the embodiment of the application, and the applicability of the technical scheme provided by the embodiment of the application is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment for an embodiment of the present application;

FIG. 2 is a schematic diagram of an implementation environment for an embodiment of the present application;

FIG. 3 is a schematic diagram of a VR head display provided in one embodiment of the present application;

FIG. 4 is a schematic illustration of a VR ring provided in accordance with one embodiment of the present application;

FIG. 5 is a flow chart of a pose acquisition method provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a detection frame provided by an embodiment of the present application;

FIG. 7 is a schematic illustration of a first profile mask provided in accordance with one embodiment of the present application;

FIG. 8 is a schematic illustration of a second profile mask provided in accordance with one embodiment of the present application;

FIG. 9 is a block diagram of a pose acquisition device provided by an embodiment of the present application;

fig. 10 is a block diagram of a pose acquisition device according to another embodiment of the present application;

Fig. 11 is a block diagram of a terminal device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Before describing embodiments of the present application, related terms referred to in the present application will be first described.

1. XR: extended Reality, XR technology is used to overlay or blend into the real world and virtual environment through computer text and graphics to augment or replace the field of view of people looking into the world. XR technology describes a range of methods of changing Reality, including VR (Virtual Reality), augmented Reality (Augmented Reality, AR), and Mixed Reality (MR).

2. IMU: inertial Measurement Unit an inertial measurement unit, which may be composed of sensors such as accelerometers, gyroscopes, magnetometers, etc.

3. And (3) head display: a head mounted display, such as a VR head display.

4. Finger ring: a virtual reality interaction device worn on a hand, such as a VR finger ring, a VR handle, a VR bracelet, and the like.

5. Posture: for the pose of the object in three-dimensional space, it may be represented in forms such as a rotation vector, a rotation matrix, four elements, and the like.

6. Pose determination: the pose may be used to represent a position and a posture, the pose may also be used to represent a position, and the pose may also be used to represent a posture, which is not limited by embodiments of the present application. For example, pose determination may be to indicate the position and pose of an object in three-dimensional space.

7. Texture: a pattern of varying strokes of pixel colors of the target surface.

The technical scheme provided by the embodiment of the application will be specifically described.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown. The implementation environment may include: a terminal device 10 and a target object 20.

The terminal device 10 may refer to an electronic device having an image capturing function. Illustratively, a camera may be mounted on the terminal device 10, for example, the terminal device 10 may be an electronic device such as a mobile phone, a tablet computer, a game console, a multimedia playing device, a PC (Personal Computer ), a vehicle-mounted terminal, a smart robot, a virtual reality device, or the like. For example, taking terminal device 10 as a VR display device as an example, terminal device 10 may refer to a VR head display, a VR mask, VR glasses, and the like.

A client of a target application, such as a game-type application, a simulation learning-type application, a virtual reality-type application, an augmented reality-type application, a social-type application, an interactive entertainment-type application, or the like, may be installed in the terminal device 10. In some examples, the terminal device 10 is further configured with a background server for providing background services for the above-described target application (e.g., virtual reality class application). The background server can be a server, a server cluster formed by a plurality of servers, or a cloud computing service center.

The target object 20 may refer to an object provided with an IMU. Optionally, the target object 20 is further provided with communication means (such as means for communication using a wireless network, a wired network, or the like) for supporting communication with the terminal device 10. Through the communication means, the data measured by the IMU can be transmitted to the terminal device 10. The target object 20 is not limited in this embodiment of the present application.

The terminal device 10 may determine the pose of the target object 20 under the camera coordinates by using the data measured by the IMU on the target object 20 (i.e., IMU data 1) and the photographed image obtained by photographing the target object 20 by the camera, so as to capture the pose of the target object 20. Optionally, in the process of determining the pose by the terminal device 10, if the terminal device 10 moves, an IMU may be set on the terminal device 10 to obtain IMU data 2 corresponding to the terminal device 10, so as to obtain the pose of the terminal device 10 under the camera coordinate system, so as to assist in capturing the pose of the target object 20.

In one example, the target object 20 may be a device such as a VR interaction device, an AR interaction device, an MR interaction device, or the like. Referring to fig. 2, taking a virtual reality scenario as an example, the terminal device 10 and the target object 20 may be combined into a set of virtual reality devices, which may include a VR head display 201 (i.e., the terminal device 10) and a VR finger ring 202 (i.e., the target object 20). Optionally, the virtual reality device may be further configured with a display 203, or the virtual reality device may be further externally connected with the display 203, which is not limited by the embodiment of the application.

Referring to fig. 3, an image capturing device (such as a camera 2011) and an IMU 2012 are disposed in the VR head display 201, and the camera 2011 is configured to capture a captured image of the VR finger ring 202, so as to obtain a captured image of the VR finger ring 202. The IMU 2012 is configured to obtain a pose of the VR head 201 in the camera coordinate system, and in an embodiment of the present application, the pose of the IMU 2012 in the camera coordinate system may be determined as the pose of the VR head 201 in the camera coordinate system. The camera coordinate system may refer to a three-dimensional coordinate system constructed based on the camera 2011 (such as an optical center), optionally, a z-axis of the three-dimensional coordinate system points to the front of the camera 2011, a y-axis points to the ground, and an x-axis points to the right of the camera 2011. The user may wear VR headset 201 on the head to view the virtual environment.

Referring to fig. 4, an IMU 2021 is disposed in the VR finger ring 202, where the IMU 2021 is configured to obtain a pose of the VR finger ring 202 in the earth coordinate system, and in an embodiment of the present application, the pose of the IMU 2021 in the earth coordinate system may be determined as the pose of the VR finger ring 202 in the earth coordinate system. Optionally, the earth coordinate system has the IMU as an origin, the east direction as an x-axis, the north direction as a y-axis, and the z-axis perpendicular to the x-axis and the y-axis and pointing to the sky. Communication may be made between VR finger ring 202 and VR header 201. The user can wear the VR ring 202 on the finger, and the user can control the VR ring 202 to move by moving the finger, so as to control the virtual object 204 to move in the virtual environment, thereby realizing interaction between reality and virtual. And the user observes the interactive animation corresponding to the virtual object 204 displayed by the display 203 through the VR head display 201, so as to realize the experience of being in the scene. Optionally, the VR finger ring 202 further includes a wearing part 2022 for supporting the wearing of the finger of the user, where the wearing part 2022 is attached to the shape of the finger (such as a complete cylinder or an incomplete cylinder), and the dotted line portion 2023 of the wearing part 2022 may be elastically adjusted, so as to facilitate the wearing of the user.

Optionally, VR header 201 has computing capabilities. The method can determine the preliminary coordinates of the VR ring 202 in the camera coordinate system according to the corresponding detection frame of the VR ring 202 in the photographed image, determine the preliminary gesture of the VR ring 202 in the camera coordinate system according to the data obtained by the measurement of the IMU2012 and the IMU 2021, and optimize the preliminary coordinates and the preliminary gesture of the VR ring 202 in the camera coordinate system according to the corresponding first contour mask of the VR ring 202 in the photographed image to obtain the final coordinates and the final gesture of the VR ring 202 in the camera coordinate system.

According to the embodiment of the application, the pose acquisition of the target object can be realized by adopting 2 IMUs and one camera, and compared with the prior art which needs a plurality of cameras or a plurality of infrared light sources or a plurality of ultrasonic sensors and the like, the terminal equipment can be miniaturized more, so that the applicability of the terminal equipment is improved. Meanwhile, as only 2 IMUs and one camera are needed, the electric quantity is saved, and therefore the power consumption of the terminal equipment is reduced.

The technical scheme of the application will be described by the method embodiment.

Referring to fig. 5, a flowchart of a pose obtaining method according to an embodiment of the present application is shown, where the main execution body of each step of the method may be the terminal device 10 (e.g. a client in the terminal 10) in the implementation environment of the solution shown in fig. 1, and the method may include the following steps (501 to 504).

In step 501, a photographed image obtained by photographing a target object with a camera is acquired.

The terminal device is provided with a camera, and the terminal device can control the camera to periodically collect images in the real scene. Under the condition that the pose of the target object needs to be captured, the terminal equipment pointedly shoots the target object, and a shooting image containing the target object is also obtained. For example, referring to fig. 2, a user may control VR head display 201 and VR finger ring 202 such that VR finger ring 202 is within the shooting range of VR head display 201, thereby obtaining a shot image including VR finger ring 202 in real time. Optionally, the terminal device may be provided with a camera, and the camera may be replaced by other image capturing devices, which is not limited in the embodiment of the present application.

The target object may refer to any object selected by a user to be within a field angle of a camera before shooting, such as a vehicle, a toy, an animal, a wearable device, a virtual reality interaction device (e.g., VR finger ring, VR handle, VR bracelet, VR necklace, VR leg ring, VR foot ring, etc.), and so on.

After the camera generates the photographed image, the terminal device may acquire the photographed image immediately and process the photographed image. In the embodiment of the present application, the terminal device may perform target detection on the captured image to obtain a detection frame of the target object in the captured image, where the detection frame may be used to identify an area where the target object is located in the captured image. For example, referring to fig. 6, a target object 601 is included in a photographed image, and the target object 601 is worn on a finger of a user. The terminal device processes the shot image by using a target detection algorithm to obtain a detection frame 602 corresponding to the target object 601. The object detection algorithm may be a method of deep learning neural network, such as SSD (Single Shot Detector, an object detection network), YOLO (You Only Look Once, an object detection network), RETINAFACE (a face detection network), CNN (Convolutional Neural Network ), R-CNN (Region-CNN, region-based convolutional neural network), faster R-CNN, and the like, and the above object detection algorithm may also be an image color matching algorithm, which is not limited in the embodiment of the present application. The detection frame in the embodiment of the application can be a rectangular frame, and the detection frame has data such as width, height and center coordinates.

In the embodiment of the application, the terminal equipment can perform image segmentation on the shot image to obtain the first contour mask of the target object in the shot image, wherein the first contour mask is used for identifying the corresponding pixels of the target object in the shot image. For example, referring to fig. 7, the terminal device processes the photographed image using an image segmentation algorithm to obtain pixels corresponding to the target object 601, and then obtains the first contour mask 603 corresponding to the target object 601 based on the pixels. The image segmentation algorithm may be a method based on a deep learning neural network, such as Mask-RCNN (an image segmentation network), or an image segmentation algorithm based on color distribution, such as spraying a specific color on a target object, spraying an infrared reflective material on the target object (corresponding to capturing an image by using an infrared camera), or the like, which is not limited in the embodiment of the present application.

Step 502, obtaining preliminary coordinates of the target object under a camera coordinate system according to the physical size of the target object and the pixel size of a detection frame corresponding to the target object, where the detection frame is used for identifying the area where the target object is located in the photographed image.

The physical dimensions of the target object refer to the actual dimensions of the target object, such as length, height, width, and the like. The pixel size of the detection frame may refer to the height and width of the detection frame in pixel units in the photographed image. The preliminary coordinates are used for representing initial values of the position of the target object under the camera coordinate system, and the accuracy of the preliminary coordinates is lower than that of the final coordinates.

In one example, the preliminary coordinate acquisition process may be as follows:

1. The maximum value of the length, width, and height included in the physical dimension is determined as a first dimension parameter.

In the present example, the physical dimensions of the target object are known, and a three-dimensional stereoscopic model corresponding to the target object may be constructed according to the physical dimensions of the target object, the physical dimensions of the three-dimensional stereoscopic model being consistent with the physical dimensions of the target object.

For example, if the maximum value of the length, width, and height included in the physical dimension is the length, the length may be determined as a first dimension parameter, which may be denoted as d.

2. The maximum value of the width and the height included in the pixel size is determined as the second size parameter.

For example, in the case where the detection frame is a rectangular frame, the width and the height of the detection frame are acquired, and if the maximum value of the width and the height is the height, the height may be determined as a second size parameter, which may be denoted as L.

3. And obtaining the depth value of the target object under the camera coordinate system according to the first dimension parameter, the second dimension parameter and the internal parameters of the camera, wherein the internal parameters are used for representing the optical properties of the camera.

The internal parameters of the camera can be used to represent the conversion relationship between the imaging plane corresponding to the camera and the pixel plane corresponding to the image taken. The internal parameters of the camera are known parameters, which can be expressed as follows:

Wherein, f _x =αf, α is the scaling multiple of the pixel coordinate on the u-axis, f is the image distance of the camera, f _y =βf, β is the scaling multiple of the pixel coordinate on the v-axis, u-axis and v-axis are the coordinate axes of the pixel plane coordinate system corresponding to the captured image, the origin on the pixel plane coordinate system is generally located at the upper left corner of the captured image, the u-axis of the pixel plane coordinate system is parallel to the x-axis to the right, the v-axis is parallel to the y-axis, u ₀ is the coordinate of the optical center of the camera on the u-axis on the captured image, and v ₀ is the coordinate of the optical center of the camera on the v-axis on the captured image.

The above depth value may be used to represent the distance between the target object and the terminal device at the camera coordinates, and the process of obtaining the depth value may be represented by the following formula:

4. And converting the center point coordinate of the detection frame under the shot image according to the depth value and the internal reference of the camera to obtain the plane coordinate of the target object under the camera coordinate system.

The plane coordinates refer to x-axis coordinates and y-axis coordinates of the target object in the camera coordinate system, respectively. In the embodiment of the application, the center point coordinate of the detection frame can be used as the pixel coordinate of the target object in the shot image, the center point coordinate of the detection frame is recorded as (u, v), the conversion process is used for converting the coordinate under the pixel plane coordinate system into the coordinate under the camera coordinate system, and then the plane coordinate can be expressed as follows:

5. And combining according to the depth value and the plane coordinate to obtain the preliminary coordinate of the target object under the camera coordinate system.

The preliminary coordinates in the preliminary embodiment of the present application are three-dimensional coordinates, where the combination is used to combine the plane coordinates and the depth values into three-dimensional coordinates of the target object in the camera coordinate system, that is, the coordinates with the plane coordinates as the x-axis and the y-axis, and the depth values as the z-axis coordinates, where the preliminary coordinates of the target object in the camera coordinate system may be expressed as follows:

The embodiment of the application can acquire the preliminary coordinates of the target object under the camera coordinate system based on the shot image of the target object, namely, only one camera is needed to be configured for the terminal equipment, thus being beneficial to reducing the volume of the terminal equipment, and further enabling the terminal equipment to be miniaturized.

In step 503, a preliminary pose of the target object in the camera coordinate system is determined based on the preliminary coordinates and the preliminary pose of the target object in the camera coordinate system, the preliminary pose being obtained based on a first inertial measurement unit provided on the target object.

The pose may be used to represent a position and a posture, the pose may also be used to represent a position, and the pose may also be used to represent a posture, which is not limited by embodiments of the present application. In the embodiment of the application, the preliminary coordinates can represent the position of the target object, and the preliminary pose of the target object can be obtained by combining the preliminary pose of the target object. The preliminary gesture may be constructed in the form of a rotation matrix, and then the preliminary gesture is a matrix of 3 rows and 3 columns. The preliminary gesture may also be constructed in the form of a rotation vector (3 rows and 1 columns of vectors) and then converted into a rotation matrix. Note that if the rotation matrix is R _h,r and the rotation vector is V _h,r, the conversion relationship between the two can be expressed as follows:

R_h,r＝exp((V_h,r)^)；

Where, (. Times.). Times.a. Represents a mathematical oblique symmetric matrix operation.

In the embodiment of the application, the preliminary gesture can be obtained by adopting an IMU, for example, the preliminary gesture can be obtained by respectively arranging one IMU on the terminal equipment and the target object and combining data obtained by respectively measuring the two IMUs.

In one example, the preliminary gesture acquisition process may be as follows:

1. a first attitude of a target object in an earth coordinate system, which is obtained based on a first inertial measurement unit provided on the target object, is acquired.

The terminal equipment can acquire data such as acceleration, angular velocity, magnetic field intensity and the like of the target object through the first inertial measurement unit. For example, the data measured by the first inertial measurement unit may include acceleration, angular velocity and magnetic field strength of the target object in three axes of the earth coordinate system, respectively.

The terminal equipment can adopt an IMU nine-axis fusion algorithm, calculate and obtain a first posture of the first inertial measurement unit under the earth coordinate system based on the data obtained by measuring the first inertial measurement unit, and then determine the first posture of the first inertial measurement unit under the earth coordinate system as a first posture of the target object under the earth coordinate system, wherein the first posture is marked as R _w,r. The IMU nine-axis fusion algorithm may be, for example, madgwick algorithm, complementary filtering algorithm, etc., which is not limited in this embodiment of the present application.

2. The method comprises the steps of obtaining a second gesture of the terminal equipment in the earth coordinate system, which is set by a camera, and a third gesture of the terminal equipment in the camera coordinate system, wherein the second gesture is obtained based on a second inertial measurement unit arranged on the terminal equipment.

The dimensions of the terminal device are known, the position of the IMU on the terminal device is also known, then the position, the pose of the second inertial measurement unit on the terminal device with respect to the camera is known, then the third pose of the second inertial measurement unit in the camera coordinate system may be taken as the third pose of the terminal device in the camera coordinate system, which is denoted R _h,I.

The terminal equipment obtains data based on measurement of the second inertial measurement unit, and a second posture of the second inertial measurement unit under the earth coordinate system can be obtained by adopting an IMU nine-axis fusion algorithm, and then the second posture of the second inertial measurement unit under the earth coordinate system is taken as the second posture of the terminal equipment under the earth coordinate system, and the second posture is recorded as R _w,I. Optionally, only one IMU needs to be arranged on the terminal equipment and the target object, so that the gesture can be obtained.

3. And integrating the first gesture, the reciprocal of the second gesture and the third gesture to determine the preliminary gesture of the target object under the camera coordinate system.

The preliminary pose of the target object in the camera coordinate system may be expressed as follows:

4. And combining the preliminary coordinates and the preliminary gestures to obtain the preliminary pose of the target object under the camera coordinate system.

The preliminary pose of the target object in the camera coordinate system is noted as (t _h,r,R_h,r).

According to the embodiment of the application, the initial gesture of the target object under the camera coordinate system can be determined through the IMU, which is beneficial to further reducing the volume of the terminal equipment, so that the terminal equipment is miniaturized, and particularly, the volume of the virtual reality interaction equipment is beneficial to further reducing under the condition that the target object is the virtual reality interaction equipment, so that the miniaturization is realized.

And step 504, adjusting the preliminary pose according to a first contour mask corresponding to the target object to obtain the final pose of the target object under a camera coordinate system, wherein the first contour mask is used for identifying the corresponding pixels of the target object in the photographed image.

The first contour mask refers to a contour mask corresponding to the target object in the photographed image. The first contour mask can be used for optimizing the initial pose, so that the pose of the target object acquired by the terminal equipment under the camera coordinate system is more accurate. According to the embodiment of the application, the preliminary pose is optimized by shortening the distance between the contour mask in the shot image and the real contour information of the target object (namely, reducing the difference between the contour mask and the real contour information of the target object), so that the pose is more accurate and real.

In one example, the final pose acquisition process may be as follows:

1. And acquiring size data of the three-dimensional model corresponding to the target object.

The size data of the three-dimensional model is consistent with the size data of the target object, and the size data of the three-dimensional model can be used for representing the real contour information of the target object. Alternatively, the three-dimensional model may be a CAD (Computer-AIDED DESIGN) model corresponding to the target object, which is not limited by the embodiment of the present application.

2. And projecting the three-dimensional model corresponding to the target object onto a shooting image according to the size data and the preliminary pose, and obtaining a second contour mask corresponding to the target object, wherein the second contour mask is used for identifying pixels corresponding to the three-dimensional model in the shooting image.

The second contour mask refers to a contour mask after the target object is projected to the photographed image.

In one example, the internal parameters of the camera may be obtained first, and then the first adjustment parameters may be obtained according to the internal parameters and the preliminary pose; and finally, converting the size data according to the first adjustment parameters to obtain a second contour mask corresponding to the target object. The first adjustment parameter can be obtained by integrating the internal reference and the initial pose, and the second outline mask corresponding to the target object can be obtained by integrating the first adjustment parameter and the size data, and the projection process can be represented by the following formula:

mask＝K*[exp((V_h,r)^),t_h,r]*CAD；

the CAD is dimension data of a three-dimensional model.

Alternatively, the projection process may also be expressed by the following formula:

mask＝K*[R_h,r,t_h,r]*CAD。

for example, referring to fig. 8, a three-dimensional model corresponding to the target object 601 is projected onto the captured image, and a second contour mask 604 corresponding to the target object 601 is obtained, where the second contour mask 604 and the first contour mask 603 substantially overlap. The three-dimensional model is projected through the initial pose, so that the three-dimensional model is matched with the initial pose, and the obtained second contour mask is the projection of the three-dimensional model under the initial pose, so that the accuracy of pose adjustment is improved.

3. And adjusting the preliminary pose according to the first contour mask and the second contour mask to obtain the final pose of the target object under the camera coordinate system.

Optionally, in the embodiment of the present application, the IOU (Intersection Over Union, cross-over ratio) is used to represent the difference between the first contour mask and the second contour mask, and the final pose obtaining process may be as follows:

1. An intersection ratio between the first profile mask and the second profile mask is obtained.

For example, the first contour mask is mask1, the second contour mask is mask2, and according to the position coordinates of the pixels corresponding to mask1 and the position coordinates of the pixels corresponding to mask2, the same pixels (i.e. intersections) between mask1 and mask2 may be determined to obtain the area size composed of the same pixels, and meanwhile, according to all the pixels corresponding to mask1 and mask2, the area size composed of all the pixels (i.e. union) may be obtained, and the area corresponding to the intersection and the area corresponding to the union may be divided to obtain the intersection ratio, which may be expressed as follows:

the larger the IOU value, the smaller the difference between the first profile mask and the second profile mask, and the smaller the IOU value, the larger the difference between the first profile mask and the second profile mask.

2. And carrying out iterative adjustment on the preliminary pose by taking the maximized intersection ratio as a target to obtain the final pose of the target object under the camera coordinate system.

And searching for new preliminary coordinates and preliminary gestures in the vicinity of the preliminary coordinates and the preliminary gestures with the maximized intersection ratio as a target so as to obtain the final pose of the target object under the camera coordinate system. The method for adjusting the initial pose is not limited in the embodiment of the application.

For example, an exhaustive approach may be employed: and in a certain interval around the preliminary coordinates and the preliminary gesture, all the values are exhausted according to a certain step length, the IOU corresponding to each group of values is recorded, and finally, the group of values with the largest IOU is selected and used as the final pose (final coordinates+final poses) of the target object under the camera coordinate system.

For another example, a perturbation method may be employed: and adding data disturbance in decibels on the primary coordinates and the primary gesture to obtain the primary coordinates and the primary gesture after disturbance, and then obtaining a new IOU by utilizing the primary coordinates and the primary gesture after disturbance. Subtracting the new IOU from the IOU to obtain a Jacobian matrix J of the IOU about each variable, and optimizing the optimal preliminary coordinates and preliminary gestures by adopting a gradient descent method, a Gauss Newton method, an LM (Levenberg-Marquardt ) method and the like, and taking the optimal preliminary coordinates and preliminary gestures as final gestures of the target object under a camera coordinate system. According to the embodiment of the application, the contact ratio between the first contour mask and the second contour mask can be more intuitively represented through the contact ratio, and the pose can be adjusted by maximizing the contact ratio, so that the contact ratio between the first contour mask and the second contour mask can be intuitively improved, and the accuracy of acquiring the pose is improved.

In an exemplary embodiment, the technical solution provided in the embodiment of the present application may be applied to a virtual reality scene, and then the camera may be set in a virtual reality VR display device, where the target object is a VR interaction device, and the VR display device is used to display a virtual environment, and the VR interaction device is used to capture an operation action of a user in a real environment. Illustratively, the VR display device may be a VR head display, a VR mask, VR glasses, or the like. The VR interactive device may be a VR finger ring, a VR handle, a VR bracelet, a VR necklace, a VR leg ring, a VR foot ring, or the like, which is not limited in the embodiments of the present application.

Because only one IMU is required to be installed on the VR interactive device, pose capture of the VR interactive device can be achieved, and therefore the VR interactive device can be miniaturized. By way of example, the VR interaction device may be configured as a ring that is wearable on a finger, i.e., a VR ring, so that shielding of the VR ring from a user's line of sight may be reduced, and wearing may be facilitated, thereby improving user experience.

In summary, according to the technical scheme provided by the embodiment of the application, through the real size of the target object and the size of the corresponding detection frame of the target object in the photographed image, the approximate coordinates of the target object under the camera coordinate system can be obtained, and the approximate pose of the target object under the camera coordinate system can be obtained by combining the approximate pose determined by the first inertia detection unit on the target object, and then the approximate pose is optimized according to the corresponding contour information of the target object in the photographed image, so that the pixel size of the target object in the image is matched with the physical size of the target object in the actual physical environment, thereby improving the accuracy and writing reality of the pose of the target object under the camera coordinate system, and further improving the acquisition accuracy of the pose.

In addition, according to the embodiment of the application, the IMU and one camera (namely one shot image) are adopted, so that the pose acquisition of the target object can be realized, and compared with the prior art which needs a plurality of cameras or a plurality of infrared light sources or a plurality of ultrasonic sensors and the like, the terminal equipment can be more miniaturized, and the applicability of the terminal equipment is improved. Meanwhile, as only the IMU and one camera are needed, the electric quantity is saved, and therefore the power consumption of the terminal equipment is reduced.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 9, a block diagram of a pose acquisition device according to an embodiment of the present application is shown. The device has the function of realizing the method example, and the function can be realized by hardware or can be realized by executing corresponding software by hardware. The device may be the terminal device described above, or may be provided in the terminal device. As shown in fig. 9, the apparatus 900 includes: a captured image acquisition module 901, a preliminary coordinate acquisition module 902, a preliminary pose acquisition module 903, and a preliminary pose adjustment module 904.

A photographed image acquisition module 901 for acquiring a photographed image obtained by photographing a target object by a camera.

The preliminary coordinate obtaining module 902 is configured to obtain, according to a physical size of the target object and a pixel size of a detection frame corresponding to the target object, a preliminary coordinate of the target object in a camera coordinate system, where the detection frame is used to identify an area where the target object is located in the captured image.

The preliminary pose acquisition module 903 is configured to determine a preliminary pose of the target object in the camera coordinate system based on a preliminary coordinate and a preliminary pose of the target object in the camera coordinate system, where the preliminary pose is obtained based on a first inertia detection unit set on the target object.

The primary pose adjustment module 904 is configured to adjust the primary pose according to a first contour mask corresponding to the target object, so as to obtain a final pose of the target object under the camera coordinate system, where the first contour mask is used to identify a pixel corresponding to the target object in the captured image.

In some embodiments, as shown in fig. 10, the preliminary pose adjustment module 904 includes: a dimension data acquisition sub-module 904a, a projection mask acquisition sub-module 904b, and a final pose acquisition sub-module 904c.

And the dimension data acquisition submodule 904a is used for acquiring dimension data of the three-dimensional model corresponding to the target object.

The projection mask obtaining submodule 904b is configured to project a three-dimensional model corresponding to the target object onto the captured image according to the size data and the preliminary pose, so as to obtain a second contour mask corresponding to the target object, where the second contour mask is used to identify pixels corresponding to the three-dimensional model in the captured image.

And a final pose acquisition submodule 904c, configured to adjust the preliminary pose according to the first contour mask and the second contour mask, so as to obtain a final pose of the target object in the camera coordinate system.

In some embodiments, the final pose acquisition submodule 904c is configured to:

Acquiring the intersection ratio between the first contour mask and the second contour mask;

and iteratively adjusting the initial pose with the maximum intersection ratio as a target to obtain the final pose of the target object under the camera coordinate system.

In some embodiments, the projection mask acquisition submodule 904b is configured to:

Acquiring an internal reference of the camera, the internal reference being used to represent an optical property of the camera;

obtaining a first adjustment parameter according to the internal parameters and the preliminary pose;

and converting the size data according to the first adjustment parameters to obtain a second contour mask corresponding to the target object.

In some embodiments, the preliminary coordinate acquisition module 902 is configured to:

Determining the maximum value of the length, the width and the height included in the physical dimension as a first dimension parameter;

determining the maximum value of the width and the height included in the pixel size as a second size parameter;

Obtaining a depth value of the target object under the camera coordinate system according to the first size parameter, the second size parameter and an internal parameter of the camera, wherein the internal parameter is used for representing the optical property of the camera;

Converting the center point coordinate of the detection frame under the shot image according to the depth value and the internal reference of the camera to obtain the plane coordinate of the target object under the camera coordinate system;

And combining to obtain the preliminary coordinates of the target object under the camera coordinate system according to the depth value and the plane coordinates.

In some embodiments, the preliminary pose acquisition module 903 is configured to:

acquiring a first posture of the target object under an earth coordinate system, which is obtained based on the first inertial measurement unit;

acquiring a second gesture of the terminal equipment under the earth coordinate system, in which the camera is arranged, and a third gesture of the terminal equipment under the camera coordinate system, wherein the second gesture is acquired based on a second inertial measurement unit arranged on the terminal equipment;

The first gesture, the reciprocal of the second gesture and the third gesture are subjected to product finding to determine the preliminary gesture of the target object under the camera coordinate system;

And combining the preliminary coordinates and the preliminary gestures to obtain the preliminary pose of the target object under the camera coordinate system.

In some embodiments, the camera is disposed in a virtual reality V display device, the target object is a VR interactive device, the VR display device is configured to display a virtual environment, and the VR interactive device is configured to capture an operation action of a user in a real environment.

In some embodiments, the target object is a VR finger ring.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

In an exemplary embodiment, the embodiment of the present application further includes the following:

1. A pose determination method, the method comprising:

Acquiring a shooting image obtained by shooting a target object by a camera;

Determining a preliminary pose of the target object under the camera coordinate system based on the preliminary coordinate and the preliminary pose of the target object under the camera coordinate system, wherein the preliminary pose is obtained based on a first inertia detection unit arranged on the target object;

2. The method according to claim 1, wherein the adjusting the preliminary pose according to the first contour mask corresponding to the target object to obtain the final pose of the target object in the camera coordinate system includes:

acquiring size data of a three-dimensional model corresponding to the target object;

According to the size data and the preliminary pose, projecting the three-dimensional model corresponding to the target object onto the photographed image to obtain a second contour mask corresponding to the target object, wherein the second contour mask is used for marking pixels corresponding to the three-dimensional model in the photographed image;

and adjusting the preliminary pose according to the first contour mask and the second contour mask to obtain the final pose of the target object under the camera coordinate system.

3. The method according to claim 1 or 2, wherein said adjusting the preliminary pose according to the first contour mask and the second contour mask to obtain a final pose of the target object in the camera coordinate system comprises:

4. A method according to any one of claims 1 to 3, wherein projecting the three-dimensional model onto the captured image according to the size data and the preliminary pose to obtain a second contour mask corresponding to the target object comprises:

5. The method according to any one of claims 1 to 4, wherein the obtaining the preliminary coordinates of the target object in the camera coordinate system according to the physical size of the target object and the pixel size of the detection frame corresponding to the target object includes:

6. The method of any of claims 1-5, wherein the determining the preliminary pose of the target object in the camera coordinate system based on the preliminary coordinates and the preliminary pose of the target object in the camera coordinate system comprises:

7. The method of any of claims 1-6, wherein the camera is disposed in a virtual reality V display device, the target object is a VR interactive device, the VR display device is configured to present a virtual environment, and the VR interactive device is configured to capture an operational motion of a user in a real environment.

8. The method of any one of claims 1-7, wherein the target object is a VR finger ring.

Referring to fig. 11, a block diagram of a terminal device 1100 according to an embodiment of the present application is shown. The terminal device is used for implementing the pose acquisition method provided in the embodiment. The terminal device may be the terminal device 10 in the implementation environment shown in fig. 1. Specifically, the present application relates to a method for manufacturing a semiconductor device.

In general, the terminal apparatus 1100 includes: a processor 1101 and a memory 1102.

Optionally, the processor 1101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1101 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field Programmable GATE ARRAY ), PLA (Programmable Logic Array, programmable logic array). The processor 1101 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1101 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1101 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Alternatively, memory 1102 may include one or more computer-readable storage media, which may be non-transitory. Memory 1102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1102 is used to store a computer program, and is configured to be executed by one or more processors to implement the above-described pose acquisition method.

In some embodiments, the terminal device 1100 may further optionally include: a peripheral interface 1103 and at least one peripheral. The processor 1101, memory 1102, and peripheral interface 1103 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1103 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1104, a display screen 1105, audio circuitry 1106, and a power supply 1107.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is not limiting and that terminal device 1100 may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In some embodiments, a computer readable storage medium is also provided, in which a computer program is stored which, when executed by a processor, implements the above-described pose acquisition method.

Alternatively, the computer-readable storage medium may include: ROM (Read-Only Memory), RAM (Random-Access Memory), SSD (Solid State disk), or optical disk. The random access memory may include, among other things, reRAM (RESISTANCE RANDOM ACCESS MEMORY, resistive random access memory) and DRAM (Dynamic Random Access Memory ).

In some embodiments, a computer program product is also provided, the computer program product comprising a computer program stored in a computer readable storage medium. The processor of the terminal device reads the computer program from the computer readable storage medium, and the processor executes the computer program so that the terminal device executes the above-mentioned pose acquisition method.

It should be noted that, the information (including, but not limited to, object device information, object personal information, etc.), data (including, but not limited to, data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the object or sufficiently authorized by each party, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant country and region. For example, operations, VR finger ring, VR head display, etc. involved in the present application are all acquired with sufficient authorization.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

The foregoing description of the exemplary embodiments of the application is not intended to limit the application to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the application.

Claims

1. A pose determination method, the method comprising:

Acquiring a shooting image obtained by shooting a target object by a camera;

3. The method according to claim 2, wherein said adjusting the preliminary pose according to the first contour mask and the second contour mask to obtain a final pose of the target object in the camera coordinate system comprises:

4. The method according to claim 2, wherein projecting the three-dimensional model onto the captured image according to the size data and the preliminary pose to obtain a second contour mask corresponding to the target object, includes:

5. The method according to claim 1, wherein the obtaining the preliminary coordinates of the target object in the camera coordinate system according to the physical size of the target object and the pixel size of the detection frame corresponding to the target object includes:

6. The method of claim 1, wherein the determining the preliminary pose of the target object in the camera coordinate system based on the preliminary coordinates and the preliminary pose of the target object in the camera coordinate system comprises:

7. The method of claim 1, wherein the camera is disposed in a virtual reality V display device, the target object is a VR interactive device, the VR display device is configured to present a virtual environment, and the VR interactive device is configured to capture an operation action of a user in a real environment.

8. The method of claim 7, wherein the target object is a VR finger ring.

9. A pose determination apparatus, the apparatus comprising:

10. A terminal device, characterized in that it comprises a processor and a memory in which a computer program is stored, which computer program is loaded and executed by the processor to implement the pose acquisition method according to any of claims 1 to 8.

11. A computer readable storage medium having stored therein a computer program that is loaded and executed by a processor to implement the pose acquisition method according to any of claims 1 to 8.

12. A computer program product, characterized in that it comprises a computer program stored in a computer readable storage medium, from which a processor reads and executes the computer program to implement the pose acquisition method according to any of claims 1 to 8.