CN114283197A

CN114283197A - Hash map-based VR handle rapid detection and pose estimation method

Info

Publication number: CN114283197A
Application number: CN202111588572.2A
Authority: CN
Inventors: 陈策
Original assignee: Shanghai Yuweia Technology Co ltd
Current assignee: Shanghai Yuweia Technology Co ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-05

Abstract

The invention discloses a quick VR handle detection and pose estimation method based on Hash map, which comprises the following steps: A. and (3) offline table building: establishing a ball taking the arm length as the radius by taking the center of the handle halo as the center of the ball, uniformly distributing a certain number of observation points on the ball, traversing all the observation points, placing a camera on the observation points, rotating the camera to ensure that the optical axis is aligned with the center of the handle halo, projecting all visible points, screening out points falling in the imaging plane of the camera, calculating a hash according to the visible points and storing the hash in a map; B. when the camera runs in real time, the light spot captured by the camera is subjected to Hash calculation and table lookup; C. the final pose of the handle is determined through data such as IMU, geomagnetism or GPS and the like among the multi-frame images and the image frames, and the accuracy of an algorithm output result is ensured. The method can obviously improve the operation speed (save more calculation power compared with a violent matching method), has more robust influence on the outlier, and can not cause great improvement of the calculation cost of the outlier.

Description

Hash map-based VR handle rapid detection and pose estimation method

Technical Field

The invention relates to the technical field of integrated virtual reality equipment, in particular to a quick VR handle detection and pose estimation method based on a Hash map.

Background

The virtual reality equipment of integral type mainly divide into two modules, and one is the head and shows for the motion of tracking user's head and show the picture that corresponds, and the other is the handle module, and the user is handheld, and the virtual reality system need carry out three-dimensional detection and location to the handle equally, in order to reach the real-time interaction of user and virtual world, and handle location mainstream scheme divide into three kinds: the first is ultrasonic positioning method, its principle is to adopt the reflective range finding method, confirm the object position through multilateral methods such as positioning, the system is made up of a main range finder and several receivers, while positioning, send the signal of the same frequency to the receiver, the reflected transmission is transmitted to the main range finder after the receiver receives, calculate the distance according to time difference of echo and transmitted wave, thus confirm the position posture, this method upgrades the frequency lower, and the precision is not high, can only reach the centimeter level, is interfered by the environment too greatly; the second is an electromagnetic scheme, wherein a transmitting end transmits magnetic fields with three axes perpendicular to each other in an environment, a target object obtains a three-dimensional pose in a space through signal processing and pose resolving of the magnetic field intensity detected by a self three-axis coil, and the scheme has the outstanding advantages of being not limited by sight blocking, but being easily influenced by the surrounding electromagnetic environment and being sensitive to metal objects; the last is an optical positioning scheme, the scheme is that LED lamp rings in known arrangement are arranged on a handle, the lamp rings on the handle are captured through a head display or an external camera to realize positioning, millimeter-scale precision can be realized, quick and accurate tracking can be realized by matching with an inertial navigation device on the handle, but a visual method can only detect and track the handle in the visual field range of the camera, the visual data volume is huge, light spots of a left handle and a right handle after camera imaging are consistent, the left handle and the right handle are detected and distinguished from image data, and the requirement of real-time operation of a moving end is met, so that great challenge is provided for an algorithm.

The prior handle has the following summarized disadvantages: the method has the advantages that the calculation amount is too large (when left and right handles are overlapped, the number of combinations needing to be traversed is too large, the pose and projection matching needs to be calculated for each combination, so that the algorithm is difficult to meet the real-time requirement), the environment robustness is poor (when other light points exist in the environment, light points on the handles cannot be quickly searched, the calculation amount is increased, the real-time requirement cannot be met), the matching error is easy (when the handle points are few, whether a group of light points are handles is difficult to judge only through image information, because outer points in the environment possibly exist in the group of points, the algorithm misjudges that the handles appear, and an incorrect pose is initialized), and therefore, the method for quickly detecting the VR handle and estimating the pose based on the Hash map is provided.

Disclosure of Invention

The invention aims to provide a quick VR handle detection and pose estimation method based on Hash map, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme: a Hash map-based VR handle rapid detection and pose estimation method comprises the following steps:

A. and (3) offline table building: establishing a ball taking the arm length as the radius by taking the center of the handle halo as the center of the ball, uniformly distributing a certain number of observation points on the ball, traversing all the observation points, placing a camera on the observation points, rotating the camera to ensure that the optical axis is aligned with the center of the handle halo, projecting all visible points, screening out points falling in the imaging plane of the camera, calculating a hash according to the visible points and storing the hash in a map;

B. when the camera runs in real time, the light spot captured by the camera is subjected to Hash calculation and table lookup;

C. the final pose of the handle is determined through data such as IMU, geomagnetism or GPS and the like among the multi-frame images and the image frames, and the accuracy of an algorithm output result is ensured.

Preferably, the camera is placed at the observation point, ensuring that the optical axis is aligned with the center of the handle halo, and ensuring that the long edge of the imaging plane is parallel to the XOY plane, by

Screening out all visible points of the camera, wherein n vectors represent normal vectors of light emitted by a handle light point Led, m vectors represent vectors from camera observation points to light ring points, when the number of effective points is larger than or equal to 3, selecting two light points as a base, rotating the camera to enable an optical axis to pass through a midpoint of a connecting line of the two base points, then enabling the two points as the base to rotate to an x axis of an imaging plane by adjusting a roll angle of the camera and be distributed in a bilateral symmetry manner, then carrying out scale transformation on all the points relative to an original point, and ensuring the distance normalization of the two points as the base from the original point.

Preferably, the coordinates of the ordinary points after projection under the correspondence transformation may be represented by p ═ s · Proj (R)₁R₀P) calculation, wherein P represents the coordinates of the corresponding light point in the camera coordinate system when the camera is at the observation point, R₀Representing the rotation matrix at the first rotation of the optical centre, R₁Rotation matrix representing the Roll angular rotation of a cameraAnd Proj represents the projection of the camera, s represents the scale transformation of the projection point on the imaging plane, after the effective point after the transformation is obtained, traversing all common points except the base, taking the x and y coordinates of the common points as a Hash value, and taking the led id corresponding to the common points and the led id of the base point, namely the led id of the three-point group as values to be stored in a Hash table.

Preferably, after traversing all the camera observation points and traversing all the results with the two-point group as the basis, the Hash table building step is completed (traversing a total of three layers, the first layer is the camera observation points, the second layer is selecting any two points in the visible points as the basis, the third layer is traversing each common point except the base point, and each common point can calculate a group of Hash values and corresponding led ids as a group of results).

Preferably, in the traversal process, two points of all light points captured by the camera are selected as a base, coordinate transformation is performed on all the other points by using the same rotation and scale transformation method in the table building process, and Hash value calculation and table lookup are performed on the transformed point coordinates.

Preferably, the table lookup result returns all possible led id combinations of the three-point group, the search results of all the three-point group are counted, the results with consistency are merged and voted, the light points and the led id matching times of the two groups of results are accumulated during each merging, the result is used as the vote of the combination, and finally n groups of led id combinations with the highest vote are screened out and used as the output of Hash search.

Preferably, for the light spot of each frame, the matching relationship between N groups of led ids and the light spot can be obtained by a Hash table look-up method, then the handle pose corresponding to each matching result is obtained by a PNP algorithm, assuming that two front and back image frames I and I +1, the frame I corresponds to N possible poses, the frame I +1 corresponds to M possible poses, and the IMU angular velocity data sequence between the two image frames is ω_i-jCalculating the rotation of the handle between two frames by IMU data:

where Δ t is the time interval between two IMU data, then traversing the look-up table and PNP to getCalculating the rotation difference between every two poses of the pose corresponding to the frame I and the frame I +1, and comparing the rotation difference with the rotation difference calculated by the IMU data to obtain a formula:

traversing all possible poses of the frame I and the frame I +1 through the formula, screening out the result of the rotation difference which is closest to the rotation difference calculated by the IMU, and using the result as the output of a pose estimation algorithm (the IMU can be expanded to the final pose of the handle determined by data such as geomagnetism or GPS and the like).

Preferably, the handle is internally provided with a heat dissipation mechanism, the heat dissipation mechanism comprises two heat dissipation modules, one heat dissipation module comprises a first temperature sensor and a heat dissipation fan, the other heat dissipation module comprises a second temperature sensor and a semiconductor refrigeration piece, the temperature detection range of the first temperature sensor is 30-35 ℃, and the temperature detection range of the second temperature sensor is 36-45 ℃.

Compared with the prior art, the invention has the following beneficial effects:

the method can obviously improve the operation speed (save more calculation power compared with a violent matching method), has more robust influence on the outlier, and can not cause great improvement of the calculation cost of the outlier.

Drawings

FIG. 1 is a view of a distribution of observation points of the present invention with the center of the handle ring as the center of the sphere;

FIG. 2 is a distribution plot of points falling within the camera imaging plane according to the present invention;

FIG. 3 is a schematic illustration of two points of the present invention as a basis rotated to the imaging plane;

FIG. 4 is a schematic representation of the distance of two points from the origin as bases for the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-4, a method for fast detecting a VR handle and estimating a pose based on a Hash map includes the following steps:

C. the final pose of the handle is determined through data such as IMU, geomagnetism or CPS and the like among the multi-frame images and the image frames, and the accuracy of an algorithm output result is ensured.

The method can obviously improve the operation speed (compared with a violent matching method, more calculation force is saved), the influence on the outlier is more robust, and the outlier cannot cause great improvement of the calculation cost.

Placing the camera at the observation point to ensure that the optical axis is aligned with the center of the handle halo, and to ensure that the long edge of the imaging plane is parallel to the XOY plane

Screening out all visible points of the camera, wherein n vectors represent normal vectors of light emitted by a handle light point Led, m vectors represent vectors from an observation point of the camera to a light ring point, when the number of effective points is larger than or equal to 3, selecting two light points as bases (in the example, selecting a point 1 and a point 5), rotating the camera to enable an optical axis to pass through a midpoint of a connecting line of the two base points, then enabling the two points as the bases to rotate to an x axis of an imaging plane by adjusting roll angles of the camera and be distributed in a bilateral symmetry mode, and then conducting scale transformation on all the points relative to an original point to ensure that the distance normalization of the two points as the bases from the original point is achieved.

In correspondence with the transformationThe coordinates of the ordinary point after projection may be represented by p ═ s · Proj (R)₁R₀P) calculation, wherein P represents the coordinates of the corresponding light point in the camera coordinate system when the camera is at the observation point, R₀Representing the rotation matrix at the first rotation of the optical centre, R₁And (2) a rotation matrix representing the Roll angular rotation of the camera, Proj represents the projection of the camera, s represents the scale transformation of the projection point on an imaging plane, after a transformed effective point is obtained, traversing all common points except the base, taking the x and y coordinates of the common points as a Hash value, and storing the led id corresponding to the common points and the led id of the base point, namely the led id of the three-point group, as values into a Hash table (the Hash table may have collision, namely the same Hash value corresponds to different three-point groups).

Traversing all the camera observation points and traversing all the two-point group as a base result, namely completing the Hash table building step (traversing a total of three layers, wherein the first layer is the camera observation points, the second layer is the base which selects any two points in the visible points, the third layer is each common point except the base point, and each common point can calculate a group of Hash values and corresponding led ids as a group of results).

In the traversal process, two points of all light points captured by the camera are selected as bases, coordinate transformation is carried out on all the other points by using the same rotation and scale transformation method in the table building process, and Hash value calculation and table lookup are carried out on the transformed point coordinates.

The result of table lookup will return all possible led id combinations of three point groups, count the search results of all three point groups, merge and vote the result with consistency (with consistency means that at least two pairs of light points between two groups of results are consistent with the matching of led id, and there is no conflict in the matching of the rest points and led id, for example, one group of results considers that light point 3 is led No. 5, another group of results considers that light point 3 is led No. 7, then there is conflict between two groups of results, and it can not merge), each merging will accumulate the light points and led id matching times of two groups of results as the voting of the combination (for example, one group of results has matching of light point 5 and led, another group has matching of light point 3 and led, when two groups of results are judged to have consistency, merge the matching results of two groups, and record the number of votes as 8), and finally, screening n groups of led id combinations with the highest ticket obtained as the output of Hash search.

For the light spot of each frame, the matching relation between N groups of led ids and the light spot can be obtained through a Hash table look-up method, then the handle poses corresponding to each matching result (namely each frame of image corresponds to N possible poses of the handle) are obtained through a PNP algorithm, assuming that two image frames I and I +1 in front and back, the frame I corresponds to N possible poses, the frame I +1 corresponds to M possible poses, and the IMU angular velocity data sequence between the two image frames is omega_i-jCalculating the rotation of the handle between two frames by IMU data:

and calculating a rotation difference between every two poses, and comparing the rotation difference with a rotation difference calculated by the IMU data to obtain a formula:

traversing all possible poses of the frame I and the frame I +1 through the formula, screening out the result of the rotation difference which is closest to the rotation difference calculated by the IMU as the output of a pose estimation algorithm, expanding the part to various sensors, such as the geomagnetic sensor on the handle, the GPS or a camera arranged on the handle, obtaining the rotation of the handle, calculating the difference between the rotation obtained by other sensors and the rotation calculated by table lookup, and screening out the result with the highest consistency as the final output (the IMU can be expanded to the data of the geomagnetic sensor or the GPS and the like to determine the final pose of the handle).

The handle is internally provided with a heat dissipation mechanism which comprises two heat dissipation modules, wherein one heat dissipation module consists of a first temperature sensor and a heat dissipation fan, the other heat dissipation module consists of a second temperature sensor and a semiconductor refrigeration piece, the temperature detection range of the first temperature sensor is 30-35 ℃, the temperature detection range of the second temperature sensor is 36-45 ℃, when the temperature of the handle is 30-35 ℃, the first temperature sensor is automatically connected with the power supply of the heat dissipation fan to achieve the purpose of primary heat dissipation of the handle, and when the temperature of the handle is 36-45 ℃, the second temperature sensor is automatically connected with the power supply of the semiconductor refrigeration piece to achieve the purpose of secondary heat dissipation of the handle, so that the normal operation of all electric equipment in the handle is ensured, and the use effect is improved.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A Hash map-based VR handle rapid detection and pose estimation method is characterized by comprising the following steps: the method comprises the following steps:

2. The Hash map-based VR handle fast detection and pose estimation method of claim 1, wherein: placing the camera at the observation point to ensure that the optical axis is aligned with the center of the handle halo, and to ensure that the long edge of the imaging plane is parallel to the XOY plane

Screening out all camera visible points, wherein n vectors representThe luminous normal vector of the handle light spot Led, the m vector represents the vector from the camera observation point to the light ring point, when the number of effective points is more than or equal to 3, two light spots are selected as a base, the camera is rotated to enable the optical axis to pass through the middle point of the connecting line of the two base points, then the roll angle of the camera is adjusted to enable the two points as the base to rotate to the x axis of the imaging plane and be distributed in a bilateral symmetry mode, then the scale transformation about the origin is carried out on all the points, and the distance normalization of the two points as the base from the origin is guaranteed.

3. The Hash map-based VR handle fast detection and pose estimation method of claim 2, wherein: the coordinates of the ordinary point after projection under the correspondence transformation can be represented by p ═ s · Proj (R)₁R₀P) calculation, wherein P represents the coordinates of the corresponding light point in the camera coordinate system when the camera is at the observation point, R₀Representing the rotation matrix at the first rotation of the optical centre, R₁And (3) representing a rotation matrix of the Roll angular rotation of the camera, Proj representing the projection of the camera, s representing the scale transformation of the projection point on an imaging plane, traversing all common points except the base after obtaining the transformed effective point, taking the x and y coordinates of the common points as Hash values, and storing the led id corresponding to the common points and the led id of the base point, namely the led id of the three-point group as values into a Hash table.

4. The Hash map-based VR handle fast detection and pose estimation method of claim 1, wherein: after traversing all the camera observation points and traversing all the results with the two-point group as the base, the Hash table building step is completed (traversing a total of three layers, the first layer is the camera observation points, the second layer is the base which selects any two points in the visible points, the third layer is each common point except the traversal base point, and each common point can calculate a group of Hash values and the corresponding led id as a group of results).

5. The Hash map-based VR handle fast detection and pose estimation method of claim 1, wherein: in the traversal process, two points of all light points captured by the camera are selected as a base, coordinate transformation is carried out on all the other points by using the same rotation and scale transformation method in the table building process, and Hash value calculation and table lookup are carried out on the transformed point coordinates.

6. The Hash map-based VR handle fast detection and pose estimation method of claim 5, wherein: and returning all possible led id combinations of the three-point group by the table lookup result, counting the search results of all the three-point group, merging and voting the results with consistency, accumulating the light points and the led id matching times of the two groups of results during each merging to be used as the voting of the combination, and finally screening n groups of led id combinations with the highest votes to be used as the output of Hash search.

7. The Hash map-based VR handle fast detection and pose estimation method of claim 1, wherein: for the light spot of each frame, the matching relation between N groups of led ids and the light spot can be obtained through a Hash table look-up method, then the handle pose corresponding to each matching result is obtained through a PNP algorithm, two image frames I and I +1 in front and back are assumed, the frame I corresponds to N possible poses, the frame I +1 corresponds to M possible poses, and the IMU angular velocity data sequence between the two image frames is omega_i...jCalculating the rotation of the handle between two frames by IMU data:

traversing all possible poses of the frame I and the frame I +1 through the formula, screening out the result of the rotation difference which is closest to the rotation difference calculated by the IMU, and using the result as the output of a pose estimation algorithm。

8. The Hash map-based VR handle fast detection and pose estimation method of claim 1, wherein: the handle is internally provided with a heat dissipation mechanism, the heat dissipation mechanism comprises two heat dissipation modules, one heat dissipation module consists of a first temperature sensor and a heat dissipation fan, the other heat dissipation module consists of a second temperature sensor and a semiconductor refrigeration piece, the temperature detection range of the first temperature sensor is 30-35 ℃, and the temperature detection range of the second temperature sensor is 36-45 ℃.