WO2024061238A1 - 一种估计手柄位姿的方法及虚拟显示设备 - Google Patents

一种估计手柄位姿的方法及虚拟显示设备 Download PDF

Info

Publication number
WO2024061238A1
WO2024061238A1 PCT/CN2023/119844 CN2023119844W WO2024061238A1 WO 2024061238 A1 WO2024061238 A1 WO 2024061238A1 CN 2023119844 W CN2023119844 W CN 2023119844W WO 2024061238 A1 WO2024061238 A1 WO 2024061238A1
Authority
WO
WIPO (PCT)
Prior art keywords
handle
light spot
light
target
light emitter
Prior art date
Application number
PCT/CN2023/119844
Other languages
English (en)
French (fr)
Inventor
黄志明
史灿灿
曾杰
周祺晟
郑贵桢
Original Assignee
海信电子科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202211149262.5A external-priority patent/CN116433569A/zh
Priority claimed from CN202211183832.2A external-priority patent/CN116430986A/zh
Priority claimed from CN202211390797.1A external-priority patent/CN116433752A/zh
Application filed by 海信电子科技(深圳)有限公司 filed Critical 海信电子科技(深圳)有限公司
Publication of WO2024061238A1 publication Critical patent/WO2024061238A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer

Definitions

  • the present disclosure relates to the field of virtual reality interaction technology and provides a method for estimating the posture of a handle and a virtual display device.
  • handles are usually used to achieve regular interactions, just like the control relationship between a personal computer (PC) and a mouse.
  • PC personal computer
  • the premise of realizing interaction with the virtual world through a handle is to obtain the 6DOF pose between the handle and the virtual display device, so that the handle can control the display screen of the virtual display device based on the 6DOF pose. Therefore, the position and posture of the handle relative to the virtual display device determines the accuracy of the handle's control of the virtual display device, which affects the user's immersive experience. Therefore, it has important research value.
  • the present disclosure provides a method for estimating the pose of a handle and a virtual display device, which are used to improve the accuracy of relative pose estimation between the handle and the virtual display device.
  • the present disclosure provides a method for estimating the pose of a handle, which is applied to a virtual display device.
  • the virtual display device interacts with the handle.
  • the handle is used to control the screen displayed by the virtual display device.
  • the handle is equipped with a IMU and multiple light emitters, the virtual display device is equipped with a multi-camera camera matching the type of the light emitter, and the method includes:
  • the target spot set of each light emitter is obtained according to the target handle image, and based on the target spot set, the observation data synchronously collected by the IMU and the handle
  • the optimized 3D spatial structure of each light emitter on the controller initializes the relative pose between the handle and the virtual display device; wherein, the 3D spatial structure is each light emitter in multiple frames of initial handle images collected from different position angles. annotation Optimized results;
  • the current relative pose between the handle and the virtual display device is predicted, combined with the continuous acquisition by the IMU
  • the observation data is used to determine the current target relative pose between the handle and the virtual display device.
  • the present disclosure provides a virtual display device, which includes a processor, a memory, a display screen, a communication interface, and a multi-camera.
  • the display screen is used to display images.
  • the virtual display device communicates with a handle through the communication interface. Communication, the handle is used to control the picture displayed on the display screen, and the type of the multi-camera matches the lighting type of the multiple light emitters on the handle;
  • the communication interface, the multi-camera, the display screen, the memory and the processor are connected through a bus, the memory stores a computer program, and the processor performs the following operations according to the computer program:
  • the target spot set of each light emitter is obtained according to the target handle image, and based on the target spot set, the observation data synchronously collected by the IMU and the data on the handle
  • the optimized 3D spatial structure of each light emitter initializes the relative pose between the handle and the virtual display device; wherein the 3D spatial structure is the structure of each light emitter in multiple frames of initial handle images collected from different position angles.
  • the annotation results are optimized;
  • the current relative pose between the handle and the virtual display device is predicted, combined with the continuous acquisition by the IMU
  • the observation data is used to determine the current target relative pose between the handle and the virtual display device.
  • the processor optimizes the 3D spatial structure of each light emitter on the handle in the following manner:
  • the 3D coordinates and first identification of each light emitter are obtained;
  • each light emitter obtain the 2D coordinates and second identification of the light spot formed by each light emitter on the corresponding initial handle image for the light emitters pre-marked on the multiple frames of initial handle images collected at different position angles;
  • the handle is determined based on the 3D coordinates of the light emitter and the 2D coordinates of the light spot with the same first identification and the second identification, as well as the observation data of the IMU corresponding to the corresponding frame.
  • the processor optimizes the 3D spatial structure of each light emitter on the handle and also performs:
  • the first 3D point cloud composed of each light emitter on the handle corresponding to the optimized 3D spatial structure, and the first 3D point cloud composed of each light emitter on the handle corresponding to the pre-optimized 3D spatial structure The second 3D point cloud composed of the second 3D point cloud is determined to determine the conversion pose between the first 3D point cloud and the second 3D point cloud before and after optimization;
  • the 3D coordinates of each light emitter on the handle are re-determined to obtain the second optimized 3D spatial structure.
  • the reprojection error equation is:
  • K n represents the projection parameter of the nth camera
  • Represent respectively the rotation matrix and translation vector between the handle and camera No. 0 represent the rotation matrix and translation vector between the nth camera and the 0th camera respectively
  • p m,n represents the 2D coordinates of the second light spot marked with m.
  • the processor obtains the target spot set of each light emitter based on the target handle image.
  • the specific operations are:
  • Obtain the current environment brightness determine the respective binarization thresholds of at least two binarization methods according to the current environment brightness, and perform binarization processing on the target handle image according to each binarization threshold to obtain the binarization handle image;
  • Contour detection is performed within the global scope of the binary handle image to obtain a set of candidate contours for each light emitter, where each contour represents a light spot;
  • abnormal contours in the candidate contour set are eliminated to obtain the target light spot set of each light emitter.
  • the processor determines respective binarization thresholds of at least two binarization methods based on the current ambient brightness, and performs the processing on the visible light handle image according to each binarization threshold. Binarization processing is performed to obtain the binarized handle image.
  • the specific operations are:
  • the target binarization threshold is obtained by weighting
  • the grayscale handle image is binarized according to the target binarization threshold to obtain a binarized handle image.
  • the processor determines respective weights corresponding to the at least two binarized thresholds based on the comparison results.
  • the specific operations are:
  • the first weight corresponding to the first binarization threshold calculated by the first binarization method is set to be greater than the second binarization threshold calculated by the second binarization method.
  • the first value calculated by the first binarization method is set.
  • the first weight corresponding to the binarization threshold is smaller than the second weight corresponding to the second binarization threshold calculated by the second binarization method;
  • the first binarization method is used to solve the histogram distribution containing a single peak
  • the second binarization method is used to solve the histogram distribution containing a double peak
  • the processor performs light spot detection within the global scope of the binary handle image to obtain the target light spot set of each light emitter.
  • the specific operations are:
  • abnormal contours in the candidate contour set are eliminated to obtain the target light spot set of each light emitter.
  • the processor eliminates abnormal contours in the candidate contour set based on the contour contrast information, including one or more of the following:
  • the Euclidean distance between the center points of the circumscribing rectangles of the two candidate contours and the minimum Manhattan distance of the edges of the two candidate contours are determined respectively, and based on the Euclidean distance and the The minimum Manhattan distance is used to eliminate abnormal contours;
  • For each candidate contour in the candidate contour set calculate the distance between the candidate contour and the nearest neighbor candidate contour, and eliminate outliers and abnormal contours based on the distance;
  • the processor removes abnormal contours based on the Euclidean distance and the minimum Manhattan distance.
  • the specific operations are:
  • the brightness average of the two candidate contours is calculated respectively, and the candidate contour corresponding to the small brightness average is eliminated.
  • the processor eliminates abnormal contours based on the quantitative relationship between pixels in the candidate contour with the largest area and the candidate contour with the second largest area.
  • the specific operations are:
  • the candidate contour with the largest area is eliminated.
  • the processor removes outlier abnormal contours according to the distance, and the specific operation is:
  • the candidate contours are eliminated.
  • the method by which the processor eliminates abnormal contours from the candidate contour set based on the a priori contour shape information includes one or more of the following:
  • the processor initializes the handle and the handle according to the target light spot set, the observation data synchronously collected by the IMU, and the optimized 3D spatial structure of each light emitter on the handle. Describe the relative posture between virtual display devices. The specific operations are:
  • the relative posture between the handle and the virtual display device is initialized.
  • the processor matches each light emitter on the optimized 3D spatial structure with the target light spot in the target light spot set, and establishes a correspondence between the 3D light emitter and the 2D light spot. relationship, the specific operations are:
  • any target light spot in the target light spot set select a first specified number of candidate light spots adjacent to the target light spot from the target light spot set, and compare the target light spot with the first specified light spot.
  • a number of candidate light spots are connected to obtain a planar figure;
  • each light spot in the planar graphic is matched with each light emitter in the set of actual adjacent light emitters to obtain each adjacent light emitter set.
  • Spot matching pair, its , each adjacent light spot matching pair includes an image spot index of the light spot and a first identification of the light emitter matching the light spot;
  • each light emitter For any predicted pose, project each light emitter into a designated image according to the predicted pose to obtain each projected light spot, and based on each projected light spot, calculate the specified image except for the plane graphics containing Other light spots other than each light spot are matched with each light emitter on the handle to obtain each other light spot matching pair, wherein each other light spot matching pair includes the image light spot index of the other light spot and the image light spot index of the other light spot.
  • the first identifier of the light emitter corresponding to the matching projection light spot;
  • Each light spot matching pair is screened according to the number of each other light spot matching pair, and each target light spot matching pair is obtained according to the number of each screened light spot matching pair, and the target light spot matching pair is centered on the third light spot matching pair of the light emitter.
  • An identification is determined as the second identification of the target light spot corresponding to the image light spot index, wherein the light spot matching pair includes the adjacent light spot matching pair and the other light spot matching pairs, and each matching pair represents a 3D light emitter and Correspondence between 2D light spots.
  • the processor selects a first specified number of candidate light spots adjacent to the target light spot from the target light spot set.
  • the specific operations are:
  • the distance between the target light spot and the other light spots is obtained
  • the processor combines each light spot in the planar graphic with the actual adjacent light emitters according to the optimized 3D spatial structure.
  • Each light emitter in the set is matched separately to obtain matching pairs of adjacent light spots. The specific operation is:
  • each light emitter in the set of actually adjacent light emitters in a specified order. For the currently traversed light emitter, use the light emitter as the initial position, and use other light emitters actually adjacent to the light emitter to emit light.
  • the processor sorts according to the specified order to obtain the sorted list;
  • any light emitter in the sorted list add the first identifier of the light emitter and the image spot index of the light spot whose position in the light spot list is the same as the position of the light emitter in the sorted list.
  • the same adjacent light spots are matched and centered;
  • the processor before projecting each light emitter into a specified image according to the predicted pose, the processor further executes:
  • the adjacent light spot matching pairs that need to be deleted are determined based on the predicted gravity direction vector corresponding to each adjacent light spot matching pair and the actual direction vector, and the adjacent light spot matching pairs that need to be deleted are deleted.
  • the processor determines the adjacent light spot matching pairs that need to be deleted through the predicted gravity direction vector corresponding to each adjacent light spot matching pair and the actual direction vector.
  • the specific operations are:
  • the adjacent light spot matching pair is determined to be the adjacent light spot matching pair that needs to be deleted.
  • the processor compares other light spots in the specified image except the light spots included in the planar graphics with each light emitter on the handle according to each of the projected light spots. Perform matching to obtain matching pairs of other light spots.
  • the specific operations are:
  • the shortest distance among the distances is less than the specified distance, add the image spot index of the other light spots and the first identification of the light emitter corresponding to the projection light spot corresponding to the shortest distance to the same light spot matching pair, and The light spot matching pair is determined as the other light spot matching pair.
  • the processor filters each light spot matching pair according to the number of each other light spot matching pair, and obtains each target light spot matching pair according to the number of each filtered light spot matching pair.
  • the specific operations are:
  • the light spot matching pair with the largest number among the light spot matching pairs is determined as the target light spot matching pair corresponding to the image light spot index.
  • the processor determines the relationship between the handle and the virtual display device based on the predicted The current relative pose and the observation data continuously collected by the IMU are used to determine the current target relative pose between the handle and the virtual display device.
  • the specific operations are as follows;
  • the position of each light emitter in the current target handle image is determined. local scope
  • the posture of the IMU and the posture of the camera According to the relative posture of the IMU and the handle, the posture of the IMU and the posture of the camera, the target relative posture between the current handle and the virtual display device is obtained.
  • the pre-integration constraint equation is:
  • the reprojection constraint equation is:
  • the result of combining the pre-integration constraint equation and the reprojection constraint equation is:
  • the present disclosure provides a computer-readable storage medium storing computer-executable instructions for causing a computer device to perform estimating a handle pose according to some embodiments.
  • an IMU and multiple light emitters are installed on the handle, and a multi-camera is installed on the virtual display device, and the type of the camera matches the type of the light emitter.
  • the relative posture between the handle and the virtual display device enables the handle to control the picture displayed by the virtual display device and completes the interaction with the virtual world.
  • multiple frames of initial handle images are collected from different positions and angles to ensure that a complete number of light emitters on the handle are obtained, thereby optimizing light emission based on the light emitters in the multiple frames of initial handle images.
  • the 3D spatial structure of the device is improved to improve the accuracy of subsequent relative pose calculations; in the pose estimation process, based on the optimized 3D spatial structure and the target spot set extracted from the first frame of the target handle image collected by each camera and the observations of the IMU Data, initialize the relative pose between the handle and the virtual display device. Since the interference of environmental factors is eliminated when extracting the target light spot set, it helps to improve the accuracy of the relative pose calculation.
  • the non-first-order data collected by the camera will be Frame target handle image, based on the relative pose between the handle and the virtual display device corresponding to the historical target handle image, predict the relative pose between the handle and the virtual display device corresponding to the current target handle image, and then combine it with the observation data of the IMU to achieve vision
  • the inertial navigation jointly optimizes the relative pose to obtain a smooth and accurate target relative pose between the current controller and the virtual display device.
  • Figure 1 is a schematic diagram of application scenarios of VR equipment and handles according to some embodiments
  • Figure 2A is a schematic diagram of a virtual display device including a multi-camera according to some embodiments
  • Figure 2B is a schematic diagram of a 6DOF handle including multiple LED white light lamps according to some embodiments
  • Figure 2C is a schematic diagram of a 6DOF handle including multiple LED infrared lights according to some embodiments
  • Figure 3A is a schematic diagram of light emitter abnormality detection according to some embodiments.
  • FIG3B is a schematic diagram of abnormal detection of a light emitter according to some embodiments.
  • Figure 4 is an overall architecture diagram of a method for estimating handle pose according to some embodiments.
  • Figure 5 is a flow chart of a method for optimizing the 3D spatial structure of each light emitter on the handle according to some embodiments
  • Figure 6A is a handle image collected by a binocular infrared camera before labeling according to some embodiments
  • Figure 6B is a handle image collected by a binocular infrared camera after labeling according to some embodiments
  • FIG7 is a schematic diagram of a PnP principle according to some embodiments.
  • FIG8 is a diagram showing an architecture of visual inertial navigation combined optimization for estimating handle pose according to some embodiments
  • Figure 9 is a flow chart of a method for jointly estimating handle pose with visual inertial navigation according to some embodiments.
  • Figure 10 is a flow chart of a light spot detection method according to some embodiments.
  • Figure 11 is a flow chart of a method for image binarization processing according to some embodiments.
  • Figure 12 is a flowchart of a method for eliminating abnormal contours using the Euclidean distance and the minimum Manhattan distance between each two candidate contours according to some embodiments;
  • Figure 13 is a flowchart of a method for eliminating abnormal contours by utilizing the quantitative relationship between pixels in the two selected candidate contours according to some embodiments;
  • Figure 14 is a flowchart of a method for eliminating outlier abnormal contours using the distance between candidate contours and nearest neighbor candidate contours according to some embodiments
  • Figure 15 is a flow chart of a method for matching 2D light spots with 3D light emitters according to some embodiments
  • Figure 16 is a schematic plan view of adjacent light spots according to some embodiments.
  • Figure 17 is a flowchart of a method for quickly matching each light spot in a planar graphic with a set of actual adjacent light emitters according to some embodiments
  • Figure 18 is a flowchart of a method for screening adjacent light spot matching pairs according to some embodiments.
  • Figure 19 is a flow chart of a method for determining other light spot matching pairs according to some embodiments.
  • Figure 20 is a flowchart of a method for real-time estimating the relative pose between a handle and a virtual display device according to some embodiments
  • Figure 21 is a structural diagram of a virtual display device according to some embodiments.
  • Virtual display devices such as AR and VR generally refer to head-mounted display devices (referred to as head displays or helmets, such as VR glasses, AR glasses, etc.) with independent processors, which have independent computing, input and output functions.
  • Virtual display devices can be connected to external handles, and users can control the virtual images displayed by the virtual display devices by operating the handles to achieve conventional interactions.
  • FIG1 is a schematic diagram of an application scene of a virtual display device and a handle according to some embodiments
  • the player uses the handle to interact with the virtual world.
  • the relative position of the display device controls the game screen of the virtual display device, and responds to the changes in the game scene with physical movements, so as to experience an immersive experience and enhance the fun of the game.
  • the virtual game screen of the virtual display device is projected on the TV, which is more entertaining.
  • handles include 3DOF handles and 6DOF handles.
  • 3DOF handles output a 3-dimensional rotation posture
  • a 6DOF handle outputs a 3-dimensional translation position and a 3-dimensional rotation posture.
  • the game actions that the 6DOF controller can make are more complex and more interesting.
  • the light emitters can emit different types of light (such as infrared light, white light, etc.), and the multi-eye camera on the virtual display device (in the figure The type circled in 2A) should be adapted to the type of light emitting.
  • FIG. 2B is a schematic diagram of a 6DOF handle according to some embodiments.
  • the LED lights provided on the 6DOF handle emit white light
  • the white dot holes are the positions of each LED light.
  • the multi-camera on the virtual display device should be an RGB camera.
  • FIG. 2C is a schematic diagram of another 6DOF handle according to some embodiments.
  • the LED light provided on the 6DOF handle emits infrared light (invisible to the human eye).
  • the multi-camera on the virtual display device should be an infrared camera.
  • the premise of using a controller to interact with the virtual world is to obtain the posture of the controller in the virtual world, so that the controller can control the display screen of the virtual display device based on the 6DOF posture.
  • the main method for locating the posture of the handle is to use the infrared camera on the virtual display device to capture the infrared image of the emitter on the handle, and then use image recognition and image tracking to track these infrared emitters, combined with the handle
  • the 3D space structure of the light emitter is used to perform operations such as matching the light emitter and calculating 3D coordinates, and finally the relative pose between the handle and the virtual display device can be obtained.
  • the accuracy is low, resulting in a large pose estimation error; at the same time, through the 3D spatial structure of the light emitter on the handle and the image
  • the 2D light spot in the camera can calculate the pose of the handle in the current frame.
  • the number of light emitters in a single frame image collected by the camera is limited, resulting in low accuracy in pose estimation.
  • the consecutive multi-frame images collected by the camera The observations of the light emitters are not correlated with each other, resulting in poor smoothness during the interaction process and affecting the visual experience.
  • an inertial measurement unit is also installed inside the handle, which is used to measure the movement speed of the handle, including acceleration and angular velocity, and the movement speed of the handle will also Affects the relative posture between the controller and the virtual display device.
  • IMU inertial measurement unit
  • embodiments of the present disclosure provide a method for estimating the pose of a handle and a virtual display device. Based on the annotation results of the light emitter in the handle images collected by the multi-camera of the virtual display device at different positions and angles, the luminescence on the handle is optimized. 3D spatial structure of the controller, thereby improving the accuracy of the controller pose estimation; and, using the observations collected by the IMU on the controller Based on the measurement data and the handle image collected by the camera on the virtual display device, the pose estimation method jointly optimized by visual inertial navigation is used to obtain a smoother and more accurate handle pose.
  • the embodiment of the present disclosure performs a series of processing operations on the image collected by the camera, and removes abnormal 2D spots of the detected light emitter in the image to improve the accuracy of the light emitter detection. and robustness.
  • FIG 4 is an overall architecture diagram of a method for estimating handle pose according to some embodiments, which mainly includes two parts: preprocessing and relative pose estimation.
  • the preprocessing part mainly uses the annotation results of each light emitter in the multi-frame initial handle image collected by the multi-camera on the virtual display device at different positions and angles to optimize the 3D spatial structure of the light emitter on the handle to obtain more accurate luminescence. 3D coordinates of the controller, thereby improving the accuracy of controller pose estimation.
  • the relative pose estimation part mainly uses the target handle image collected by the camera and the observation data collected by the IMU, and uses the visual inertial navigation joint optimization method to estimate the relative pose between the handle and the virtual display device in real time.
  • the relative pose estimation part spot detection is performed on the target handle image collected by the camera, and the target spot set of each light emitter on the handle in the image is obtained, combined with the optimized 3D spatial structure of each light emitter and the image collected by the IMU. Observe the data and perform relative pose estimation.
  • the pose estimation process it is necessary to match the 3D points of each light emitter on the handle with the 2D points of the light spots formed by each light emitter in the image.
  • the first identification of each light emitter on the handle is in the design drawing. are set, therefore, the matching process can be regarded as the second identification process of determining the matched light spots of each light emitter.
  • the 3D spatial structure of each light emitter can be obtained based on the design drawing of the handle, including the position of each light emitter (represented by 3D coordinates) and the first identification (represented by a digitally encoded ID).
  • the position of each light emitter represented by 3D coordinates
  • the first identification represented by a digitally encoded ID
  • embodiments of the present disclosure optimize the 3D spatial structure of each light emitter based on multiple frames of different initial handle images collected.
  • the optimization process can use handle images collected by at least two pre-calibrated cameras on the virtual display device, or can also use pre-calibrated independent multiple cameras to collect images. Set the handle image, but no matter which camera is used, the type of camera is matched to the type of light emitted by the light emitter on the handle.
  • S501 According to the 3D spatial structure of each light emitter before optimization, obtain the 3D coordinates and first identification of each light emitter.
  • the 3D spatial structure of each light emitter before optimization is determined by the design drawings of the handle. By measuring the design drawings of the handle, the 3D coordinates of each light emitter on the handle in the 3D space structure before optimization can be obtained, as well as the first position of each light emitter. logo.
  • S502 According to the pre-marked light emitters on the multi-frame initial handle images collected at different position angles, obtain the 2D coordinates and second identification of the light spot formed by each light emitter on the corresponding initial handle image.
  • a multi-camera camera that matches the light emitting type of the light emitter is used to collect multiple frames of initial handle images from different positions and angles to ensure that the light emitting device on the handle is illuminated. All devices were collected. After obtaining multiple frames of initial handle images, manually mark the position of the center point of each light emitter in each frame of the initial handle image (represented by 2D coordinates), as well as the second identification of each light emitter (represented by a digitally encoded ID). ). Wherein, the second identification of each light emitter is consistent with the 3D spatial structure of each light emitter.
  • the light emitter on the handle is an LED infrared light and the acquisition camera is a binocular infrared camera on the virtual display device.
  • the initial handle image is an infrared handle image.
  • Figure 6A it is the infrared handle image collected by the binocular infrared camera before labeling. After manual labeling, the binocular infrared handle image is shown in Figure 6B.
  • the positions and numbers of the handle's light emitters are different in the single-frame infrared handle images collected simultaneously.
  • the infrared handle image collected by one infrared camera contains five LED infrared spots first identified as 2, 3, 4, 5, and 7.
  • the infrared light spots collected by another infrared camera contains 8 LED infrared spots with the first identification numbers 2, 3, 4, 5, 6, 7, 8, and 9.
  • the 2D coordinates of the light spots formed on the corresponding initial handle images of each light emitter can be obtained based on the annotation results of the initial handle images of each frame. and a second logo.
  • the 3D coordinates of each light emitter are optimized using the Structure from Motion (SFM) idea to obtain the optimized For the 3D spatial structure of the light emitter, please refer to S503-S506 for details.
  • SFM Structure from Motion
  • S503 For each frame of the initial handle image, determine the relative pose between the handle and the acquisition camera based on the 2D coordinates and 3D coordinates of the light emitter with the same first and second identifiers, and the observation data of the IMU corresponding to the corresponding frame.
  • For each frame of the initial handle image perform the following operations: use the PnP (Perspective-n-Points) algorithm based on the 2D coordinates of the spot with the same second identifier in the 2D image and the first identifier in the 3D space and the 3D coordinates of the light emitter, Determine the first relative pose between the handle corresponding to the frame and the acquisition camera, and the observation data of the IMU corresponding to the frame Perform integration to obtain the second relative pose between the handle and the acquisition camera. By fusing the first relative pose and the second relative pose, the relative pose between the handle and the acquisition camera corresponding to the frame is obtained.
  • PnP Perspective-n-Points
  • the PnP algorithm refers to solving the object motion positioning problem based on 3D and 2D point pairs. Its principle is shown in Figure 7.
  • O represents the optical center of the camera.
  • 3D points of the object in the 3D space such as A, B, C, D
  • the camera is projected on the image plane and the corresponding 2D points (such as a, b, c, d) are obtained.
  • the distance between the camera and the object can be estimated.
  • Posture In the embodiment of the present disclosure, the projection relationship between the 3D point and the 2D point can be reflected by the first identification and the second identification of the light emitter.
  • S504 Construct a reprojection error equation, and simultaneously optimize each relative pose and 3D coordinate according to the reprojection error equation to obtain the first optimized 3D spatial structure.
  • each camera Since each camera is calibrated before use, the projection parameters of each camera (also called internal parameters) and the relative poses between cameras are known. Therefore, in S504, based on the projection parameters of each camera, the relative pose between the cameras, the 3D coordinates of each light emitter on the handle, and the 2D coordinates of the light spot formed by each light emitter in the initial handle image collected by each camera , construct a reprojection error equation, and by minimizing the reprojection error, simultaneously optimize the relative pose between the handle and the acquisition camera corresponding to the initial handle image of each frame, as well as the 3D coordinates of each light emitter on the handle, and obtain the first optimization 3D spatial structure.
  • Kn represents the projection parameter of the nth camera, They represent the rotation matrix and translation vector between the handle and the camera No. 0 respectively. Respectively represent the rotation matrix and translation vector between the nth camera and the 0th camera, represents the 3D coordinates of the light source with the first identifier m on the handle, and p m,n represents the 2D coordinates of the light spot formed by the second light source with the second identifier m on the initial handle image captured by the nth camera.
  • camera No. 0 may be the camera that collects the largest number of light spots, also called the main camera.
  • the main camera For example, taking Figure 6B as an example, the number of light spots collected by the right infrared camera is greater than the number of light spots collected by the left infrared camera. At this time, the right infrared camera is camera No. 0 (main camera).
  • the similarity transformation (SIM3) method of 3 pairs of points is used to align the optimized front and rear handle coordinate systems to achieve the 3D spatial structure of each light emitter.
  • SIM3 similarity transformation
  • S505 Determine the first 3D point cloud before and after optimization based on the first 3D point cloud composed of each light emitter on the handle corresponding to the 3D space structure after optimization, and the second 3D point cloud composed of each light emitter on the handle corresponding to the 3D space structure before optimization.
  • the 3D points of each light emitter constitute the first 3D point cloud.
  • the 3D points of each light emitter constitute Second 3D point cloud.
  • the 3D point coordinates of each light emitter before and after optimization are known.
  • the first 3D point is obtained by minimizing the drift error between the 3D coordinates of each light emitter before and after optimization.
  • the conversion pose between the cloud and the second 3D point cloud, the calculation formula of the conversion pose is as follows:
  • It represents the 3D coordinates of the emitter marked as m in the handle coordinate system after the first optimization.
  • s represents the scale transformation coefficient of the first 3D point cloud and the second 3D point cloud
  • (R, t) represents the conversion pose between the first 3D point cloud and the second 3D point cloud
  • R represents the rotation matrix between the handle coordinate systems before and after optimization
  • t represents the translation vector between the handle coordinate systems before and after optimization.
  • the final 3D coordinates of each light emitter on the handle are calculated, recorded as Calculated as follows:
  • the second optimized 3D spatial structure can be obtained.
  • the 3D spatial structure of each light emitter on the handle more accurate 3D coordinates of each light emitter can be obtained.
  • the relative pose between the handle and the virtual display device can be estimated in real time, which can improve Accuracy of pose estimation.
  • handles of the same batch are produced based on the same design drawings. Therefore, only one optimization is required for the handles of the same batch.
  • the above method of optimizing the 3D spatial structure of each light emitter on the handle can be executed by a virtual display device or other devices, such as a laptop computer, a desktop computer, etc.
  • the multi-camera on the virtual display device can be used to image the handle, and combined with the observation data collected by the IMU in the handle, joint optimization of vision and inertial navigation can be achieved.
  • FIG 8 it is an architectural diagram of joint optimization of visual inertial navigation to estimate the handle pose according to some embodiments.
  • Respectively represent the relative pose between the IMU coordinate system and the world coordinate system on the handle corresponding to the jth (j 1,2,...n) frame, the relative pose between the handle coordinate system and the world coordinate system, and the camera (i.e. virtual display Equipment) coordinate system and world
  • the relative pose between boundary coordinate systems Indicates the relative pose between the handle coordinate system and the IMU coordinate system.
  • FIG. 9 is a flowchart of a method for jointly estimating handle pose with visual inertial navigation according to some embodiments.
  • the process mainly includes the following steps:
  • S901 Determine whether the relative posture between the handle and the virtual display device has been initialized. If not, execute S902; if so, execute S903.
  • the relative pose between the handle and the virtual display device can be predicted.
  • the prediction process requires the initial value of the relative pose between the handle and the virtual display device to be given. Therefore, During the pose estimation process, first determine whether the relative pose between the handle and the virtual display device has been initialized. If not, initialize the relative pose between the handle and the virtual display device. If it has been initialized, initialize the relative pose between the handle and the virtual display device. Predict and optimize the relative poses between virtual display devices.
  • S902 For the first frame of the target handle image collected by the camera, obtain the target spot set of each light emitter based on the target handle image, and based on the target spot set, the observation data synchronously collected by the IMU, and the optimized 3D spatial structure of each light emitter on the handle, Initialize the relative pose between the handle and the virtual display device.
  • embodiments of the present disclosure provide a method that can accurately detect the 2D light spots of each light emitter in the image in both bright and dark environments.
  • a flow chart of a light spot detection method provided by an embodiment of the present disclosure mainly includes the following steps:
  • S9021 Obtain the current environment brightness, determine the binarization thresholds of at least two binarization methods according to the current environment brightness, and perform binarization processing on the target handle image according to each binarization threshold to obtain the binarized handle image. .
  • illumination features can be extracted from images collected by a camera, and through the illumination features, the current environment brightness can be obtained.
  • the image collected by the camera can be grayscaled to obtain a grayscale image, including but not limited to floating point method, integer method, shift method, average method, etc., further, according to the grayscale method, The histogram of the image to determine the current ambient brightness.
  • the peak of the histogram when the peak of the histogram is located on the dark side with a gray value less than 100, it indicates that there is no bright light in the current environment. At this time, the brightness of the current environment is determined to be dim; when the peak of the histogram is located on the bright side with a gray value greater than or equal to 100 When it is on the side, it indicates that there is bright light in the current environment. At this time, the brightness of the current environment is determined to be bright.
  • the target handle image can be binarized using a target binarization threshold that matches the current environment brightness to improve the accuracy and robustness of light emitter detection in different environments.
  • the methods suitable for binarizing target handle images containing multiple light emitters mainly include the following two methods:
  • Maximum inter-class variance method also known as Otsu method, is a binary threshold solution method proposed in 1979. This method is based on the core idea of maximizing the inter-class variance between foreground images and background images, and is suitable for solving histogram distributions. Approaching the binarization threshold of the double peak;
  • Triangulation method It is a binary threshold solution algorithm, which is more suitable for solving the binary threshold where the histogram distribution approaches a single peak. This method constructs a straight line from the highest peak of the histogram to the far side histogram. Then find the vertical distance from each histogram to the straight line, and take the histogram position corresponding to the maximum vertical distance as the binarization threshold.
  • the embodiment of the present disclosure is based on these two main binary adaptive threshold solving algorithms, combining the Otsu method and the trigonometric method to obtain an algorithm that can adapt to both bright and dim environments at the same time.
  • FIG. 11 is a flow chart of a method for image binarization processing in an embodiment of the present disclosure, which mainly includes the following steps:
  • S9021_1 Eliminate pixels whose grayscale value is lower than the preset grayscale threshold in the grayscale handle image after grayscale processing of the target handle image, and determine respectively based on the new histogram of the grayscale handle image after pixel removal. Binarization thresholds for each of at least two binarization methods.
  • the brightness of each light emitter on the handle is basically stable in different environments.
  • dim backgrounds with too low brightness should be excluded. Therefore, the pixels whose grayscale value is lower than the preset grayscale threshold in the grayscale handle image after grayscale processing of the target handle image are removed, and a new histogram of the current image is calculated based on the remaining pixels in the grayscale handle image. And based on the new histogram, the respective binarization thresholds of at least two binarization methods are determined.
  • a minimum guarantee threshold can be set in advance for each binarization method.
  • the binarization threshold calculated based on the new histogram is lower than the preset minimum guarantee threshold, the calculated binarization threshold is forced to be set to the preset minimum guarantee threshold, thereby enhancing the stability of the algorithm under special circumstances.
  • the preset minimum guarantee threshold when the binarization threshold calculated by the Otsu method is lower than the preset minimum guarantee threshold, the preset minimum guarantee threshold is set to the binarization threshold corresponding to the Otsu method; when the binarization threshold calculated by the trigonometric method When the threshold is lower than the preset minimum guarantee threshold, the preset minimum guarantee threshold is set to the binarized threshold corresponding to the trigonometric method.
  • the binarization thresholds of other binarization methods can also be determined.
  • S9021_2 Compare the current ambient brightness with the preset brightness threshold, and determine the corresponding weights of at least two binarized thresholds based on the comparison results.
  • the degree of adaptation of the current environment brightness to the binarization threshold solved by each binarization method can be determined, and the degree of adaptation can be reflected by the weight.
  • the first binarization method is used to solve the histogram distribution containing a single peak
  • the second binarization method is used to solve the histogram distribution containing a single peak.
  • the first binarization method is the trigonometric method
  • the second binarization method is the Otsu method.
  • the first binarization threshold calculated using the first binarization method is more suitable for the current environment brightness, that is, the first The first binarization threshold calculated by the binarization method is more accurate. Therefore, the first weight corresponding to the first binarization threshold calculated by the first binarization method is set to be greater than the second binary value calculated by the second binarization method. The second weight corresponding to the value threshold; if not, it indicates that the handle is in a dark environment.
  • the second binarization threshold calculated using the second binarization method is more suitable for the current environment brightness, that is, the second binarization
  • the second binarization threshold calculated by the method is more accurate. Therefore, the first weight corresponding to the first binarization threshold calculated by the first binarization method is set to be smaller than the second binarization threshold calculated by the second binarization method. The corresponding second weight.
  • the target binarization threshold is obtained by weighting.
  • the first binarization threshold is denoted as S1
  • the corresponding first weight is ⁇
  • the second binarization threshold is denoted as S2
  • the corresponding second weight is ⁇ .
  • S9021_4 According to the target binarization threshold, perform binarization processing on the grayscale handle image to obtain a binarized handle image.
  • the grayscale handle image is binarized according to the target binarization threshold to obtain the binarized handle image. Since the target binarization threshold is obtained by weighting the binarization thresholds of different binarization methods according to the current environment brightness, the setting of the target binarization threshold is more reasonable and can adapt to the current environment brightness, thereby reducing the interference of ambient light. Improve the accuracy of illuminator detection.
  • S9022 Perform contour detection in the global scope of the binary handle image to obtain a candidate contour set for each light emitter.
  • the relative posture between the handle and the virtual display device is unknown, and the position of the light spot in the target handle image collected by the camera on the virtual display device projected by each light emitter on the handle in the 3D space is also unknown. Therefore, it is necessary to detect each light emitter in the global scope of the binary handle image, and use each detected light spot as the 2D point of each light emitter in the image in the 3D space.
  • a contour extraction algorithm in image processing may be used for illuminator detection.
  • the contour is composed of the outermost pixels in the disconnected binary area after binarizing the image.
  • Each disconnected binary area has and has only one outermost outline.
  • the contour area can be obtained by summing the areas of all pixels in the area surrounded by points.
  • each contour represents a light spot.
  • the embodiments of the present disclosure do not impose any restrictive requirements on the detection method of the light emitter.
  • the method may also be adopted.
  • Use deep learning models (such as CNN, YOLO, etc.) for illuminator detection.
  • S9023 Eliminate abnormal contours in the candidate contour set based on the prior contour shape information and contour comparison information respectively, and obtain the target light spot set of each light emitter.
  • the candidate contours for contour detection may include the outline of the light emitter or other light emitters that interfere with the light emitter.
  • At least one of the following culling operations is performed based on a priori contour shape information:
  • Elimination operation 1 Eliminate candidate outlines whose aspect ratio exceeds a first preset proportion threshold based on the ratio between the area of the candidate outline and the length-width ratio of the circumscribed rectangle of the candidate outline, the first preset proportion threshold and the area of the candidate outline.
  • the embodiment of the present disclosure uses a stepped proportion threshold to eliminate abnormal contours, that is, the first preset proportion threshold and the area of the candidate contour are in a stepped state, and the candidate contour is The larger the area, the smaller the first preset proportion threshold.
  • the aspect ratio of the circumscribed rectangle of the candidate contour exceeds the first preset ratio threshold, it is considered a false detection and the candidate contour is eliminated.
  • Elimination operation 2 Eliminate candidate outlines whose area ratio to the circumscribed rectangle of the candidate outline is less than a preset percentage threshold.
  • Elimination operation three Calculate the distance between the gray centroid point of the candidate contour and the center point of the circumscribing rectangle of the candidate contour on the horizontal axis and the vertical axis respectively, and calculate the proportion of each distance to the side length of the candidate contour. If the two If at least one of the proportions exceeds the second preset proportion threshold, the candidate contours are eliminated.
  • Elimination operation 4 Determine the roundness of the candidate outline based on the total number of pixels contained in the candidate outline and the side length of the candidate outline. If the roundness is lower than the preset roundness threshold, the candidate outline is eliminated.
  • Elimination operation 5 Calculate the average brightness of the candidate contours. If the average brightness is less than the preset brightness threshold, the candidate contours are eliminated.
  • Elimination operation 6 Determine the brightness mean value of the preset peripheral area of the circumscribed rectangle of the candidate area, and the brightness mean value of the candidate contour. If the brightness difference between the two brightness mean values is less than the preset brightness difference value, the candidate contour is eliminated.
  • abnormal contours in the candidate contour set are eliminated based on the a priori contour shape information, the elimination is for a single candidate contour, and the relationship between candidate contours is not considered. Therefore, abnormal contours in the candidate contour set can be further eliminated based on the contour contrast information.
  • the method of eliminating abnormal contours in the candidate contour set based on contour contrast information includes one or more of the following:
  • Elimination operation 7 For each two candidate contours in the candidate contour set, determine the Euclidean distance between the center points of the circumscribing rectangles of the two candidate contours, and the minimum Manhattan distance between the edges of the two candidate contours, and calculate them based on the Euclidean distance and the minimum Manhattan distance to remove abnormal contours.
  • Figure 12 the specific process of eliminating abnormal contours based on the Euclidean distance and the minimum Manhattan distance between each two candidate contours is shown in Figure 12, which mainly includes the following steps:
  • S9023_11 Determine whether at least one of the Euclidean distance and the minimum Manhattan distance between the two candidate contours is less than the preset distance threshold. If so, execute S9023_12; otherwise, execute S9023_16.
  • the degree of approximation of the two candidate contours can be determined.
  • at least one of the Euclidean distance and the minimum Manhattan distance between two candidate contours is less than the preset distance threshold, it indicates that the two candidate contours have a high degree of approximation, and further abnormality judgment needs to be performed.
  • S9023_12 should be executed; when the distance between the two candidate contours is The Euclidean distance and the minimum Manhattan distance are both greater than the preset distance threshold, indicating that the two candidate contours have a low degree of approximation, and S9023_16 should be executed.
  • S9023_12 Calculate the areas of two candidate contours respectively.
  • S9023_13 Determine whether the areas of the two candidate contours are both smaller than the preset area threshold. If so, execute S9023_14; otherwise, execute S9023_15.
  • S9023_14 Eliminate two candidate contours at the same time.
  • both candidate contours may be noise points, and the two candidate contours should be eliminated at the same time.
  • S9023_15 Calculate the brightness mean of two candidate contours respectively, and eliminate the candidate contour corresponding to the small brightness mean.
  • the abnormality can be eliminated by the brightness mean.
  • the brightness means of the two candidate contours are calculated respectively, and the two brightness means are compared, and the candidate contour corresponding to the small brightness mean is eliminated from the candidate contour set.
  • the two candidate contours When the Euclidean distance and the minimum Manhattan distance between two candidate contours are both greater than the preset distance threshold, it indicates that the two candidate contours have a low degree of approximation, and the two candidate contours can be retained in the candidate contour set at the same time.
  • Elimination operation 8 Sort all candidate contours in the candidate contour set according to their area, and eliminate abnormal contours based on the quantitative relationship between the pixels in the candidate contour with the largest area and the candidate contour with the second largest area.
  • the candidate contour with the largest area and the candidate contour with the second largest area in the candidate contour set can be selected.
  • the specific process of eliminating abnormal contours based on the quantitative relationship between the pixels in the two selected candidate contours is shown in Figure 13. , mainly including the following steps:
  • S9023_21 Determine whether the number of pixels in the candidate contour with the largest area and the candidate contour with the second largest area exceeds the preset Pixel number threshold, if yes, execute S9023_22, otherwise, execute S9023_25.
  • the number of pixels in the two candidate contours can reflect the degree of approximation of the two candidate contours. Therefore, the two candidate contours can be determined based on the comparison of the number of pixels in the candidate contour with the largest area and the candidate contour with the second largest area with the preset pixel number threshold. Whether the candidate contours have similar shapes.
  • S9023_22 Calculate the multiple between the number of pixels in the candidate contour with the largest area and the candidate contour with the second largest area.
  • S9023_23 Determine whether the multiple is greater than the preset multiple threshold, if so, execute S9023_24, otherwise, execute S9023_25.
  • Abnormality judgment is further performed based on the multiple between the number of pixels in the candidate contour with the largest area and the candidate contour with the second largest area.
  • S9023_24 Eliminate candidate contours with the largest area.
  • the candidate contour with the largest area may be an interference similar to the shape of the light emitter on the handle, and should be selected from the candidate contour Centralized elimination.
  • S9023_25 Keep the candidate contour with the largest area and the candidate contour with the second largest area.
  • the candidate contour with the largest area and the candidate contour with the second largest area are retained.
  • Elimination operation 9 For each candidate contour in the candidate contour set, calculate the distance between the candidate contour and the nearest neighbor candidate contour, and eliminate outliers and abnormal contours based on the distance.
  • Figure 14 the process of eliminating outliers and abnormal contours based on the distance between the candidate contour and the nearest neighbor candidate contour is shown in Figure 14, which mainly includes the following steps:
  • S9023_31 Determine the adaptive outlier distance based on the side length of the candidate contour and the median side length of all candidate contours.
  • Sort all candidate contours in the candidate contour set according to the side length of the candidate contours obtain the median side length, and use the distance between the median side length and the current candidate contour as the adaptive outlier distance.
  • S9023_32 Determine whether the distance between the candidate contour and the nearest neighbor candidate contour is greater than the adaptive outlier distance. If so, execute S9023_33; otherwise, execute S9023_36.
  • S9023_33 Determine whether the number of all candidate contours is greater than the preset quantity threshold. If so, execute S9023_34; otherwise, execute S9023_35.
  • the candidate contour When the distance between the candidate contour and the nearest neighbor candidate contour is greater than the adaptive outlier distance, and the number of all candidate contours is greater than the preset quantity threshold, it indicates that the candidate contour is an abnormal outlier contour and should be eliminated.
  • Elimination operation 10 Calculate the mean brightness of each candidate contour in the candidate contour set, and remove abnormal contours based on the mean brightness.
  • the average brightness value of each candidate contour in the candidate contour set is sorted from large to small, the first N (N is an integer greater than or equal to 1) candidate contours are retained, and the remaining candidate contours are eliminated.
  • Abnormal contours can be eliminated based on the prior contour shape information first, and then abnormal contours can be eliminated based on the contour comparison information; or the abnormal contours can be eliminated first.
  • Abnormal contours are eliminated based on the contour contrast information, and then abnormal contours are eliminated based on the prior contour shape information; the two types of abnormal contour elimination methods, contour contrast information and prior contour shape information, can also be interspersed.
  • different binarization methods are binarized according to the current environmental brightness.
  • the threshold is weighted to obtain the target binary threshold for binarizing the target handle image, which ensures the accuracy of the detection of the light emitter on the handle under different brightness and greatly reduces the development difficulty and cost; at the same time, in order to improve the
  • image processing technology is used to eliminate abnormalities in the detected contours, which improves the running speed and reduces the occupation of memory resources, which is convenient for deployment on portable wearable devices.
  • the embodiments of the present disclosure do not require a high-configuration processor for network training, nor do they need to label a large amount of data, which reduces the development hardware resource requirements and the cost and cost of development. Workload; Compared with the light emitter detection method of general image processing, the embodiment of the present disclosure can adaptively adjust the binarization threshold according to the current environment brightness, and by weighting the binarization thresholds of at least two binarization methods , which improves the robustness of the algorithm in complex scenarios and expands its scope of application. On the other hand, the embodiments of the present disclosure eliminate the light spots of the light emitter that interfere with the positioning of the handle based on the contour characteristics of the light emitter, further improving the performance of the algorithm and the accuracy of detection.
  • each target light spot in the target light spot set is the projection of, that is, the correspondence between the 2D light spot and the 3D light emitter is unknown. Therefore, it is necessary to match each target light spot in the target light spot set with each light emitter after 3D spatial structure optimization, and establish a one-to-one correspondence between the 2D light spots and the 3D light emitter.
  • the PNP algorithm is used to align the coordinate system between the handle and the virtual display device, and the observation data collected by the IMU on the handle after alignment (including but not limited to the acceleration and angular velocity of the handle) Perform pre-integration to obtain the relative 6DOF pose between the handle and the virtual display device, and complete the initialization process of the relative pose between the handle and the virtual display device.
  • the acquisition frequency of the IMU and the camera may be different.
  • the pose estimation process needs to ensure that the observation data collected by the IMU is synchronized with the target handle image collected by the camera.
  • the synchronization relationship between the observation data and the target handle image can be determined based on the timestamp. .
  • the one-to-one correspondence between the 2D light spot and the 3D light emitter can be characterized by the first identifier of the 3D light emitter and the image spot index of the 2D light spot. Therefore, the 2D light spot matches the 3D light emitter.
  • the process can be regarded as a process of determining the second identifier of the light spot corresponding to a certain image spot index in the target handle image.
  • the brute force matching method is: select any 3 target spots from the target spot set, guess the IDs of these 3 target spots based on the 3D spatial structure of each light emitter, and then use the P3P algorithm to calculate the relative pose.
  • Each P3P algorithm has 4 solutions, and then re-project all emitters into the image according to the relative poses solved, calculate the number and error of matching point pairs, and then sort all the combined results, giving priority to the result with the largest number of matches. If they match The quantity is the same, choose the result with the smallest error.
  • embodiments of the present disclosure provide an efficient matching method, which splices adjacent light spots into a planar pattern for matching. It has been experimentally measured that, taking a planar triangle as an example, the number of combinations of adjacent light spots is usually less than 500 and less than 500. The number of combinations of brute force matching can effectively improve the efficiency and accuracy of matching.
  • FIG. 15 is a flow chart of a method for matching 2D light spots and 3D light emitters in an embodiment of the present disclosure, which mainly includes the following steps:
  • S9024 For any target light spot in the target light spot set, select a first specified number of candidate light spots adjacent to the target light spot from the target light spot set, and connect the target light spot with the first specified number of candidate light spots to obtain a plane figure.
  • the determination process of the candidate light spot includes: according to the 2D coordinates of the target light spot and the 2D coordinates of other light spots in the target light spot set, the distance between the target light spot and other light spots is obtained, and the target light spot is The distances between the light spot and other spots are sorted in order from small to large, and other light spots corresponding to the first specified number of distances are determined as candidate light spots, where the difference between the target light spot and any other light spot can be obtained by formula 6. Distance between light spots:
  • d is the distance between the target light spot and any other light spot
  • x 1 is the abscissa coordinate of the target light spot in the image
  • y 1 is the ordinate coordinate of the target light spot in the image
  • x 2 is the horizontal coordinate of other light spots in the image. Coordinates, y 2 is the ordinate of other light spots in the image.
  • the first designated number is 2, but the first designated number in the embodiments of the present disclosure is not limited and can be set according to actual conditions.
  • the first specified quantity corresponds to the plane figure. If the plane figure is a triangle, the first specified quantity is 2, and if the plane figure is a tetrahedron, the first specified quantity is 3.
  • each light spot in the plane graphic is matched with each light emitter in the set of actual adjacent light emitters to obtain a matching pair of adjacent light spots.
  • each light spot in the planar figure can be quickly matched with a set of actual adjacent light emitters on a planar figure basis to obtain a matching pair of adjacent light spots.
  • each adjacent light spot matching pair includes an image spot index of the light spot and a first identification of the light emitter matching the light spot.
  • S9025_1 Arrange each light spot in the plane graphic in ascending order according to the image spot index to obtain a light spot list.
  • S9025_2 Traverse each light emitter in the actual adjacent light emitter set in the specified order. For the currently traversed light emitter, use the light emitter as the initial position, and place other light emitters actually adjacent to the light emitter in the specified order. Sort to get a sorted list.
  • the designated order in this embodiment includes a clockwise order and a counterclockwise order, but the designated order in this embodiment is not limited.
  • the designated order in this embodiment can be set according to the actual situation.
  • a group of light emitters includes light emitter 1, light emitter 2, and light emitter 3.
  • the order of traversing each light emitter in the actual adjacent light emitter set is light emitter 3, light emitter 2 and light emitter 1.
  • the corresponding sorting list is: light emitter 3, light emitter 2, light emitter 1; when traversing to light emitter 2, the corresponding sorting list is light emitter 2, light emitter 1, light emitter 3; when traversing to light emitter 1 , the corresponding sorted list is emitter 1, emitter 3, emitter 2.
  • S9025_3 For any light emitter in the sorted list, add the first identifier of the light emitter and the image spot index of the light spot whose position in the spot list is the same as the position of the light emitter in the sorted list to the same adjacent light spot matching pair.
  • the order in the light spot list is: light spot A, light spot B, light spot C.
  • the sorted list as: light emitter 3, light emitter 2, light emitter 1 as an example, the obtained matching degrees of adjacent light spots are respectively : Light spot A - light emitter 3, light spot B - light emitter 2, light spot C - light emitter 1.
  • S9025_4 Determine whether there is an untraversed light emitter in the actual adjacent light emitter set. If so, return to S9025_2. If not, end.
  • each light emitter has a corresponding image spot index, and a matching result based on each adjacent light spot can be obtained.
  • a matching pair of adjacent light spots is obtained.
  • the matching pairs of adjacent light spots can be filtered, as shown in Figure 18, which is a schematic flow chart of screening matching pairs of adjacent light spots, including the following steps:
  • the predicted gravity direction vector of the corresponding handle of the adjacent light spot matching pair can be solved through the preset IMU integration algorithm.
  • S9025_6 Obtain the actual gravity direction vector of the handle based on the current position of the virtual display device when shooting the specified image.
  • the actual gravity direction vector of the handle can be obtained based on the 6Dof pose of the virtual display device when shooting the specified image.
  • S9025_7 Determine the adjacent light spot matching pairs that need to be deleted through the predicted gravity direction vector and the actual direction vector corresponding to each adjacent light spot matching pair, and delete the adjacent light spot matching pairs that need to be deleted.
  • the angle between the gravity direction vectors is obtained based on the predicted gravity direction vector corresponding to the adjacent light spot matching pair and the actual gravity direction vector; if the angle between the gravity direction vectors is greater than the specified angle angle, then the adjacent light spot matching pair is determined to be the adjacent light spot matching pair that needs to be deleted.
  • the angle between the gravity direction vectors can be obtained through Formula 7:
  • is the angle between the gravity direction vectors
  • To predict the gravity direction vector is the actual gravity direction vector.
  • the specified angle is 10°, and if the angle between the gravity direction vectors corresponding to the first adjacent light spot matching pair is 4°, it is determined that the first adjacent light spot matching pair does not need to be deleted, and if the second adjacent light spot matching pair If the angle between the corresponding gravity direction vectors is 12°, it is determined that the second adjacent light spot matching pair needs to be deleted.
  • the specified included angle in this embodiment can be set according to the actual situation, and this embodiment does not limit the specific value of the specified included angle.
  • S9026 For any set of adjacent light spot matching pairs, determine multiple predicted poses of the handles corresponding to the adjacent light spot matching pairs based on the 2D coordinates of each light spot in the adjacent light spot matching pair and the 3D coordinates of each light emitter.
  • each group of adjacent light spot matching pairs contains the matching results of three light spots.
  • the 2D coordinates of each light spot and the 3D coordinates of each light emitter in this group of adjacent light spot matching pairs are input into the p3p algorithm.
  • multiple predicted poses of the handle corresponding to this set of adjacent light spot matching pairs can be obtained, including rotation matrices and translation vectors.
  • the p3p algorithm can output four results, so a set of adjacent light spot matching pairs corresponds to four predicted poses.
  • S9027 For any predicted pose, project each light emitter into the specified image according to the predicted pose, obtain each projected light spot, and based on each projected light spot, calculate other light spots in the specified image except for each light spot included in the plane graphic. Match each light emitter on the handle to obtain matching pairs of other light spots.
  • the multi-camera on the virtual display device can simultaneously collect multiple handle images, where the specified The image is at least one image among the target handle images acquired at the current moment.
  • the designated image can be one or multiple. The number of designated images and which image to use can be set according to the actual situation.
  • each light emitter in the 3D space can be projected Go to the 2D specified image to obtain each projection spot. Since the light emitters matching each light spot included in the planar figure have been determined, it is only necessary to determine the light emitters matching other light spots in the specified image except for each light spot included in the planar figure.
  • the process diagram for determining other light spot matching pairs includes the following steps:
  • S9027_1 For any other light spot in the specified image, obtain the distances between the other light spot and each projection light spot according to the 2D coordinates of the other light spot and the 2D coordinates of each projection light spot.
  • the distance between other light spots and the projection light spot can be determined by the distance formula in Formula 6, which will not be described again in this embodiment.
  • S9027_2 Determine whether the shortest distance among the distances is smaller than the specified distance. If so, execute S9027_3. If not, end.
  • S9027_3 Add the image spot index of other spots and the first identifier of the light emitter corresponding to the projection spot corresponding to the shortest distance to the same spot matching pair, and determine the spot matching pair as another spot matching pair.
  • each other light spot matching pair includes an image spot index of the other light spot and a first identification of the light emitter corresponding to the projection light spot matching the other light spot.
  • the specified image includes other light spots C and other light spots D. If the distance between the other light spot C and the first projection light spot is m, and the distance between the other light spot C and the second projection light spot is n, the first projection light spot is light emitter 1.
  • the second projection light spot is the projection light spot of the light emitter 2 . If m>n, determine that n is the shortest distance. If n is less than the specified distance, determine that another light spot matching pair is (C, 2).
  • D does not have a corresponding light emitter.
  • the specified distance in this embodiment can be set according to the actual situation, and this embodiment does not limit the specified distance here.
  • S9028 Filter each light spot matching pair according to the number of other light spot matching pairs, obtain each target light spot matching pair according to the number of each filtered light spot matching pair, and center the target light spot matching pair on the first identifier of the light emitter Determine the second identifier of the target spot corresponding to the image spot index.
  • the light spot matching pairs include adjacent light spot matching pairs and other light spot matching pairs, and each matching pair represents the corresponding relationship between the 3D light emitter and the 2D light spot.
  • the predicted pose of any handle if the number of other light spot matching pairs corresponding to the predicted pose is less than the second specified number, the predicted pose and the corresponding predicted pose are deleted. of other light spot matching pairs.
  • any adjacent light spot matching pair if multiple predicted poses corresponding to the adjacent light spot matching have been deleted, the adjacent light spot matching pair is deleted.
  • each adjacent spot match has corresponding 4 predicted poses. If the 4 predicted poses corresponding to any adjacent spot match pair have been deleted, the adjacent spot match pair will be deleted.
  • the second specified quantity in this embodiment can be set according to the actual situation, and this embodiment does not limit the specific value of the second specified quantity.
  • the number of each spot matching pair after elimination is counted, and for each spot matching pair with the same image spot index, the spot matching pair with the largest number among the spot matching pairs is determined as the target spot matching pair corresponding to the image spot index, and the first identifier of the light emitter in the target spot matching pair is determined as the second identifier of the target spot corresponding to the image spot index.
  • the matched pairs of light spots after elimination are: (A, 1), (A, 2), (A, 2), (A, 2), (A, 1), (B, 3), (B , 1), (B, 3), (B, 3), (B, 1), from the eliminated light spot matching pairs, it can be obtained that the number of light spot matching pairs (A, 1) is 2, and the light spot matching pairs (A, 1) are The number of A, 2) is 3, the number of spot matching pairs (B, 1) is 2, and the number of spot matching pairs (B, 3) is 3, then it is determined that the target spot matching pair with the image spot index A is (A , 2), at this time, the second identifier of the target spot with the image spot index A is 2, and the matching pair of the target spot with the image spot index B is determined to be (B, 3). At this time, the image spot index is the spot with B The second identifier is 3.
  • adjacent light spots are connected into a planar figure, and then each light spot is quickly matched with a set of actual adjacent light emitters in units of the planar figure. Predicting the pose and obtaining each light spot matching pair effectively reduces the number of combinations in the matching process, and by filtering each light spot matching pair, the matching accuracy is improved, thereby improving positioning efficiency and accuracy.
  • S9029 Initialize the relative position between the handle and the virtual display device according to the 3D coordinates of the light emitter and the 2D coordinates of the target light spot matched with each target light spot and the observation data collected by the IMU.
  • the corresponding relationship between the 3D light emitter and the 2D light spot is obtained, so that the 3D coordinates of the light emitter and the 2D coordinates of the target light spot can be used to match the target light spot.
  • the PNP algorithm is used to align the handle and the virtual light spot. Display the coordinate system between the devices, obtain the 6Dof pose between the handle and the virtual display device based on visual calculation, and pre-integrate the observation data collected by the IMU on the handle after alignment to optimize the relationship between the handle and the virtual display device using the inertial navigation positioning results
  • the relative 6DOF pose completes the initialization process of the relative pose between the controller and the virtual display device.
  • S903 For the non-first frame target handle image collected by the camera, predict the current relative pose between the handle and the virtual display device based on the relative pose between the handle and the virtual display device corresponding to the historical target handle image, combined with the observations continuously collected by the IMU Data to determine the relative pose of the target between the current controller and the virtual display device.
  • the relative pose between the controller and the virtual display device In the process of estimating the relative pose between the controller and the virtual display device in real time, when the relative pose between the controller and the virtual display device has been initialized, based on the non-first-frame target controller image collected by the camera, based on the initialization result, the current controller and the virtual display device are predicted. Displays the relative pose between devices.
  • the relative posture between the handle corresponding to the target handle image of the second frame and the virtual display device is predicted based on the relative posture between the handle corresponding to the target handle image of the first frame and the virtual display device, and then the relative posture between the handle corresponding to the target handle image of the first frame and the target handle image of the second frame and the virtual display device is predicted, and so on.
  • the relative pose between the handle and the virtual display device corresponding to the historical target handle image is predicted, thereby ensuring the smoothness of the relative pose between consecutive multiple frames of target handle images, so that .
  • the smoothness of the virtual display picture is ensured and the user's immersive experience is improved.
  • the observation data continuously collected by the IMU can be used to optimize the predicted current relative pose, so as to obtain the accurate target relative pose between the current handle and the virtual display device in real time.
  • S9031 Based on the 3D coordinates of each light emitter on the handle in the optimized 3D space structure and the predicted current relative pose between the handle and the virtual display device, determine the local range of each light emitter in the current target handle image.
  • the current relative pose between the handle and the virtual display device is obtained through prediction. Based on the current relative pose, the approximate position of the light spot projected by each light emitter on the handle into the current target handle image in the 3D space can be determined. , thereby reducing the image range detected by the light emitter and improving detection efficiency.
  • S9032 Extract the current light spots of each current light emitter within the local range of the current target handle image, and determine the light emitter corresponding to each current light spot based on nearest neighbor matching.
  • the nearest neighbor matching method can be used to take the light spot closest to the projected light spot among the current light spots extracted in the current target handle image as the current light spot matched by the light emitter.
  • S9033 Establish a reprojection constraint equation based on the corresponding 2D coordinates of the current light spot and the 3D coordinates of the 3D light emitter, as well as the posture of the IMU and the camera when the observation data is synchronized with the current target handle image.
  • reprojection constraint equation is as follows:
  • Equation 8 Respectively represent the rotation matrix and translation vector of the IMU in the world coordinate system corresponding to the jth frame of observation data collected by the IMU, Respectively represent the rotation matrix and translation vector of the camera on the virtual display device corresponding to the j-th frame of observation data collected by the IMU in the world coordinate system, Represent respectively the rotation matrix and translation vector of the IMU in the handle coordinate system, represents the 3D coordinates of the first light emitter marked m on the handle, p m represents the 2D coordinates of the current spot with the second mark m on the current target handle image, and pro j( ⁇ ) represents the projection equation of the camera.
  • S9034 Establish a pre-integration constraint equation based on the pose of the IMU and the movement speed of the handle corresponding to two consecutive frames of observation data.
  • It represents the translation vector of the IMU in the world coordinate system corresponding to the j+1th frame of observation data collected by the IMU. They represent the motion speed of the IMU corresponding to the j-th frame and the j+1-th frame observation data in the world coordinate system, which can be obtained by integrating the acceleration in the j-th frame and the j+1-th frame observation data respectively.
  • g W represents the gravitational acceleration.
  • ⁇ t represents the time interval between the j-th frame and the j+1-th frame observation data collected by the IMU.
  • LOG( ⁇ ) represents the logarithmic function on the Lie group (Special Orthometri, SO3) corresponding to the quaternion array. They represent the pre-integrated variables of the IMU's translation vector, motion velocity, and rotation matrix respectively.
  • S9035 Combine the pre-integration constraint equation and the re-projection constraint equation to solve for the pose of the IMU corresponding to the current target handle image, the pose of the camera, and the relative pose of the IMU and the handle.
  • j represents the number of frames of observation data collected by the IMU
  • f j represents the pre-integration constraint equation
  • g j represents the reprojection constraint equation.
  • the pose of the IMU corresponding to the current target handle image in the world coordinate system can be obtained.
  • the position and orientation of the camera (i.e. virtual display device) in the world coordinate system And the relative posture of the IMU and the handle
  • S9036 According to the relative pose of the IMU and the handle, as well as the current pose of the IMU and the pose of the camera, obtain the target relative pose between the current handle and the virtual display device.
  • Equation 8 Indicates the current controller's pose in the world coordinate system, Indicates the relative posture of the IMU and the controller.
  • the target relative position between the current handle and the virtual display device can be obtained, so that the image displayed by the virtual display device can be controlled by operating the handle.
  • the camera is located on the virtual display device, so the pose of the camera can represent the virtual display device. Prepared posture.
  • the virtual display device generally has multiple cameras, and each camera collects data synchronously.
  • the target handle image collected by one camera can be used for pose estimation.
  • multiple light emitters of the IMU on the handle and the multi-camera on the virtual display device are used to jointly optimize the relative posture between the handle and the virtual display device using visual inertial navigation.
  • the illuminators are annotated on multiple frames of initial handle images collected at different positions and angles, thereby optimizing the 3D spatial structure of the illuminators based on the annotation results of each illuminator and improving the accuracy of subsequent relative pose calculations.
  • the relative pose between the handle and the virtual display device is initialized.
  • embodiments of the present disclosure provide a virtual display device that can perform the above method of detecting the light emitter on the handle and can achieve the same technical effect.
  • the virtual display device includes a processor 2101, a memory 2102, a display screen 2103, a communication interface 2104, and a multi-camera 2105.
  • the display screen 2103 is used to display images, and the virtual display device uses the communication interface 2104 to Communicates with the handle, which is used to control the picture displayed on the display screen 2103, and the type of the multi-camera 2105 matches the lighting type of the multiple light emitters on the handle;
  • the communication interface 2104, the multi-camera 2105, the display screen 2103, the memory 2102 and the processor 2101 are connected through a bus 2106.
  • the memory 2102 stores a computer program
  • the processor 2101 Describe a computer program that performs the following operations:
  • the target spot set of each light emitter is obtained according to the target handle image, and based on the target spot set, the observation data synchronously collected by the IMU and the
  • the optimized 3D spatial structure of each light emitter on the handle initializes the relative pose between the handle and the virtual display device; wherein the 3D spatial structure is each of the multiple frames of initial handle images collected according to different position angles.
  • the labeling results of light emitters are optimized;
  • the current relative pose between the handle and the virtual display device is predicted, combined with the continuous collection by the IMU
  • the observation data is used to determine the current target relative pose between the handle and the virtual display device.
  • the processor 2101 optimizes the 3D spatial structure of each light emitter on the handle in the following manner:
  • the 3D coordinates and first identification of each light emitter are obtained;
  • each light emitter obtain the 2D coordinates and second identification of the light spot formed by each light emitter on the corresponding initial handle image for the light emitters pre-marked on the multiple frames of initial handle images collected at different position angles;
  • the processor 2101 after obtaining the first optimized 3D spatial structure, the processor 2101 also executes:
  • the 3D coordinates of each light emitter on the handle are re-determined to obtain the second optimized 3D spatial structure.
  • the reprojection error equation is:
  • Kn represents the projection parameters of the nth camera, They represent the rotation matrix and translation vector between the handle and the 0th camera respectively, Respectively represent the rotation matrix and translation vector between the nth camera and the 0th camera, represents the 3D coordinate of the light source with the first mark m on the handle, and p m,n represents the 2D coordinate of the light spot with the second mark m.
  • the processor 2101 obtains the target spot set of each light emitter based on the target handle image.
  • the specific operations are:
  • Obtain the current environment brightness determine the respective binarization thresholds of at least two binarization methods according to the current environment brightness, and perform binarization processing on the target handle image according to each binarization threshold to obtain the binarization handle image;
  • Contour detection is performed within the global scope of the binary handle image to obtain a set of candidate contours for each light emitter, where each contour represents a light spot;
  • abnormal contours in the candidate contour set are eliminated to obtain the target light spot set of each light emitter.
  • the processor 2101 determines the respective binarization thresholds of at least two binarization methods based on the current ambient brightness, and performs the visible light handle image processing according to each binarization threshold. Perform binarization processing to obtain the binarized handle image.
  • the specific operations are:
  • the target binarization threshold is obtained by weighting
  • the grayscale handle image is binarized to obtain a binarized handle image.
  • the processor 2101 determines respective weights corresponding to the at least two binarization thresholds based on the comparison results.
  • the specific operations are:
  • the first weight corresponding to the first binarization threshold calculated by the first binarization method is set to be greater than the second binarization threshold calculated by the second binarization method.
  • the first binarization method is used to solve the histogram distribution containing a single peak
  • the second binarization method is used to solve the histogram distribution containing a double peak
  • the processor 2101 performs spot detection in the global scope of the binary handle image to obtain the target spot set of each light emitter.
  • the specific operations are:
  • abnormal contours in the candidate contour set are eliminated to obtain the target light spot set of each light emitter.
  • the method by which the processor 2101 eliminates abnormal contours in the candidate contour set according to the contour contrast information includes one or more of the following:
  • the Euclidean distance between the center points of the circumscribing rectangles of the two candidate contours and the minimum Manhattan distance of the edges of the two candidate contours are determined respectively, and based on the Euclidean distance and the The minimum Manhattan distance is used to eliminate abnormal contours;
  • All candidate contours in the candidate contour set are sorted according to their areas, and abnormal contours are eliminated according to the quantitative relationship between the pixel points in the candidate contour with the largest area and the candidate contour with the second largest area;
  • For each candidate contour in the candidate contour set calculate the distance between the candidate contour and the nearest neighbor candidate contour, and eliminate outliers and abnormal contours based on the distance;
  • the processor 2101 removes abnormal contours based on the Euclidean distance and the minimum Manhattan distance.
  • the specific operations are:
  • both candidate contours will be eliminated at the same time;
  • the brightness average of the two candidate contours is calculated respectively, and the candidate contour corresponding to the small brightness average is eliminated.
  • the processor 2101 eliminates abnormal contours based on the quantitative relationship between pixels in the candidate contour with the largest area and the candidate contour with the second largest area.
  • the specific operations are:
  • the candidate contour with the largest area is eliminated.
  • the processor 2101 eliminates outlier abnormal contours based on the distance.
  • the specific operations are:
  • the candidate contours are eliminated.
  • the method by which the processor 2101 eliminates abnormal contours in the candidate contour set based on the a priori contour shape information includes one or more of the following:
  • the processor 2101 initializes the handle and the optimized 3D spatial structure of each light emitter on the handle based on the target light spot set, the observation data synchronously collected by the IMU, and the optimized 3D spatial structure of each light emitter on the handle.
  • the specific operations of the relative posture between the virtual display devices are:
  • the relative posture between the handle and the virtual display device is initialized.
  • the processor 2101 matches each light emitter on the optimized 3D spatial structure with a target light spot in the target light spot set to establish a corresponding relationship between the 3D light emitter and the 2D light spot, and the specific operation is:
  • any target light spot in the target light spot set select a first specified number of candidate light spots adjacent to the target light spot from the target light spot set, and compare the target light spot with the first specified light spot.
  • a number of candidate light spots are connected to obtain a planar figure;
  • each light spot in the plane figure is matched with each light emitter in the set of light emitters actually adjacent to each other, so as to obtain each adjacent light spot matching pair, wherein each adjacent light spot matching pair includes an image light spot index of the light spot and a first identifier of the light emitter matching the light spot;
  • each light emitter For any predicted pose, project each light emitter into a designated image according to the predicted pose to obtain each projected light spot, and based on each projected light spot, calculate the specified image except for the plane graphics containing Other light spots other than each light spot are matched with each light emitter on the handle to obtain each other light spot matching pair, wherein each other light spot matching pair includes the image light spot index of the other light spot and the image spot index of the other light spot.
  • the first identifier of the light emitter corresponding to the matching projection light spot;
  • Each light spot matching pair is screened according to the number of each other light spot matching pair, and each target light spot matching pair is obtained according to the number of each screened light spot matching pair, and the target light spot matching pair is centered on the third light spot matching pair of the light emitter.
  • An identification is determined as the second identification of the target light spot corresponding to the image light spot index, wherein the light spot matching pair includes the adjacent light spot matching pair and the other light spot matching pairs, and each matching pair represents a 3D light emitter and Correspondence between 2D light spots.
  • the processor 2101 selects a first specified number of candidate light spots adjacent to the target light spot from the target light spot set, and the specific operation is:
  • the distance between the target light spot and the other light spots is obtained
  • the processor 2101 combines each light spot in the planar graphic with the actual adjacent light emitter set based on the optimized 3D spatial structure. Each light emitter in the emitter set is matched separately to obtain matching pairs of adjacent light spots.
  • the specific operation is:
  • Each light spot in the plane graphic is arranged in ascending order according to the image spot index to obtain a light spot sequence.
  • each light emitter in the set of actually adjacent light emitters in a specified order. For the currently traversed light emitter, use the light emitter as the initial position, and use other light emitters actually adjacent to the light emitter to emit light.
  • the processor sorts according to the specified order to obtain the sorted list;
  • the processor 2101 before projecting each light emitter into a specified image according to the predicted pose, the processor 2101 further executes:
  • the adjacent light spot matching pairs that need to be deleted are determined based on the predicted gravity direction vector corresponding to each adjacent light spot matching pair and the actual direction vector, and the adjacent light spot matching pairs that need to be deleted are deleted.
  • the processor 2101 determines the adjacent light spot matching pairs that need to be deleted through the predicted gravity direction vector corresponding to each adjacent light spot matching pair and the actual direction vector.
  • the specific operation is as follows :
  • the adjacent light spot matching pair is the adjacent light spot matching pair that needs to be deleted.
  • the processor 2101 controls other light spots in the specified image other than the light spots included in the plane graphics and each luminous spot on the handle according to each of the projected light spots.
  • the detector is matched to obtain matching pairs of other light spots.
  • the shortest distance among the distances is less than the specified distance, add the image spot index of the other light spots and the first identification of the light emitter corresponding to the projection light spot corresponding to the shortest distance to the same light spot matching pair, and The light spot matching pair is determined as the other light spot matching pair.
  • the processor 2101 pairs Each light spot matching pair is screened, and each target light spot matching pair is obtained according to the number of filtered light spot matching pairs.
  • the specific operation is:
  • the light spot matching pair with the largest number among the light spot matching pairs is determined as the target light spot matching pair corresponding to the image light spot index.
  • the processor 2101 determines the current relative pose between the handle and the virtual display device based on the predicted current relative pose between the handle and the virtual display device, and the observation data continuously collected by the IMU.
  • the specific operation of the relative pose of the target between the virtual display devices is;
  • the position of each light emitter in the current target handle image is determined. local scope
  • the relative pose of the IMU and the handle According to the relative pose of the IMU and the handle, the pose of the IMU and the pose of the camera, the current target relative pose between the handle and the virtual display device is obtained.
  • the pre-integration constraint equation is:
  • the reprojection constraint equation is:
  • the result of combining the pre-integration constraint equation and the reprojection constraint equation is:
  • FIG. 21 is only an example, showing the hardware necessary for the virtual display device to implement the method steps of estimating the handle pose provided by the present disclosure.
  • the virtual display device also includes conventional hardware such as speakers, earpieces, lenses, and power interfaces.
  • the processor involved in Figure 21 of the embodiment of the present disclosure may be a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a graphics processor (Graphics Processing Unit, GPU), a digital signal processor (Digital Signal Processor, DSP), Application-specific integrated circuit (Application-specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof.
  • CPU Central Processing Unit
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • DSP Digital Signal Processor
  • ASIC Application-specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • Embodiments of the present disclosure also provide a computer-readable storage medium for storing some instructions. When these instructions are executed, the method for estimating the handle pose in the foregoing embodiment can be completed.
  • the embodiments of the present disclosure also provide a computer program product for storing a computer program for executing the method for estimating the handle posture in the aforementioned embodiments.
  • embodiments of the present disclosure may be provided as methods, apparatuses, or computer program products. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present disclosure may be embodied in one or more computers having computer usable program code embodied therein. It may be in the form of a computer program product implemented on a storage medium (including but not limited to disk storage, CD-ROM, optical storage, etc.).
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Abstract

本公开涉及虚拟现实交互技术领域,提供一种估计手柄位姿的方法及虚拟显示设备,利用手柄上的IMU和多个发光器,实现视觉和惯导对位姿的联合优化。位姿估计前,根据不同位置角度采集的多帧初始手柄图像中手柄上各发光器的标注结果,优化各发光器的3D空间结构以提高相对位姿计算的准确性;位姿估计过程中,基于优化后的3D空间结构、相机采集的目标手柄图像中提取的目标光斑集合以及IMU的观测数据,初始化手柄与虚拟显示设备间的相对位姿,由于目标光斑集合剔除了环境因素的干扰,有助于提高相对位姿计算的准确性,后续在对手柄与虚拟显示设备间相对位姿进行预测和优化时,能够得到平稳、准确的目标相对位姿。

Description

一种估计手柄位姿的方法及虚拟显示设备
相关申请的交叉引用
本公开要求在2022年09月27日提交中华人民共和国知识产权局、申请号为202211183832.2、发明名称为“一种估计手柄位姿的方法及虚拟显示设备”,2022年09月21日提交中华人民共和国知识产权局、申请号为202211149262.5、发明名称为“一种检测手柄上发光器的方法及虚拟显示设备”,2022年11月07日提交中华人民共和国知识产权局、申请号为202211390797.1、发明名称为“检测手柄图像中光斑标识的方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及虚拟现实交互技术领域,提供一种估计手柄位姿的方法及虚拟显示设备。
背景技术
针对虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)等虚拟显示设备,通常使用手柄实现常规交互,就如同个人电脑(Personal Computer,PC)和鼠标间的控制关系。
然而,通过手柄实现与虚拟世界的交互,其前提是得到手柄与虚拟显示设备间的6DOF位姿,从而根据6DOF位姿实现手柄对虚拟显示设备显示画面的控制。因此,手柄相对于虚拟显示设备的位姿,决定了手柄对虚拟显示设备的控制精度,会影响了用户的沉浸式体验,因此,具有重要的研究价值。
发明内容
本公开提供一种估计手柄位姿的方法及虚拟显示设备,用于提高手柄与虚拟显示设备间相对位姿估计的准确性。
一方面,本公开提供一种估计手柄位姿的方法,应用于虚拟显示设备,所述虚拟显示设备与手柄进行交互,所述手柄用于控制虚拟显示设备显示的画面,所述手柄上安装有IMU和多个发光器,所述虚拟显示设备安装有与所述发光器类型相匹配的多目相机,所述方法包括:
针对所述多目相机各自采集的首帧目标手柄图像,根据所述目标手柄图像获得各发光器的目标光斑集合,并根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿;其中,所述3D空间结构是根据不同位置角度采集的多帧初始手柄图像中各发光器的标注 结果优化的;
针对所述多目相机各自采集的非首帧目标手柄图像,根据历史目标手柄图像对应的相对位姿,预测所述手柄与所述虚拟显示设备间的当前相对位姿,结合所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿。
另一方面,本公开提供一种虚拟显示设备,包括处理器、存储器、显示屏、通信接口和多目相机,所述显示屏用于显示画面,所述虚拟显示设备通过所述通信接口与手柄通信,所述手柄用于控制所述显示屏显示的画面,所述多目相机的类型与所述手柄上多个发光器的发光类型相匹配;
所述通信接口、所述多目相机、所述显示屏、所述存储器和所述处理器通过总线连接,所述存储器存储有计算机程序,所述处理器根据所述计算机程序,执行以下操作:
针对所述多目相机采集的首帧目标手柄图像,根据所述目标手柄图像获得各发光器的目标光斑集合,并根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿;其中,所述3D空间结构是根据不同位置角度采集的多帧初始手柄图像中各发光器的标注结果优化的;
针对所述多目相机各自采集的非首帧目标手柄图像,根据历史目标手柄图像对应的相对位姿,预测所述手柄与所述虚拟显示设备间的当前相对位姿,结合所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿。
在本公开的某一些实施例中,所述处理器通过以下方式优化所述手柄上各发光器的3D空间结构:
根据优化前所述各发光器的3D空间结构,获得每个发光器的3D坐标和第一标识;
根据各发光器的第一标识,对不同位置角度采集的多帧初始手柄图像上预先标注的发光器,获得每个发光器在相应的初始手柄图像上形成的光斑的2D坐标和第二标识;
针对各帧所述初始手柄图像,根据所述第一标识和所述第二标识相同的发光器的3D坐标和光斑的2D坐标,以及相应帧对应的所述IMU的观测数据,确定所述手柄与采集相机间的相对位姿;
构建重投影误差方程,根据所述重投影误差方程同时优化各个相对位姿和各发光器的3D坐标,得到第一次优化后的3D空间结构。
在本公开的某一些实施例中,所述处理器优化所述手柄上各发光器的3D空间结构还执行:
得到第一次优化后的3D空间结构之后,根据优化后3D空间结构对应的所述手柄上各发光器组成的第一3D点云,以及优化前3D空间结构对应的所述手柄上各发光器组成的第二3D点云,确定优化前后所述第一3D点云和所述第二3D点云间的转换位姿;
根据所述转换位姿,重新确定所述手柄上各发光器的3D坐标,得到第二次优化后的3D空间结构。
在本公开的某一些实施例中,所述重投影误差方程为:
其中,Kn表示第n号相机的投影参数,分别表示所述手柄与第0号相机间的旋转矩阵和平移向量,分别表示所述第n号相机与第0号相机间的旋转矩阵和平移向量,表示第一标识为m的发光器在所述手柄上的3D坐标,pm,n表示第二标识为m的光斑的2D坐标。
在本公开的某一些实施例中,所述处理器根据所述目标手柄图像获得所述各发光器的目标光斑集合,具体操作为:
获取当前环境亮度,根据所述当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对所述目标手柄图像进行二值化处理,获得二值化手柄图像;
在所述二值化手柄图像的全局范围内进行轮廓检测,得到所述各发光器的候选轮廓集,其中,每个轮廓表征一个光斑;
分别根据先验轮廓形状信息以及轮廓对比信息,剔除所述候选轮廓集中的异常轮廓,得到所述各发光器的目标光斑集合。
在本公开的某一些实施例中,所述处理器根据所述当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对所述可见光手柄图像进行二值化处理,获得二值化手柄图像,具体操作为:
剔除对所述可见光手柄图像灰度化处理后的灰度手柄图像中灰度值低于预设灰度阈值的像素点,并根据像素点剔除后的灰度手柄图像的新直方图,分别确定所述至少两个二值化方法各自的二值化阈值;
将所述当前环境亮度与预设亮度阈值进行比较,根据比较结果,分别确定所述至少两个二值化阈值各自对应的权重;
根据各二值化阈值以及相应的权重,加权得到目标二值化阈值;
根据所述目标二值化阈值,对所述灰度手柄图像进行二值化处理,获得二值化手柄图像。
在本公开的某一些实施例中,所述处理器根据比较结果,分别确定所述至少两个二值化阈值各自对应的权重,具体操作为:
当所述当前环境亮度大于所述预设亮度阈值时,设置第一二值化方法计算的第一二值化阈值对应的第一权重,大于第二二值化方法计算的第二二值化阈值对应的第二权重;
当所述当前环境亮度小于等于所述预设亮度阈值时,设置第一二值化方法计算的第一 二值化阈值对应的第一权重,小于第二二值化方法计算的第二二值化阈值对应的第二权重;
其中,所述第一二值化方法用于求解包含单峰的直方图分布,所述第二二值化方法用于求解包含双峰的直方图分布。
在本公开的某一些实施例中,所述处理器在所述二值化手柄图像的全局范围内进行光斑检测,获得各发光器的目标光斑集合,具体操作为:
对所述二值化手柄图像进行轮廓检测,得到所述各发光器的候选轮廓集,其中,每个轮廓表征一个光斑;
分别根据先验轮廓形状信息以及轮廓对比信息,剔除所述候选轮廓集中的异常轮廓,得到所述各发光器的目标光斑集合。
在本公开的某一些实施例中,所述处理器根据所述轮廓对比信息剔除所述候选轮廓集中异常轮廓的方式包含以下一种或多种:
针对所述候选轮廓集中的每两个候选轮廓,分别确定两个候选轮廓的外接矩形中心点之间的欧式距离,以及两个候选轮廓的边缘的最小曼哈顿距离,并根据所述欧式距离和所述最小曼哈顿距离,剔除异常轮廓;
根据候选轮廓的面积对所述候选轮廓集中的全部候选轮廓进行排顺序,并根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓;
针对所述候选轮廓集中的每个候选轮廓,计算所述候选轮廓与最近邻候选轮廓间的距离,并根据所述距离,剔除离群的异常轮廓;
计算所述候选轮廓集中每个候选轮廓的亮度均值,并根据各亮度均值,剔除异常轮廓。
在本公开的某一些实施例中,所述处理器根据所述欧式距离和所述最小曼哈顿距离,剔除异常轮廓,具体操作为:
当所述欧式距离和所述最小曼哈顿距离中的至少一个小于预设距离阈值时,则分别计算两个候选轮廓的面积;
若两个候选轮廓的面积均小于预设面积阈值,则同时剔除两个候选轮廓;
若两个候选轮廓的面积中至少一个不小于所述预设面积阈值,则分别计算两个候选轮廓的亮度均值,剔除小亮度均值对应的一个候选轮廓。
在本公开的某一些实施例中,所述处理器根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓,具体操作为:
若所述面积最大候选轮廓和面积次大候选轮廓内像素点数量均超过预设像素点数量阈值,则计算所述面积最大候选轮廓与所述面积次大候选轮廓内像素点数量间的倍数;
若所述倍数大于预设倍数阈值,则剔除所述面积最大候选轮廓。
在本公开的某一些实施例中,所述处理器根据所述距离,剔除离群的异常轮廓,具体操作为:
根据所述候选轮廓的边长以及全部候选轮廓的边长中位数,确定自适应离群距离;
若所述全部候选轮廓的数量大于预设数量阈值,且所述距离大于所述自适应离群距离,则剔除所述候选轮廓。
在本公开的某一些实施例中,所述处理器根据所述先验轮廓形状信息剔除所述候选轮廓集中异常轮廓的方式包含以下一种或多种:
根据所述候选轮廓的面积与所述候选轮廓的外接矩形的长宽比例关系,剔除所述长宽比例超出第一预设比例阈值的候选轮廓;
剔除所述候选轮廓与所述候选轮廓的外接矩形的面积占比小于预设占比阈值的候选轮廓;
计算所述候选轮廓的灰度质心点与所述候选轮廓的外接矩形的中心点,分别在横轴与纵轴上的距离,并分别计算每个距离占所述候选轮廓的边长的比例,若两个比例中的至少一个超过第二预设比例阈值,则剔除所述候选轮廓;
根据所述候选轮廓包含的像素点总数以及所述候选轮廓的边长,确定所述候选轮廓的圆度,若所述圆度低于预设圆度阈值,则剔除所述候选轮廓;
计算所述候选轮廓的亮度均值,若所述亮度均值小于预设亮度阈值,则剔除所述候选轮廓;
确定所述候选区域的外接矩形的预设外围区域的亮度均值,以及所述候选轮廓的亮度均值,若两个亮度均值之间的亮度差异小于预设差值,则剔除所述候选轮廓。
在本公开的某一些实施例中,所述处理器根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿,具体操作为:
将所述优化后的3D空间结构上各发光器与所述目标光斑集合中的目标光斑进行匹配,建立3D发光器与2D光斑间的对应关系;
根据存在对应关系的发光器的3D坐标和光斑的2D坐标,以及所述IMU同步采集的观测数据,初始化所述手柄与所述虚拟显示设备间的相对位姿。
在本公开的某一些实施例中,所述处理器将所述优化后的3D空间结构上各发光器与所述目标光斑集合中的目标光斑进行匹配,建立3D发光器与2D光斑间的对应关系,具体操作为:
针对所述目标光斑集合中的任意一个目标光斑,从所述目标光斑集合中筛选出与所述目标光斑相邻的第一指定数量的候选光斑,并将所述目标光斑与所述第一指定数量的候选光斑进行连接,得到平面图形;
根据所述优化后的3D空间结构上实际相邻的发光器集合,将所述平面图形中的各光斑和所述实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对,其 中,每个相邻光斑匹配对包含所述光斑的图像光斑索引和与所述光斑相匹配的发光器的第一标识;
针对任意一组相邻光斑匹配对,根据所述相邻光斑匹配对中各光斑的2D坐标和所述各发光器的3D坐标,确定所述相邻光斑匹配对对应的所述手柄的多个预测位姿;
针对任意一个预测位姿,根据所述预测位姿将所述各发光器投影到指定图像中,获得各投影光斑,并根据所述各投影光斑,对所述指定图像中除所述平面图形包含的各光斑之外的其他光斑与所述手柄上的各发光器进行匹配,得到各其他光斑匹配对,其中,每个其它光斑匹配对包含所述其他光斑的图像光斑索引和与所述其它光斑匹配的投影光斑对应的发光器的第一标识;
根据所述各其他光斑匹配对的数量对各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,并将所述目标光斑匹配对中发光器的第一标识确定为所述图像光斑索引对应的目标光斑的第二标识,其中,所述光斑匹配对包括所述相邻光斑匹配对和所述其他光斑匹配对,每个匹配对表征3D发光器与2D光斑间的对应关系。
在本公开的某一些实施例中,所述处理器从所述目标光斑集合中筛选出与所述目标光斑相邻的第一指定数量的候选光斑,具体操作为:
根据所述目标光斑的2D坐标以及所述目标光斑集合中其他光斑的2D坐标,得到所述目标光斑与所述其他光斑之间的距离;
按照所述目标光斑与所述其他光斑之间的距离从小到大的顺序,选择前第一指定数量的距离对应的其他光斑作为所述候选光斑。
在本公开的某一些实施例中,所述处理器根据所述优化后的3D空间结构上实际相邻的发光器集合,将所述平面图形中的各光斑和所述实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对,具体操作为:
将所述平面图形中的各光斑按照图像光斑索引从小到大的顺序进行排列,得到光斑列表;
按照指定顺序对所述实际相邻的发光器集合中的各发光器进行遍历,针对当前遍历的发光器,以所述发光器作为初始位置,并将与所述发光器实际相邻的其他发光器按照指定顺序进行排序,得到排序列表;
针对所述排序列表中的任意一个发光器,将所述发光器的第一标识与所述光斑列表中位置与所述发光器在所述排序列表中的位置相同的光斑的图像光斑索引添加到同一相邻光斑匹配对中;
判断所述实际相邻的发光器集合中是否存在未进行遍历的发光器;
若是,则返回按照指定顺序对所述实际相邻的发光器集合中的各发光器进行遍历的步骤,直至所述实际相邻的发光器集合中不存在未遍历的发光器。
在本公开的某一些实施例中,根据所述预测位姿将所述各发光器投影到指定图像中之前,所述处理器还执行:
针对任意一组所述相邻光斑匹配对对应的所述手柄的多个预测位姿,分别得到与所述相邻光斑匹配对相对应的手柄的预测重力方向向量;
根据拍摄所述指定图像时所述虚拟显示设备的当前位置,得到所述手柄的实际重力方向向量;
通过与各相邻光斑匹配对相对应的预测重力方向向量和所述实际方向向量,确定需要删除的相邻光斑匹配对,并将所述需要删除的相邻光斑匹配对进行删除。
在本公开的某一些实施例中,所述处理器通过与各相邻光斑匹配对相对应的预测重力方向向量和所述实际方向向量,确定需要删除的相邻光斑匹配对,具体操作为:
针对任意一组相邻光斑匹配对,根据与所述相邻光斑匹配对对应的预测重力方向向量与所述实际方向向量,得到重力方向向量夹角;
若所述重力方向向量夹角大于指定夹角,则确定所述相邻光斑匹配对为所述需要删除的相邻光斑匹配对。
在本公开的某一些实施例中,所述处理器根据所述各投影光斑,对所述指定图像中除所述平面图形包含的各光斑之外的其他光斑与所述手柄上的各发光器进行匹配,得到各其他光斑匹配对,具体操作为:
针对所述指定图像中任意一个其他光斑,根据所述其他光斑的2D坐标和所述各投影光斑的2D坐标,得到所述其他光斑分别与所述各投影光斑之间的距离;
若所述各距离中的最短距离小于指定距离,则将所述其他光斑的图像光斑索引以及与所述最短距离对应的投影光斑对应的发光器的第一标识添加到同一光斑匹配对,并将所述光斑匹配对确定为所述其他光斑匹配对。
在本公开的某一些实施例中,所述处理器根据所述各其他光斑匹配对的数量对各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,具体操作为:
针对任意一个预测位姿,若所述预测位姿对应的其他光斑匹配对的数量小于第二指定数量,则删除所述预测位姿以及与所述预测位姿相对应的其他光斑匹配对;
针对任意一个相邻光斑匹配对,若与所述相邻光斑匹配对相应的多个预测位姿均已被删除,则删除所述相邻光斑匹配对;
统计剔除后剩余的各光斑匹配对的数量;
针对存在同一图像光斑索引的各光斑匹配对,将所述各光斑匹配对中数量最多的光斑匹配对确定为与所述图像光斑索引相对应的目标光斑匹配对。
在本公开的某一些实施例中,所述处理器根据预测的所述手柄与所述虚拟显示设备间 的当前相对位姿,以及所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿,具体操作为;
根据所述手柄上各发光器在优化后3D空间结构中的3D坐标,以及预测得到的所述手柄与所述虚拟显示设备间的当前相对位姿,确定当前各发光器在当前目标手柄图像的局部范围;
在所述当前目标手柄图像的局部范围内提取所述当前各发光器的当前光斑,并根据最近邻匹配,确定各当前光斑对应的发光器;
根据存在对应关系的当前光斑的2D坐标与3D发光器的3D坐标,以及所述观测数据和所述当前目标手柄图像同步时所述IMU与所述相机的位姿,建立重投影约束方程;
根据连续两帧观测数据对应的所述IMU的位姿和所述手柄的运动速度,建立预积分约束方程;
联合所述预积分约束方程和所述重投影约束方程,求解出所述当前目标手柄图像对应的所述IMU的位姿、所述相机的位姿、以及所述IMU与所述手柄的相对位姿;
根据所述IMU与所述手柄的相对位姿、所述IMU的位姿和所述相机的位姿,得到当前所述手柄与所述虚拟显示设备间的目标相对位姿。
在本公开的某一些实施例中,所述预积分约束方程为:
所述重投影约束方程为:
其中,分别表示所述IMU采集的第j帧观测数据对应的所述IMU在世界坐标系下的旋转矩阵和平移向量,表示所述IMU采集的第j+1帧观测数据对应的所述IMU在所述世界坐标系下的平移向量,分别表示第j帧和第j+1帧观测数据对应的所述IMU在所述世界坐标系下的运动速度,gW表示重力加速度,Δt表示所述IMU采集的第j帧和第j+1帧观测数据之间的时间间隔,LOG(·)表示四元数组对应的李群SO3上的对数函数,分别表示所述IMU的所述平移向量、所述运动速度和所述旋转矩阵的预积分变量,分别表示所述IMU采集的第j帧观测数据对应的所述虚拟显示设备上的相机在世界坐标系下的旋转矩阵和平移向量,分别表示所述IMU在手柄坐标系下的旋转矩阵和平移向量,表示所述手柄上第一标识为m的发光器的3D坐标,pm表示所述手柄上第二标识为m的当前光斑的2D坐标,pro j(·)表示相机的 投影方程。
在本公开的某一些实施例中,联合所述预积分约束方程和所述重投影约束方程的结果为:
其中,分别表示所述IMU采集的第j帧观测数据对应的所述IMU在世界坐标系下的旋转矩阵和平移向量,j表示所述IMU采集的观测数据的帧数,fj表示所述预积分约束方程,gj表示所述重投影约束方程。
另一方面,本公开提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机设备执行根据一些实施例的估计手柄位姿的方法。
本公开提供的估计手柄位姿的方法及虚拟显示设备中,手柄上安装有IMU和多个发光器,虚拟显示设备上安装有多目相机,且相机的类型与发光器类型相匹配,通过估计手柄与虚拟显示设备间的相对位姿,实现手柄对控制虚拟显示设备显示的画面的控制,完成与虚拟世界的交互。在估计手柄与虚拟显示设备间相对位姿前,从不同位置、角度采集多帧初始手柄图像,保证获取到手柄上完整数量的发光器,从而基于多帧初始手柄图像中的发光器来优化发光器的3D空间结构,提高后续相对位姿计算的准确性;在位姿估计过程中,基于优化后的3D空间结构以及各相机采集的首帧目标手柄图像中提取的目标光斑集合以及IMU的观测数据,初始化手柄与虚拟显示设备间的相对位姿,由于目标光斑集合提取时剔除了环境因素的干扰,有助于提高相对位姿计算的准确性,当初始化完成后,针对相机采集的非首帧目标手柄图像,根据历史目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,预测当前目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,再结合IMU的观测数据,实现视觉惯导对相对位姿的联合优化,从而得到平稳、准确的当前手柄与虚拟显示设备间的目标相对位姿。
附图说明
图1为根据一些实施例的VR设备与手柄的应用场景示意图;
图2A为根据一些实施例的包含多目相机的虚拟显示设备示意图;
图2B为根据一些实施例的包含多个LED白光灯的6DOF手柄示意图;
图2C为根据一些实施例的包含多个LED红外灯的6DOF手柄示意图;
图3A为根据一些实施例的发光器异常检测示意图;
图3B为根据一些实施例的发光器异常检测示意图;
图4为根据一些实施例的估计手柄位姿方法的整体架构图;
图5为根据一些实施例的优化手柄上各发光器的3D空间结构的方法流程图;
图6A为根据一些实施例的标注前双目红外相机采集的手柄图像;
图6B为根据一些实施例的标注后双目红外相机采集的手柄图像;
图7为根据一些实施例的PnP原理示意图;
图8为根据一些实施例的视觉惯导联合优化估计手柄位姿的架构图;
图9为根据一些实施例的视觉惯导联合估计手柄位姿的方法流程图;
图10为根据一些实施例的光斑检测方法流程图;
图11为根据一些实施例的图像二值化处理的方法流程图;
图12为根据一些实施例的利用每两个候选轮廓间的欧式距离和最小曼哈顿距离剔除异常轮廓的方法流程图;
图13为根据一些实施例的利用选择出的这两个候选轮廓内像素点间的数量关系剔除异常轮廓的方法流程图;
图14为根据一些实施例的利用候选轮廓与最近邻候选轮廓间的距离剔除离群的异常轮廓的方法流程图;
图15为根据一些实施例的2D光斑与3D发光器匹配的方法流程图;
图16为根据一些实施例的相邻光斑组成的平面图形示意图;
图17为根据一些实施例的平面图形中的各光斑与实际相邻的发光器集合快速匹配的方法流程图;
图18为根据一些实施例的为对相邻光斑匹配对进行筛选的方法流程图;
图19为根据一些实施例的确定其他光斑匹配对的方法流程图;
图20为根据一些实施例的实时估计手柄与虚拟显示设备间相对位姿的方法流程图;
图21为根据一些实施例的虚拟显示设备的结构图。
具体实施方式
为使本公开实施例的目的和优点更加清楚,下面将结合本公开实施例中的附图,对本公开进行清楚、完整地描述,显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于本公开文件中记载的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
AR、VR等虚拟显示设备一般是指具备独立处理器的头戴式显示设备(简称为头显或者头盔,如VR眼镜、AR眼镜等),具有独立运算、输入和输出的功能。虚拟显示设备可外接手柄,用户通过操作手柄来控制虚拟显示设备显示的虚拟画面,实现常规交互。
以游戏场景为例,参见图1,为根据一些实施例的虚拟显示设备与手柄的应用场景示意图,如图1所示的游戏场景中,玩家通过手柄实现与虚拟世界的交互,利用手柄与虚拟 显示设备的相对位姿,控制虚拟显示设备的游戏画面,并根据游戏场景的变化做出肢体上的反映,从而体验身临其境般的沉浸式体验,提升游戏的趣味性。特别的,利用电视的大屏优势,将虚拟显示设备的虚拟游戏画面投放到电视上,娱乐性更高。
一般的,根据输出位姿的不同,常用的手柄包括3DOF手柄和6DOF手柄,其中,3DOF输出3维的旋转姿态,6DOF手柄输出3维的平移位置和3维的旋转姿态,相对于3DOF手柄,6DOF手柄可以做出的游戏动作更加复杂,趣味性更强。
目前,常用的6DOF手柄上设置有多个发光器(如LED灯),其中,发光器可以发不同类型的光(如红外光、白光等),且虚拟显示设备上的多目相机(在图2A中用圆圈圈出)的类型,应与发光类型相适配。
例如,参见图2B,为根据一些实施例的一种6DOF手柄的示意图,如图2B所示的,该6DOF手柄上设置的LED灯发射白光,白点孔洞就是每个LED灯的位置。此时,为通过手柄上LED灯的位置来估计手柄与虚拟显示设备间的位姿,虚拟显示设备上的多目相机应为RGB相机。
再例如,参见图2C,为根据一些实施例的另一种6DOF手柄的示意图,如图2C所示的,该6DOF手柄上设置的LED灯发射红外光(人眼不可见)。此时,为通过手柄上LED灯的位置来估计手柄与虚拟显示设备间的位姿,虚拟显示设备上的多目相机应为红外相机。
在实际应用中,使用手柄与虚拟世界进行交互,其前提是得到手柄在虚拟世界中的位姿,从而根据6DOF位姿实现手柄对虚拟显示设备显示画面的控制。
目前,市面上的大多数产品中,定位手柄位姿的方法主要为:利用虚拟显示设备上的红外相机捕捉手柄上发光器的红外图像,通过图像识别、图像跟踪这些红外发光器,并结合手柄上发光器的3D空间结构进行发光器的匹配、3D坐标计算等操作,最终可以得到手柄与虚拟显示设备间的相对位姿。
然而,上述方法中,由于手柄上发光器的3D空间结构是基于手柄的设计图纸测量得到,精度较低,导致位姿估计误差较大;同时,通过手柄上发光器的3D空间结构以及在图像中的2D光斑,可以计算当前帧手柄的位姿,但一方面相机采集的单帧图像内发光器的数目有限,导致位姿估计准确度不高,另一方面相机采集的连续多帧图像之间的发光器的观测没有相互关联,导致交互过程中位姿光滑度较差,影响视觉体验。
一般的,如图2B和图2C所示的手柄内部,还安装有惯性测量单元(Inertial measurement unit,IMU),用于测量手柄的运动速度,包括加速度和角速度,而手柄的运动速度,也会影响手柄与虚拟显示设备间的相对位姿。
鉴于此,本公开实施例提供了一种估计手柄位姿的方法及虚拟显示设备,基于虚拟显示设备的多目相机在不同位置、角度采集的手柄图像中发光器的标注结果,优化手柄上发光器的3D空间结构,从而提高手柄位姿估计的准确性;并且,利用手柄上IMU采集的观 测数据和虚拟显示设备上相机采集的手柄图像,采用视觉惯导联合优化的位姿估计方法,获得更加平滑、准确的手柄位姿。
同时,考虑到手柄上的发光器是通过视觉图像进行检测的,在一定程度上会受到环境因素的影响。例如,图3A中(a)所示廊道和房间中的一些灯光,这些灯光可能被错误的检测到,如图3A中(b)所示;再例如,图3B中(a)所示LED显示屏上的文字,可能被错误的检测到,如图3B中(b)所示。而如果图像中手柄上的发光器检测不准确,将会导致手柄与虚拟显示设备间的相对位姿存在较大误差,降低了控制精度,严重影响用户体验。因此,本公开实施例在估计位姿时,在相机采集的图像进行了一些列处理操作,并对检测出的发光器在图像中的2D光斑进行了异常剔除,以提高发光器检测的准确性和鲁棒性。
在计算手柄与虚拟显示设备间的相对位姿时,需要将手柄上发光器的3D点与发光器在相机采集的图像中光斑的2D点一一对应,而传统的暴力匹配方式比较耗时,会降低定位效率。因此,本公开实施例通过将相邻的光斑拼接为平面图形来提高匹配的效率和精度。
参见图4,为根据一些实施例的估计手柄位姿方法的整体架构图,主要包括预处理以及相对位姿估计两部分。其中,预处理部分主要是利用虚拟显示设备上多目相机在不同位置、角度采集的多帧初始手柄图像中各发光器的标注结果,优化手柄上发光器的3D空间结构,获得更加准确的发光器的3D坐标,从而提高手柄位姿估计的准确性。相对位姿估计部分主要是利用相机采集的目标手柄图像以及IMU采集的观测数据,采用视觉惯导联合优化方法,实时估计手柄与虚拟显示设备间的相对位姿。
其中,在相对位姿估计部分,针对相机采集的目标手柄图像进行了光斑检测,获得手柄上的各发光器在图像中的目标光斑集合,结合各发光器优化后的3D空间结构和IMU采集的观测数据,进行相对位姿估计。在位姿估计过程中,需要将手柄上各发光器的3D点与各发光器在图像中形成的光斑的2D点一一匹配,而通常的,手柄上各发光器的第一标识在设计图纸上是设置好的,因此,匹配过程可以看作是确定各发光器匹配的光斑的第二标识过程。
手柄在出厂前,各发光器的3D空间结构可以根据手柄的设计图纸获得,包括每个发光器的位置(用3D坐标表示)以及第一标识(用数字编码的ID表示)。但由于生产工艺的不同,实际上各发光器的3D空间结构可能和设计图纸存在误差,若直接使用设计图纸中手柄上各发光器的3D空间结构进行位姿估计,可能造成估计误差,影响用户的沉浸式体验。
因此,本公开实施例在估计手柄与虚拟显示设备间相对位姿之前,根据采集的多帧不同初始手柄图像,优化各发光器的3D空间结构。其中,优化过程可使用虚拟显示设备上预先标定好的至少两台相机采集的手柄图像,还可以使用预先标定好的独立的多台相机采 集的手柄图像,但无论使用哪种相机,该相机的类型是与手柄上发光器的发光类型相配的。
在本公开的某一些实施例中,手柄上各发光器的3D空间结构的具体优化过程参见图5,主要包括以下几步:
S501:根据优化前各发光器的3D空间结构,获得每个发光器的3D坐标和第一标识。
优化前各发光器的3D空间结构是由手柄的设计图纸确定的,通过测量手柄的设计图纸,可以得到优化前3D空间结构中手柄上各发光器的3D坐标,以及每个发光器的第一标识。
S502:根据不同位置角度采集的多帧初始手柄图像上预先标注的发光器,获得每个发光器在相应的初始手柄图像上形成的光斑的2D坐标和第二标识。
本公开的实施例中,在手柄上各发光器亮起的状态下,使用与发光器的发光类型相匹配的多目相机,从不同位置、角度采集多帧初始手柄图像,保证手柄上的发光器全部被采集到。得到多帧初始手柄图像后,人工预先标注出各发光器的中心点在每帧初始手柄图像中的位置(用2D坐标表示),以及每个发光器的第二标识(用数字编码的ID表示)。其中,各发光器的第二标识与各发光器的3D空间结构保持一致。
以手柄上的发光器为LED红外灯、采集相机为虚拟显示设备上的双目红外相机为例,此时,初始手柄图像为红外手柄图像。如图6A所示,为双目红外相机采集的标注前的红外手柄图像,人工标注后,双目红外手柄图像如图6B所示。
由于双目红外相机相对于同一个手柄的位置和角度不同,因此,同步采集的单帧红外手柄图像中,手柄的发光器的位置和数量不同。例如,如图6A和图6B所示的,一个红外相机采集的红外手柄图像中,包含第一标识为2、3、4、5、7的5个LED红外光斑,另一个红外相机采集的红外手柄图像中,包含第一标识为2、3、4、5、6、7、8、9的8个LED红外光斑。
对多目相机在不同位置、角度采集的每帧初始手柄图像全部进行标注后,可以根据各帧初始手柄图像的标注结果,可以获得每个发光器相应的初始手柄图像上形成的光斑的2D坐标和第二标识。
进一步地,基于每帧初始手柄图像中各光斑的2D坐标和第二标识,采用从运动恢复结构(Structure from Motion,SFM)思想,对每个发光器的3D坐标进行优化,得到优化后的各发光器的3D空间结构,具体参见S503-S506。
S503:针对各帧初始手柄图像,根据第一标识和第二标识相同的发光器的2D坐标和3D坐标,以及相应帧对应的IMU的观测数据,确定手柄与采集相机间的相对位姿。
针对每一帧初始手柄图像,执行以下操作:根据2D图像中第二标识和3D空间中第一标识相同的光斑的2D坐标和发光器的3D坐标,采用PnP(Perspective-n-Points)算法,确定该帧对应的手柄与采集相机间第一相对位姿,以及通过对该帧对应的IMU的观测数据 进行积分,得到手柄与采集相机间的第二相对位姿,通过对第一相对位姿和第二相对位姿进行融合,得到该帧对应的手柄与采集相机间的相对位姿。
PnP算法是指基于3D与2D点对解决物体运动定位问题,其原理如图7所示,O表示相机光心,3D空间中物体的若干个(如A、B、C、D)3D点通过相机投影在图像平面上,得到对应的2D点(如a、b、c、d),在已知3D点的坐标和3D点与2D点的投影关系的情况下,可以估算相机与物体间的位姿。在本公开实施例中,3D点与2D点的投影关系可以通过发光器的第一标识和第二标识反映出来。
S504:构建重投影误差方程,根据重投影误差方程同时优化各个相对位姿和3D坐标,得到第一次优化后的3D空间结构。
由于各相机在使用前进行了标定,每个相机的投影参数(也称为内参数),以及相机间的相对位姿是已知的。因此,在S504中,根据每个相机的投影参数、相机间的相对位姿、以及手柄上各发光器的3D坐标、各发光器在每个相机采集的初始手柄图像中形成的光斑的2D坐标,构建重投影误差方程,通过使重投影误差最小,从而同时优化各帧初始手柄图像对应的手柄与采集相机间的相对位姿,以及手柄上各发光器的3D坐标,得到第一次优化后的3D空间结构。
其中,重投影误差方程公式表示如下:
在公式1中,Kn表示第n号相机的投影参数,分别表示手柄与第0号相机间的旋转矩阵和平移向量,分别表示第n号相机与第0号相机间的旋转矩阵和平移向量,表示第一标识为m的发光器在手柄上的3D坐标,pm,n表示第二标识为m的发光器在第n号相机采集的初始手柄图像上形成的光斑的2D坐标。
其中,表示手柄与第0号相机间的相对位姿,表示第n号相机与第0号相机间的相对位姿。
在本公开的某一些实施例中,第0号相机可以为采集的光斑数量最多的相机,也称为主相机。例如,以图6B为例,右红外相机采集的光斑数量多于左红外相机采集的光斑数量,此时,右红外相机为第0号相机(主相机)。
第一次3D空间结构优化后,可以得到较为准确的各发光器的3D坐标,但优化后3D空间结构的原点与优化前3D空间结构的原点之间会有一定的漂移。在一些实施例中,为进一步提高各发光器3D坐标的准确性,采用3对点的相似变换(Similarity Transformation,SIM3)方法将优化前后手柄坐标系统一对齐,实现对各发光器的3D空间结构的二次优化。其中,第二次优化过程包括:
S505:根据优化后3D空间结构对应的手柄上各发光器组成的第一3D点云,以及优化前3D空间结构对应的手柄上各发光器组成的第二3D点云,确定优化前后第一3D点云和第二3D点云间的转换位姿。
手柄上各发光器的3D空间结构第一次优化后,各发光器的3D点组成第一3D点云,手柄上各发光器的3D空间结构第一次优化前,各发光器的3D点组成第二3D点云。在第一3D点云和第二3D点云中,优化前后各发光器的3D点坐标是已知的,通过是优化前后各发光器的3D坐标间的漂移误差最小,求得第一3D点云和第二3D点云间的转换位姿,转换位姿的计算公式如下:
其中,表示第一次优化后标识为m的发光器在手柄坐标系下的3D坐标,表示第一次优化前标识为m的发光器在手柄坐标系下的3D坐标,s表示第一3D点云和第二3D点云的尺度变换系数,(R,t)表示第一3D点云和第二3D点云间的转换位姿,其中,R表示优化前后手柄坐标系间的旋转矩阵,t表示优化前后手柄坐标系间的平移向量。
S506:根据转换位姿,重新确定手柄上各发光器的3D坐标,得到第二次优化后的3D空间结构。
根据3D空间结构第一次优化前后各发光器的第一3D点云和第二3D点云间的准换位姿,计算手柄上各发光器最终的3D坐标,记为计算公式如下:
基于各发光器最终的3D坐标,可以得到第二次优化后的3D空间结构。通过优化手柄上各发光器的3D空间结构,可以得到各发光器更加准确的3D坐标,进而基于优化后各发光器的3D坐标,实时估计手柄与虚拟显示设备间的相对位姿时,能够提高位姿估计的准确性。
需要说明的是,同一批次的手柄是基于同一设计图纸生产的,因此,对于同一批次的手柄,只需要进行一次优化即可。
需要说明的是,上述优化手柄上各发光器的3D空间结构的方法,可以由虚拟显示设备执行,还可以由其他设备执行,如笔记本电脑、台式计算机等。
优化完手柄上各发光器的3D空间结构后,可以利用虚拟显示设备上的多目相机对手柄进行成像,再结合手柄内IMU采集的观测数据,实现视觉和惯导对位姿的联合优化。
参见图8,为根据一些实施例的视觉惯导联合优化估计手柄位姿的架构图,在图8中,分别表示第j(j=1,2,…n)帧对应的手柄上IMU坐标系与世界坐标系间的相对位姿、手柄坐标系与世界坐标系间的相对位姿、相机(即虚拟显示设备)坐标系与世 界坐标系间的相对位姿,表示手柄坐标系与IMU坐标系间的相对位姿。
如图8示出的,通过IMU连续采集的多帧观测数据间的预积分约束,以及IMU和相机采集的同一帧数据(即观测数据和目标手柄图像的时间戳相同)间的重投影约束,实现视觉惯导对手柄与虚拟显示设备间相对位姿的联合优化。
参见图9,为根据一些实施例的视觉惯导联合估计手柄位姿的方法流程图,该流程主要包括以下几步:
S901:确定是否对手柄和虚拟显示设备间的相对位姿进行了初始化操作,若否,则执行S902,若是,则执行S903。
实时估计手柄与虚拟显示设备间相对位姿的过程中,可对手柄与虚拟显示设备间的相对位姿进行预测,预测过程需要给定手柄与虚拟显示设备间相对位姿的初值,因此,位姿估计过程中,首先确定是否对手柄和虚拟显示设备间的相对位姿进行了初始化操作,若没有初始化,则初始化手柄和虚拟显示设备间的相对位姿,若已经初始化,则对手柄和虚拟显示设备间的相对位姿进行预测及优化。
S902:针对相机采集的首帧目标手柄图像,根据目标手柄图像获得各发光器的目标光斑集合,并根据目标光斑集合、IMU同步采集的观测数据和手柄上各发光器优化后的3D空间结构,初始化手柄与虚拟显示设备间的相对位姿。
在实际应用中,VR体验的环境亮度亮暗差异较大,且环境中的光源会对手柄上发光器的检测存在影响。为了解决该问题,本公开实施例提供一种在明亮环境和昏暗环境下均能准确检测出各发光器在图像中的2D光斑的方法。
参见图10,为本公开实施例提供的光斑检测方法流程图,主要包括以下几步:
S9021:获取当前环境亮度,根据当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对目标手柄图像进行二值化处理,获得二值化手柄图像。
通常的,相机采集的图像中可以提取光照特征,通过光照特征,可以获得当前环境亮度。
在公开的某一些实施例中,可以对相机采集的图像进行灰度化处理,获得灰度图像,包括但不限于浮点法、整数法、移位法、平均法等,进一步地,根据灰度图像的直方图,确定当前环境亮度。
例如,当直方图的高峰位于灰度值小于100的暗侧时,表明当前环境中没有明亮光照,此时,确定当前环境亮度为昏暗;当直方图的高峰位于灰度值大于等于100的亮侧时,表明当前环境中存在明亮光照,此时,确定当前环境亮度为明亮。
根据当前环境亮度,可以采用与当前环境亮度相匹配的目标二值化阈值对目标手柄图像进行二值化处理以提高不同环境下发光器检测的准确性和鲁棒性。其中,适用于对包含多个发光器的目标手柄图像进行二值化的方法主要包括以下两种:
最大类间方差法:也称为大津法,是1979年提出的一种二值化阈值求解方法,该方法以最大化前景图像与背景图像的类间方差为核心思想,适用于求解直方图分布趋近于双峰的二值化阈值;
三角法:是一种二值化阈值求解算法,更适用于求解直方图分布趋近于单峰的二值化阈值,该方法通过直方图的最高峰到较远侧的直方图构造一条直线,然后求解每条直方图到该直线的垂直距离,取最大垂直距离对应的直方图位置作为二值化阈值。
由于虚拟游戏体验场景复杂,环境亮度亮暗差异较大,无论单独采用上述两种方法中的任意一种,均无法获得较为理想的二值化效果。为了适应更加广泛的使用场景,本公开实施例基于这两种主要的二值化自适应阈值求解算法,将大津法与三角法结合起来,得到一种能够同时适应明亮与昏暗环境下,求得更合理的图像二值化所用的目标二值化阈值。
参见图11,为本公开实施例中图像二值化处理的方法流程图,主要包括以下几步:
S9021_1:剔除对目标手柄图像灰度化处理后的灰度手柄图像中灰度值低于预设灰度阈值的像素点,并根据像素点剔除后的灰度手柄图像的新直方图,分别确定至少两个二值化方法各自的二值化阈值。
手柄上各发光器的亮度在不同环境下基本稳定,在通过二值化方法计算二值化阈值时,应排除亮度过低的昏暗背景。因此,将目标手柄图像灰度化处理后的灰度手柄图像中灰度值低于预设灰度阈值的像素点剔除,根据灰度手柄图像中剩余像素点来计算当前图像的新直方图,并根据新直方图,分别确定至少两个二值化方法各自的二值化阈值。
在本公开的某一些实施例中,由于手柄所处的环境复杂多样,为防止意外情况的发生,可预先为每个二值化方法设置一个最低保障阈值。当根据新直方图计算的二值化阈值低于预设最低保障阈值,强制将计算的二值化阈值设置为预设最低保障阈值,从而增强算法在特殊情况下的稳定性。
例如,根据新直方图,当大津法计算的二值化阈值低于预设最低保障阈值时,将预设最低保障阈值设置为大津法对应的二值化阈值;当三角法计算的二值化阈值低于预设最低保障阈值时,将预设最低保障阈值设置为三角法对应的二值化阈值。
需要说明的是,根据新直方图确定的上述大津法和三角法的二值化阈值外,还可确定其他二值化方法的二值化阈值。
S9021_2:将当前环境亮度与预设亮度阈值进行比较,根据比较结果,分别确定至少两个二值化阈值各自对应的权重。
通过将当前环境亮度与预设亮度阈值进行比较,可以确定当前环境亮度与每个二值化方法求解的二值化阈值的适应程度,该适应程度可通过权重反应。
以两个二值化方法求解的二值化阈值加权得到目标二值化阈值的过程为例,其中,第一二值化方法用于求解包含单峰的直方图分布,第二二值化方法用于求解包含双峰的直方 图分布,例如,第一二值化方法为三角法,第二二值化方法为大津法。首先,确定当前环境亮度是否大于预设亮度阈值,若是,表明手柄处于明亮环境,此时,采用第一二值化方法计算的第一二值化阈值与当前环境亮度更适配,即第一二值化方法计算的第一二值化阈值更准确,因此,设置第一二值化方法计算的第一二值化阈值对应的第一权重,大于第二二值化方法计算的第二二值化阈值对应的第二权重;若否,表明手柄处于昏暗环境,此时,采用第二二值化方法计算的第二二值化阈值与当前环境亮度更适配,即第二二值化方法计算的第二二值化阈值更准确,因此,设置第一二值化方法计算的第一二值化阈值对应的第一权重,小于第二二值化方法计算的第二二值化阈值对应的第二权重。
S9021_3:根据各二值化阈值以及相应的权重,加权得到目标二值化阈值。
获得各二值化方法对应的权重后,通过加权得到目标二值化阈值。
以第一二值化方法为三角法、第二二值化方法为大津法为例,假设第一二值化阈值记为S1,对应的第一权重为α,第二二值化阈值记为S2,对应的第二权重为β,此时,目标二值化阈值S的计算公式为:
S=α*S1+β*S2             公式4
在本公开的某一些实施例中,当当前环境亮度大于预设亮度阈值时,α=0.7,β=0.3;当当前环境亮度大于预设亮度阈值时,α=0.3,β=0.7。
S9021_4:根据目标二值化阈值,对灰度手柄图像进行二值化处理,获得二值化手柄图像。
获取与当前环境亮度相匹配的目标二值化阈值后,根据目标二值化阈值对灰度手柄图像进行二值化处理,得到二值化手柄图像。由于目标二值化阈值是根据当前环境亮度对不同二值化方法的二值化阈值加权得到的,因此目标二值化阈值的设置更加合理,能够适应当前环境亮度,从而减少环境光的干扰,提高发光器检测的准确性。
S9022:在二值化手柄图像的全局范围内进行轮廓检测,得到各发光器的候选轮廓集。
初始化前,手柄与虚拟显示设备间的相对位姿是未知的,3D空间中手柄上各发光器投影到虚拟显示设备上相机采集的目标手柄图像中的光斑的位置也是未知的。因此,需要在二值化手柄图像的全局范围内进行各发光器的检测,将检测出的各光斑作为3D空间中的各发光器在图像中的2D点。
在本公开的某一些实施例中,可采用图像处理中的轮廓提取算法(如HOG、Canny等)进行发光器检测。其中,轮廓是对图像进行二值化处理后由不连通的二值化区域中最外围像素组成的,每个不连通的二值化区域均有且只有一个最外围轮廓,通过计算最外围像素点包围的区域内所有像素点的面积之和,可以获得轮廓面积。其中,每个轮廓表征一个光斑。
需要说明的是,本公开实施例对发光器的检测方法不做限制性要求,例如,还可以采 用深度学习模型(如CNN、YOLO等)进行发光器检测。
S9023:分别根据先验轮廓形状信息以及轮廓对比信息,剔除候选轮廓集中的异常轮廓,得到各发光器的目标光斑集合。
由于除手柄上的多个发光器可以发光外,周围环境的其他发光设备也会发光,因此,轮廓检测的候选轮廓集中,可能包含发光器的轮廓,也可能包含对发光器形成干扰的其他发光设备的轮廓,因此,需要对候选轮廓集进行筛选。
在本公开的某一些实施例中,根据先验轮廓形状信息执行以下至少一种剔除操作:
剔除操作一、根据候选轮廓的面积与候选轮廓的外接矩形的长宽比例关系,剔除长宽比例超出第一预设比例阈值的候选轮廓,所述第一预设比例阈值与候选轮廓的面积。
当候选轮廓的面积扩大时,要求候选轮廓的外接矩形的长宽要更加接近。因此,在剔除操作一中,为提高轮廓检测的准确性,本公开实施例采用阶梯式的比例阈值进行异常轮廓剔除,即第一预设比例阈值与候选轮廓的面积呈阶梯式状态,候选轮廓的面积越大,第一预设比例阈值越小。当候选轮廓的外接矩形的长宽比例超出第一预设比例阈值,则认为是误检,剔除该候选轮廓。
剔除操作二、剔除候选轮廓与候选轮廓的外接矩形的面积占比小于预设占比阈值的候选轮廓。
剔除操作三、计算候选轮廓的灰度质心点与候选轮廓的外接矩形的中心点,分别在横轴与纵轴上的距离,并分别计算每个距离占候选轮廓的边长的比例,若两个比例中的至少一个超过第二预设比例阈值,则剔除候选轮廓。
剔除操作四、根据候选轮廓包含的像素点总数以及候选轮廓的边长,确定候选轮廓的圆度,若圆度低于预设圆度阈值,则剔除候选轮廓。
假设候选轮廓包含的像素点总数(包括该候选轮廓内部的像素点以及轮廓边界上的像素点)为P,候选轮廓的周长为C,则圆度R的计算公式为:
R=(4*π*P)/C2          公式5
剔除操作五、计算候选轮廓的亮度均值,若亮度均值小于预设亮度阈值,则剔除候选轮廓。
剔除操作六、确定候选区域的外接矩形的预设外围区域的亮度均值,以及候选轮廓的亮度均值,若两个亮度均值之间的亮度差异小于预设亮度差值,则剔除候选轮廓。
上述根据先验轮廓形状信息剔除候选轮廓集中的异常轮廓时,是针对单一候选轮廓剔除的,没有考虑候选轮廓间的关系。因此,还可以根据轮廓对比信息进一步剔除候选轮廓集中的异常轮廓。
在本公开的某一些实施例中,根据轮廓对比信息剔除候选轮廓集中的异常轮廓的方式包含以下一种或多种:
剔除操作七、针对候选轮廓集中的每两个候选轮廓,分别确定两个候选轮廓的外接矩形中心点之间的欧式距离,以及两个候选轮廓的边缘的最小曼哈顿距离,并根据欧式距离和最小曼哈顿距离,剔除异常轮廓。
其中,根据每两个候选轮廓间的欧式距离和最小曼哈顿距离剔除异常轮廓的具体过程参见图12,主要包括以下几步:
S9023_11:确定两个候选轮廓间的欧式距离和最小曼哈顿距离中的至少一个是否小于预设距离阈值,若是,则执行S9023_12,否则,执行S9023_16。
根据两个候选轮廓间的欧式距离和最小曼哈顿距离,可以判定两个候选轮廓的近似程度。当两个候选轮廓间的欧式距离和最小曼哈顿距离中的至少一个小于预设距离阈值时,表明两个候选轮廓近似程度较高,需进一步进行异常判断,应执行S9023_12;当两个候选轮廓间的欧式距离和最小曼哈顿距离均大于预设距离阈值,表明两个候选轮廓近似程度较低,应执行S9023_16。
S9023_12:分别计算两个候选轮廓的面积。
S9023_13:确定两个候选轮廓的面积是否均小于预设面积阈值,若是,则执行S9023_14,否则,执行S9023_15。
通过计算出的两个候选轮廓各自的面积与预设面积阈值的比较结果,进一步进行异常判断。
S9023_14:同时剔除两个候选轮廓。
当两个候选轮廓的面积均小于预设面积阈值时,表明这两个候选轮廓均可能是噪点,应同时剔除这两个候选轮廓。
S9023_15:分别计算两个候选轮廓的亮度均值,剔除小亮度均值对应的一个候选轮廓。
当两个候选轮廓的面积中至少一个不小于预设面积阈值时,可通过亮度均值进行异常剔除。在实际应用中,分别计算这两个候选轮廓的亮度均值,并比较两个亮度均值的大小,将小亮度均值对应的一个候选轮廓从候选轮廓集中剔除。
S9023_16:同时保留两个候选轮廓。
当两个候选轮廓间的欧式距离和最小曼哈顿距离均大于预设距离阈值,表明两个候选轮廓近似程度较低,可同时保留候选轮廓集中的这两个候选轮廓。
剔除操作八、根据候选轮廓的面积对候选轮廓集中的全部候选轮廓进行排顺序,并根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓。
通过候选轮廓的面积排序后,可以选择出候选轮廓集中面积最大候选轮廓和面积次大候选轮廓,根据选择出的这两个候选轮廓内像素点间的数量关系剔除异常轮廓的具体过程参见图13,主要包括以下几步:
S9023_21:确定面积最大候选轮廓和面积次大候选轮廓内像素点数量是否均超过预设 像素点数量阈值,若是,则执行S9023_22,否则,执行S9023_25。
两个候选轮廓内像素点数量可以反应两个候选轮廓的近似程度,因此,可以根据面积最大候选轮廓和面积次大候选轮廓内像素点数量分别与预设像素点数量阈值的比较,确定这两个候选轮廓是否形状相似。
S9023_22:计算面积最大候选轮廓与面积次大候选轮廓内像素点数量间的倍数。
S9023_23:确定倍数是否大于预设倍数阈值,若是,则执行S9023_24,否则,执行S9023_25。
通过面积最大候选轮廓与面积次大候选轮廓内像素点数量间的倍数,进一步进行异常判断。
S9023_24:剔除面积最大候选轮廓。
当面积最大候选轮廓与面积次大候选轮廓内像素点数量间的倍数大于预设倍数阈值,此时,面积最大候选轮廓可能为一种与手柄上发光器的形状类似的干扰物,应从候选轮廓集中剔除。
S9023_25:保留面积最大候选轮廓和面积次大候选轮廓。
当面积最大候选轮廓和面积次大候选轮廓内像素点数量有一个未均超过预设像素点数量阈值,或者,面积最大候选轮廓与面积次大候选轮廓内像素点数量间的倍数不大于预设倍数阈值时,保留面积最大候选轮廓和面积次大候选轮廓。
剔除操作九、针对候选轮廓集中的每个候选轮廓,计算候选轮廓与最近邻候选轮廓间的距离,并根据距离,剔除离群的异常轮廓。
其中,根据候选轮廓与最近邻候选轮廓间的距离剔除离群的异常轮廓的过程参见图14,主要包括以下几步:
S9023_31:根据候选轮廓的边长以及全部候选轮廓的边长中位数,确定自适应离群距离。
根据候选轮廓的边长对候选轮廓集中全部候选轮廓的进行排序,得到边长中位数,将边长中位数与当前的候选轮廓间的距离,作为自适应离群距离。
S9023_32:确定候选轮廓与最近邻候选轮廓间的距离是否大于自适应离群距离,若是,则执行S9023_33,否则,执行S9023_36。
S9023_33:确定全部候选轮廓的数量是否大于预设数量阈值,若是,则执行S9023_34,否则,执行S9023_35。
S9023_34:剔除候选轮廓。
当候选轮廓与最近邻候选轮廓间的距离大于自适应离群距离,且全部候选轮廓的数量大于预设数量阈值时,表明该候选轮廓为一个异常的离群轮廓,应该剔除。
S9023_35:保留候选轮廓。
S9023_36:离群剔除结束。
当全部候选轮廓的数量较少时,可能无法代表一个群体,此时,通过离群剔除异常轮廓可能无法实现,需通过其他方式进行异常剔除。
剔除操作十、计算候选轮廓集中每个候选轮廓的亮度均值,并根据各亮度均值,剔除异常轮廓。
在剔除操作十中,对候选轮廓集中各候选轮廓的亮度均值从大到小进行排序,保留前N(N为大于等于1的整数)个候选轮廓,剔除其余候选轮廓。
需要说明的是,上述剔除操作一至剔除操作十这多种方式中,没有严格的执行顺序,可以先根据先验轮廓形状信息进行异常轮廓剔除,再根据轮廓对比信息进行异常轮廓剔除;也可以先根据轮廓对比信息进行异常轮廓剔除,再根据先验轮廓形状信息进行异常轮廓剔除;还可以将轮廓对比信息和先验轮廓形状信息两类异常剔除方式穿插进行。
在本公开获取目标光斑集合的实施例中,为进一步适应不同环境光照的使用场景,使算法能在复杂环境下稳定鲁棒的运行,根据当前环境亮度对不同的二值化方法的二值化阈值进行加权,得到对目标手柄图像进行二值化处理的目标二值化阈值,保证了不同亮度下手柄上发光器检测的准确性,大幅度降低了开发难度及成本;同时,为了提高手柄上发光器的检测速度,采用图像处理技术对检测出的轮廓进行了异常剔除,提高运行速度的同时降低了内存资源的占用,利于部署在便携的可穿戴设备上。一方面,相比于基于AI神经网络的发光器检测方法,本公开实施例不需要高配置处理器进行网络训练,也不需要进行大量数据的标注,降低了开发硬件资源需求以及开发的成本与工作量;相比于一般图像处理的发光器检测方法,本公开实施例能够根据当前环境亮度,自适应调节二值化阈值,且通过对至少两个二值化方法的二值化阈值进行加权,提高了算法在复杂场景下使用的鲁棒性,扩大了适用范围。另一方面,本公开实施例根据发光器的轮廓特征,剔除了干扰手柄定位的发光器的光斑,进一步提升了算法的性能和检测的准确性。
获得准确检测的目标光斑集合后,目标光斑集合中的各目标光斑是优化后的3D空间结构中哪个发光器的投影是未知的,即2D光斑与3D发光器间的对应关系未知。因此,需要将目标光斑集合中的各目标光斑与3D空间结构优化后的各发光器进行匹配,建立2D光斑与3D发光器间一一对应的关系。从而根据存在对应关系的3D发光器与2D光斑,采用PNP算法,对齐手柄与虚拟显示设备间的坐标系,并对对齐后手柄上IMU采集的观测数据(包括但不限于手柄的加速度和角速度)进行预积分,进而得到手柄与虚拟显示设备间的相对6DOF位姿,完成手柄与虚拟显示设备间相对位姿的初始化过程。
一般的,IMU与相机的采集频率可能不同,位姿估计过程需要保证使用的IMU采集的观测数据与相机采集的目标手柄图像保持同步,观测数据与目标手柄图像的同步关系,可根据时间戳确定。
在本公开的某一些实施例中,2D光斑与3D发光器间一一对应的关系可通过3D发光器的第一标识与2D光斑的图像光斑索引表征,因此,2D光斑与3D发光器匹配的过程,可看作是确定目标手柄图像中某个图像光斑索引对应的光斑的第二标识的过程。
目前,2D光斑与3D发光器的匹配方式大多采用暴力匹配。其中,暴力匹配方法为:从目标光斑集合中任选3个目标光斑,根据各发光器的3D空间结构猜测这3个目标光斑的ID,然后使用P3P算法计算相对位姿,每个P3P算法有4个解,再根据解出的相对位姿将所有发光器重新投影到图像中,计算匹配点对的个数和误差,然后对所有组合结果进行排序,优先选择匹配数量最多的结果,如果匹配数量一样,选择误差小的结果。
通常的,暴力匹配的组合数量是巨大的,整体耗时较大,会降低定位效率。假设目标光斑集合中有m个光斑,手柄有n个发光器,则组合数计算公式为
为了解决该问题,本公开实施例提供一种高效的匹配方式,将相邻光斑拼接为平面图形进行匹配,经实验测得,以平面三角形为例,相邻光斑的组合数量通常小于500,小于暴力匹配的组合数量,能够有效提高匹配的效率和精度。
参见图15,为本公开实施例中2D光斑与3D发光器匹配方法流程图,主要包括以下几步:
S9024:针对目标光斑集合中的任意一个目标光斑,从目标光斑集合中筛选出与目标光斑相邻的第一指定数量的候选光斑,并将目标光斑与第一指定数量的候选光斑进行连接,得到平面图形。
以目标光斑集合中的任意一个目标光斑为例,候选光斑的确定过程包括:根据目标光斑的2D坐标以及目标光斑集合中其他光斑的2D坐标,得到目标光斑与其他光斑之间的距离,将目标光斑与其他斑之间的距离按照从小到大的顺序进行排序,将与前第一指定数量的距离对应的其他光斑确定为候选光斑,其中,可通过公式6得到所述目标光斑与任意一个其他光斑之间的距离:
其中,d为目标光斑与任意一个其他光斑之间的距离,x1为目标光斑在图像中的横坐标,y1为目标光斑在图像中的纵坐标,x2为其他光斑在图像中的横坐标,y2为其他光斑在图像中的纵坐标。
在本公开的某一些实施例中,第一指定数量为2,但是并不对本公开的实施例中的第一指定数量进行限定,其可根据实际情况来进行设置。
其中,第一指定数量与平面图形是相对应的,若平面图形是三角形,则第一指定数量为2,若平面图形是四面形,则第一指定数量为3。
以平面图形为三角形为例,如图16所示,为将目标光斑集合中的各相邻光斑进行连 接得到多个三角形。
S9025:根据优化后的3D空间结构上实际相邻的发光器集合,将平面图形中的各光斑和实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对。
通过手柄的设计图纸,可以得到手柄上各发光器的第一标识以及各发光器间的实际相邻关系,获得实际相邻的发光器集合。因此,可以以平面图形为单位,对平面图形中的各光斑与实际相邻的发光器集合进行快速匹配,得到各相邻光斑匹配对。其中,每个相邻光斑匹配对包含光斑的图像光斑索引和与光斑相匹配的发光器的第一标识。
如图17所示,为平面图形中的各光斑与实际相邻的发光器集合的快速匹配过程,主要包括以下几步:
S9025_1:将平面图形中的各光斑按照图像光斑索引从小到大的顺序进行排列,得到光斑列表。
S9025_2:按照指定顺序对实际相邻的发光器集合中的各发光器进行遍历,针对当前遍历的发光器,以发光器作为初始位置,并将与发光器实际相邻的其他发光器按照指定顺序进行排序,得到排序列表。
其中,本实施例中的指定顺序包括顺时针方向顺序和逆时针方向顺序,但是并不对本实施例中的指定顺序进行限定,本实施例中的指定顺序可根据实际情况来进行设置。
例如,以平面图形为三角形为例,实际相邻的发光器集合中,一组发光器包括发光器1、发光器2和发光器3。若本实施中的指定顺序为逆时针方向,则对实际相邻的发光器集合中各发光器的遍历顺序依次为发光器3、发光器2和发光器1,当遍历到发光器3时,对应的排序列表为:发光器3、发光器2、发光器1;当遍历到发光器2时,对应的排序列表为发光器2、发光器1、发光器3;当遍历到发光器1时,对应的排序列表为发光器1、发光器3、发光器2。
S9025_3:针对排序列表中的任意一个发光器,将发光器的第一标识与光斑列表中位置与发光器在排序列表中的位置相同的光斑的图像光斑索引添加到同一相邻光斑匹配对中。
例如,光斑列表中的顺序依序为:光斑A、光斑B、光斑C,以排序列表为:发光器3、发光器2、发光器1为例进行说明,得到的相邻光斑匹配度分别为:光斑A-发光器3,光斑B-发光器2,光斑C-发光器1。
S9025_4:判断实际相邻的发光器集合中是否存在未进行遍历的发光器,若是,则返回S9025_2,若否,则结束。
通过遍历实际相邻的发光器集合中的发光器,可以保证每个发光器都存在对应的图像光斑索引,获得基于相邻的各光斑的匹配结果。
为了进一步提高匹配的效率,在本公开的某一些实施例中,获得各相邻光斑匹配对之 后,可以对相邻光斑匹配对进行筛选,如图18所示,为对相邻光斑匹配对进行筛选的流程示意图,包括以下步骤:
S9025_5:针对任意一组相邻光斑匹配对对应的手柄的多个预测位姿,分别得到该相邻光斑匹配对相对应的手柄的预测重力方向向量。
在实际应用中,根据任意一组相邻光斑匹配对对应的手柄的多个预测位姿,通过预设的IMU积分算法,可以求解出该相邻光斑匹配对相对应的手柄的预测重力方向向量。
S9025_6:根据拍摄指定图像时虚拟显示设备的当前位置,得到手柄的实际重力方向向量。
在实际应用中,根据拍摄指定图像时虚拟显示设备的6Dof位姿,可以得到手柄的实际重力方向向量。
S9025_7:通过与各相邻光斑匹配对相对应的预测重力方向向量和实际方向向量,确定需要删除的相邻光斑匹配对,并将需要删除的相邻光斑匹配对进行删除。
在实际应用中,针对任意一组相邻光斑匹配对,根据该相邻光斑匹配对对应的预测重力方向向量与实际重力方向向量,得到重力方向向量夹角;若重力方向向量夹角大于指定夹角,则确定该相邻光斑匹配对为需要删除的相邻光斑匹配对。其中,可通过公式7得到重力方向向量夹角:
其中,θ为重力方向向量夹角,为预测重力方向向量,为实际重力方向向量。
例如,若指定夹角为10°,若第一相邻光斑匹配对对应的重力方向向量夹角为4°,则确定第一相邻光斑匹配对不需要进行删除,若第二相邻光斑匹配对对应的重力方向向量夹角为12°,则确定第二相邻光斑匹配对需要进行删除。
需要说明的是:本实施例中的指定夹角可根据实际情况来进行设置,本实施例在此并不对指定夹角的具体值进行限定。
S9026:针对任意一组相邻光斑匹配对,根据相邻光斑匹配对中各光斑的2D坐标和各发光器的3D坐标,确定相邻光斑匹配对对应的手柄的多个预测位姿。
以平面图形为三角形为例,每一组相邻光斑匹配对包含三个光斑的匹配结果,将这组相邻光斑匹配对中各光斑的2D坐标和各发光器的3D坐标输入至p3p算法中,可以得到这组相邻光斑匹配对对应的手柄的多个预测位姿,包括旋转矩阵和平移向量。
其中,p3p算法可以输出四个结果,因此,一组相邻光斑匹配对对应四个预测位姿。
S9027:针对任意一个预测位姿,根据预测位姿将各发光器投影到指定图像中,获得各投影光斑,并根据各投影光斑,对指定图像中除平面图形包含的各光斑之外的其他光斑与手柄上的各发光器进行匹配,得到各其他光斑匹配对。
在同一时刻上,虚拟显示设备上的多目相机可以同步采集多张手柄图像,其中,指定 图像为当前时刻获取的各目标手柄图像中的至少一个图像,该指定图像可为一个,也可为多个,指定图像的数量以及具体使用哪一张图像可根据实际情况来进行设置。
虚拟显示设备在出厂时,多目相机的内参数已经标定好了,或者,在定位前,可以采用棋盘格标定法进行预先标定,再结合预测位姿,可以将3D空间中的各发光器投影到2D指定图像中,获得各投影光斑。由于已经确定了平面图形包含的各光斑匹配的发光器,因此,只需要在确定指定图像中除平面图形包含的各光斑之外的其他光斑匹配的发光器即可。
如图19所示,为确定其他光斑匹配对的流程示意图,包括以下步骤:
S9027_1:针对指定图像中任意一个其他光斑,根据其他光斑的2D坐标和各投影光斑的2D坐标,得到其他光斑分别与各投影光斑之间的距离。
其中,其他光斑与投影光斑之间的距离可通过公式6中的距离公式来确定,本实施例在此不再进行赘述。
S9027_2:确定各距离中的最短距离是否小于指定距离,若是,则执行S9027_3,若否,则结束。
S9027_3:将其他光斑的图像光斑索引以及与最短距离对应的投影光斑对应的发光器的第一标识添加到同一光斑匹配对,并将光斑匹配对确定为其他光斑匹配对。
其中,每个其它光斑匹配对包含其他光斑的图像光斑索引和与其它光斑匹配的投影光斑对应的发光器的第一标识。
例如,指定图像中包括其他光斑C和其他光斑D,若其他光斑C与第一投影光斑之间的距离为m,与第二投影光斑之间的距离为n,第一投影光斑为发光器1的投影光斑,第二投影光斑为发光器2的投影光斑。若m>n,则确定n是最短距离,若n小于指定距离,则确定一个其他光斑匹配对为(C,2)。若其他光斑D与第一投影光斑之间的距离为p,与第二投影光斑之间的距离为q,若p<q,则确定p是最短距离,若p大于指定距离,则确定其他光斑D不存在对应的发光器。
需要说明的是:本实施例中的指定距离可根据实际情况来进行设置,本实施例在此并不对指定距离进行限定。
S9028:根据各其他光斑匹配对的数量对各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,并将目标光斑匹配对中发光器的第一标识确定为图像光斑索引对应的目标光斑的第二标识。
其中,光斑匹配对包括相邻光斑匹配对和其他光斑匹配对,每个匹配对表征3D发光器与2D光斑间的对应关系。
在本公开的某一些实施例中,针对任意一个手柄的预测位姿,若该预测位姿对应的其他光斑匹配对的数量小于第二指定数量,则删除预测位姿以及与预测位姿相对应的其他光斑匹配对。
在本公开的某一些实施例中,针对任意一个相邻光斑匹配对,若该相邻光斑匹配对应的多个预测位姿均已被删除,则删除该相邻光斑匹配对。
例如,每个相邻光斑匹配存在对应的4个预测位姿,若任意一个相邻光斑匹配对对应的4个预测位姿均已经被删除,则将该相邻光斑匹配对进行删除。
需要说明的是:本实施例中的第二指定数量可根据实际情况来进行设置,本实施例在此并不对第二指定数量的具体值进行限定。
对各光斑匹配对进行筛选后,统计剔除后的各光斑匹配对的数量,针对存在同一图像光斑索引的各光斑匹配对,将各光斑匹配对中数量最多的光斑匹配对确定为与图像光斑索引相对应的目标光斑匹配对,并将目标光斑匹配对中发光器的第一标识确定为图像光斑索引对应的目标光斑的第二标识。
例如:剔除后的各光斑匹配对分别为:(A,1)、(A,2)、(A,2)、(A,2)、(A,1)、(B,3)、(B,1)、(B,3)、(B,3)、(B,1),从剔除后的各光斑匹配对中可以得到光斑匹配对(A,1)的数量为2,光斑匹配对(A,2)的数量为3,光斑匹配对(B,1)的数量为2,光斑匹配对(B,3)的数量为3,则确定图像光斑索引为A的目标光斑匹配对为(A,2),此时,图像光斑索引为A的目标光斑的第二标识为2,确定图像光斑索引为B的目标光斑匹配对为(B,3),此时,图像光斑索引为B的光斑的第二标识为3。
在本公开实施例的3D发光器与2D光斑的匹配过程中,通过将相邻的光斑连接为平面图形,然后以平面图形为单位,将各光斑与实际相邻的发光器集合进行快速匹配和位姿预测,获得各光斑匹配对,有效减少了匹配过程中的组合数量,且通过对各光斑匹配对进行筛选,提高了匹配精度,从而提高了定位效率和准确性。
S9029:根据各目标光斑匹配对中发光器的3D坐标与目标光斑的2D坐标,结合IMU采集的观测数据,初始化手柄与虚拟显示设备间相对位姿。
获得各目标光斑匹配对后,便得到了3D发光器与2D光斑的对应关系,从而可以利用各目标光斑匹配对中发光器的3D坐标和目标光斑的2D坐标,采用PNP算法,对齐手柄与虚拟显示设备间的坐标系,获得基于视觉计算的手柄与虚拟显示设备间6Dof位姿,并对对齐后手柄上IMU采集的观测数据进行预积分,以利用惯导定位结果优化手柄与虚拟显示设备间的相对6DOF位姿,完成手柄与虚拟显示设备间相对位姿的初始化过程。
S903:针对相机采集的非首帧目标手柄图像,根据历史目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,预测手柄与虚拟显示设备间的当前相对位姿,结合IMU连续采集的观测数据,确定当前手柄与虚拟显示设备间的目标相对位姿。
实时估计手柄与虚拟显示设备间相对位姿的过程中,当已经初始化手柄与虚拟显示设备间的相对位姿时,针对相机采集的非首帧目标手柄图像,根据初始化结果,预测当前手柄与虚拟显示设备间的相对位姿。
在本公开的某一些实施例中,根据首帧目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,预测第二帧目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,再根据首帧目标手柄图像和第二帧目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,预测第三帧目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,依此类推。
本公开实施例中,位姿估计过程中,通过根据历史目标手柄图像对应的手柄与虚拟显示设备间的相对位姿进行预测,保证了连续多帧目标手柄图像间相对位姿的平滑性,这样,在实际交互过程中,使用手柄控制虚拟显示设备显示的画面时,保证了虚拟显示画面的流畅性,提升了用户的沉浸式体验。
为进一步提高相对位姿的准确性,可以利用IMU连续采集的观测数据对预测的当前相对位姿进行优化,从而实时获得当前手柄与虚拟显示设备间准确的目标相对位姿。
目标相对位姿的确定过程参见图20,主要包括以下几步:
S9031:根据手柄上各发光器在优化后3D空间结构中的3D坐标,以及预测得到的手柄与虚拟显示设备间的当前相对位姿,确定当前各发光器在当前目标手柄图像的局部范围。
位姿估计过程中,通过预测得到了手柄与虚拟显示设备间的当前相对位姿,根据当前相对位姿,可以确定3D空间中手柄上各发光器投影到当前目标手柄图像中的光斑的大概位置,从而减小发光器检测的图像范围,提高检测效率。
S9032:在当前目标手柄图像的局部范围内提取当前各发光器的当前光斑,并根据最近邻匹配,确定各当前光斑对应的发光器。
由于手柄与虚拟显示设备间的当前相对位姿是已知的,可以预测出3D空间结构优化后手柄上各发光器,投影到当前目标手柄图像中的当前光斑的大概位置。因此,位姿估计过程中,针对每个发光器,可采用最近邻匹配方法,将当前目标手柄图像内提取的各当前光斑中与投影光斑最近的一个光斑,作为该发光器匹配的当前光斑。
S9033:根据存在对应关系的当前光斑的2D坐标与3D发光器的3D坐标,以及观测数据和当前目标手柄图像同步时IMU与相机的位姿,建立重投影约束方程。
其中,重投影约束方程如下:
在公式8中,分别表示IMU采集的第j帧观测数据对应的IMU在世界坐标系下的旋转矩阵和平移向量,分别表示IMU采集的第j帧观测数据对应的虚拟显示设备上的相机在世界坐标系下的旋转矩阵和平移向量,分别表示IMU在手柄坐标系下的旋转矩阵和平移向量,表示手柄上第一标识为m的发光器的3D坐标,pm表示第二标识为m的当前光斑在当前目标手柄图像上的2D坐标,pro j(·)表示相机的投影方程。其中,为IMU与相机同步时IMU在世界坐标系下的位姿, 为IMU与相机同步时相机在世界坐标系下的位姿,为IMU与相机同步时IMU与手柄间的相对位姿。
S9034:根据连续两帧观测数据对应的IMU的位姿和手柄的运动速度,建立预积分约束方程。
其中,预积分约束方程如下:
在公式9中,表示IMU采集的第j+1帧观测数据对应的IMU在世界坐标系下的平移向量,分别表示第j帧和第j+1帧观测数据对应的IMU在世界坐标系下的运动速度,可通过分别对第j帧和第j+1帧观测数据中加速度进行积分得到,gW表示重力加速度,Δt表示IMU采集的第j帧和第j+1帧观测数据之间的时间间隔,LOG(·)表示四元数组对应的李群(Special Orthometri,SO3)上的对数函数,分别表示IMU的平移向量、运动速度和旋转矩阵的预积分变量。
S9035:联合预积分约束方程和重投影约束方程,求解出当前目标手柄图像对应的IMU的位姿、相机的位姿、以及IMU与手柄的相对位姿。
其中,预积分约束方程和重投影约束方程联合后的公式表示如下:
在公式10中,j表示IMU采集的观测数据的帧数,fj表示预积分约束方程,gj表示重投影约束方程。
通过求解公式10,可以得到当前目标手柄图像对应的IMU在世界坐标系下的位姿相机(即虚拟显示设备)在世界坐标系下的位姿以及IMU与手柄的相对位姿
S9036:根据IMU与手柄的相对位姿,以及当前IMU的位姿和相机的位姿,得到当前手柄与虚拟显示设备间的目标相对位姿。
其中,视觉惯导联合优化后手柄在世界坐标系下的位姿的公式表示如下:
在公式8中,表示当前手柄在世界坐标系下的位姿,表示IMU和手柄的相对位姿。
由于均在同一世界坐标系下,可以得到当前手柄与虚拟显示设备间的目标相对位姿,从而通过操作手柄控制虚拟显示设备显示的画面。
需要说明的是,由相机位于虚拟显示设备上,因此,相机的位姿可以表示虚拟显示设 备的位姿。而虚拟显示设备上一般由多个相机,各相机时同步采集的,本公开实施例中,可使用一个相机采集的目标手柄图像进行位姿估计。
根据一些实施例的估计手柄位姿的方法中,利用手柄上的IMU的多个发光器,以及虚拟显示设备上的多目相机,实现视觉惯导联合优化手柄与虚拟显示设备间的相对位姿。在位姿估计前,通过对不同位置、角度采集的多帧初始手柄图像进行发光器的标注,从而根据各发光器的标注结果优化发光器的3D空间结构,提高后续相对位姿计算的准确性。位姿估计过程中,基于优化后的3D空间结构以及相机采集的首帧目标手柄图像,初始化手柄与虚拟显示设备间的相对位姿,初始化完成后,针对相机采集的非首帧目标手柄图像,根据历史目标手柄图像对应的手柄与虚拟显示设备间的相对位姿,预测当前手柄与虚拟显示设备间的相对位姿,再结合IMU的观测数据,实现视觉惯导对相对位姿的联合优化,从而得到平稳、准确的当前手柄与虚拟显示设备间的目标相对位姿。
基于相同的技术构思,本公开实施例提供一种虚拟显示设备,该虚拟显示设备可执行上述检测手柄上发光器的方法,且能达到相同的技术效果。
参见图21,该虚拟显示设备包括处理器2101、存储器2102、显示屏2103、通信接口2104和多目相机2105,所述显示屏2103用于显示画面,所述虚拟显示设备通过所述通信接口2104与手柄通信,所述手柄用于控制所述显示屏2103显示的画面,所述多目相机2105的类型与所述手柄上多个发光器的发光类型相匹配;
所述通信接口2104、所述多目相机2105、所述显示屏2103、所述存储器2102和所述处理器2101通过总线2106连接,所述存储器2102存储有计算机程序,所述处理器2101根据所述计算机程序,执行以下操作:
针对所述多目相机2105采集的首帧目标手柄图像,根据所述目标手柄图像获得所述各发光器的目标光斑集合,并根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿;其中,所述3D空间结构是根据不同位置角度采集的多帧初始手柄图像中各发光器的标注结果优化的;
针对所述多目相机2105采集的非首帧目标手柄图像,根据历史目标手柄图像对应的相对位姿,预测所述手柄与所述虚拟显示设备间的当前相对位姿,结合所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿。
在本公开的某一些实施例中,所述处理器2101通过以下方式优化所述手柄上各发光器的3D空间结构:
根据优化前所述各发光器的3D空间结构,获得每个发光器的3D坐标和第一标识;
根据各发光器的第一标识,对不同位置角度采集的多帧初始手柄图像上预先标注的发光器,获得每个发光器在相应的初始手柄图像上形成的光斑的2D坐标和第二标识;
针对各帧所述初始手柄图像,根据所述第一标识和所述第二标识相同的发光器的3D坐标和光斑的2D坐标,以及相应帧对应的所述IMU的观测数据,确定所述手柄与采集相机间的相对位姿;
构建重投影误差方程,根据所述重投影误差方程同时优化各个相对位姿和各发光器的3D坐标,得到第一次优化后的3D空间结构。
在本公开的某一些实施例中,得到第一优化后的3D空间结构之后,所述处理器2101还执行:
根据优化后3D空间结构对应的所述手柄上各发光器组成的第一3D点云,以及优化前3D空间结构对应的所述手柄上各发光器组成的第二3D点云,确定优化前后所述第一3D点云和所述第二3D点云间的转换位姿;
根据所述转换位姿,重新确定所述手柄上各发光器的3D坐标,得到第二次优化后的3D空间结构。
在本公开的某一些实施例中,所述重投影误差方程为:
其中,Kn表示第n号相机的投影参数,分别表示所述手柄与第0号相机间的旋转矩阵和平移向量,分别表示所述第n号相机与第0号相机间的旋转矩阵和平移向量,表示第一标识为m的发光器在所述手柄上的3D坐标,pm,n表示第二标识为m的光斑的2D坐标。
在本公开的某一些实施例中,所述处理器2101根据所述目标手柄图像获得所述各发光器的目标光斑集合,具体操作为:
获取当前环境亮度,根据所述当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对所述目标手柄图像进行二值化处理,获得二值化手柄图像;
在所述二值化手柄图像的全局范围内进行轮廓检测,得到所述各发光器的候选轮廓集,其中,每个轮廓表征一个光斑;
分别根据先验轮廓形状信息以及轮廓对比信息,剔除所述候选轮廓集中的异常轮廓,得到所述各发光器的目标光斑集合。
在本公开的某一些实施例中,所述处理器2101根据所述当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对所述可见光手柄图像进行二值化处理,获得二值化手柄图像,具体操作为:
剔除对所述可见光手柄图像灰度化处理后的灰度手柄图像中灰度值低于预设灰度阈值的像素点,并根据像素点剔除后的灰度手柄图像的新直方图,分别确定所述至少两个二值化方法各自的二值化阈值;
将所述当前环境亮度与预设亮度阈值进行比较,根据比较结果,分别确定所述至少两个二值化阈值各自对应的权重;
根据各二值化阈值以及相应的权重,加权得到目标二值化阈值;
根据所述目标二值化阈值,对所述灰度手柄图像进行二值化处理,获得二值化手柄图像。
在本公开的某一些实施例中,所述处理器2101根据比较结果,分别确定所述至少两个二值化阈值各自对应的权重,具体操作为:
当所述当前环境亮度大于所述预设亮度阈值时,设置第一二值化方法计算的第一二值化阈值对应的第一权重,大于第二二值化方法计算的第二二值化阈值对应的第二权重;
当所述当前环境亮度小于等于所述预设亮度阈值时,设置第一二值化方法计算的第一二值化阈值对应的第一权重,小于第二二值化方法计算的第二二值化阈值对应的第二权重;
其中,所述第一二值化方法用于求解包含单峰的直方图分布,所述第二二值化方法用于求解包含双峰的直方图分布。
在本公开的某一些实施例中,所述处理器2101在所述二值化手柄图像的全局范围内进行光斑检测,获得各发光器的目标光斑集合,具体操作为:
对所述二值化手柄图像进行轮廓检测,得到所述各发光器的候选轮廓集,其中,每个轮廓表征一个光斑;
分别根据先验轮廓形状信息以及轮廓对比信息,剔除所述候选轮廓集中的异常轮廓,得到所述各发光器的目标光斑集合。
在本公开的某一些实施例中,所述处理器2101根据所述轮廓对比信息剔除所述候选轮廓集中异常轮廓的方式包含以下一种或多种:
针对所述候选轮廓集中的每两个候选轮廓,分别确定两个候选轮廓的外接矩形中心点之间的欧式距离,以及两个候选轮廓的边缘的最小曼哈顿距离,并根据所述欧式距离和所述最小曼哈顿距离,剔除异常轮廓;
根据候选轮廓的面积对所述候选轮廓集中的全部候选轮廓进行排顺序,并根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓;
针对所述候选轮廓集中的每个候选轮廓,计算所述候选轮廓与最近邻候选轮廓间的距离,并根据所述距离,剔除离群的异常轮廓;
计算所述候选轮廓集中每个候选轮廓的亮度均值,并根据各亮度均值,剔除异常轮廓。
在本公开的某一些实施例中,所述处理器2101根据所述欧式距离和所述最小曼哈顿距离,剔除异常轮廓,具体操作为:
当所述欧式距离和所述最小曼哈顿距离中的至少一个小于预设距离阈值时,则分别计算两个候选轮廓的面积;
若两个候选轮廓的面积均小于预设面积阈值,则同时剔除两个候选轮廓;
若两个候选轮廓的面积中至少一个不小于所述预设面积阈值,则分别计算两个候选轮廓的亮度均值,剔除小亮度均值对应的一个候选轮廓。
在本公开的某一些实施例中,所述处理器2101根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓,具体操作为:
若所述面积最大候选轮廓和面积次大候选轮廓内像素点数量均超过预设像素点数量阈值,则计算所述面积最大候选轮廓与所述面积次大候选轮廓内像素点数量间的倍数;
若所述倍数大于预设倍数阈值,则剔除所述面积最大候选轮廓。
在本公开的某一些实施例中,所述处理器2101根据所述距离,剔除离群的异常轮廓,具体操作为:
根据所述候选轮廓的边长以及全部候选轮廓的边长中位数,确定自适应离群距离;
若所述全部候选轮廓的数量大于预设数量阈值,且所述距离大于所述自适应离群距离,则剔除所述候选轮廓。
在本公开的某一些实施例中,所述处理器2101根据所述先验轮廓形状信息剔除所述候选轮廓集中异常轮廓的方式包含以下一种或多种:
根据所述候选轮廓的面积与所述候选轮廓的外接矩形的长宽比例关系,剔除所述长宽比例超出第一预设比例阈值的候选轮廓;
剔除所述候选轮廓与所述候选轮廓的外接矩形的面积占比小于预设占比阈值的候选轮廓;
计算所述候选轮廓的灰度质心点与所述候选轮廓的外接矩形的中心点,分别在横轴与纵轴上的距离,并分别计算每个距离占所述候选轮廓的边长的比例,若两个比例中的至少一个超过第二预设比例阈值,则剔除所述候选轮廓;
根据所述候选轮廓包含的像素点总数以及所述候选轮廓的边长,确定所述候选轮廓的圆度,若所述圆度低于预设圆度阈值,则剔除所述候选轮廓;
计算所述候选轮廓的亮度均值,若所述亮度均值小于预设亮度阈值,则剔除所述候选轮廓;
确定所述候选区域的外接矩形的预设外围区域的亮度均值,以及所述候选轮廓的亮度均值,若两个亮度均值之间的亮度差异小于预设差值,则剔除所述候选轮廓。
在本公开的某一些实施例中,所述处理器2101根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿,具体操作为:
将所述优化后的3D空间结构上各发光器与所述目标光斑集合中的目标光斑进行匹配,建立3D发光器与2D光斑间的对应关系;
根据存在对应关系的发光器的3D坐标和光斑的2D坐标,以及所述IMU同步采集的观测数据,初始化所述手柄与所述虚拟显示设备间的相对位姿。
在本公开的某一些实施例中,所述处理器2101将所述优化后的3D空间结构上各发光器与所述目标光斑集合中的目标光斑进行匹配,建立3D发光器与2D光斑间的对应关系,具体操作为:
针对所述目标光斑集合中的任意一个目标光斑,从所述目标光斑集合中筛选出与所述目标光斑相邻的第一指定数量的候选光斑,并将所述目标光斑与所述第一指定数量的候选光斑进行连接,得到平面图形;
根据所述优化后的3D空间结构上实际相邻的发光器集合,将所述平面图形中的各光斑和所述实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对,其中,每个相邻光斑匹配对包含所述光斑的图像光斑索引和与所述光斑相匹配的发光器的第一标识;
针对任意一组相邻光斑匹配对,根据所述相邻光斑匹配对中各光斑的2D坐标和所述各发光器的3D坐标,确定所述相邻光斑匹配对对应的所述手柄的多个预测位姿;
针对任意一个预测位姿,根据所述预测位姿将所述各发光器投影到指定图像中,获得各投影光斑,并根据所述各投影光斑,对所述指定图像中除所述平面图形包含的各光斑之外的其他光斑与所述手柄上的各发光器进行匹配,得到各其他光斑匹配对,其中,每个其它光斑匹配对包含所述其他光斑的图像光斑索引和与所述其它光斑匹配的投影光斑对应的发光器的第一标识;
根据所述各其他光斑匹配对的数量对各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,并将所述目标光斑匹配对中发光器的第一标识确定为所述图像光斑索引对应的目标光斑的第二标识,其中,所述光斑匹配对包括所述相邻光斑匹配对和所述其他光斑匹配对,每个匹配对表征3D发光器与2D光斑间的对应关系。
在本公开的某一些实施例中,所述处理器2101从所述目标光斑集合中筛选出与所述目标光斑相邻的第一指定数量的候选光斑,具体操作为:
根据所述目标光斑的2D坐标以及所述目标光斑集合中其他光斑的2D坐标,得到所述目标光斑与所述其他光斑之间的距离;
按照所述目标光斑与所述其他光斑之间的距离从小到大的顺序,选择前第一指定数量的距离对应的其他光斑作为所述候选光斑。
在本公开的某一些实施例中,所述处理器2101根据所述优化后的3D空间结构上实际相邻的发光器集合,将所述平面图形中的各光斑和所述实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对,具体操作为:
将所述平面图形中的各光斑按照图像光斑索引从小到大的顺序进行排列,得到光斑列 表;
按照指定顺序对所述实际相邻的发光器集合中的各发光器进行遍历,针对当前遍历的发光器,以所述发光器作为初始位置,并将与所述发光器实际相邻的其他发光器按照指定顺序进行排序,得到排序列表;
针对所述排序列表中的任意一个发光器,将所述发光器的第一标识与所述光斑列表中位置与所述发光器在所述排序列表中的位置相同的光斑的图像光斑索引添加到同一相邻光斑匹配对中;
判断所述实际相邻的发光器集合中是否存在未进行遍历的发光器;
若是,则返回按照指定顺序对所述实际相邻的发光器集合中的各发光器进行遍历的步骤,直至所述实际相邻的发光器集合中不存在未遍历的发光器。
在本公开的某一些实施例中,根据所述预测位姿将所述各发光器投影到指定图像中之前,所述处理器2101还执行:
针对任意一组所述相邻光斑匹配对对应的所述手柄的多个预测位姿,分别得到与所述相邻光斑匹配对相对应的手柄的预测重力方向向量;
根据拍摄所述指定图像时所述虚拟显示设备的当前位置,得到所述手柄的实际重力方向向量;
通过与各相邻光斑匹配对相对应的预测重力方向向量和所述实际方向向量,确定需要删除的相邻光斑匹配对,并将所述需要删除的相邻光斑匹配对进行删除。
在本公开的某一些实施例中,所述处理器2101通过与各相邻光斑匹配对相对应的预测重力方向向量和所述实际方向向量,确定需要删除的相邻光斑匹配对,具体操作为:
针对任意一组相邻光斑匹配对,根据与所述相邻光斑匹配对对应的预测重力方向向量与所述实际方向向量,得到重力方向向量夹角;
若所述重力方向向量夹角大于指定夹角,则确定所述相邻光斑匹配对为所述需要删除的相邻光斑匹配对。
在本公开的某一些实施例中,所述处理器2101根据所述各投影光斑,对所述指定图像中除所述平面图形包含的各光斑之外的其他光斑与所述手柄上的各发光器进行匹配,得到各其他光斑匹配对,具体操作为:
针对所述指定图像中任意一个其他光斑,根据所述其他光斑的2D坐标和所述各投影光斑的2D坐标,得到所述其他光斑分别与所述各投影光斑之间的距离;
若所述各距离中的最短距离小于指定距离,则将所述其他光斑的图像光斑索引以及与所述最短距离对应的投影光斑对应的发光器的第一标识添加到同一光斑匹配对,并将所述光斑匹配对确定为所述其他光斑匹配对。
在本公开的某一些实施例中,所述处理器2101根据所述各其他光斑匹配对的数量对 各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,具体操作为:
针对任意一个预测位姿,若所述预测位姿对应的其他光斑匹配对的数量小于第二指定数量,则删除所述预测位姿以及与所述预测位姿相对应的其他光斑匹配对;
针对任意一个相邻光斑匹配对,若与所述相邻光斑匹配对相应的多个预测位姿均已被删除,则删除所述相邻光斑匹配对;
统计剔除后剩余的各光斑匹配对的数量;
针对存在同一图像光斑索引的各光斑匹配对,将所述各光斑匹配对中数量最多的光斑匹配对确定为与所述图像光斑索引相对应的目标光斑匹配对。
在本公开的某一些实施例中,所述处理器2101根据预测的所述手柄与所述虚拟显示设备间的当前相对位姿,以及所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿,具体操作为;
根据所述手柄上各发光器在优化后3D空间结构中的3D坐标,以及预测得到的所述手柄与所述虚拟显示设备间的当前相对位姿,确定当前各发光器在当前目标手柄图像的局部范围;
在所述当前目标手柄图像的局部范围内提取所述当前各发光器的当前光斑,并根据最近邻匹配,确定各当前光斑对应的发光器;
根据存在对应关系的当前光斑的2D坐标与3D发光器的3D坐标,以及所述观测数据和所述当前目标手柄图像同步时所述IMU与所述相机的位姿,建立重投影约束方程;
根据连续两帧观测数据对应的所述IMU的位姿和所述手柄的运动速度,建立预积分约束方程;
联合所述预积分约束方程和所述重投影约束方程,求解出所述当前目标手柄图像对应的所述IMU的位姿、所述相机的位姿、以及所述IMU与所述手柄的相对位姿;
根据所述IMU与所述手柄的相对位姿、所述IMU的位姿和所述相机的位姿,得到当前所述手柄与所述虚拟显示设备间的目标相对位姿。
在本公开的某一些实施例中,所述预积分约束方程为:
所述重投影约束方程为:
其中,分别表示所述IMU采集的第j帧观测数据对应的所述IMU在世界坐标系下的旋转矩阵和平移向量,表示所述IMU采集的第j+1帧观测数据对应的所述IMU在所述世界坐标系下的平移向量,分别表示第j帧和第j+1帧观测数据对应的所述IMU在所述世界坐标系下的运动速度,gW表示重力加速度,Δt表示所述IMU采集的第j帧和第j+1帧观测数据之间的时间间隔,LOG(·)表示四元数组对应的李群SO3上的对数函数,分别表示所述IMU的所述平移向量、所述运动速度和所述旋转矩阵的预积分变量,分别表示所述IMU采集的第j帧观测数据对应的所述虚拟显示设备上的相机在世界坐标系下的旋转矩阵和平移向量,分别表示所述IMU在手柄坐标系下的旋转矩阵和平移向量,表示所述手柄上第一标识为m的发光器的3D坐标,pm表示所述手柄上第二标识为m的当前光斑的2D坐标,pro j(·)表示相机的投影方程。
在本公开的某一些实施例中,联合所述预积分约束方程和所述重投影约束方程的结果为:
其中,分别表示所述IMU采集的第j帧观测数据对应的所述IMU在世界坐标系下的旋转矩阵和平移向量,j表示所述IMU采集的观测数据的帧数,fj表示所述预积分约束方程,gj表示所述重投影约束方程。
需要说明的是,图21仅是一种示例,给出虚拟显示设备实现本公开提供的估计手柄位姿的方法步骤所必要的硬件。未示出的,该虚拟显示设备还包括扬声器、听筒、镜片、电源接口等常规硬件。
本公开实施例图21中涉及的处理器可以是中央处理器(Central Processing Unit,CPU),通用处理器,图形处理器(Graphics Processing Unit,GPU)数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。
本公开实施例还提供一种计算机可读存储介质,用于存储一些指令,这些指令被执行时,可以完成前述实施例中估计手柄位姿的方法。
本公开实施例还提供一种计算机程序产品,用于存储计算机程序,该计算机程序用于执行前述实施例中估计手柄位姿的方法。
本领域内的技术人员应明白,本公开的实施例可提供为方法、装置、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机 可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。

Claims (24)

  1. 一种估计手柄位姿的方法,应用于虚拟显示设备,所述虚拟显示设备与手柄进行交互,所述手柄用于控制虚拟显示设备显示的画面,所述手柄上安装有IMU和多个发光器,所述虚拟显示设备安装有与所述发光器类型相匹配的多目相机,所述方法包括:
    针对所述多目相机各自采集的首帧目标手柄图像,根据所述目标手柄图像获得各发光器的目标光斑集合,并根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿;其中,所述3D空间结构是根据不同位置角度采集的多帧初始手柄图像中各发光器的标注结果优化的;
    针对所述多目相机各自采集的非首帧目标手柄图像,根据历史目标手柄图像对应的相对位姿,预测所述手柄与所述虚拟显示设备间的当前相对位姿,结合所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿。
  2. 如权利要求1所述的方法,通过以下方式优化所述手柄上各发光器的3D空间结构:
    根据优化前所述各发光器的3D空间结构,获得每个发光器的3D坐标和第一标识;
    根据各发光器的第一标识,对不同位置角度采集的多帧初始手柄图像上预先标注的发光器,获得每个发光器在相应的初始手柄图像上形成的光斑的2D坐标和第二标识;
    针对各帧所述初始手柄图像,根据所述第一标识和所述第二标识相同的发光器的3D坐标和光斑的2D坐标,以及相应帧对应的所述IMU的观测数据,确定所述手柄与采集相机间的相对位姿;
    构建重投影误差方程,根据所述重投影误差方程同时优化各个相对位姿和各发光器的3D坐标,得到第一次优化后的3D空间结构。
  3. 如权利要求2所述的方法,优化所述手柄上各发光器的3D空间结构的方式还包括:
    得到第一次优化后的3D空间结构之后,根据优化后3D空间结构对应的所述手柄上各发光器组成的第一3D点云,以及优化前3D空间结构对应的所述手柄上各发光器组成的第二3D点云,确定优化前后所述第一3D点云和所述第二3D点云间的转换位姿;
    根据所述转换位姿,重新确定所述手柄上各发光器的3D坐标,得到第二次优化后的3D空间结构。
  4. 如权利要求2或3所述的方法,所述重投影误差方程为:
    其中,Kn表示第n号相机的投影参数,分别表示所述手柄与第0号相机间的旋转矩阵和平移向量,分别表示所述第n号相机与第0号相机间的旋转矩阵和平 移向量,表示第一标识为m的发光器在所述手柄上的3D坐标,pm,n表示第二标识为m的光斑的2D坐标。
  5. 如权利要求1所述的方法,所述根据所述目标手柄图像获得所述各发光器的目标光斑集合,包括:
    获取当前环境亮度,根据所述当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对所述目标手柄图像进行二值化处理,获得二值化手柄图像;
    在所述二值化手柄图像的全局范围内进行轮廓检测,得到所述各发光器的候选轮廓集,其中,每个轮廓表征一个光斑;
    分别根据先验轮廓形状信息以及轮廓对比信息,剔除所述候选轮廓集中的异常轮廓,得到所述各发光器的目标光斑集合。
  6. 如权利要求5所述的方法,所述根据所述当前环境亮度,确定至少两个二值化方法各自的二值化阈值,并根据各二值化阈值对所述目标手柄图像进行二值化处理,获得二值化手柄图像,包括:
    剔除对所述目标手柄图像灰度化处理后的灰度手柄图像中灰度值低于预设灰度阈值的像素点,并根据像素点剔除后的灰度手柄图像的新直方图,分别确定所述至少两个二值化方法各自的二值化阈值;
    将所述当前环境亮度与预设亮度阈值进行比较,根据比较结果,分别确定所述至少两个二值化阈值各自对应的权重;
    根据各二值化阈值以及相应的权重,加权得到目标二值化阈值;
    根据所述目标二值化阈值,对所述灰度手柄图像进行二值化处理,获得二值化手柄图像。
  7. 如权利要求6所述的方法,所述根据比较结果,分别确定所述至少两个二值化阈值各自对应的权重,包括:
    当所述当前环境亮度大于所述预设亮度阈值时,设置第一二值化方法计算的第一二值化阈值对应的第一权重,大于第二二值化方法计算的第二二值化阈值对应的第二权重;
    当所述当前环境亮度小于等于所述预设亮度阈值时,设置第一二值化方法计算的第一二值化阈值对应的第一权重,小于第二二值化方法计算的第二二值化阈值对应的第二权重;
    其中,所述第一二值化方法用于求解包含单峰的直方图分布,所述第二二值化方法用于求解包含双峰的直方图分布。
  8. 如权利要求5所述的方法,根据所述轮廓对比信息剔除所述候选轮廓集中异常轮廓的方式包含以下一种或多种:
    针对所述候选轮廓集中的每两个候选轮廓,分别确定两个候选轮廓的外接矩形中心点之间的欧式距离,以及两个候选轮廓的边缘的最小曼哈顿距离,并根据所述欧式距离和所 述最小曼哈顿距离,剔除异常轮廓;
    根据候选轮廓的面积对所述候选轮廓集中的全部候选轮廓进行排顺序,并根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓;
    针对所述候选轮廓集中的每个候选轮廓,计算所述候选轮廓与最近邻候选轮廓间的距离,并根据所述距离,剔除离群的异常轮廓;
    计算所述候选轮廓集中每个候选轮廓的亮度均值,并根据各亮度均值,剔除异常轮廓。
  9. 如权利要求8所述的方法,所述根据所述欧式距离和所述最小曼哈顿距离,剔除异常轮廓,包括:
    当所述欧式距离和所述最小曼哈顿距离中的至少一个小于预设距离阈值时,则分别计算两个候选轮廓的面积;
    若两个候选轮廓的面积均小于预设面积阈值,则同时剔除两个候选轮廓;
    若两个候选轮廓的面积中至少一个不小于所述预设面积阈值,则分别计算两个候选轮廓的亮度均值,剔除小亮度均值对应的一个候选轮廓。
  10. 如权利要求8所述的方法,所述根据面积最大候选轮廓和面积次大候选轮廓内像素点间的数量关系,剔除异常轮廓,包括:
    若所述面积最大候选轮廓和面积次大候选轮廓内像素点数量均超过预设像素点数量阈值,则计算所述面积最大候选轮廓与所述面积次大候选轮廓内像素点数量间的倍数;
    若所述倍数大于预设倍数阈值,则剔除所述面积最大候选轮廓。
  11. 如权利要求8所述的方法,所述根据所述距离,剔除离群的异常轮廓,包括:
    根据所述候选轮廓的边长以及全部候选轮廓的边长中位数,确定自适应离群距离;
    若所述全部候选轮廓的数量大于预设数量阈值,且所述距离大于所述自适应离群距离,则剔除所述候选轮廓。
  12. 如权利要求5所述的方法,根据所述先验轮廓形状信息剔除所述候选轮廓集中异常轮廓的方式包含以下一种或多种:
    根据所述候选轮廓的面积与所述候选轮廓的外接矩形的长宽比例关系,剔除所述长宽比例超出第一预设比例阈值的候选轮廓;
    剔除所述候选轮廓与所述候选轮廓的外接矩形的面积占比小于预设占比阈值的候选轮廓;
    计算所述候选轮廓的灰度质心点与所述候选轮廓的外接矩形的中心点,分别在横轴与纵轴上的距离,并分别计算每个距离占所述候选轮廓的边长的比例,若两个比例中的至少一个超过第二预设比例阈值,则剔除所述候选轮廓;
    根据所述候选轮廓包含的像素点总数以及所述候选轮廓的边长,确定所述候选轮廓的圆度,若所述圆度低于预设圆度阈值,则剔除所述候选轮廓;
    计算所述候选轮廓的亮度均值,若所述亮度均值小于预设亮度阈值,则剔除所述候选轮廓;
    确定所述候选区域的外接矩形的预设外围区域的亮度均值,以及所述候选轮廓的亮度均值,若两个亮度均值之间的亮度差异小于预设差值,则剔除所述候选轮廓。
  13. 如权利要求1所述的方法,所述根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位姿,包括:
    将所述优化后的3D空间结构上各发光器与所述目标光斑集合中的目标光斑进行匹配,建立3D发光器与2D光斑间的对应关系;
    根据存在对应关系的发光器的3D坐标和光斑的2D坐标,以及所述IMU同步采集的观测数据,初始化所述手柄与所述虚拟显示设备间的相对位姿。
  14. 如权利要求13所述的方法,所述将所述优化后的3D空间结构上各发光器与所述目标光斑集合中的目标光斑进行匹配,建立3D发光器与2D光斑间的对应关系,包括:
    针对所述目标光斑集合中的任意一个目标光斑,从所述目标光斑集合中筛选出与所述目标光斑相邻的第一指定数量的候选光斑,并将所述目标光斑与所述第一指定数量的候选光斑进行连接,得到平面图形;
    根据所述优化后的3D空间结构上实际相邻的发光器集合,将所述平面图形中的各光斑和所述实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对,其中,每个相邻光斑匹配对包含所述光斑的图像光斑索引和与所述光斑相匹配的发光器的第一标识;
    针对任意一组相邻光斑匹配对,根据所述相邻光斑匹配对中各光斑的2D坐标和所述各发光器的3D坐标,确定所述相邻光斑匹配对对应的所述手柄的多个预测位姿;
    针对任意一个预测位姿,根据所述预测位姿将所述各发光器投影到指定图像中,获得各投影光斑,并根据所述各投影光斑,对所述指定图像中除所述平面图形包含的各光斑之外的其他光斑与所述手柄上的各发光器进行匹配,得到各其他光斑匹配对,其中,每个其它光斑匹配对包含所述其他光斑的图像光斑索引和与所述其它光斑匹配的投影光斑对应的发光器的第一标识;
    根据所述各其他光斑匹配对的数量对各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,并将所述目标光斑匹配对中发光器的第一标识确定为所述图像光斑索引对应的目标光斑的第二标识,其中,所述光斑匹配对包括所述相邻光斑匹配对和所述其他光斑匹配对,每个匹配对表征3D发光器与2D光斑间的对应关系。
  15. 如权利要求14所述的方法,所述从所述目标光斑集合中筛选出与所述目标光斑相邻的第一指定数量的候选光斑,包括:
    根据所述目标光斑的2D坐标以及所述目标光斑集合中其他光斑的2D坐标,得到所述目标光斑与所述其他光斑之间的距离;
    按照所述目标光斑与所述其他光斑之间的距离从小到大的顺序,选择前第一指定数量的距离对应的其他光斑作为所述候选光斑。
  16. 如权利要求14所述的方法,所述根据所述优化后的3D空间结构上实际相邻的发光器集合,将所述平面图形中的各光斑和所述实际相邻的发光器集合中的各发光器分别进行匹配,得到各相邻光斑匹配对,包括:
    将所述平面图形中的各光斑按照图像光斑索引从小到大的顺序进行排列,得到光斑列表;
    按照指定顺序对所述实际相邻的发光器集合中的各发光器进行遍历,针对当前遍历的发光器,以所述发光器作为初始位置,并将与所述发光器实际相邻的其他发光器按照指定顺序进行排序,得到排序列表;
    针对所述排序列表中的任意一个发光器,将所述发光器的第一标识与所述光斑列表中位置与所述发光器在所述排序列表中的位置相同的光斑的图像光斑索引添加到同一相邻光斑匹配对中;
    判断所述实际相邻的发光器集合中是否存在未进行遍历的发光器;
    若是,则返回按照指定顺序对所述实际相邻的发光器集合中的各发光器进行遍历的步骤,直至所述实际相邻的发光器集合中不存在未遍历的发光器。
  17. 如权利要求14所述的方法,根据所述预测位姿将所述各发光器投影到指定图像中之前,所述方法还包括:
    针对任意一组所述相邻光斑匹配对对应的所述手柄的多个预测位姿,分别得到与所述相邻光斑匹配对相对应的手柄的预测重力方向向量;
    根据拍摄所述指定图像时所述虚拟显示设备的当前位置,得到所述手柄的实际重力方向向量;
    通过与各相邻光斑匹配对相对应的预测重力方向向量和所述实际重力方向向量,确定需要删除的相邻光斑匹配对,并将所述需要删除的相邻光斑匹配对进行删除。
  18. 根据权利要求17所述的方法,所述通过与各相邻光斑匹配对相对应的预测重力方向向量和所述实际重力方向向量,确定需要删除的相邻光斑匹配对,包括:
    针对任意一组相邻光斑匹配对,根据与所述相邻光斑匹配对对应的预测重力方向向量与所述实际重力方向向量,得到重力方向向量夹角;
    若所述重力方向向量夹角大于指定夹角,则确定所述相邻光斑匹配对为所述需要删除的相邻光斑匹配对。
  19. 如权利要求14所述的方法,所述根据所述各投影光斑,对所述指定图像中除所 述平面图形包含的各光斑之外的其他光斑与所述手柄上的各发光器进行匹配,得到各其他光斑匹配对,包括:
    针对所述指定图像中任意一个其他光斑,根据所述其他光斑的2D坐标和所述各投影光斑的2D坐标,得到所述其他光斑分别与所述各投影光斑之间的距离;
    若所述各距离中的最短距离小于指定距离,则将所述其他光斑的图像光斑索引以及与所述最短距离对应的投影光斑对应的发光器的第一标识添加到同一光斑匹配对,并将所述光斑匹配对确定为所述其他光斑匹配对。
  20. 如权利要求14所述的方法,所述根据所述各其他光斑匹配对的数量对各光斑匹配对进行筛选,并根据筛选后的各光斑匹配对的数量,得到各目标光斑匹配对,包括:
    针对任意一个预测位姿,若所述预测位姿对应的其他光斑匹配对的数量小于第二指定数量,则删除所述预测位姿以及与所述预测位姿相对应的其他光斑匹配对;
    针对任意一个相邻光斑匹配对,若与所述相邻光斑匹配对相应的多个预测位姿均已被删除,则删除所述相邻光斑匹配对;
    统计剔除后剩余的各光斑匹配对的数量;
    针对存在同一图像光斑索引的各光斑匹配对,将所述各光斑匹配对中数量最多的光斑匹配对确定为与所述图像光斑索引相对应的目标光斑匹配对。
  21. 如权利要求1所述的方法,根据预测的所述手柄与所述虚拟显示设备间的当前相对位姿,以及所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿,包括;
    根据所述手柄上各发光器在优化后3D空间结构中的3D坐标,以及预测得到的所述手柄与所述虚拟显示设备间的当前相对位姿,确定当前各发光器在当前目标手柄图像的局部范围;
    在所述当前目标手柄图像的局部范围内提取所述当前各发光器的当前光斑,并根据最近邻匹配,确定各当前光斑对应的发光器;
    根据存在对应关系的当前光斑的2D坐标与3D发光器的3D坐标,以及所述观测数据和所述当前目标手柄图像同步时所述IMU与所述相机的位姿,建立重投影约束方程;
    根据连续两帧观测数据对应的所述IMU的位姿和所述手柄的运动速度,建立预积分约束方程;
    联合所述预积分约束方程和所述重投影约束方程,求解出所述当前目标手柄图像对应的所述IMU的位姿、所述相机的位姿、以及所述IMU与所述手柄的相对位姿;
    根据所述IMU与所述手柄的相对位姿、所述IMU的位姿和所述相机的位姿,得到当前所述手柄与所述虚拟显示设备间的目标相对位姿。
  22. 如权利要求21所述的方法,所述预积分约束方程为:
    所述重投影约束方程为:
    其中,分别表示所述IMU采集的第j帧观测数据对应的所述IMU在世界坐标系下的旋转矩阵和平移向量,表示所述IMU采集的第j+1帧观测数据对应的所述IMU在所述世界坐标系下的平移向量,分别表示第j帧和第j+1帧观测数据对应的所述IMU在所述世界坐标系下的运动速度,gW表示重力加速度,Δt表示所述IMU采集的第j帧和第j+1帧观测数据之间的时间间隔,LOG(·)表示四元数组对应的李群SO3上的对数函数,分别表示所述IMU的所述平移向量、所述运动速度和所述旋转矩阵的预积分变量,分别表示所述IMU采集的第j帧观测数据对应的所述虚拟显示设备上的相机在世界坐标系下的旋转矩阵和平移向量,分别表示所述IMU在手柄坐标系下的旋转矩阵和平移向量,表示所述手柄上第一标识为m的发光器的3D坐标,pm表示所述手柄上第二标识为m当前光斑的2D坐标,pro j(·)表示相机的投影方程。
  23. 如权利要求22所述的方法,联合所述预积分约束方程和所述重投影约束方程的结果为:
    其中,分别表示所述IMU采集的第j帧观测数据对应的所述IMU在世界坐标系下的旋转矩阵和平移向量,j表示所述IMU采集的观测数据的帧数,fj表示所述预积分约束方程,gj表示所述重投影约束方程。
  24. 一种虚拟显示设备,包括处理器、存储器、显示屏、通信接口和多目相机,所述显示屏用于显示画面,虚拟显示设备通过所述通信接口与手柄通信,所述手柄用于控制所述显示屏显示的画面,所述多目相机的类型与所述手柄上多个发光器的发光类型相匹配;
    所述通信接口、所述多目相机、所述显示屏、所述存储器和所述处理器通过总线连接,所述存储器存储有计算机程序,所述处理器根据所述计算机程序,执行以下操作:
    针对所述多目相机各自采集的首帧目标手柄图像,根据所述目标手柄图像获得所述各发光器的目标光斑集合,并根据所述目标光斑集合、所述IMU同步采集的观测数据和所述手柄上各发光器优化后的3D空间结构,初始化所述手柄与所述虚拟显示设备间的相对位 姿;其中,所述3D空间结构是根据不同位置角度采集的多帧初始手柄图像中各发光器的标注结果优化的;
    针对所述多目相机各自采集的非首帧目标手柄图像,根据历史目标手柄图像对应的所述手柄与所述虚拟显示设备间的相对位姿,预测当前所述手柄与所述虚拟显示设备间的相对位姿,结合所述IMU连续采集的观测数据,确定当前所述手柄与所述虚拟显示设备间的目标相对位姿。
PCT/CN2023/119844 2022-09-21 2023-09-19 一种估计手柄位姿的方法及虚拟显示设备 WO2024061238A1 (zh)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN202211149262.5 2022-09-21
CN202211149262.5A CN116433569A (zh) 2022-09-21 2022-09-21 一种检测手柄上发光器的方法及虚拟显示设备
CN202211183832.2A CN116430986A (zh) 2022-09-27 2022-09-27 一种估计手柄位姿的方法及虚拟显示设备
CN202211183832.2 2022-09-27
CN202211390797.1 2022-11-07
CN202211390797.1A CN116433752A (zh) 2022-11-07 2022-11-07 检测手柄图像中光斑标识的方法及电子设备

Publications (1)

Publication Number Publication Date
WO2024061238A1 true WO2024061238A1 (zh) 2024-03-28

Family

ID=90453850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/119844 WO2024061238A1 (zh) 2022-09-21 2023-09-19 一种估计手柄位姿的方法及虚拟显示设备

Country Status (1)

Country Link
WO (1) WO2024061238A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528082A (zh) * 2016-01-08 2016-04-27 北京暴风魔镜科技有限公司 三维空间及手势识别追踪交互方法、装置和系统
CN108734736A (zh) * 2018-05-22 2018-11-02 腾讯科技(深圳)有限公司 相机姿态追踪方法、装置、设备及存储介质
CN111882607A (zh) * 2020-07-14 2020-11-03 中国人民解放军军事科学院国防科技创新研究院 一种适用于增强现实应用的视觉惯导融合位姿估计方法
WO2022148224A1 (zh) * 2021-01-07 2022-07-14 华为技术有限公司 手柄校正方法、电子设备、芯片及可读存储介质
CN116430986A (zh) * 2022-09-27 2023-07-14 海信电子科技(深圳)有限公司 一种估计手柄位姿的方法及虚拟显示设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528082A (zh) * 2016-01-08 2016-04-27 北京暴风魔镜科技有限公司 三维空间及手势识别追踪交互方法、装置和系统
CN108734736A (zh) * 2018-05-22 2018-11-02 腾讯科技(深圳)有限公司 相机姿态追踪方法、装置、设备及存储介质
CN111882607A (zh) * 2020-07-14 2020-11-03 中国人民解放军军事科学院国防科技创新研究院 一种适用于增强现实应用的视觉惯导融合位姿估计方法
WO2022148224A1 (zh) * 2021-01-07 2022-07-14 华为技术有限公司 手柄校正方法、电子设备、芯片及可读存储介质
CN116430986A (zh) * 2022-09-27 2023-07-14 海信电子科技(深圳)有限公司 一种估计手柄位姿的方法及虚拟显示设备

Similar Documents

Publication Publication Date Title
US11703951B1 (en) Gesture recognition systems
AU2018292610B2 (en) Method and system for performing simultaneous localization and mapping using convolutional image transformation
US9053571B2 (en) Generating computer models of 3D objects
Dame et al. Dense reconstruction using 3D object shape priors
US9821226B2 (en) Human tracking system
JP5887775B2 (ja) ヒューマンコンピュータインタラクションシステム、手と手指示点位置決め方法、及び手指のジェスチャ決定方法
EP2670496B1 (en) Using a three-dimensional environment model in gameplay
KR20200005999A (ko) 듀얼 이벤트 카메라를 이용한 slam 방법 및 slam 시스템
CN108427871A (zh) 3d人脸快速身份认证方法与装置
EP1969559A1 (en) Contour finding in segmentation of video sequences
JP2023501574A (ja) 仮想および拡張現実のためのシステムおよび方法
US20180075611A1 (en) Model-based three-dimensional head pose estimation
US10803604B1 (en) Layered motion representation and extraction in monocular still camera videos
US11776213B2 (en) Pose generation apparatus, generation method, and storage medium
Xu et al. Integrated approach of skin-color detection and depth information for hand and face localization
Liang et al. A manufacturing-oriented intelligent vision system based on deep neural network for object recognition and 6d pose estimation
Karbasi et al. Real-time hands detection in depth image by using distance with Kinect camera
Patil et al. A survey on joint object detection and pose estimation using monocular vision
CN107274477B (zh) 一种基于三维空间表层的背景建模方法
WO2024061238A1 (zh) 一种估计手柄位姿的方法及虚拟显示设备
CN110009683B (zh) 基于MaskRCNN的实时平面上物体检测方法
CN116430986A (zh) 一种估计手柄位姿的方法及虚拟显示设备
Ghosh et al. Real-time 3d markerless multiple hand detection and tracking for human computer interaction applications
WO2023224304A1 (en) Method and electronic device for achieving accurate point cloud segmentation
CN112633372B (zh) 一种ar设备的光源估计方法和装置