CN111476907A

CN111476907A - Positioning and three-dimensional scene reconstruction device and method based on virtual reality technology

Info

Publication number: CN111476907A
Application number: CN202010292190.4A
Authority: CN
Inventors: 舒玉龙; 郑光璞; 宋田; 吴涛
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2020-04-14
Filing date: 2020-04-14
Publication date: 2020-07-31

Abstract

The specification discloses a positioning and three-dimensional scene reconstruction device and method based on a VR technology, an electronic device and a computer readable storage medium. The device sets up on the helmet, includes: the system comprises a first image sensor, a second image sensor and a display device, wherein the first image sensor is used for acquiring first image data of a frame image corresponding to a real scene, and the first image data comprises depth data and gray data; the field programmable gate array is communicated with the first image sensor and is used for processing the depth data and the gray data and transmitting the depth data and the gray data to the microcircuit module; and the microcircuit module is used for generating a point cloud according to the depth data, generating 6DOF data according to the depth data and the gray data, and executing positioning and three-dimensional scene reconstruction which correspond to the real scene and are based on a virtual reality technology according to the point cloud and the 6DOF data.

Description

Positioning and three-dimensional scene reconstruction device and method based on virtual reality technology

Technical Field

The invention relates to the technical field of computer vision, in particular to a positioning and three-dimensional scene reconstruction device and method based on a virtual reality technology, an electronic device and a computer readable storage medium.

Background

In the experience process of wearing a VR (Virtual Reality) helmet by a user, if the user walks in a real scene and wants to obtain displacement with consistent experience in a VR Virtual scene, the user needs to navigate and position the helmet, a more popular positioning mode is S L AM (Simultaneous L localization and Mapping) technical positioning, if the user wants to further interact with the external environment, for example, the user can sense obstacles for safety prompt, or can render a three-dimensional model of a person in front into a Virtual space in real time, the user needs a visual sensor for three-dimensional information sensing, and the three-dimensional model is modeled through an algorithm.

An existing S L AM positioning scheme is to capture and track feature points by using an RGB sensor, calculate three-dimensional coordinates of the feature points by using parallax, obtain 6DOF (degree of freedom) of a current VR helmet by using a BA (Bundle Adjustment) algorithm, and perform error suppression by using a local and global optimization algorithm, and simultaneously generate point clouds by using MapPoint (map point) of S L AM, and perform three-dimensional modeling of an external environment by using the 6DOF and the point clouds generated by S L AM.

However, in the existing S L AM positioning scheme, the generation of the point cloud by using the MapPoint of the S L AM is performed on a Central Processing Unit (CPU) of a computing device, for example, a high-pass 8-series processor, because the computation complexity of the point cloud generation processing by using the MapPoint of the S L AM is large, the computation burden of the CPU is increased, the processing cost of the CPU is high, the power consumption is large, and the point cloud constructed by the MapPoint is sparse, so that the modeling and development accuracy is poor, and the user experience is poor.

Disclosure of Invention

The invention aims to provide a positioning and three-dimensional scene reconstruction device and method based on a virtual reality technology, an electronic device and a computer readable storage medium, so that the CPU cost and the power consumption of a processor running an S L AM algorithm are reduced, and the use experience of a user is improved.

According to a first aspect of the present invention, there is provided a positioning and three-dimensional scene reconstruction apparatus based on virtual reality technology, disposed on a helmet, the apparatus including:

the system comprises a first image sensor, a second image sensor and a display device, wherein the first image sensor is used for acquiring first image data of a frame image corresponding to a real scene, and the first image data comprises depth data and gray data;

the field programmable gate array is communicated with the first image sensor and is used for processing the depth data and the gray data and transmitting the depth data and the gray data to the microcircuit module;

and the microcircuit module is used for generating a point cloud according to the depth data, generating 6DOF data according to the depth data and the gray data, and executing positioning and three-dimensional scene reconstruction which correspond to the real scene and are based on a virtual reality technology according to the point cloud and the 6DOF data.

Optionally, the apparatus further comprises:

the pose measurement sensor is used for acquiring pose data corresponding to a user wearing the helmet when the real scene moves, and the pose data comprises an angular rate and an acceleration;

the field programmable logic gate array is also communicated with the pose measuring sensor and comprises a soft core central processing unit which is used for resolving the angular rate and the acceleration to obtain pose data of quaternions and transmitting the pose data of the quaternions to the microcircuit module.

Optionally, the microcircuit module is further configured to: and before the positioning and the three-dimensional scene reconstruction are executed according to the point cloud and the 6DOF data, carrying out fusion processing on the 6DOF data by utilizing the pose data of the quaternion.

Optionally, the field programmable gate array communicates with the first image sensor through a mobile industry processor interface, and transmits the first image data to the microprocessor through the mobile industry processor interface.

Optionally, the field programmable gate array communicates with the pose measurement sensor through a serial peripheral interface, and transmits the pose data of the quaternion to the microcircuit module through the serial peripheral interface.

Optionally, the apparatus further comprises a second image sensor for acquiring second image data of a frame image corresponding to the real scene, the second image data including RGB color data,

wherein the field programmable gate array is further in communication with the second image sensor and transmits the RGB color data to the microcircuit module.

Optionally, the field programmable gate array is further configured to: the received first image data and pose data are synchronously processed and then transmitted to the microcircuit module,

wherein the synchronization process comprises:

when first image data and pose data are received currently, subtracting a preset exposure period from the current time to be used as a timestamp of the first image data, and using the current time as a timestamp of the pose data.

Optionally, the field programmable gate array further includes a logic unit, and the logic unit is configured to perform parallel filtering processing on the depth data of the first image data;

and the field programmable gate array transmits the first image data after filtering processing to the microcircuit module.

Optionally, the virtual reality technology-based positioning step is correspondingly split into four steps, the microcircuit module includes a hardmac central processing unit, the hardmac central processing unit is configured with four threads to respectively process the four steps split by the virtual reality technology-based positioning step,

wherein the four steps comprise:

a first processing step, configured to detect and track feature points of a frame image corresponding to the scene according to the grayscale data, and generate the 6DOF data according to the depth data and the grayscale data;

a second processing step, which is used for optimizing a local map formed by the feature points and key points in the map points;

a third processing step, which is used for optimizing a global map formed by map points corresponding to all the feature points;

and a fourth processing step, configured to adjust the position of the key point when a path loop occurs, and optimize the local map after the key point position is adjusted.

Optionally, the microcircuit module is further configured to: the priorities of the four threads are assigned in order from high to low according to the real-time requirements of the four steps,

wherein, the real-time requirement sequence of the four steps is as follows: the first processing step > the second processing step > the third processing step > the fourth processing step.

Optionally, the microcircuit module further includes a parallel processing unit, configured to perform parallel acceleration processing on pose vector operations corresponding to the map points in a single instruction-multiple data manner.

Optionally, the performing, by the microcircuit module, the three-dimensional scene reconstruction according to the point cloud and the 6DOF data includes:

generating a point cloud according to the depth data;

acquiring the 6DOF data generated by the thread processing the first processing step;

generating a world series dense point cloud from the point cloud and the 6DOF data;

and performing the three-dimensional scene reconstruction on the reality scene by using the dense point cloud.

Optionally, the microcircuit module further includes a graphic processor, and the graphic processor is configured to establish a homography matrix corresponding to the frame image through external and internal references corresponding to the first and second graphic sensors, so as to draw a color for the point cloud through the RGB color data.

According to a second aspect of the present invention, there is provided a positioning and three-dimensional scene reconstruction method based on a virtual reality technology, including:

acquiring first image data of a frame image corresponding to a real scene acquired by a first image sensor, wherein the first image data comprises depth data and gray data;

generating a point cloud according to the depth data;

generating 6DOF data from the depth image and the grayscale image;

and performing virtual reality technology-based positioning and three-dimensional scene reconstruction corresponding to the real scene according to the point cloud and the 6DOF data.

Optionally, the method further includes:

acquiring pose data, which are acquired by a pose measurement sensor and correspond to a user wearing the helmet when the real scene moves, wherein the pose data comprise an angular rate and an acceleration;

wherein, prior to acquiring the pose data, the method further comprises:

and transmitting the pose data to a soft core central processing unit of a field programmable logic gate array through the pose measurement sensor so as to solve the angular velocity and the acceleration to obtain the pose data of quaternion.

Optionally, the method further includes:

and before the positioning and the three-dimensional scene reconstruction are executed according to the point cloud and the 6DOF data, carrying out fusion processing on the 6DOF data by utilizing the pose data of the quaternion.

Optionally, before acquiring the first image data and the pose data, the method further includes:

correspondingly transmitting the first image data and the pose data to the field programmable gate array through the first image sensor and the pose measurement sensor respectively so as to synchronously process the first image data and the pose data through the field programmable gate array,

wherein the synchronization process comprises:

when the field programmable gate array receives first image data and pose data currently, subtracting a preset exposure period from the current time to be used as a timestamp of the first image data, and using the current time as a timestamp of the pose data.

Optionally, before acquiring the first image data, the method further includes:

transmitting the first image data to the field programmable gate array through the first image sensor to perform parallel filtering processing on depth data of the first image data through the field programmable gate array.

Optionally, the method further includes:

acquiring second image data of a frame image corresponding to a real scene acquired by a second image sensor, wherein the second image data comprises RGB color data;

rendering color to the point cloud using the RGB color data.

Optionally, the method further includes:

correspondingly splitting a positioning step based on a virtual reality technology into four steps;

four threads are configured to handle the four steps split from the virtual reality technology based positioning step,

wherein the four steps comprise:

Optionally, the method further includes:

the priorities of the four threads are assigned in order from high to low according to the real-time requirements of the four steps,

Optionally, the method further includes:

and performing parallel acceleration processing on the pose vector operation corresponding to the map points in a single instruction-multiple data mode.

Optionally, the performing the three-dimensional scene reconstruction according to the point cloud and the 6DOF data includes:

generating a point cloud according to the depth data;

According to a third aspect of the present invention, there is provided an electronic apparatus comprising:

a memory for storing executable commands;

a processor for performing the method according to any of the second aspects of the invention under control of the executable command.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any one of the second aspects of the present invention.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a block diagram of a positioning and three-dimensional scene reconstruction apparatus based on VR technology according to an embodiment of the present invention.

Fig. 2 is a hardware block diagram of an exemplary positioning and three-dimensional scene reconstruction apparatus based on VR technology according to an embodiment of the present invention.

Fig. 3 is a flowchart illustrating steps of a positioning and three-dimensional scene reconstruction method based on a VR technology according to an embodiment of the present invention.

Fig. 4 is an exemplary diagram of a positioning and three-dimensional scene reconstruction method based on a VR technology according to an embodiment of the present invention.

Fig. 5 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

< apparatus embodiment >

According to an embodiment of the present invention, a positioning and three-dimensional scene reconstructing apparatus based on VR technology is provided, where fig. 1 shows a block diagram of a positioning and three-dimensional scene reconstructing apparatus based on VR technology according to an embodiment of the present invention.

The device sets up on the VR helmet, and the user wears the VR helmet and experiences the in-process, can realize navigating orientation to the VR helmet through the device. When the user further interacts with the external environment, the device can be used for reconstructing the three-dimensional scene, and the three-dimensional object of the real scene in front of the user is rendered into the virtual space in real time.

As shown in fig. 1, the apparatus of this embodiment includes a first image sensor 1, an FPGA (field programmable gate array) 2, and a microcircuit module 3, the first image sensor 1 may be disposed on a front display panel of the VR headset, and the FPGA2 and the microcircuit module 3 may be disposed inside the VR headset.

The first image sensor 1 is configured to acquire first image data of a frame image corresponding to a real scene, where the first image data includes depth data and grayscale data. The FPGA2 communicates with the first image sensor 2 for transmitting depth data and grey data to the microcircuit module 3. The microcircuit module 3 is used for generating point cloud according to the depth data, generating 6DOF data according to the depth data and the gray data, and executing positioning and three-dimensional scene reconstruction based on virtual reality technology of the corresponding real scene according to the point cloud and the 6DOF data.

In one embodiment, the first image sensor 1 may be a TOF (time of flight) sensor for converting the distance of an object in a captured scene by capturing a corresponding picture of a real scene, calculating a time difference or a phase difference between light emission and reflection to generate depth data, and obtaining corresponding gray data by capturing Infrared (IR) data of the picture.

In one embodiment, the apparatus further includes a pose measurement sensor (not shown in the figure) for acquiring pose data corresponding to the helmet-worn user while the real scene is moving, the pose data including angular velocity and acceleration. The pose measurement sensor is, for example, an Inertial Measurement Unit (IMU), so that a corresponding three-axis pose angle (or angular velocity) and acceleration of the VR helmet when moving can be measured.

The FPGA2 communicates with the pose measurement sensor, and in one embodiment, the FPGA2 includes a soft core central processing unit (soft core CPU) for resolving the angular rate and acceleration transmitted by the pose measurement sensor to obtain pose data of a quaternion (occupying 4 floating point numbers) and transmitting the pose data of the quaternion to the microcircuit module 3.

In the embodiment of the invention, the algorithm for resolving quaternion by using the acc data of the acceleration and the gyro data of the corresponding triaxial attitude angle or angular rate to perform complementary filtering and the Longzetta integral algorithm is transplanted to a soft core CPU in the FPGA2 for realization, and the pose data of the quaternion is resolved in the FPGA2 and then transmitted to the microcircuit module 3 for executing corresponding processing. Because the acc data transmission and the gyro data transmission usually occupy 3 floating point numbers respectively, only the quaternion (4 float) needs to be transmitted when the data are transmitted to the microcircuit module 2 after the data are processed, compared with the acc data transmission and the gyro data transmission, the 2 float is saved, and the bandwidth of 8 bytes is correspondingly saved. Therefore, when the frequency of the attitude and position measuring sensor is more than 200Hz, the transmission power consumption of data can be reduced to a certain extent.

In one embodiment, the apparatus may further include a second image sensor (not shown) for acquiring second image data of a corresponding frame image of the real scene, the second image data including RGB color data, the second image sensor being, for example, an RGB sensor. The FPGA2 communicates with the second image sensor and transmits the RGB colour data to the microcircuit module 3 for subsequent use in rendering colours to the point cloud (as will be explained below) from the RGB colour data.

In one embodiment, the microcircuit module 3 is also adapted to: before positioning and three-dimensional scene reconstruction are performed according to the point cloud and the 6DOF data, fusion processing is performed on the 6DOF data by using pose data of quaternions (data fusion will be explained later).

Because the data fusion of the rear-end microcircuit module 3 has high requirements on the data synchronization of the sensors, the invention utilizes the FPGA to collect the data corresponding to the sensors.

In one embodiment, the FPGA2 is further configured to perform synchronization processing on the received first image data and pose data, and transmit the processed first image data and pose data to the microcircuit module, where the synchronization processing includes: when first image data and pose data are received currently, subtracting a preset exposure period from the current time to be used as a timestamp of the first image data, and using the current time as a timestamp of the pose data.

In one embodiment, the FPGA2 is further configured to perform synchronization processing on the received first image data, second image data, and pose data, and transmit the processed data to the microcircuit module, where the synchronization processing includes: when first image data, second image data and pose data are received currently, subtracting a preset exposure period from the current time to be used as timestamps of the first image data and the second image data, and using the current time as a timestamp of the pose data.

Because the internal logic and other delays of the image sensor chip are large, the image sensor can be delayed into 3/4 exposure periods in the embodiment of the invention, the frame rate of the pose measurement sensor is high, and the delay is ignored. When the FPGA2 receives the first and second image data, 3/4 exposure cycles are subtracted from the current time as the time stamp of the image data. And when the pose data is received, taking the current moment as the timestamp of the pose data. Therefore, the time stamps of all sensors are set by the FPGA2, and the delay inside the FPGA2 is small, so that a very good data synchronization effect is achieved. Therefore, the microcircuit module 3 can realize the real-time performance of data according to the synchronized pose data and the first and second image data transmitted by the FPGA2, and achieve better 6DOF data fusion processing.

In one embodiment, the FPGA2 further comprises a logic unit (not shown in the figure) for performing parallel filtering processing on the depth data acquired by the first image sensor 1. The FPGA2 transmits the first image data after filtering to the microcircuit module 3.

However, because the calculation amount of the image convolution operation is large, in the embodiment of the invention, the convolution IP core is designed according to the simulated convolution weight, and the logic resources of the FPGA2, such as the logic unit (L E), are used for realizing 640 IP instances, so that 640 pixels can be processed at one time, so that the image data preprocessing of the depth data parallel filtering is performed by using the logic unit of the FPGA2, the calculation force can be greatly improved, the delay is reduced, and compared with the filtering performed by using the operation mode of the CPU on the microcircuit module 3, the power consumption of the CPU is reduced.

The microcircuit module 3, upon receiving the depth data, grayscale data transmitted by the FPGA2, can perform VR technology-based positioning and three-dimensional scene reconstruction corresponding to a real scene, hi one embodiment, VR technology-based positioning utilizes an S L AM algorithm, for example, the microcircuit module 3 can be a central processing unit CPU, a microprocessor MCU, or the like, for executing a computer program, which can be written using an instruction set of architectures such as x86, Arm, RISC, MIPS, SSE, or the like, the microcircuit module 3 is, for example, an RK3288 chip.

In order to fully exert the operational capability of the microcircuit module 3 and ensure the performance and the real-time performance of the S L AM algorithm, in one embodiment of the invention, the S L AM algorithm is divided into four steps, the microcircuit module 3 comprises a hard core CPU, the hard core CPU is configured with four threads to respectively process the four steps of the S L AM positioning algorithm, the four steps with higher operational complexity in the S L AM algorithm are respectively configured with one thread, and the operational performance of a processor of the hard core CPU in the microcircuit module 3 can be effectively exerted.

Wherein the four steps comprise:

and a first processing step, which is used for detecting and tracking the feature points of the frame image corresponding to the scene according to the gray data and generating the 6DOF data according to the depth data and the gray data.

The characteristic points are two-dimensional data obtained according to the gray data. The 6DOF data are three-dimensional coordinate points calculated from the depth data for the feature points of the two-dimensional gradation data, and these three-dimensional coordinate points are formed map points. As described above, the 6DOF data can be subjected to the fusion process using the pose data of quaternions.

Since the 6DOF frame rate generated in the first processing step by using the S L AM algorithm is generally low, the embodiment of the present invention may use the UKF (Unscented Kalman Filter) algorithm to fuse the pose data acquired by the pose measurement unit with 6DOF to obtain higher frequency (for example, increasing from 30Hz to 200Hz) and smoother 6DOF data, where the 6DOF data is used as an observed quantity, the pose data is used as a controlled quantity, the 6DOF data of the previous frame is used as a prior estimate, and the UKF filtering algorithm is executed on the microcircuit module 3 to obtain the fused 6 DOF.

In this processing step, feature point detection is performed by the IR image (corresponding to grayscale data) generated by the first image sensor, and the coordinates of the corresponding feature points of the camera coordinate system are directly solved by the camera internal parameters through the corresponding depth values. Because the depth value is obtained by the first image sensor, compared with the traditional binocular positioning method, the binocular feature point matching and time difference depth calculating process is saved, and the complexity of CPU operation in the microcircuit module 3 is reduced.

And a second processing step, which is used for optimizing the local map formed by the key points (KeyPoints) in the feature point corresponding map points (MapPoints). The key points are some characteristic points with high quality selected from the characteristic points, and the three-dimensional coordinate points are obtained by calculating with the depth data.

In this step, a local map formed of the key points and the key frames may be maintained, and local map optimization may be performed by a BA (beam adjustment) algorithm.

And a third processing step, which is used for optimizing the global map formed by map points corresponding to all the feature points. In this step, a global map is maintained, and global map optimization is performed by a BA algorithm.

And a fourth processing step, configured to adjust the position of the keypoint when a path loop (L oop closing) occurs, and optimize the local map after the keypoint position adjustment.

In the step, the current position and the local key point characteristics are compared with a global map, if the path loops back, the key point position is adjusted, and BA algorithm optimization is carried out through the adjusted local map, so that the error influence is reduced.

In one embodiment, the microcircuit module 3 is also adapted to: distributing the priorities of the four threads according to the real-time requirements of the four steps from high to low, wherein the real-time requirements of the four steps are as follows: first processing step > second processing step > third processing step > fourth processing step. That is, the corresponding processing steps are executed according to the priority assignment thread, wherein the feature point tracking of the first step has the highest requirement on real-time performance, the microcircuit module 3 preferentially assigns the thread to process the first step, and other steps can wait for subsequent processing according to the priority when the thread is blocked.

In one embodiment, the microcircuit module 3 may further include a parallel processing unit for performing parallel acceleration processing on the pose vector operation corresponding to the map point in a single instruction-multiple data manner.

In the S L AM algorithm, a large number of pose operations exist, and the pose operations exist in the second, third, and fourth processing steps, for example, the map points need to solve the jacobi and hessian matrixes for the positions, and since the map points are many and the operation scale is large, in the embodiment of the present invention, for example, the Neon (Neon is a 128-bit SIMD (Single Instruction, Multiple Data) extension structure suitable for the ARM Cortex-a series processor) resource in RK3288 is used to perform parallel acceleration processing on the position vector operations, so that an Instruction can perform arithmetic operations on (x, y, z), and the performance improvement on large-scale map point operations is considerable.

In the embodiment of the present invention, the microcircuit module 3 may perform three-dimensional scene reconstruction on a real scene when the VR headset moves, and specifically includes: generating a point cloud according to the depth data of the first image sensor; acquiring 6DOF data generated by a thread processing first processing step; generating a world series dense point cloud from the point cloud and the 6DOF data; and performing three-dimensional scene reconstruction on the reality scene by using the dense point cloud.

Therefore, in the embodiment of the invention, the local point cloud generated by the first image sensor of each frame (namely, the point cloud generated by the current coordinate system when each frame of image is shot) is adopted by the microcircuit module 3, the voxel in the space where the VR experiences is marked with a TSDF (truncated signed Distance Function) value through the 6DOF information generated in the first processing step of the S L AM algorithm, and then the coordinate position where the positive and negative values of the TSDF value are traversed and searched in the voxel to serve as the position of the reconstructed point cloud.

The three-dimensional scene reconstruction algorithm flow is as follows:

and dividing the space where the VR experience exists into i x j x k individual pixels, and taking the central point as the voxel coordinate.

When each frame of depth data is generated, the voxels in the field of view of the first image sensor are projected to the phase plane through the sensor parameters and 6DOF, and each voxel obtains a depth value corresponding to the voxel. And subtracting the corresponding depth value from the Euclidean distance between each voxel coordinate and the coordinate of the camera 6DOF of the first image sensor to obtain the TSDF value of the voxel. And if the TSDF value is larger than 1, setting the absolute value to 1.

And forming a voxel with TSDF values in the ray direction by using the camera optical center of the first image sensor and each pixel point, and if the voxel exists at the positive and negative junction, indicating that the voxel is the surface of an object, namely the position of the reconstructed point cloud.

In one embodiment, the microcircuit module 3 further includes a graphic processor (not shown) for creating a homography matrix of the corresponding frame image by the corresponding external and internal references of the first and

second image sensors

1 and 1, to render colors to the reconstructed point cloud by RGB color data.

Since the number of voxels is very large when the voxel TSDF value is calculated, in order not to affect the real-time performance of the S L AM operation, in the embodiment of the present invention, a graphics processor of the microcircuit module 3, for example, a Mali-T764GPU in RK3288, may be used to implement operations such as voxel projection and traversal, and these operation algorithms are consistent, so that the feasibility of performing kernel construction through OpenC L (fully called Open Computing L angle) is sufficiently ensured.

In the embodiment of the invention, the dense point cloud is quickly generated by adopting the depth data of the first image sensor, the CPU calculation burden can be reduced, and compared with the sparse point cloud constructed by utilizing map points in the existing S L AM algorithm, the method can provide modeling development with higher precision, and improve the user experience.

In addition, the 6DOF and point cloud generation processing of the existing S L AM positioning scheme is executed on a Central Processing Unit (CPU) of a computing device, such as a high-pass 8-system processor.

In contrast to the positioning and three-dimensional scene reconstruction device of the embodiment of the invention, according to the complexity of the S L AM algorithm and the point cloud processing algorithm, a special operation unit of a coprocessor (processing of a soft core CPU and/or a logic unit in an FPGA) + a main processor (processing of a hard core CPU and/or a GPU in a microcircuit module) is designed, so that low delay and low power consumption are realized, and multi-sensor data is efficiently fused.

In the embodiment, key points needed by S L AM positioning are directly generated through depth and IR gray data generated by a first image sensor (TOF sensor), world system dense point cloud is generated through 6DOF data and local point cloud generated by the TOF sensor, dense modeling is realized, model texture is fine, color information is captured through the RGB sensor, color texture is pasted on the point cloud, a synchronous time stamp is marked for data collected by a plurality of sensors through an FPGA, and data preprocessing intensive in calculation is realized.

To sum up, this embodiment guarantees the effect of user experience "low-delay" to the restriction of VR helmet application scene "low-power consumption", "low cost", realizes navigation and three-dimensional reconstruction function in order to promote the added value of VR product simultaneously. In view of the above requirements, the present embodiment proposes a hardware solution based on heterogeneous operations and multiple sensors, and designs an algorithm software architecture based on the solution to fully exert hardware performance. Finally, a software and hardware system with low power consumption and high performance is realized, and the user can experience additional functions such as modeling of a real object in a virtual scene or protection of a safety zone while experiencing 6DOF navigation positioning.

< example >

The hardware and signal connection relationship of the positioning and three-dimensional scene reconstruction apparatus based on VR technology according to the embodiment of the present invention will be described with reference to the example of fig. 2.

Fig. 2 is a hardware block diagram of an exemplary positioning and three-dimensional scene reconstruction apparatus based on VR technology according to an embodiment of the present invention, and the apparatus is disposed on a VR headset.

As shown in FIG. 2, in the embodiment, the positioning and reconstruction device comprises three sensors including a TOF sensor 10, an RGB sensor 12 and an IMU14, the positioning and reconstruction device further comprises an FPGA 20 and a processor RK3288, wherein the FPGA 20 comprises a soft core CPU22 and a logic unit (L E)24, and the processor RK3288 comprises a hard core CPU 32 and a GPU 34.

The FPGA 20 communicates with the TOF sensor 10 and the RGB sensor 12 over a MIPI interface and with the IMU14 over an SPI interface. The TOF sensor 10 collects depth and gray data, the RGB sensor 12 obtains RGB color data, and the IMU14 collects pose data corresponding to the VR helmet during movement. The data acquisition of the sensors is performed by the FPGA 20, and the FPGA 20 synchronizes the acquired image data of the TOF sensor 10 and the RGB sensor 12 with the pose data of the MU 14. Since the time stamps of the three sensors are all set by the FPGA 20 and the delay inside the FPGA 20 is small, a good data synchronization effect can be achieved.

The FPGA 20 executes parallel combined bilateral filtering processing on the acquired depth data of the TOF sensor 10 through L E24, so that the computing power is improved, the processing speed is improved, and the CPU power consumption of RK328830 is reduced compared with a mode of using a hard core CPU 30 of RK328830 for computing.

In addition, before the pose data acquired by the IMU14 is transmitted to the RK328830 for processing, the FPGA 20 first performs quaternion solution on acc data and gyro data of the pose data by the soft core CPU 22. The quaternion solution algorithm is transplanted to the soft core CPU22 in the FPGA 20, so that only quaternion (4 × float) needs to be transmitted when data is transmitted to the hard core CPU 30, and bandwidth is saved compared with transmitting acc and gyro data of 6 floating point numbers. When the IMU frequency is more than 200Hz, the power consumption can be reduced to a certain extent.

The communication mode between the FPGA 20 and the RK328830 is as follows: the FPGA 20 transmits the pose data of the quaternion to RK328830 through an SPI interface, and transmits the image data of the TOF sensor 10 and the RGB sensor 12 to RK328830 through an MIPI interface. The quaternion and image data transferred to the RK328830 are subjected to clock synchronization processing in the FPGA 20 in advance.

And after receiving the quaternion and the image data transmitted by the FPGA 20, the RK328830 at the back end performs corresponding navigation positioning and/or three-dimensional scene reconstruction.

The RK3288 has 4 × Cortex-A17@1.8GHz, so that the four modules with higher operation complexity in the positioning of the S L AM algorithm are respectively configured with one thread, and the operation performance of the processor can be effectively exerted.

After the feature point tracking thread processes and obtains 6DOF data, the thread transmits the 6DOF data to other threads of RK328830, fusion processing is carried out on the 6DOF data according to the received quaternion attitude data, and the 6DOF data with higher frequency and smoother performance is obtained, so that subsequent navigation positioning and three-dimensional scene reconstruction are carried out.

In addition, RK328830 can perform parallel acceleration processing on a large number of pose operations involved in the S L AM algorithm by utilizing the interior Neon resources, and the operation performance of map points is improved.

The GPU 34 of RK328830 draws color from the RGB color data to the point cloud and performs operations such as voxel projection and traversal using, for example, a Mali-T764 GPU.

< method examples >

In an embodiment of the present invention, a positioning and three-dimensional scene reconstructing method based on a VR technology is further provided, please refer to fig. 3, which is a flowchart illustrating steps of the positioning and three-dimensional scene reconstructing method based on the VR technology according to the embodiment of the present invention.

As shown in fig. 3, the positioning and three-dimensional scene reconstruction method according to the embodiment of the present invention includes the following steps:

step 202, acquiring first image data of a frame image corresponding to a real scene acquired by a first image sensor, wherein the first image data comprises depth data and gray data;

step 204, generating a point cloud according to the depth data;

step 206, generating 6DOF data according to the depth image and the gray scale image;

and 208, performing virtual reality technology-based positioning and three-dimensional scene reconstruction corresponding to the real scene according to the point cloud and the 6DOF data.

In one embodiment, the method further comprises: acquiring pose data, which are acquired by a pose measurement sensor and correspond to a user wearing the helmet when the real scene moves, wherein the pose data comprise an angular rate and an acceleration; wherein, prior to acquiring the pose data, the method further comprises: and transmitting the pose data to a soft core central processing unit of a field programmable logic gate array through the pose measurement sensor so as to solve the angular velocity and the acceleration to obtain the pose data of quaternion.

In one embodiment, the method further comprises: and before the positioning and the three-dimensional scene reconstruction are executed according to the point cloud and the 6DOF data, carrying out fusion processing on the 6DOF data by utilizing the pose data of the quaternion.

In one embodiment, prior to acquiring the first image data and the pose data, the method further comprises: correspondingly transmitting the first image data and the pose data to the field programmable gate array through the first image sensor and the pose measurement sensor respectively so as to synchronously process the first image data and the pose data through the field programmable gate array, wherein the synchronous processing comprises the following steps: when the field programmable gate array receives first image data and pose data currently, subtracting a preset exposure period from the current time to be used as a timestamp of the first image data, and using the current time as a timestamp of the pose data.

In one embodiment, prior to acquiring the first image data, the method further comprises: transmitting the first image data to the field programmable gate array through the first image sensor to perform parallel filtering processing on depth data of the first image data through the field programmable gate array.

In one embodiment, the method further comprises: acquiring second image data of a frame image corresponding to a real scene acquired by a second image sensor, wherein the second image data comprises RGB color data; rendering color to the point cloud using the RGB color data.

In one embodiment, the method further comprises: correspondingly splitting a positioning step based on a virtual reality technology into four steps; configuring four threads to respectively process four steps split in the positioning step based on the virtual reality technology, wherein the four steps comprise:

In one embodiment, the method further comprises: allocating the priorities of the four threads according to the real-time requirements of the four steps from high to low, wherein the real-time requirements of the four steps are in the order: the first processing step > the second processing step > the third processing step > the fourth processing step.

In one embodiment, the method further comprises: and performing parallel acceleration processing on the pose vector operation corresponding to the map points in a single instruction-multiple data mode.

In one embodiment, the performing the three-dimensional scene reconstruction from the point cloud and the 6DOF data comprises: generating a point cloud according to the depth data; acquiring the 6DOF data generated by the thread processing the first processing step; generating a world series dense point cloud from the point cloud and the 6DOF data; and performing the three-dimensional scene reconstruction on the reality scene by using the dense point cloud.

< example >

The positioning and three-dimensional scene reconstruction method according to the embodiment of the present invention will be described with reference to fig. 4, where fig. 4 is an exemplary diagram of the positioning and three-dimensional scene reconstruction method based on VR technology according to the embodiment of the present invention.

Image data pre-processing, including parallel filtering of the depth data of the first image sensor as described above, 640 pixels may be processed at once, for example, using logic (L E) of an FPGA to implement 640 IP instances.

Step 404, IMU attitude calculation: including quaternion resolution of the pose data angular rate and acceleration of the pose measurement sensors as described above. For example, the quaternion is solved by processing complementary filtering and a Longzestata integral algorithm by using a soft core CPU of the FPGA.

Step 406, synchronization processing: including the clock synchronization processing of the image data of the first and second image sensors and the pose data of the pose measurement sensor as described above.

Step 408, feature point tracking: as described above, two-dimensional feature points are obtained from the gray data, and 6DOF data is generated from the preprocessed depth data and the two-dimensional feature points;

step 410, data fusion: as described above, the 6DOF data generated in step 408 is subjected to the fusion processing using the quaternion attitude data obtained in step 404, and the fused 6DOF data is obtained.

Step 412, local map optimization: as described above, the local map is maintained according to the key points obtained from the depth data and the gray data, and the local map is optimized using the BA algorithm.

Step 414, global map optimization: as described above, according to the global map obtained by corresponding all the feature points and the depth data, the global map is maintained and optimized by the BA algorithm.

Step 416, adjusting loop key points: as described above, according to the comparison between the current position and the local key point characteristics and the global map, if the path loops back, the key point position is adjusted, and BA optimization is performed through the adjusted local map, so that the error influence is reduced.

Step 418, reconstructing the three-dimensional scene: as described above, the local point cloud is generated by using the first image data corresponding to each frame of image, and the TSDF value is labeled to the voxel in the space where the VR experiences through the 6DOF data generated in step 410, and then the coordinate position where the TSDF value is excessive in positive and negative is searched in the voxel in a traversal manner, so as to serve as the position of the reconstructed point cloud. Further, a homography matrix of the image is established through the external parameters of the first image sensor and the second image sensor and the internal parameters of the first image sensor and the second image sensor, so that the graphic processor can draw colors to the point cloud through RGB color data.

< apparatus embodiment >

In the present embodiment, there is also provided an electronic device including the

microcircuit module

3, 30 described in the apparatus embodiments of the present specification.

In further embodiments, as shown in fig. 5, the electronic device 2000 may include a memory 2200 and a processor 2400. The memory 2200 is used for storing executable commands. The processor 2400 is configured to perform the method described in any of the method embodiments herein under control of executable commands stored in the memory 2200.

The implementation subject of the electronic device 2000 according to the executed method embodiment may be a server or a terminal device, and is not limited herein.

< computer-readable storage Medium embodiment >

Finally, according to yet another embodiment of the present invention, there is further provided a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the virtual reality technology-based positioning and three-dimensional scene reconstruction method according to any of the embodiments of the present invention.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

Computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including AN object oriented programming language such as Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" language or similar programming languages.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A positioning and three-dimensional scene reconstruction device based on virtual reality technology, which is arranged on a helmet, is characterized in that the device comprises:

2. The apparatus of claim 1, wherein the apparatus further comprises:

the field programmable logic gate array is also communicated with the pose measuring sensor and comprises a soft core central processing unit which is used for resolving the angular rate and the acceleration to obtain pose data of quaternions and transmitting the pose data of the quaternions to the microcircuit module;

wherein the microcircuit module is further for: and before the positioning and the three-dimensional scene reconstruction are executed according to the point cloud and the 6DOF data, carrying out fusion processing on the 6DOF data by utilizing the pose data of the quaternion.

3. The apparatus of claim 1, wherein the apparatus further comprises:

a second image sensor for acquiring second image data of a frame image corresponding to a real scene, the second image data including RGB color data,

wherein the field programmable gate array is further in communication with the second image sensor and transmits the RGB color data to the microcircuit module;

the micro circuit module further comprises a graphic processor, and the graphic processor is used for establishing a homography matrix of a corresponding frame image through external parameters and internal parameters corresponding to the first image sensor and the second image sensor so as to draw colors for the point cloud through the RGB color data.

4. The apparatus of claim 2, wherein the field programmable gate array is further to: the received first image data and pose data are synchronously processed and then transmitted to the microcircuit module,

wherein the synchronization process comprises:

5. The apparatus of claim 1, wherein the virtual reality technology based positioning step is split into four steps, the microcircuit module comprises a hardmac central processing unit configured with four threads to process the four steps split from the virtual reality technology based positioning step, respectively,

wherein the four steps comprise:

6. The apparatus of claim 5, wherein the microcircuit module further comprises a parallel processing unit for parallel accelerated processing of pose vector operations corresponding to the map points using single instruction-multiple data (SIMD) approach.

7. A positioning and three-dimensional scene reconstruction method based on a virtual reality technology is characterized by comprising the following steps:

generating a point cloud according to the depth data;

generating 6DOF data from the depth image and the grayscale image;

8. The method of claim 7, wherein the method further comprises:

wherein, prior to acquiring the pose data, the method further comprises:

transmitting the pose data to a soft core central processing unit of a field programmable logic gate array through the pose measurement sensor so as to solve the angular velocity and the acceleration to obtain pose data of quaternion; and

9. The method of claim 8, wherein prior to acquiring the first image data and the pose data, the method further comprises:

wherein the synchronization process comprises:

10. The method of claim 7, wherein the method further comprises:

rendering color to the point cloud using the RGB color data.

11. The method of claim 7, wherein the method further comprises:

wherein the four steps comprise:

12. The method of claim 11, wherein the performing the three-dimensional scene reconstruction from the point cloud and the 6DOF data comprises:

generating a point cloud according to the depth data;

13. An electronic device, comprising:

a memory for storing executable commands;

a processor for performing the method of any of claims 7 to 12 under control of the executable command.

14. A computer-readable storage medium, characterized in that a computer program is stored thereon, which computer program, when being executed by a processor, carries out the method according to any one of claims 7 to 12.