CN112085777A

CN112085777A - Six-degree-of-freedom VR glasses

Info

Publication number: CN112085777A
Application number: CN202011001694.2A
Authority: CN
Inventors: 赵焕
Original assignee: Shanghai Vla Vr Technology Co ltd
Current assignee: Shanghai Vla Vr Technology Co ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2020-12-15

Abstract

The invention provides six-degree-of-freedom VR glasses, which comprise an image acquisition module, a parallax calculation module, a scene depth acquisition module and an image real-time mapping module; the image data of a target scene in a left visual field and a right visual field are collected through the image collection module, the parallax calculation graph is obtained through the parallax calculation module, the scene obstacle data are obtained through processing of the scene depth obtaining module, the scene obstacle data and the image are matched through the image real-time mapping module, the image is mapped in the VR environment in real time to form a video stream, and the real-time mapping of the picture shot by the binocular camera in the VR scene can be achieved without additionally configuring an external positioning system.

Description

Six-degree-of-freedom VR glasses

Technical Field

The invention relates to VR glasses, in particular to VR glasses with six degrees of freedom.

Background

Virtual Reality (VR) technology is an important direction of simulation technology, is a collection of simulation technology and multiple technologies such as computer graphics, human-computer interface technology, multimedia technology, sensing technology, network technology and the like, and is a challenging cross-technology leading-edge subject and research field. The virtual reality technology mainly comprises the aspects of simulating environment, perception, natural skill, sensing equipment and the like. The simulated environment is a three-dimensional realistic image generated by a computer and dynamic in real time. Perception means that an ideal VR should have the perception that everyone has. In addition to the visual perception generated by computer graphics, there are also perceptions such as auditory sensation, tactile sensation, force sensation, and movement, and even olfactory sensation and taste sensation, which are also called "no-much perception". The natural skill refers to the head rotation, eyes, gestures or other human body behavior actions of a human, and data adaptive to the actions of the participants are processed by the computer, respond to the input of the user in real time and are respectively fed back to the five sense organs of the user. The sensing device refers to a three-dimensional interaction device.

Inside-out is an optical tracking system, a light source emitting device is installed on a tracked target, a sensor/mark point for acquiring a light source signal is fixed in a use environment, the principle is based on a triangulation algorithm, light reflected or actively emitted by the target is measured, and the light is converted into space position data of the target through a special vision algorithm of a computer, so that the position tracking of the target is realized. In VR equipment field, mainly be at the apparent installation camera of VR head, let the apparent self detection external environment of VR head change, calculate the spatial position that the VR head shows with the help of computer or the algorithm chip of self. And the positioning can be divided into multi-view visual positioning and monocular visual positioning according to the number of the light source emitting devices (cameras).

In the prior art, the VR device needs to perform target positioning through an external positioning system, and the performance of the external positioning system is uneven, which easily affects the user experience.

Disclosure of Invention

The utility model aims at providing a six degree of freedom VR glasses, for the location technical problem who solves the VR field.

In order to achieve the purpose, the invention adopts the following technical scheme:

the application provides six degree of freedom VR glasses, include: the system comprises an image acquisition module, a parallax calculation module, a scene depth acquisition module and an image real-time mapping module;

the image acquisition module acquires image data of a target scene in a left visual field and a right visual field through a binocular camera;

the parallax calculation module is used for performing parallax calculation on image data acquired in left and right visual fields to obtain a parallax calculation graph;

the scene depth obtaining module is used for processing the parallax computation graph obtained by the parallax computation module to obtain scene obstacle data;

and the image real-time mapping module is used for matching the scene obstacle data with the image and mapping the image in the VR environment in real time to form a video stream.

Preferably, the step of acquiring image data of the target scene in the left and right visual fields by the image acquisition module comprises:

arranging two cameras of a binocular camera left and right in parallel, and arranging the two cameras on the front surface of the supporting shell facing to the direction of an image acquisition target for acquiring a target image;

vertically installing a distance sensor on the front surface of a supporting shell, and enabling the sensing direction of the distance sensor to be over against an image acquisition target for sensing the distance of the detection target;

when the acquisition target is in a set acquisition induction area, starting a binocular camera to acquire an image of the target, wherein the acquisition distance is subjected to induction detection by the distance sensor;

and preprocessing the acquired image and transmitting the preprocessed image to a parallax calculation module.

More preferably, the image acquisition module adopts an integrated parallel optical axis binocular camera module with a fixed base line distance.

Preferably, the step of calculating the binocular disparity by the disparity calculation module includes:

calibrating a binocular camera: calibrating the binocular camera to respectively obtain an internal parameter matrix and an external parameter matrix of the left and right cameras;

binocular image correction: respectively carrying out distortion removal processing on images shot by the left and right eye cameras through the internal parameter matrix; then, the internal parameter matrix and the external parameter matrix are combined to carry out binocular image correction processing, and the same point in the three-dimensional space is projected onto the same horizontal scanning line of the two-dimensional left-eye image and the two-dimensional right-eye image;

two-dimensional feature extraction: selecting a two-dimensional convolution neural network, and carrying out neural network training to serve as a two-dimensional feature extractor; sending the corrected binocular image into a two-dimensional feature extractor for forward propagation to obtain a feature map subjected to feature transformation;

three-dimensional feature extraction: selecting a three-dimensional convolution neural network, carrying out neural network training, and taking the three-dimensional convolution neural network as a three-dimensional feature extractor, wherein the three-dimensional feature extractor is used for carrying out multi-level feature extraction and transformation on a space dimension and a time dimension so as to fuse information of the space dimension and the time dimension and obtain a multi-frame information fused feature map; superposing the feature map of the current frame and the feature maps of the previous multi-frame images on the feature dimension of the feature map obtained by two-dimensional feature extraction, and sending the feature maps into a dimensional feature extractor for forward propagation to extract features to obtain the feature maps of the multi-frame images;

a parallax calculation step: and (4) performing transposition convolution on the feature map of the multi-frame image obtained in the three-dimensional extraction step, and converting the feature map back to pixels to obtain a parallax calculation map.

More preferably, the network structure of the two-dimensional feature extractor is preferably a VGG network or a residual error network.

More preferably, the parallax calculation module eliminates distortion influence by correcting and processing the pair of images acquired by the binocular camera and having distortion, so that the pair of images reaches an approximately ideal state convenient for subsequent processing.

Preferably, the step of acquiring the scene obstacle data by the scene depth acquisition module comprises:

performing matching algorithm processing on the parallax calculation map acquired by the parallax calculation module to obtain a depth information map, performing threshold segmentation on the depth information map, and extracting a useful region in an original image;

processing the depth information map by using a three-dimensional reconstruction algorithm to obtain a result containing three-dimensional coordinate information of the characteristic points;

acquiring and filtering feature points, extracting coordinate information of the feature points, and further processing the feature points from two aspects of depth of field information filtering and obstacle height information filtering;

and constructing an obstacle detection area, and processing the filtered characteristic point coordinate information to obtain data for realizing an obstacle avoidance function.

More preferably, the input of the matching algorithm is an image pair of the parallax calculation map, and matched pixel points of the images in the left and right visual fields are searched by using a SAD matching cost function.

More preferably, the scene depth obtaining module extracts an effective region in the background by filtering the background.

More preferably, the scene depth obtaining module obtains coordinate data of the feature points in the target scene through a result of the three-dimensional reconstruction processing.

More preferably, the scene depth obtaining module reads the coordinate information of the feature points of the target scene, and then further filters the feature points from the aspects of depth information filtering, obstacle height information filtering, and the like, so as to obtain useful feature points.

More preferably, the scene depth acquisition module first constructs three detection regions, and obtains data for realizing obstacles in the scene by detecting the number and distribution of useful feature points in each region.

Preferably, the step of playing the video by the image real-time mapping module comprises:

and respectively establishing a left eye plane mapping layer and a right eye plane mapping layer, respectively rendering images of left and right eye views processed by the scene depth acquisition module to the left eye plane mapping layer and the right eye plane mapping layer, and displaying the mapping layers on VR glasses.

Compared with the prior art, the technical scheme of the invention has the following beneficial effects:

the utility model provides a six degree of freedom VR glasses, through image acquisition module, gather the image data of left and right visual field target scene, acquire the parallax computation graph through parallax computation module, handle through scene degree of depth acquisition module and acquire scene obstacle data, form the video stream through image real-time mapping module with scene obstacle data and image matching and image real-time mapping in the VR environment, need not with the help of extra configuration external positioning system, can realize with the picture real-time mapping in the VR scene of shooing through the binocular camera.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a flow chart of the operation of the present invention.

FIG. 2 is an image acquisition module workflow diagram of the present invention.

Fig. 3 is a flow chart of the parallax calculation module of the present invention.

FIG. 4 is a scene depth acquisition module workflow diagram of the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The application provides six-degree-of-freedom VR glasses which comprise an image acquisition module, a parallax calculation module, a scene depth acquisition module and an image real-time mapping module; the image acquisition module acquires image data of a target scene in a left visual field and a right visual field through a binocular camera; the parallax calculation module is used for performing parallax calculation on image data acquired in left and right visual fields to obtain a parallax calculation graph; the scene depth obtaining module is used for processing the parallax computation graph obtained by the parallax computation module to obtain scene obstacle data; and the image real-time mapping module is used for matching the scene obstacle data with the image and mapping the image in the VR environment in real time to form a video stream.

In a preferred embodiment, the step of the image acquisition module acquiring image data of the target scene in the left and right visual fields comprises:

and S1, arranging the two cameras of the binocular camera left and right in parallel, and arranging the two cameras on the front surface of the supporting shell facing the direction of the image acquisition target for acquiring the target image.

And S2, vertically installing the distance sensor on the front surface of the supporting shell, and enabling the sensing direction of the distance sensor to be over against the image acquisition target for sensing the distance of the detection target.

And S3, when the collected target is in the set collection induction area, starting the binocular camera to collect the image of the target, wherein the collection distance is subjected to induction detection by the distance sensor.

And S4, preprocessing the acquired image and transmitting the preprocessed image to a parallax calculation module.

The binocular camera is provided with two cameras which are arranged in parallel from left to right, wherein the right side is a color camera, and the left side is an off-white camera; the binocular camera mainly collects images of a target.

The invention carries out image acquisition based on the binocular camera and realizes accurate positioning triggering on a close-range target by combining with the distance sensor, thereby not only improving the image acquisition efficiency, but also improving the image acquisition quality, greatly reducing the image distortion degree and being beneficial to improving the accuracy of image mapping in a VR environment.

In a preferred embodiment, the step of calculating the binocular disparity by the disparity calculation module comprises:

and W1, calibrating the binocular camera to obtain an internal parameter matrix and an external parameter matrix of the left and right cameras respectively.

W2, binocular image correction, wherein the images shot by the left and right eye cameras are respectively subjected to distortion removal processing through an internal parameter matrix; and then, the binocular image correction processing is carried out by combining the internal parameter matrix and the external parameter matrix, and the same point in the three-dimensional space is projected to the same horizontal scanning line of the two-dimensional left-eye image and the two-dimensional right-eye image.

W3, two-dimensional feature extraction, selecting a two-dimensional convolution neural network, and carrying out neural network training to serve as a two-dimensional feature extractor. And sending the corrected binocular image into a two-dimensional feature extractor for forward propagation to obtain a feature map subjected to feature transformation.

W4, three-dimensional feature extraction, namely selecting a three-dimensional convolution neural network, carrying out neural network training, and taking the three-dimensional convolution neural network as a three-dimensional feature extractor, wherein the three-dimensional feature extractor is used for carrying out multi-stage feature extraction and transformation on space dimension and time dimension so as to fuse information of the space dimension and the time dimension and obtain a multi-frame information fused feature map. And (3) superposing the feature map of the current frame and the feature maps of the previous multi-frame images on the feature dimension of the feature map obtained by two-dimensional feature extraction, and then sending the feature maps into a dimensional feature extractor for forward propagation to extract features so as to obtain the feature maps of the multi-frame images.

And W5, performing parallax calculation, namely performing transposition convolution on the feature map of the multi-frame image obtained in the three-dimensional extraction step, and converting the feature map back to pixels to obtain a parallax calculation map.

The invention firstly calibrates the binocular camera to respectively obtain the internal parameter matrix of the left eye camera and the right eye camera and the external parameter matrix between the left eye camera and the right eye camera. And the images shot by the left-eye camera and the right-eye camera can be respectively subjected to distortion removal processing through the internal parameter matrix. Through the external parameter matrix, binocular correction can be jointly performed on the left eye image and the right eye image, so that the same point in the real three-dimensional space is projected onto the same horizontal scanning line of the two-dimensional left eye image and the two-dimensional right eye image, and subsequent processing is facilitated. After the corrected binocular images are obtained, extracting features of each frame of image by utilizing the traditional two-dimensional convolution to obtain a feature map of single frame information. Network structures such as a VGG (visual Geometry group) network and a residual error network can be used as the feature extractor. And after the feature map of the single-frame information is obtained, splicing the feature maps of the current frame and the previous N frames in the feature dimension to obtain the stack of multiple feature maps. And taking the spliced feature map as input, sending the input into a three-dimensional convolution network, and performing multi-stage feature extraction and transformation on the space dimension and the time dimension through three-dimensional convolution, three-dimensional pooling and other operations to fuse the information of the space dimension and the time dimension and finally obtain the multi-frame information fused feature map. And after the feature map of the multi-frame information is obtained, restoring the feature map to a pixel domain by using the transposed convolution to obtain a final parallax calculation map.

In a preferred embodiment, the step of acquiring the scene obstacle data by the scene depth acquisition module comprises:

and performing matching algorithm processing on the parallax calculation map acquired by the parallax calculation module to obtain a depth information map, performing threshold segmentation on the depth information map, and extracting a useful area in the original image.

And processing the depth information map by using a three-dimensional reconstruction algorithm to obtain a result containing the three-dimensional coordinate information of the characteristic points.

And acquiring and filtering the feature points, extracting coordinate information of the feature points, and further processing the feature points from two aspects of depth of field information filtering and obstacle height information filtering.

Specifically, the input of the matching algorithm is an image pair of the parallax calculation graph, and matched pixel points of the images in the left visual field and the right visual field are searched by using the SAD matching cost function.

Specifically, the scene depth acquiring module extracts an effective region in the background by filtering the background.

Specifically, the scene depth acquisition module obtains coordinate data of feature points in the target scene through a result of three-dimensional reconstruction processing.

Specifically, the scene depth acquisition module reads the coordinate information of the feature points of the target scene, and then further filters the feature points from the aspects of depth information filtering, obstacle height information filtering and the like, so as to obtain useful feature points.

Specifically, the scene depth acquisition module firstly constructs three detection regions, and obtains data for realizing obstacles in the scene by detecting the number and distribution of useful feature points in each region.

In a preferred embodiment, the step of playing the video by the image real-time mapping module comprises:

the method comprises the steps of establishing a left eye plane mapping layer and a right eye plane mapping layer aiming at left and right visual fields respectively, rendering images of the left and right eye visual fields processed by a scene depth acquisition module onto the left eye plane mapping layer and the right eye plane mapping layer respectively, displaying views of the left eye plane mapping layer and the right eye plane mapping layer at the same position of VR glasses simultaneously, adjusting a left eyepiece of the VR glasses to acquire the view of the left eye plane mapping layer and adjusting a right eyepiece of the VR glasses to acquire the view of the right eye plane mapping layer, and accordingly achieving real-time mapping of images of the VR glasses in a VR scene and presenting the images to a user for immersive virtual scene experience.

The embodiments of the present invention have been described in detail, but the embodiments are merely examples, and the present invention is not limited to the embodiments described above. Any equivalent modifications and substitutions to those skilled in the art are also within the scope of the present invention. Accordingly, equivalent changes and modifications made without departing from the spirit and scope of the present invention should be covered by the present invention.

Claims

1. A six degree-of-freedom VR glasses comprising: the system comprises an image acquisition module, a parallax calculation module, a scene depth acquisition module and an image real-time mapping module;

2. The six-degree-of-freedom VR glasses of claim 1 wherein the image capture module is configured to capture image data of the target scene in left and right visual fields, and further comprising:

3. The six-degree-of-freedom VR glasses of claim 2 wherein the image capture module is an integrated parallel-optic binocular camera module with a fixed baseline distance.

4. The six-degree-of-freedom VR glasses of claim 1 wherein the step of calculating binocular disparity by the disparity calculation module includes:

5. The six-degree-of-freedom VR glasses of claim 1 wherein the scene depth capture module capturing scene obstacle data includes:

6. The VR glasses of claim 5 wherein the input to the matching algorithm is a pair of images of the disparity maps, and the SAD matching cost function is used to search for matching pixels in the images in the left and right visual fields.

7. The VR glasses of claim 5 wherein the scene depth capture module obtains coordinate data of feature points in the target scene from the result of three-dimensional reconstruction processing.

8. The VR glasses of claim 5, wherein the scene depth obtaining module reads coordinate information of the feature points of the target scene, and then further filters the feature points from depth information filtering, obstacle height information filtering, and the like, so as to obtain useful feature points.

9. The VR glasses of claim 5 wherein the scene depth capture module first constructs three detection regions, and obtains data for realizing obstacles in the scene by detecting the number and distribution of useful feature points in each region.

10. The six-degree-of-freedom VR glasses of claim 1 wherein the step of the image real-time mapping module playing the video includes: