CN113866987A

CN113866987A - Method for interactively adjusting interpupillary distance and image surface of augmented reality helmet display by utilizing gestures

Info

Publication number: CN113866987A
Application number: CN202111154355.2A
Authority: CN
Inventors: 陈靖; 倪科; 王剑; 雷霆; 杨露梅
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2021-12-31

Abstract

In order to solve the problem that the pupil distance and the image plane of an augmented reality helmet are not natural enough, the method for interactively adjusting the pupil distance and the image plane of an augmented reality helmet display by gestures comprises the following steps: shooting gesture actions in a real scene, numbering according to preset gesture types, and constructing a gesture recognition model according to the gesture actions; creating two virtual cameras in a virtual scene rendered by a three-dimensional rendering engine, and setting an initial distance between the two virtual cameras to simulate left and right eyes of a human; creating two empty objects as carriers of pictures presented by the left eye and the right eye of the simulated human, respectively corresponding the two virtual cameras to the two empty objects, and setting the depth of an imaging picture; independently rendering prompt information of pupil distance and image plane adjustment on a hollow object simulating a picture presented by left and right eyes of a human by using a three-dimensional rendering engine; and defining virtual scene logic in the three-dimensional rendering engine, and corresponding the gesture number with the virtual scene logic.

Description

Method for interactively adjusting interpupillary distance and image surface of augmented reality helmet display by utilizing gestures

Technical Field

The invention belongs to the technical field of augmented reality, and particularly relates to a method for interactively adjusting the interpupillary distance and the image plane of an augmented reality helmet display by utilizing gestures.

Background

Augmented Reality (Augmented Reality) technology is a new technology for overlaying virtual information generated by a computer to the real world of a user, and has the characteristics of virtual-real combination, real-time interaction and three-dimensional registration. Unlike virtual reality, AR technology utilizes three-dimensional registration to calculate the position of virtual objects in the real environment, and achieves augmentation of the real world by bringing virtual objects or information in a computer into the real world.

With the development of technology in recent years, AR technology has been widely applied to various fields such as industry, military, medical treatment, education, etc., and near-eye display systems developed from head-mounted displays have become one of the main ways of AR display, and can be divided into video see-through type and optical see-through type displays according to the representation form of real environment. For AR near-eye devices, whether their display device creates an immersive mixed environment for the user is critical to the success or failure of the overall system.

At present, virtual images displayed by most AR near-eye display equipment are rendered according to a binocular stereoscopic vision principle, namely, left and right eye views displayed by the AR helmet display can restore binocular parallax when human eyes watch real objects as much as possible, so that vivid and stereoscopic effects of the rendered images are achieved. In the rendering process, the main factor influencing the virtual model stereoscopic display is the distance between the optical centers of the virtual cameras corresponding to the left and right screens of the AR helmet display, namely the virtual 'interpupillary distance'. If the interpupillary distance between the eyes of the user is not matched with the distance between the optical centers of the two virtual cameras of the AR helmet, a prism effect is generated, so that not only is the three-dimensional effect reduced and the pictures of the left and right eyes can not be combined, but also the symptoms of visual fatigue, dizziness, nausea and the like are generated for part of users due to the fact that the interpupillary distance between the eyes is greatly different from the virtual interpupillary distance. Meanwhile, the distance of the display picture of the AR helmet in the imaging position in front of the eyes is also an important influence factor for influencing whether the user can obtain immersion experience and comfortable interaction when using the equipment.

Most of AR near-to-eye display devices in the current market adjust the distance between the virtual pupil distance and the imaging position of a display picture in a physical or software mode, and the main modes are as follows: the adjustment is carried out through the keys at the equipment end or corresponding parameters in the matched software are changed, but the adjustment modes have certain problems and do not completely consider the error possibly generated by the user in the adjustment and calibration stage and the feeling during interaction. The physical adjustment has the problems that after the AR helmet is worn by a user, the relative position relation between the original helmet and eyes can be changed through keys on the appearance of the equipment, so that the whole display picture can be deviated, and the wearing feeling of the user in the adjustment process is influenced; the software adjustment requires the user to pay attention to the definition of the display picture and the change of the input numerical value in the helmet during the adjustment process, which increases the use burden of the user and increases the visual fatigue, and the adjustment mode is not natural enough.

In order to solve the problem that the pupil distance and the image plane are not natural and efficient when a user wears an AR helmet, the pupil distance and the image plane can be adjusted in real time in a natural gesture interaction mode. The method has the characteristics of high recognition speed, high feasibility and convenience in operation based on a real-time gesture interaction mode, so that the virtual scene of the AR helmet can be rendered in an off-line mode, and the interpupillary distance and the image surface of the AR helmet display can be adjusted in an on-line real-time gesture interaction mode. The virtual camera and the imaging picture for simulating human eyes in the virtual scene can be set by using a three-dimensional rendering engine, the three-dimensional rendering engine in the current market is mature, and for example, Unity3D and Unreal can both generate virtual objects in augmented reality and construct a simulation environment. During real-time gesture interaction, firstly, a user makes corresponding gestures according to a current AR helmet display picture and an adjusting requirement, a camera acquires the current gestures of the user in real time, then static gestures and dynamic gestures of the user are classified and identified by adopting a gesture identification algorithm, an identification result is transmitted to a virtual scene, and finally, the virtual camera and an imaging picture are moved according to predefined scene logic, so that pupil distance and image plane adjustment of the AR helmet display can be completed.

Disclosure of Invention

In order to solve the problem that the pupil distance and the image plane of an augmented reality helmet are not natural enough, the invention provides a method for interactively adjusting the pupil distance and the image plane of an augmented reality helmet display by using gestures, so that the pupil distance and the image plane can be adjusted more naturally and efficiently when the augmented reality helmet is worn.

In order to achieve the purpose, the technical scheme of the invention is as follows:

the method for interactively adjusting the interpupillary distance and the image plane of the augmented reality helmet display by using gestures is characterized by comprising an off-line stage and an on-line stage, wherein the off-line stage specifically comprises the following steps:

step one, shooting gesture actions in a real scene, numbering according to preset gesture types, and constructing a gesture recognition model according to the gesture actions;

step two, using a three-dimensional rendering engine to create two virtual cameras in a rendered virtual scene, and setting an initial distance between the two virtual cameras to simulate the left and right eyes of a human; creating two empty objects as carriers of pictures presented by the left eye and the right eye of the simulated human, respectively corresponding the two virtual cameras to the two empty objects, and setting the depth of an imaging picture;

independently rendering prompt information of interpupillary distance and image plane adjustment on a hollow object simulating a picture presented by the left eye and the right eye of a human by using a three-dimensional rendering engine;

step four: defining virtual scene logic in a three-dimensional rendering engine, and corresponding the gesture number with the virtual scene logic;

wherein the online phase specifically comprises:

step 1: the method comprises the steps that a camera obtains a gesture image sequence of a user and transmits the gesture image sequence to a gesture recognition model;

step 2: after receiving the current gesture image sequence, the gesture recognition model outputs a gesture recognition result and sends the number of the recognition result to the three-dimensional rendering engine;

and step 3: and after receiving the serial numbers, the three-dimensional rendering engine changes the position relation between corresponding components in the virtual scene according to the virtual scene logic.

The invention has the beneficial effects that:

compared with the prior art, the method can solve the problem that the adjusting mode of matching the pupil distance of the user and changing the distance of the position of the imaging picture under the augmented reality helmet is unnatural, and has the effect of enabling the pupil distance and the image plane to be adjusted more flexibly and efficiently.

Drawings

Fig. 1 is a flowchart of a method for interactively adjusting the interpupillary distance and the image plane of an augmented reality head-mounted display by using gestures according to an embodiment of the present invention.

Detailed Description

The invention is described in detail below by way of example with reference to the accompanying drawings.

The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality helmet display by using gestures comprises an offline stage and an online stage, wherein the offline stage specifically comprises the following steps:

in this embodiment, the gesture types are divided into static gestures and dynamic gestures.

step four: and defining virtual scene logic in the three-dimensional rendering engine, and corresponding the gesture number with the virtual scene logic.

In this embodiment, the virtual scene logic includes adjustment of the interpupillary distance and the image plane, switching between adjustment interfaces, reset adjustment, and exit adjustment.

Wherein the online phase specifically comprises:

step 1: the camera acquires a gesture image sequence of a user in real time and transmits the gesture image sequence to the gesture recognition model;

in specific implementation, after a user wears the AR helmet and before the camera acquires a gesture image sequence of the user in real time, a correct pupil distance adjustment or image plane adjustment dynamic gesture and a static gesture for confirming task completion are made according to a current display picture and an adjustment requirement.

In this embodiment, the changing the position relationship between the corresponding components in the virtual scene specifically includes: pupil distance adjustment, image plane adjustment, interface switching and resetting adjustment and exit adjustment. Wherein the content of the first and second substances,

the pupil distance adjustment is that the distance between the left camera and the right camera in the virtual scene is changed through a dynamic gesture based on hand translation, and when a display picture in the virtual scene is clear and is combined, the user makes a defined confirmation static gesture to complete the pupil distance adjustment and confirmation tasks.

The image plane adjustment is that the distance between a camera and an empty object presenting a picture in a virtual scene is changed by a user through a dynamic gesture based on hand translation, and when the virtual picture is in a proper distance in front of human eyes, the user makes a defined confirmation static gesture to finish image plane adjustment and confirmation tasks.

The interface switching and resetting adjustment are corresponding gestures of interface switching, resetting adjustment and quitting adjustment performed by a user according to requirements, and the switching, resetting adjustment and quitting adjustment among the pupil distance and the image plane adjustment interface are completed through different static gestures.

And the exit adjustment is a static gesture for exiting adjustment after the user confirms the pupil distance and the image plane adjustment result, and exits the adjustment stage to complete the whole adjustment process.

Example 1:

the following embodiments are described in detail according to an off-line stage and an on-line stage, and as shown in fig. 1, specifically include:

first, off-line stage

Step 1: firstly, defining gesture types and action specifications for interaction according to the adjustment requirements of the interpupillary distance and the image plane, and numbering each gesture type. The gesture types mainly comprise 4 static gestures and 2 dynamic gestures, wherein the static gestures are mainly used for interface switching, reset adjustment, exit adjustment and other functions. The dynamic gestures are divided into initial actions and continuous actions and are mainly used for changing parameters, adjusting distances and other functions, and specific action specifications are shown in the following table:

step 2: and (3) shooting the gesture action defined in the step (1) in a real scene through a monocular camera, and then storing the shot video stream, wherein the whole gesture collection process comprises 20 tested subjects. In the static gesture collection process, a camera is used for recording 4 static actions for 3 times respectively, and each time is recorded for 6 seconds; in the dynamic gesture collection process, the camera records 2 dynamic actions for 3 times respectively, and the recording time for each time is the time consumed by the testee for completing one grabbing-sliding.

And step 3: preprocessing the video stream saved in the step 2, and labeling the acquired gesture image sequence by using a labeling tool, wherein the specific implementation steps comprise:

(1) image sequence pre-processing

In the off-line stage, the stored video stream is converted into an image sequence and stored in a gray-scale image format by using a video stream processing algorithm in OpenCv.

(2) Gesture image sequence annotation

And (3) labeling the acquired gesture image sequence by using a LabelImg labeling tool, removing fuzzy images when labeling data, labeling the range of the thumb, the index finger and the palm in each frame of clear image, and writing corresponding labels.

And 4, step 4: according to the gesture actions defined in the step 1, a gesture recognition model is constructed to realize the recognition of static gestures and dynamic gestures, and the specific implementation steps comprise:

(1) static gesture recognition

A neural network is built by adopting a Python language-based Pythrch framework, and the operation is carried out by using Pycharm. The network structure adopts a YOLO-v5 model, gray level images of 4 static gesture images and 2 dynamic gesture initial action images are used as network input, training is carried out by extracting features on gesture image sequences, gesture recognition results are output, if the recognition results are static gestures, the recognition results are directly output by YOLO-v5, and if the recognition results are not static gestures, a plurality of adjacent frame image sequences of the frame gesture images are input into a dynamic gesture recognition network for processing.

(2) Dynamic gesture recognition

A neural network is built by using a Pythrch framework for training, ResNet50 is used as a backbone network, and an A2J algorithm is used for hand key point detection, and the hand joint positions are predicted by aggregating estimation results of a plurality of anchor points. Taking a gray scale image of a section of dynamic gesture image sequence as input, carrying out joint point tracking and prediction on a thumb and an index finger in the image sequence by a network, and outputting a dynamic gesture recognition result.

And 5: using a three-dimensional rendering engine Unity to render a virtual information picture of an AR helmet, two virtual cameras are required to be set to simulate the left and right eyes of a human, and the initial positions of the two virtual cameras in a Unity world coordinate system are respectively set as: the unit of the left camera (-0.032, 0, 0) and the right camera (0.032, 0, 0) is meter, namely the distance between the centers of the two virtual cameras is 0.032- (-0.032) ═ 0.064m, and the pupil distance range of the common human eye is 58 mm-68 mm. The adjustable range between the two virtual cameras is set to be 50 mm-80 mm.

Step 6: two 3D objects Quad are placed in the three-dimensional rendering engine Unity to be respectively used as carriers of left and right eye display pictures, and the two virtual cameras in the step 5 are respectively corresponding to the left and right Quad to be respectively imaged. The Z-axis value of the two Quad in the Unity world coordinate system is set to be 5 in meters, namely, the virtual imaging picture is rendered at 5m in front of human eyes. The adjustable depth range of the imaging picture is 1 m-10 m.

And 7: and a three-dimensional rendering engine is used for rendering a gesture icon and a scale bar for prompting whether the user is in an adjusting state or not currently in a virtual scene, and the prompting information is independently rendered on a Quad component corresponding to a virtual camera for simulating the right eye of a human, so that the user is prevented from generating wrong judgment on the problem of image failure caused by mismatching of the pupil distance in the adjusting process.

And 8: defining scene logic in a three-dimensional rendering engine, wherein the scene logic comprises switching between pupil distance and image plane adjustment, resetting adjustment and quitting adjustment, and the corresponding relation between each gesture number and the scene logic in the Unity is shown in the following table:

an online stage:

step 1: and (3) a user wears a set of augmented reality helmet and performs dynamic gesture actions of pupil distance adjustment or image plane adjustment and static gestures for confirming task completion in the step 1 in the off-line stage according to the current display picture and the adjustment requirement.

Step 2: and acquiring a gesture image sequence of the current frame of the user in real time by using a camera, and transmitting the gesture image sequence to the gesture recognition model.

And step 3: and after receiving the current gesture image sequence, the gesture recognition model outputs a gesture recognition result and sends the gesture recognition result to the three-dimensional rendering engine in a serial number form defined in the step 1 of the off-line stage in a TCP communication mode.

And 4, step 4: after receiving the gesture number information, the three-dimensional rendering engine changes the position relationship between the corresponding components in the virtual scene according to the scene logic determined in the offline stage step 8, and the specific adjustment process is as follows:

(1) interpupillary distance adjustment

And on the pupil distance adjusting interface, the distance between the left camera and the right camera in the virtual scene can be changed through the dynamic gestures grabbed leftwards and rightwards by the user, and when the display frame in the virtual scene is clear and is combined with the image, the user releases the current dynamic gesture to complete the pupil distance confirmation.

(2) Image plane adjustment

And on the image plane adjusting interface, a user can change the distance between a camera and a hollow object presenting a picture in the virtual scene through a left/right captured dynamic gesture, and when the virtual picture is in a proper distance in front of the eyes, the user releases the current dynamic gesture to finish image plane confirmation.

(3) Interface switching and reset adjustment

After the pupil distance or the image plane is adjusted, the user loosens the dynamic gesture of left/right grabbing, namely, the adjustment can be stopped and the current adjustment result can be stored, and then the gesture of interface switching, reset adjustment and exit adjustment can be made according to the requirement. The switching between the pupil distance and the image plane adjusting interface can be completed by using a static gesture of 'left finger' and 'right finger'. Reset adjust uses static gesture "C".

(4) Exit adjustment

And after confirming the pupil distance and the image plane adjusting result, the user makes a static gesture 'OK', exits from the adjusting stage and finishes the whole adjusting process.

Therefore, pupil distance and image plane adjustment based on gestures under the augmented reality helmet are achieved.

In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for interactively adjusting the interpupillary distance and the image plane of an augmented reality helmet display by utilizing gestures is characterized by comprising an off-line stage and an on-line stage;

wherein the off-line phase comprises: shooting gesture actions in a real scene, numbering according to preset gesture types, and constructing a gesture recognition model according to the gesture actions; creating two virtual cameras in a virtual scene rendered by a three-dimensional rendering engine, and setting an initial distance between the two virtual cameras to simulate left and right eyes of a human; creating two empty objects as carriers of pictures presented by the left eye and the right eye of the simulated human, respectively corresponding the two virtual cameras to the two empty objects, and setting the depth of an imaging picture; independently rendering prompt information of pupil distance and image plane adjustment on a hollow object simulating a picture presented by left and right eyes of a human by using a three-dimensional rendering engine; defining virtual scene logic in a three-dimensional rendering engine, and corresponding the gesture number with the virtual scene logic;

wherein the online phase comprises: the method comprises the steps that a camera obtains a gesture image sequence of a user and transmits the gesture image sequence to a gesture recognition model; after receiving the current gesture image sequence, the gesture recognition model outputs a gesture recognition result and sends the number of the recognition result to the three-dimensional rendering engine; and after receiving the serial numbers, the three-dimensional rendering engine changes the position relation between corresponding components in the virtual scene according to the virtual scene logic.

2. The method for interactively adjusting the interpupillary distance and the image plane of an augmented reality head-mounted display by gestures as claimed in claim 1, wherein the types of gestures are classified into static gestures and dynamic gestures.

3. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality head-mounted display by gestures as claimed in claim 1 or 2, wherein the virtual scene logic comprises the adjustment of the interpupillary distance and the image plane, the interface switching adjustment, the reset adjustment and the exit adjustment.

4. The method for interactively adjusting the interpupillary distance and the image plane of an augmented reality head-mounted display by using gestures as claimed in claim 1 or 2, wherein after a user wears an AR helmet in an online stage, a dynamic gesture action of correct interpupillary distance adjustment or image plane adjustment and a static gesture for confirming task completion are performed according to a current display picture and an adjustment requirement before the camera acquires a gesture image sequence of the user in real time.

5. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality head-mounted display by using gestures according to claim 1 or 2, wherein the changing of the position relationship between the corresponding components in the virtual scene specifically comprises: pupil distance adjustment, image plane adjustment, interface switching and resetting adjustment and exit adjustment.

6. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality head-mounted display by gestures as claimed in claim 5, wherein the interpupillary distance adjustment is performed by changing the distance between the left camera and the right camera in the virtual scene through dynamic gestures based on hand translation, and when the displayed image in the virtual scene is clear, the user makes defined confirmation static gestures to complete the tasks of interpupillary distance adjustment and confirmation.

7. The method as claimed in claim 5, wherein the image plane adjustment is performed by changing the distance between the camera and the empty object in the virtual scene through a dynamic gesture based on hand translation, and when the virtual scene is in a suitable distance in front of the eyes, the user makes a defined confirmation static gesture to complete the image plane adjustment and confirmation tasks.

8. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality helmet display by gestures as claimed in claim 5, wherein the interface switching and resetting adjustment are corresponding gestures for the user to perform the interface switching, resetting adjustment and exiting adjustment according to the requirement, and the switching, resetting adjustment and exiting adjustment among the interpupillary distance, the image plane adjustment interface are all completed by different static gestures.

9. The method for interactively adjusting the interpupillary distance and the image plane of an augmented reality helmet display by gestures as claimed in claim 5, wherein the exit adjustment is a static gesture for exiting adjustment after the user confirms the interpupillary distance and the image plane adjustment result, and the exit adjustment stage is performed to complete the whole adjustment process.

10. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality helmet display by using gestures as claimed in claim 1 or 2, wherein after the gesture action is shot in the real scene, the shot video stream is converted into an image sequence by using a video stream processing algorithm in OpenCv and is saved in a gray-scale image format.

11. The method for adjusting the interpupillary distance and the image plane of the augmented reality head-mounted display by using gesture interaction as claimed in claim 1 or 2, wherein the constructing the gesture recognition model comprises a static gesture recognition model and a dynamic gesture recognition model.

12. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality head-mounted display through gestures as claimed in claim 11, wherein the static gesture recognition model is trained by building a neural network model using a Pytorch framework based on Python language.

13. The method as claimed in claim 12, wherein the neural network model adopts a YOLO-v5 model to take gray scale images of static gesture images and dynamic gesture initial motion images as input of the network, training is performed by extracting features on gesture image sequences, and a gesture recognition result is output, if the recognition result is a static gesture, the recognition result is directly output by the YOLO-v5, otherwise, several adjacent frame image sequences of the frame gesture images are input into the dynamic gesture recognition network for processing.

14. The method for interactively adjusting the interpupillary distance and the image plane of the augmented reality head-mounted display through gestures as claimed in claim 11, wherein the dynamic gesture recognition model is trained by building a neural network by using a Pythrch framework.

15. The method as claimed in claim 14, wherein the neural network adopts ResNet50 as a backbone network, the detection of the hand key points uses A2J algorithm, the estimation results of a plurality of anchor points are aggregated to predict the hand joint position, a gray scale image of a dynamic gesture image sequence is used as an input, and the network performs joint point tracking and prediction on the thumb and the index finger in the image sequence to output the dynamic gesture recognition result.