CN113282166A

CN113282166A - Interaction method and device of head-mounted display equipment and head-mounted display equipment

Info

Publication number: CN113282166A
Application number: CN202110501365.2A
Authority: CN
Inventors: 吴涛
Original assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Current assignee: Qingdao Xiaoniao Kankan Technology Co Ltd
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2021-08-20

Abstract

The application discloses an interaction method and device of a head-mounted display device and the head-mounted display device. The method comprises the following steps: acquiring a gesture image in real time; recognizing the gesture image by using a gesture recognition model to obtain a gesture recognition result; determining a preset interaction event corresponding to the gesture recognition result; and executing corresponding operation on the virtual content displayed on the display interface according to the preset interaction event corresponding to the gesture recognition result. The method and the device can greatly reduce the need that a user needs to hold extra hardware for providing user input information, reduce the communication processing steps between the manual work and each independent component of the head-mounted display equipment, simplify the user operation, reduce the learning cost of the user, and further improve the overall interaction efficiency of the head-mounted display equipment; meanwhile, the accessibility of various users who are not convenient to hold the external input equipment by hands to the head-mounted display equipment such as virtual reality is improved, and further the virtual interaction experience of the users is improved.

Description

Interaction method and device of head-mounted display equipment and head-mounted display equipment

Technical Field

The application relates to the technical field of head-mounted display equipment, in particular to an interaction method and device of the head-mounted display equipment and the head-mounted display equipment.

Background

With the development of science and technology, the diversified market demands, head-mounted display devices such as virtual reality devices are becoming more and more popular and are used in many fields such as computer games, health and safety, industry and education training. For example, hybrid virtual reality systems are being integrated into various corners of life such as mobile communication devices, game machines, personal computers, movie theaters, theme parks, university laboratories, student classrooms, and hospital exercise gyms.

In general, technologies involved in existing head-mounted display devices mainly include Virtual Reality (VR), Augmented Reality (AR), Mixed Reality (MR), and some combination and/or derivative thereof, and the principle of implementation is to adjust display content in some way before the display content is presented to a user to provide a better immersive experience for the user.

Taking a virtual reality system as an example, a typical virtual reality system generally includes one or more devices for presenting and displaying content to a user, such as may include a Head Mounted Display (HMD) worn by the user and configured to output virtual reality content to the user, which may include fully generated content or generated content in combination with captured content (e.g., real-world video, images, etc.). During operational use by a user, the user typically interacts with the virtual reality system to select content, launch applications, or otherwise configure the system.

The existing way for the user to interact with the virtual reality system is to add an external input device, such as a handle control tracker, on the basis of the head-mounted display, and the user interacts with the virtual reality content presented by the head-mounted display device by triggering the handle control tracker.

However, the inventors have found that the above interaction scheme has at least the following problems: 1) in some virtual scenes such as multi-person virtual cinemas, the way of controlling the head-mounted display device to present virtual reality contents through the external input device is complicated, and the interaction efficiency is low; 2) if a user needs to hold an external input device, the accessibility of various users who are not convenient for holding the external input device may be reduced, which in turn results in poor virtual interaction experience for the user.

Disclosure of Invention

In view of this, a main object of the present application is to provide an interaction method and apparatus for a head-mounted display device, and a head-mounted display device, so as to solve technical problems of low interaction efficiency, poor user experience, and the like of an existing interaction method for a head-mounted display device.

According to a first aspect of the present application, there is provided an interaction method of a head-mounted display device, including:

acquiring a gesture image in real time;

recognizing the gesture image by using a gesture recognition model to obtain a gesture recognition result, wherein the gesture recognition result comprises a gesture action and position information of the gesture action in the gesture image;

determining a preset interaction event corresponding to the gesture recognition result;

and executing corresponding operation on the virtual content displayed on the display interface according to the preset interaction event corresponding to the gesture recognition result.

According to a second aspect of the present application, there is provided an interaction apparatus of a head-mounted display device, comprising:

the gesture image acquisition unit is used for acquiring a gesture image in real time;

the gesture image recognition unit is used for recognizing the gesture image by utilizing a gesture recognition model to obtain a gesture recognition result, wherein the gesture recognition result comprises a gesture action and position information of the gesture action in the gesture image;

the preset interaction event determining unit is used for determining a preset interaction event corresponding to the gesture recognition result;

and the virtual content operation unit is used for executing corresponding operation on the virtual content displayed on the display interface according to the preset interaction event corresponding to the gesture recognition result.

In accordance with a third aspect of the present application, there is provided a head-mounted display device comprising: a processor, a memory storing computer-executable instructions,

the executable instructions realize the interaction method of the head-mounted display device when being executed by the processor.

According to a fourth aspect of the present application, there is provided a computer readable storage medium storing one or more programs which, when executed by a processor, implement the aforementioned method of interacting with a head-mounted display device.

The beneficial effect of this application is: according to the interaction method of the head-mounted display device, the gesture images acquired in real time are recognized through the pre-trained gesture recognition model, the preset interaction event which the user wants to trigger is determined according to the gesture recognition result, and then the virtual content displayed by the head-mounted display device is correspondingly operated according to the preset interaction event triggered by the user. The interaction method of the head-mounted display equipment can greatly reduce the need that a user needs to hold extra hardware for providing user input information, reduces the communication processing steps between manpower and each independent component of the head-mounted display equipment, simplifies user operation, reduces user learning cost and further improves the overall interaction efficiency of the head-mounted display equipment; meanwhile, the accessibility of various users who are not convenient to hold the external input equipment by hands to the head-mounted display equipment such as virtual reality is improved, and further the virtual interaction experience of the users is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flowchart of an interaction method of a head mounted display device according to an embodiment of the present application;

FIG. 2 is a block diagram of an interaction device of a head mounted display apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a head-mounted display device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. These embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein.

Fig. 1 is a flowchart illustrating an interaction method of a head-mounted display device according to an embodiment of the present application, and referring to fig. 1, the interaction method of the head-mounted display device according to the embodiment of the present application includes the following steps S110 to S140:

and step S110, acquiring a gesture image in real time.

The head-mounted display device in the embodiment of the present application may be various devices capable of implementing technologies such as VR or AR, for example, VR glasses, VR helmets, and the like, and therefore, the method for interacting with the head-mounted display device in the embodiment of the present application may be applied to any form of head-mounted display device.

When the head-mounted display device interacts with the head-mounted display device, gesture images of a user can be acquired in real time, the gesture images can be understood as images containing hand movements of the user, the hand movements of the user can be tracked in real time by utilizing a gesture tracking camera built in the head-mounted display device, and then the gesture images of the user can be acquired in real time.

Step S120, recognizing the gesture image by using the gesture recognition model to obtain a gesture recognition result, wherein the gesture recognition result comprises a gesture action and position information of the gesture action in the gesture image.

After the gesture image is acquired, the gesture image can be recognized by using a pre-trained gesture recognition model, where the gesture recognition model can be understood as a model for recognizing a predefined gesture action, and is used for recognizing the gesture action contained in the gesture image acquired in real time, so as to obtain a gesture recognition result.

Step S130, determining a preset interaction event corresponding to the gesture recognition result.

According to the steps, a gesture recognition result of the current gesture image can be obtained, different gesture recognition results may correspond to different preset interaction events, and therefore whether the user wants to trigger or not and which preset interaction event the user wants to trigger can be determined according to the currently recognized gesture recognition result.

The preset interaction event may be defined in advance for different gesture actions of the user, for example, an interaction event corresponding to a sliding menu corresponding to a fist making gesture action, an interaction event corresponding to a selected menu element corresponding to a squeezing and pinching gesture action of a thumb and a forefinger of a single hand, and the like may be defined specifically, and those skilled in the art may flexibly set the preset interaction event according to actual requirements, and are not limited specifically herein.

Step S140, according to the preset interaction event corresponding to the gesture recognition result, performing a corresponding operation on the virtual content displayed on the display interface.

After the preset interaction event corresponding to the currently recognized gesture recognition result is determined, corresponding operation can be performed on the virtual content displayed on the current head-mounted display device according to the preset interaction event.

The virtual content herein may include user interface elements, which may specifically include interactive GUI elements, such as menus or sub-menus with which a user interacts to operate on a display interface, and may also include individual GUI elements, such as elements that may be selected and/or manipulated by a user. In various virtual reality interaction scenarios, such a single GUI element may specifically include one or more of a toggle (or switchable) element, a drop-down element, a menu selection element, such as a check-box based menu, a two-or three-dimensional shape, a content display window, and the like. Of course, which types of virtual content are specifically displayed may be flexibly set by those skilled in the art according to actual requirements, and is not specifically limited herein.

The interaction method of the head-mounted display equipment can greatly reduce the need that a user needs to hold extra hardware for providing user input information, reduces the communication processing steps between manpower and each independent component of the head-mounted display equipment, simplifies user operation, reduces user learning cost and further improves the overall interaction efficiency of the head-mounted display equipment; meanwhile, the accessibility of various users who are not convenient to hold the external input equipment by hands to the head-mounted display equipment such as virtual reality is improved, and further the virtual interaction experience of the users is improved.

In one embodiment of the present application, the head mounted display device includes a gesture tracking camera, which is any one of a depth camera, a binocular infrared camera, or a binocular color camera.

The head-mounted display equipment of the embodiment of the application can adopt any one of a depth camera, a binocular infrared camera or a binocular color camera to track a camera to acquire gesture images in real time, if the depth camera is adopted, three-dimensional space information of gesture actions can be directly obtained, if the binocular infrared camera or the binocular color camera is adopted, two-dimensional position information of the gesture actions can be directly obtained, and the three-dimensional space information can be further converted through a stereoscopic vision technology.

In order to ensure the tracking stability and tracking precision in the gesture recognition process, the embodiment of the application has some basic requirements on the configuration specification of the camera: 1) FOV (Field of View, Field angle): 100 degrees and above; 2) resolution ratio: minimum 640 x 480; 3) shooting frame rate: a minimum of 30 Hz; 4) minimum tracking distance: 10 cm-100 cm. Of course, how to configure the camera parameters specifically, those skilled in the art can flexibly set the parameters according to actual requirements, and the configuration is not limited specifically herein.

In an embodiment of the present application, the gesture image is a plurality of frames of continuous gesture images, and recognizing the gesture image by using a gesture recognition model to obtain a gesture recognition result includes: recognizing multiple continuous gesture images frame by using a pre-trained gesture recognition model, and obtaining a gesture action of a current frame corresponding to each frame of gesture image; and judging whether the gesture action of the current frame meets the preset action or not for the gesture action of each current frame, if so, directly outputting the position information of the gesture action of the current frame in the gesture image of the current frame, and if not, not outputting the position information of the gesture action of the current frame in the gesture image of the current frame.

The gesture images of the embodiment of the application are continuously collected multi-frame gesture images, and the gesture recognition result can contain recognized gesture actions of the user. For the gesture image of the current frame, when the gesture image is identified by using the gesture identification model, the gesture action of the current frame can be identified, then whether the gesture action of the current frame meets the preset action condition or not is judged, and if the gesture action of the current frame meets the preset action condition, the specific position information of the gesture action of the current frame in the gesture image of the current frame can be directly output; if not, the position information of the gesture action of the current frame in the gesture image of the current frame is not required to be output.

It should be noted that, for each frame of gesture image acquired in real time, the processing may be performed according to the above steps, and therefore, details are not described herein.

In an embodiment of the application, the preset interaction event includes a static interaction event and a dynamic interaction event, and determining the preset interaction event corresponding to the gesture recognition result includes: if the gesture actions in the multiple continuous gesture images meet the preset action, obtaining the position information of the multiple gesture actions in the gesture images; determining whether the gesture movement occurs or not according to the position information of the plurality of gesture movements in the gesture image; and if the gesture action is determined not to move, determining that the preset interaction event corresponding to the gesture recognition result is a static interaction event, and if the gesture action is determined to move, determining that the preset interaction event corresponding to the gesture recognition result is a dynamic interaction event.

The preset interaction event of the embodiment of the application can be divided into a static interaction event and a dynamic interaction event according to the operation mode of the virtual content, and the static interaction event can be understood as that when the static interaction event is triggered by the gesture recognition result of the user, the user only realizes static interaction with the virtual content displayed on the display interface and does not relate to the dynamic change of the virtual content; conversely, the dynamic interaction event may be understood as that when the gesture recognition result of the user triggers the dynamic interaction event, the virtual content displayed on the display interface changes along with the movement of the gesture action of the user.

Whether the interaction event is a static interaction event or a dynamic interaction event, whether the gesture action of the user meets a preset action or not can be determined based on the gesture action in the recognized gesture image, if the gesture action in the recognized gesture image meets the preset action, whether the gesture action meets a preset displacement condition or not can be further determined according to the position information of the gesture action in the gesture image, namely whether the gesture action moves or not is determined, if the gesture action recognized by a certain number of frames does not meet the preset displacement condition, the user only wants to trigger the static interaction event, and if the gesture action recognized by the certain number of frames meets the preset displacement condition, the user wants to trigger the dynamic interaction event.

It should be noted that the preset actions set for the static interaction event and the dynamic interaction event in the embodiment of the present application may be the same gesture actions, so that the cost for training and recognizing the gesture recognition model may be reduced, and of course, different gesture actions may also be set, so that the diversity of gesture interactions may be enriched, and how to set the preset actions may be set.

In an embodiment of the application, the dynamic interaction event includes a first dynamic interaction event and a second dynamic interaction event, wherein when no user interface element is displayed in currently displayed virtual content, a preset interaction event corresponding to a gesture recognition result is determined as the first dynamic interaction event; and when the user interface element is displayed in the currently displayed virtual content, determining that the preset interaction event corresponding to the gesture recognition result is a second dynamic interaction event.

The dynamic interaction event designed in the embodiment of the application can be further divided into a first dynamic interaction event and a second dynamic interaction event, and which dynamic interaction event is triggered can be determined according to virtual content displayed on a current display interface.

In an embodiment of the application, a preset gesture is a grasping gesture motion formed by at least two fingers of a single hand, position information of the gesture motion in a gesture image is position information of a grasping contact point corresponding to the grasping gesture motion in an image coordinate system, and according to a preset interaction event corresponding to a gesture recognition result, executing corresponding operation on virtual content displayed on a display interface includes: if the preset interaction event corresponding to the gesture recognition result is a static interaction event, determining corresponding virtual content on a display interface according to the position information of a grasping contact point corresponding to the grasping gesture action in an image coordinate system, so as to select and confirm the corresponding virtual content; if the preset interaction event corresponding to the gesture recognition result is a first dynamic interaction event, displaying rays with directions in a virtual content displayed on a display interface according to the direction of the grasping gesture action and the position information of the grasping contact point in an image coordinate system, moving the rays with the directions based on the movement of the grasping gesture action in a real scene, and completing the selection and confirmation of the virtual content through the release of the grasping gesture action; and if the preset interaction event corresponding to the gesture recognition result is a second dynamic interaction event, determining a user interface element corresponding to the virtual content displayed on the display interface according to the position information of a grasping contact point corresponding to the grasping gesture action in the image coordinate system, moving the user interface element based on the movement of the grasping gesture action in the real scene, and completing the movement of the user interface element through the release of the grasping gesture action.

The preset action designed by the embodiment of the application can be a gripping gesture action formed by at least two fingers of a single hand, and the position information of the gesture action in the gesture image is the position information of a gripping contact point corresponding to the gripping gesture action in an image coordinate system. When the static interaction event is triggered by the gesture recognition result of the user, the corresponding virtual content on the display interface can be determined according to the position information of the grasping contact point corresponding to the grasping gesture action in the image coordinate system, and the selection and confirmation of the corresponding virtual content are completed, so that the static interaction operation between the user and the virtual content is realized.

When the gesture recognition result of the user triggers the first dynamic interaction event, a ray with a direction can be displayed in the virtual content displayed on the display interface according to the direction of the grasping gesture action and the position information of the grasping contact point in the image coordinate system, the ray displayed in the virtual content can perform position translation according to the corresponding relation through the movement of the grasping gesture action in the real physical environment, and then the grasping gesture action is released to complete the selection and confirmation of the corresponding virtual content, so that the first dynamic interaction operation between the user and the virtual content is realized.

When the gesture recognition result of the user triggers the second dynamic interaction event, one or more user interface elements in the virtual content displayed on the display interface in which the grasping contact point is in contact with the adhesion can be determined according to the position information of the grasping contact point corresponding to the grasping gesture action in the image coordinate system, then the corresponding user interface elements can be moved in position by moving or throwing the grasping gesture action, and finally the scene position moving event of the user interface elements in the virtual content is completed by releasing the grasping gesture action, so that the second dynamic interaction operation between the user and the user interface elements displayed on the virtual content is realized.

Therefore, the gesture action for triggering the preset interaction event designed by the embodiment of the application is the gesture action finished by one hand, and the user does not need to use two hands simultaneously to interact with the virtual content displayed in the virtual reality scene, so that the user operation can be simplified, the learning cost of the user is reduced, and the interaction efficiency of the user and the head-mounted display equipment is improved. Of course, in practical applications, those skilled in the art can flexibly set other forms of preset actions and interaction events according to actual requirements, which are not listed here.

In one embodiment of the present application, displaying a directional ray in virtual content displayed on a display interface according to the direction of the gesture and the position information of the contact point in the image coordinate system includes: converting the position information of the grasping contact point of the grasping gesture action in the image coordinate system into the camera coordinate system to obtain the three-dimensional space information of the grasping contact point of the grasping gesture action in the camera coordinate system; converting three-dimensional space information of a grasping contact point of a grasping gesture action in a camera coordinate system into a world coordinate system where virtual content displayed on a display interface is located; determining the direction of the display visual field of the current head-mounted display equipment, and determining the direction of the gripping gesture action according to the direction of the display visual field of the current head-mounted display equipment; and displaying a ray with a direction in the virtual content displayed on the display interface according to the three-dimensional space information of the grasping contact point of the grasping gesture action in the world coordinate system and the direction of the grasping gesture action.

For the first dynamic interaction event of the above embodiment, when performing interaction, since the position information of the grasping contact point output by the gesture recognition model of the embodiment of the present application is in the image coordinate system, and the ray showing the band direction displayed in the display interface is in the world coordinate system of 6DoF (six degrees of freedom), there is a need for position information transformation between the image coordinate system-camera coordinate system-world coordinate system.

Specifically, the position information of the grasping contact point of the grasping gesture motion in the image coordinate system may be converted into the camera coordinate system through a stereoscopic vision technique, so as to obtain the three-dimensional space information of the grasping contact point in the camera coordinate system, then the three-dimensional space information of the grasping contact point in the camera coordinate system is converted into the world coordinate system, and finally, the ray with the direction is displayed in the virtual content displayed on the display interface according to the three-dimensional space information of the grasping contact point of the grasping gesture motion in the world coordinate system and the direction of the grasping gesture motion.

When the ray with the direction is shown in the virtual content, the direction of the generated ray can be determined according to the direction of the grasping gesture action of the user. Specifically, when determining the direction of the grip gesture motion, the direction of the display field of view of the current head mounted display device may be determined, for example, if the head mounted display device worn by the user faces a 45 ° southeast direction, it indicates that the user wants to perform an interactive operation in the direction, so at this time, the direction of the display field of view of the current head mounted display device may be determined to be the direction of the grip gesture motion, and a ray may be generated in the direction.

It should be noted that, if the gesture tracking camera used in the head-mounted display device according to the embodiment of the present application is a binocular infrared camera or a binocular color camera, the position information of the grasping contact point output by the gesture recognition model is the two-dimensional position information in the image coordinate system, so that there is a need for converting the position information between the image coordinate system and the camera coordinate system. And if the gesture tracking camera adopted by the head-mounted display device is a depth camera, the position information of the grasping contact point output by the gesture recognition model is the three-dimensional space information under the camera coordinate system, and at the moment, the conversion of the position information between the image coordinate system and the camera coordinate system is not needed.

In an embodiment of the present application, in order to improve the accuracy of the sliding operation, before mapping the three-dimensional spatial information of the grasp contact point, a sliding window based smoothing filtering process may be further performed on the three-dimensional spatial information of the grasp contact point, so as to improve the stability of the jitter precision error of the position information in the three-dimensional space and reduce the jitter error of the position information due to image data noise or model identification error.

In an embodiment of the present application, the gesture recognition model of the embodiment of the present application may be obtained by training in the following manner: the method comprises the steps of designing a gesture detection and recognition model which is based on a convolutional neural network model and is approximately right-angled to form a forefinger and a thumb on a single hand, and training through an off-line network model to obtain a gesture recognition model which is applicable to a right-angled shape formed between the thumb and the forefinger of a left hand and a right hand.

Specifically, more than 180 user gesture action cases are collected through a built-in gesture tracking camera of the head-mounted display device in the embodiment of the application, more than 330 thousands of image data are obtained in total, gesture action information is labeled, the labeled image data is used as a training sample, and a gripping gesture recognition model suitable for a left hand and a right hand is trained based on a convolutional neural network.

In order to be compatible with the recognition efficiency of the gesture recognition model on the head-mounted display device, and by combining the use characteristics of a user scene, the gesture recognition model trained by the embodiment of the application only recognizes the gesture action data of one hand in each frame of image. The gesture recognition model inputs gesture image data shot by the gesture tracking camera in real time, outputs whether a gripping gesture action formed by at least two fingers of a single hand exists in a current gesture image, for example, if the gripping gesture action exists, the recognition state can be output to be 1, the position of a gripping contact point of the gripping gesture action on the gesture image is output, and if the gripping gesture action does not exist, the recognition state is output to be 0. And calculating a training loss value according to the gesture recognition result output by the gesture recognition model and the labeled gesture action information, and updating the gesture recognition model according to the training loss value to obtain the trained gesture recognition model.

The interaction method of the head-mounted display device belongs to the same technical concept as the interaction method of the head-mounted display device, and the embodiment of the application also provides an interaction device of the head-mounted display device. Fig. 2 shows a block diagram of an interaction apparatus of a head-mounted display device according to an embodiment of the present application, and referring to fig. 2, an interaction apparatus 200 of a head-mounted display device includes: a gesture image obtaining unit 210, a gesture image recognition unit 220, a preset interaction event determining unit 230, and a virtual content operating unit 240. Wherein the content of the first and second substances,

a gesture image obtaining unit 210, configured to obtain a gesture image in real time;

the gesture image recognition unit 220 is configured to recognize a gesture image by using a gesture recognition model to obtain a gesture recognition result, where the gesture recognition result includes a gesture action and position information of the gesture action in the gesture image;

a preset interaction event determining unit 230, configured to determine a preset interaction event corresponding to the gesture recognition result;

and the virtual content operating unit 240 is configured to perform corresponding operations on the virtual content displayed on the display interface according to the preset interaction event corresponding to the gesture recognition result.

In an embodiment of the present application, the gesture image is a plurality of frames of continuous gesture images, the gesture recognition result includes a gesture motion, and the gesture image recognition unit 220 is specifically configured to: recognizing multiple continuous gesture images frame by using a pre-trained gesture recognition model, and obtaining a gesture action of a current frame corresponding to each frame of gesture image; and judging whether the gesture action of the current frame meets the preset action or not for the gesture action of each current frame, if so, directly outputting the position information of the gesture action of the current frame in the gesture image of the current frame, and if not, not outputting the position information of the gesture action of the current frame in the gesture image of the current frame.

In an embodiment of the present application, the preset interactive event includes a static interactive event and a dynamic interactive event, and the preset interactive event determining unit 230 is specifically configured to: if the gesture actions in the multiple continuous gesture images meet the preset action, obtaining the position information of the multiple gesture actions in the gesture images; determining whether the gesture movement occurs or not according to the position information of the plurality of gesture movements in the gesture image; and if the gesture action is determined not to move, determining that the preset interaction event corresponding to the gesture recognition result is a static interaction event, and if the gesture action is determined to move, determining that the preset interaction event corresponding to the gesture recognition result is a dynamic interaction event.

In an embodiment of the present application, the dynamic interaction event includes a first dynamic interaction event and a second dynamic interaction event, and the preset interaction event determining unit 230 is specifically configured to: when no user interface element is displayed in the currently displayed virtual content, determining a preset interaction event corresponding to the gesture recognition result as a first dynamic interaction event; and when the user interface element is displayed in the currently displayed virtual content, determining that the preset interaction event corresponding to the gesture recognition result is a second dynamic interaction event.

In an embodiment of the present application, a grasping gesture motion formed by at least two fingers of a single hand is preset, the position information of the gesture motion in the gesture image is position information of a grasping contact point corresponding to the grasping gesture motion in an image coordinate system, and the virtual content operating unit 240 is specifically configured to: if the preset interaction event corresponding to the gesture recognition result is a static interaction event, determining corresponding virtual content on a display interface according to the position information of a grasping contact point corresponding to the grasping gesture action in an image coordinate system, so as to select and confirm the corresponding virtual content; if the preset interaction event corresponding to the gesture recognition result is a first dynamic interaction event, displaying rays with directions in a virtual content displayed on a display interface according to the direction of the grasping gesture action and the position information of the grasping contact point in an image coordinate system, moving the rays with the directions based on the movement of the grasping gesture action in a real scene, and completing the selection and confirmation of the virtual content through the release of the grasping gesture action; and if the preset interaction event corresponding to the gesture recognition result is a second dynamic interaction event, determining a user interface element corresponding to the virtual content displayed on the display interface according to the position information of a grasping contact point corresponding to the grasping gesture action in the image coordinate system, moving the user interface element based on the movement of the grasping gesture action in the real scene, and completing the movement of the user interface element through the release of the grasping gesture action.

In an embodiment of the present application, the virtual content operating unit 240 is specifically configured to: converting the position information of the grasping contact point of the grasping gesture action in the image coordinate system into the camera coordinate system to obtain the three-dimensional space information of the grasping contact point of the grasping gesture action in the camera coordinate system; converting three-dimensional space information of a grasping contact point of a grasping gesture action in a camera coordinate system into a world coordinate system where virtual content displayed on a display interface is located; determining the direction of the display visual field of the current head-mounted display equipment, and determining the direction of the gripping gesture action according to the direction of the display visual field of the current head-mounted display equipment; and displaying a ray with a direction in the virtual content displayed on the display interface according to the three-dimensional space information of the grasping contact point of the grasping gesture action in the world coordinate system and the direction of the grasping gesture action.

It should be noted that:

fig. 3 illustrates a schematic structural diagram of a head-mounted display device. Referring to fig. 3, at a hardware level, the head-mounted display device includes a memory and a processor, and optionally further includes an interface module, a communication module, and the like. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may also include a non-volatile Memory, such as at least one disk Memory. Of course, the head mounted display device may also include hardware needed for other services.

The processor, the interface module, the communication module, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.

A memory for storing computer executable instructions. The memory provides computer executable instructions to the processor through the internal bus.

A processor executing computer executable instructions stored in the memory and specifically configured to perform the following operations:

acquiring a gesture image in real time;

The functions performed by the interaction device of the head-mounted display apparatus according to the embodiment shown in fig. 2 of the present application may be implemented in or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The head-mounted display device may also perform steps performed by the interaction method of the head-mounted display device in fig. 1, and implement the functions of the interaction method of the head-mounted display device in the embodiment shown in fig. 1, which are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium storing one or more programs, which when executed by a processor, implement the aforementioned interaction method for a head-mounted display device, and are specifically configured to perform:

acquiring a gesture image in real time;

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) that include computer-usable program code.

The present application is described in terms of flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) characterized by computer-usable program code.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. An interaction method of a head-mounted display device, comprising:

acquiring a gesture image in real time;

2. The method according to claim 1, wherein the gesture image is a plurality of frames of continuous gesture images, and the recognizing the gesture image by using the gesture recognition model to obtain the gesture recognition result comprises:

recognizing multiple continuous gesture images frame by using a pre-trained gesture recognition model, and obtaining a gesture action of a current frame corresponding to each frame of gesture image;

and judging whether the gesture action of the current frame meets a preset action or not for the gesture action of each current frame, if so, directly outputting the position information of the gesture action of the current frame in the gesture image of the current frame, and if not, not outputting the position information of the gesture action of the current frame in the gesture image of the current frame.

3. The method according to claim 2, wherein the preset interaction events include a static interaction event and a dynamic interaction event, and the determining the preset interaction event corresponding to the gesture recognition result includes:

if the gesture actions in the multiple continuous gesture images meet the preset action, obtaining the position information of the multiple gesture actions in the gesture images;

determining whether the gesture movement occurs or not according to the position information of the plurality of gesture movements in the gesture image;

and if the gesture action is determined not to move, determining that the preset interaction event corresponding to the gesture recognition result is a static interaction event, and if the gesture action is determined to move, determining that the preset interaction event corresponding to the gesture recognition result is a dynamic interaction event.

4. The method of claim 3, wherein the dynamic interactivity event comprises a first dynamic interactivity event and a second dynamic interactivity event, wherein,

when no user interface element is displayed in the currently displayed virtual content, determining that a preset interaction event corresponding to the gesture recognition result is a first dynamic interaction event; and when the user interface element is displayed in the currently displayed virtual content, determining that the preset interaction event corresponding to the gesture recognition result is a second dynamic interaction event.

5. The method according to claim 4, wherein the preset action is a gripping gesture action formed by at least two fingers of a single hand, the position information of the gesture action in the gesture image is position information of a gripping contact point corresponding to the gripping gesture action in an image coordinate system, and performing a corresponding operation on virtual content displayed on a display interface according to a preset interaction event corresponding to the gesture recognition result includes:

if the preset interaction event corresponding to the gesture recognition result is a static interaction event, determining corresponding virtual content on the display interface according to the position information of a grasping contact point corresponding to the grasping gesture action in an image coordinate system, so as to select and confirm the corresponding virtual content;

if the preset interaction event corresponding to the gesture recognition result is a first dynamic interaction event, displaying a directional ray in virtual content displayed on the display interface according to the direction of the grasping gesture action and the position information of the grasping contact point in an image coordinate system, moving the directional ray based on the movement of the grasping gesture action in a real scene, and completing the selection and confirmation of the virtual content through the release of the grasping gesture action;

if the preset interaction event corresponding to the gesture recognition result is a second dynamic interaction event, determining a corresponding user interface element in virtual content displayed on the display interface according to the position information of a grasping contact point corresponding to the grasping gesture action in an image coordinate system, moving the user interface element based on the movement of the grasping gesture action in a real scene, and completing the movement of the user interface element through the release of the grasping gesture action.

6. The method according to claim 5, wherein the displaying a directional ray in the virtual content displayed on the display interface according to the direction of the grasping gesture action and the position information of the grasping contact point in the image coordinate system comprises:

converting the position information of the grasping contact point of the grasping gesture motion in an image coordinate system into a camera coordinate system to obtain three-dimensional space information of the grasping contact point of the grasping gesture motion in the camera coordinate system;

converting the three-dimensional space information of the grasping contact point of the grasping gesture motion in the camera coordinate system into a world coordinate system where virtual content displayed on the display interface is located;

determining the direction of the display visual field of the current head-mounted display equipment, and determining the direction of the gripping gesture action according to the direction of the display visual field of the current head-mounted display equipment;

and displaying a ray with a direction in virtual content displayed on the display interface according to the three-dimensional space information of the grasping contact point of the grasping gesture action in the world coordinate system and the direction of the grasping gesture action.

7. An interaction apparatus of a head-mounted display device, comprising:

8. The apparatus according to claim 7, wherein the gesture image is a plurality of consecutive gesture images, and the gesture image recognition unit is specifically configured to:

9. The apparatus according to claim 8, wherein the preset interactivity event comprises a static interactivity event and a dynamic interactivity event, and the preset interactivity event determining unit is specifically configured to:

10. A head-mounted display device, comprising: a processor, a memory storing computer-executable instructions,

the executable instructions, when executed by the processor, implement a method of interacting with the head mounted display device of any of claims 1 to 6.