CN112486322A

CN112486322A - Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition

Info

Publication number: CN112486322A
Application number: CN202011416500.5A
Authority: CN
Inventors: 朱翔宇; 段强; 李锐; 王建华
Original assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Current assignee: Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-12

Abstract

A multi-mode AR (augmented reality) glasses interaction system based on speech recognition and gesture recognition effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.

Description

Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition

Technical Field

The invention relates to the technical field of augmented reality, in particular to a multi-mode AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition.

Background

The proportion of visually and audibly captured information within a multimodal is 94% and is the two channels used by currently popular GUIs (graphical user interfaces) and VUIs (voice user interfaces). Therefore, the two maximum user interaction modes are fused in the fields of computers and graphics to judge information, a system which is operated efficiently by a user can be provided, and the user can be comfortable, efficient and safe to interact with AR (augmented reality) wearable equipment. However, existing AR glasses cannot recognize voice and gestures, and thus cannot improve user interaction experience.

Disclosure of Invention

In order to overcome the defects of the technologies, the invention provides the multimode AR glasses interaction system based on voice recognition and gesture recognition, which combines voice and gesture operations and improves user interaction.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition comprises the following steps:

a) collecting voice and gesture data of a user according to different scenes;

b) preprocessing the collected voice and gesture data;

c) training the preprocessed data by using an artificial intelligence model;

d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;

e) deploying the trained model into a wearable device system of the AR glasses;

f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.

Further, labeling and aligning data for different combinations of voice and gesture in the step b).

Further, in step f), the user activates the interactive mode through a voice gesture adding mode.

The invention has the beneficial effects that: the multi-modal AR (augmented reality) wearable device interaction system based on speech recognition and gesture recognition is provided, and effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.

Drawings

FIG. 1 is a flow chart of the system of the present invention.

Detailed Description

The invention is further described below with reference to fig. 1.

a) collecting voice and gesture data of a user according to different scenes;

b) preprocessing the collected voice and gesture data;

c) training the preprocessed data by using an artificial intelligence model;

e) deploying the trained model into a wearable device system of the AR glasses;

The multi-modal AR (augmented reality) wearable device interaction system based on speech recognition and gesture recognition is provided, and effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition is characterized by comprising the following steps of:

a) collecting voice and gesture data of a user according to different scenes;

b) preprocessing the collected voice and gesture data;

c) training the preprocessed data by using an artificial intelligence model;

e) deploying the trained model into a wearable device system of the AR glasses;

2. The system of claim 1, wherein the system further comprises a voice recognition and gesture recognition based multimodal AR glasses interaction system, wherein: and b) labeling and aligning data for different combinations of voice and gestures in the step b).

3. The system of claim 1, wherein the system further comprises a voice recognition and gesture recognition based multimodal AR glasses interaction system, wherein: and f), activating an interactive mode by the user in a voice and gesture mode.