CN112486322A - Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition - Google Patents
Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition Download PDFInfo
- Publication number
- CN112486322A CN112486322A CN202011416500.5A CN202011416500A CN112486322A CN 112486322 A CN112486322 A CN 112486322A CN 202011416500 A CN202011416500 A CN 202011416500A CN 112486322 A CN112486322 A CN 112486322A
- Authority
- CN
- China
- Prior art keywords
- voice
- gesture
- recognition
- mode
- glasses
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 25
- 239000011521 glass Substances 0.000 title claims abstract description 17
- 230000003190 augmentative effect Effects 0.000 title claims abstract description 14
- 238000013473 artificial intelligence Methods 0.000 claims description 3
- 230000002452 interceptive effect Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims 1
- FGUUSXIOTUKUDN-IBGZPJMESA-N C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 Chemical compound C1(=CC=CC=C1)N1C2=C(NC([C@H](C1)NC=1OC(=NN=1)C1=CC=CC=C1)=O)C=CC=C2 FGUUSXIOTUKUDN-IBGZPJMESA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T19/00—Manipulating 3D models or images for computer graphics
- G06T19/006—Mixed reality
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Computer Graphics (AREA)
- Computer Hardware Design (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
A multi-mode AR (augmented reality) glasses interaction system based on speech recognition and gesture recognition effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.
Description
Technical Field
The invention relates to the technical field of augmented reality, in particular to a multi-mode AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition.
Background
The proportion of visually and audibly captured information within a multimodal is 94% and is the two channels used by currently popular GUIs (graphical user interfaces) and VUIs (voice user interfaces). Therefore, the two maximum user interaction modes are fused in the fields of computers and graphics to judge information, a system which is operated efficiently by a user can be provided, and the user can be comfortable, efficient and safe to interact with AR (augmented reality) wearable equipment. However, existing AR glasses cannot recognize voice and gestures, and thus cannot improve user interaction experience.
Disclosure of Invention
In order to overcome the defects of the technologies, the invention provides the multimode AR glasses interaction system based on voice recognition and gesture recognition, which combines voice and gesture operations and improves user interaction.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition comprises the following steps:
a) collecting voice and gesture data of a user according to different scenes;
b) preprocessing the collected voice and gesture data;
c) training the preprocessed data by using an artificial intelligence model;
d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;
e) deploying the trained model into a wearable device system of the AR glasses;
f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.
Further, labeling and aligning data for different combinations of voice and gesture in the step b).
Further, in step f), the user activates the interactive mode through a voice gesture adding mode.
The invention has the beneficial effects that: the multi-modal AR (augmented reality) wearable device interaction system based on speech recognition and gesture recognition is provided, and effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
The invention is further described below with reference to fig. 1.
A multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition comprises the following steps:
a) collecting voice and gesture data of a user according to different scenes;
b) preprocessing the collected voice and gesture data;
c) training the preprocessed data by using an artificial intelligence model;
d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;
e) deploying the trained model into a wearable device system of the AR glasses;
f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.
The multi-modal AR (augmented reality) wearable device interaction system based on speech recognition and gesture recognition is provided, and effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.
Further, labeling and aligning data for different combinations of voice and gesture in the step b).
Further, in step f), the user activates the interactive mode through a voice gesture adding mode.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (3)
1. A multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition is characterized by comprising the following steps of:
a) collecting voice and gesture data of a user according to different scenes;
b) preprocessing the collected voice and gesture data;
c) training the preprocessed data by using an artificial intelligence model;
d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;
e) deploying the trained model into a wearable device system of the AR glasses;
f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.
2. The system of claim 1, wherein the system further comprises a voice recognition and gesture recognition based multimodal AR glasses interaction system, wherein: and b) labeling and aligning data for different combinations of voice and gestures in the step b).
3. The system of claim 1, wherein the system further comprises a voice recognition and gesture recognition based multimodal AR glasses interaction system, wherein: and f), activating an interactive mode by the user in a voice and gesture mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011416500.5A CN112486322A (en) | 2020-12-07 | 2020-12-07 | Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011416500.5A CN112486322A (en) | 2020-12-07 | 2020-12-07 | Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112486322A true CN112486322A (en) | 2021-03-12 |
Family
ID=74940381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011416500.5A Pending CN112486322A (en) | 2020-12-07 | 2020-12-07 | Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112486322A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113986111A (en) * | 2021-12-28 | 2022-01-28 | 北京亮亮视野科技有限公司 | Interaction method, interaction device, electronic equipment and storage medium |
CN114167994A (en) * | 2022-02-11 | 2022-03-11 | 北京亮亮视野科技有限公司 | Knowledge base adding method, device, equipment and medium |
CN115756161A (en) * | 2022-11-15 | 2023-03-07 | 华南理工大学 | Multi-modal interactive structure mechanics analysis method, system, computer equipment and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997235A (en) * | 2016-01-25 | 2017-08-01 | 亮风台(上海)信息科技有限公司 | Method, equipment for realizing augmented reality interaction and displaying |
CN108334199A (en) * | 2018-02-12 | 2018-07-27 | 华南理工大学 | The multi-modal exchange method of movable type based on augmented reality and device |
CN110211240A (en) * | 2019-05-31 | 2019-09-06 | 中北大学 | A kind of augmented reality method for exempting from sign-on ID |
CN110415166A (en) * | 2019-07-29 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Training method, image processing method, device and the storage medium of blending image processing model |
CN110554774A (en) * | 2019-07-22 | 2019-12-10 | 济南大学 | AR-oriented navigation type interactive normal form system |
CN111680594A (en) * | 2020-05-29 | 2020-09-18 | 北京计算机技术及应用研究所 | Augmented reality interaction method based on gesture recognition |
-
2020
- 2020-12-07 CN CN202011416500.5A patent/CN112486322A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106997235A (en) * | 2016-01-25 | 2017-08-01 | 亮风台(上海)信息科技有限公司 | Method, equipment for realizing augmented reality interaction and displaying |
CN108334199A (en) * | 2018-02-12 | 2018-07-27 | 华南理工大学 | The multi-modal exchange method of movable type based on augmented reality and device |
CN110211240A (en) * | 2019-05-31 | 2019-09-06 | 中北大学 | A kind of augmented reality method for exempting from sign-on ID |
CN110554774A (en) * | 2019-07-22 | 2019-12-10 | 济南大学 | AR-oriented navigation type interactive normal form system |
CN110415166A (en) * | 2019-07-29 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Training method, image processing method, device and the storage medium of blending image processing model |
CN111680594A (en) * | 2020-05-29 | 2020-09-18 | 北京计算机技术及应用研究所 | Augmented reality interaction method based on gesture recognition |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113986111A (en) * | 2021-12-28 | 2022-01-28 | 北京亮亮视野科技有限公司 | Interaction method, interaction device, electronic equipment and storage medium |
CN114167994A (en) * | 2022-02-11 | 2022-03-11 | 北京亮亮视野科技有限公司 | Knowledge base adding method, device, equipment and medium |
CN114167994B (en) * | 2022-02-11 | 2022-06-28 | 北京亮亮视野科技有限公司 | Knowledge base adding method, device, equipment and medium |
CN115756161A (en) * | 2022-11-15 | 2023-03-07 | 华南理工大学 | Multi-modal interactive structure mechanics analysis method, system, computer equipment and medium |
CN115756161B (en) * | 2022-11-15 | 2023-09-26 | 华南理工大学 | Multi-mode interactive structure mechanics analysis method, system, computer equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112486322A (en) | Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition | |
JP7432556B2 (en) | Methods, devices, equipment and media for man-machine interaction | |
CN107220235B (en) | Speech recognition error correction method and device based on artificial intelligence and storage medium | |
CN109889920B (en) | Network course video editing method, system, equipment and storage medium | |
US11093769B2 (en) | Stroke extraction in free space | |
Gao et al. | Sign language recognition based on HMM/ANN/DP | |
Dreuw et al. | Benchmark databases for video-based automatic sign language recognition | |
CN103218842B (en) | A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation | |
CN107293296A (en) | Voice identification result correcting method, device, equipment and storage medium | |
CN108985358A (en) | Emotion identification method, apparatus, equipment and storage medium | |
CN109508687A (en) | Man-machine interaction control method, device, storage medium and smart machine | |
CN107665708A (en) | Intelligent sound exchange method and system | |
CN108710704B (en) | Method and device for determining conversation state, electronic equipment and storage medium | |
CN110148399A (en) | A kind of control method of smart machine, device, equipment and medium | |
CN112667068A (en) | Virtual character driving method, device, equipment and storage medium | |
CN109992765A (en) | Text error correction method and device, storage medium and electronic equipment | |
CN114495130B (en) | Cross-modal information-based document reading understanding model training method and device | |
CN109817210A (en) | Voice writing method, device, terminal and storage medium | |
CN104267922A (en) | Information processing method and electronic equipment | |
CN111160004A (en) | Method and device for establishing sentence-breaking model | |
CN112382287A (en) | Voice interaction method and device, electronic equipment and storage medium | |
CN115050354B (en) | Digital human driving method and device | |
CN110032736A (en) | A kind of text analyzing method, apparatus and storage medium | |
Ghosh et al. | Eyeditor: Towards on-the-go heads-up text editing using voice and manual input | |
US20150073772A1 (en) | Multilingual speech system and method of character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210312 |
|
RJ01 | Rejection of invention patent application after publication |