CN112486322A - Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition - Google Patents

Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition Download PDF

Info

Publication number
CN112486322A
CN112486322A CN202011416500.5A CN202011416500A CN112486322A CN 112486322 A CN112486322 A CN 112486322A CN 202011416500 A CN202011416500 A CN 202011416500A CN 112486322 A CN112486322 A CN 112486322A
Authority
CN
China
Prior art keywords
voice
gesture
recognition
mode
glasses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011416500.5A
Other languages
Chinese (zh)
Inventor
朱翔宇
段强
李锐
王建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Hi Tech Investment and Development Co Ltd
Original Assignee
Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Hi Tech Investment and Development Co Ltd filed Critical Jinan Inspur Hi Tech Investment and Development Co Ltd
Priority to CN202011416500.5A priority Critical patent/CN112486322A/en
Publication of CN112486322A publication Critical patent/CN112486322A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A multi-mode AR (augmented reality) glasses interaction system based on speech recognition and gesture recognition effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.

Description

Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition
Technical Field
The invention relates to the technical field of augmented reality, in particular to a multi-mode AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition.
Background
The proportion of visually and audibly captured information within a multimodal is 94% and is the two channels used by currently popular GUIs (graphical user interfaces) and VUIs (voice user interfaces). Therefore, the two maximum user interaction modes are fused in the fields of computers and graphics to judge information, a system which is operated efficiently by a user can be provided, and the user can be comfortable, efficient and safe to interact with AR (augmented reality) wearable equipment. However, existing AR glasses cannot recognize voice and gestures, and thus cannot improve user interaction experience.
Disclosure of Invention
In order to overcome the defects of the technologies, the invention provides the multimode AR glasses interaction system based on voice recognition and gesture recognition, which combines voice and gesture operations and improves user interaction.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition comprises the following steps:
a) collecting voice and gesture data of a user according to different scenes;
b) preprocessing the collected voice and gesture data;
c) training the preprocessed data by using an artificial intelligence model;
d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;
e) deploying the trained model into a wearable device system of the AR glasses;
f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.
Further, labeling and aligning data for different combinations of voice and gesture in the step b).
Further, in step f), the user activates the interactive mode through a voice gesture adding mode.
The invention has the beneficial effects that: the multi-modal AR (augmented reality) wearable device interaction system based on speech recognition and gesture recognition is provided, and effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
The invention is further described below with reference to fig. 1.
A multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition comprises the following steps:
a) collecting voice and gesture data of a user according to different scenes;
b) preprocessing the collected voice and gesture data;
c) training the preprocessed data by using an artificial intelligence model;
d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;
e) deploying the trained model into a wearable device system of the AR glasses;
f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.
The multi-modal AR (augmented reality) wearable device interaction system based on speech recognition and gesture recognition is provided, and effectively fills the gap in the aspect. The user can use the single-mode interaction mode, combine voice and gesture operation to expand the operation mode, also can set up the recognition operation mode of different voices + gestures according to the hobby of user, then combine AR (augmented reality) wearable equipment to experience brand-new multi-mode, high immersive user interaction experience.
Further, labeling and aligning data for different combinations of voice and gesture in the step b).
Further, in step f), the user activates the interactive mode through a voice gesture adding mode.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. A multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition is characterized by comprising the following steps of:
a) collecting voice and gesture data of a user according to different scenes;
b) preprocessing the collected voice and gesture data;
c) training the preprocessed data by using an artificial intelligence model;
d) optimizing the model according to the training result, and improving the accuracy of the model in recognizing the voice and the gesture;
e) deploying the trained model into a wearable device system of the AR glasses;
f) after the user activates the interaction mode, different operations and interactions are performed on the AR glasses through different voice and gesture combinations.
2. The system of claim 1, wherein the system further comprises a voice recognition and gesture recognition based multimodal AR glasses interaction system, wherein: and b) labeling and aligning data for different combinations of voice and gestures in the step b).
3. The system of claim 1, wherein the system further comprises a voice recognition and gesture recognition based multimodal AR glasses interaction system, wherein: and f), activating an interactive mode by the user in a voice and gesture mode.
CN202011416500.5A 2020-12-07 2020-12-07 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition Pending CN112486322A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011416500.5A CN112486322A (en) 2020-12-07 2020-12-07 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011416500.5A CN112486322A (en) 2020-12-07 2020-12-07 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition

Publications (1)

Publication Number Publication Date
CN112486322A true CN112486322A (en) 2021-03-12

Family

ID=74940381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011416500.5A Pending CN112486322A (en) 2020-12-07 2020-12-07 Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition

Country Status (1)

Country Link
CN (1) CN112486322A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986111A (en) * 2021-12-28 2022-01-28 北京亮亮视野科技有限公司 Interaction method, interaction device, electronic equipment and storage medium
CN114167994A (en) * 2022-02-11 2022-03-11 北京亮亮视野科技有限公司 Knowledge base adding method, device, equipment and medium
CN115756161A (en) * 2022-11-15 2023-03-07 华南理工大学 Multi-modal interactive structure mechanics analysis method, system, computer equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997235A (en) * 2016-01-25 2017-08-01 亮风台(上海)信息科技有限公司 Method, equipment for realizing augmented reality interaction and displaying
CN108334199A (en) * 2018-02-12 2018-07-27 华南理工大学 The multi-modal exchange method of movable type based on augmented reality and device
CN110211240A (en) * 2019-05-31 2019-09-06 中北大学 A kind of augmented reality method for exempting from sign-on ID
CN110415166A (en) * 2019-07-29 2019-11-05 腾讯科技(深圳)有限公司 Training method, image processing method, device and the storage medium of blending image processing model
CN110554774A (en) * 2019-07-22 2019-12-10 济南大学 AR-oriented navigation type interactive normal form system
CN111680594A (en) * 2020-05-29 2020-09-18 北京计算机技术及应用研究所 Augmented reality interaction method based on gesture recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997235A (en) * 2016-01-25 2017-08-01 亮风台(上海)信息科技有限公司 Method, equipment for realizing augmented reality interaction and displaying
CN108334199A (en) * 2018-02-12 2018-07-27 华南理工大学 The multi-modal exchange method of movable type based on augmented reality and device
CN110211240A (en) * 2019-05-31 2019-09-06 中北大学 A kind of augmented reality method for exempting from sign-on ID
CN110554774A (en) * 2019-07-22 2019-12-10 济南大学 AR-oriented navigation type interactive normal form system
CN110415166A (en) * 2019-07-29 2019-11-05 腾讯科技(深圳)有限公司 Training method, image processing method, device and the storage medium of blending image processing model
CN111680594A (en) * 2020-05-29 2020-09-18 北京计算机技术及应用研究所 Augmented reality interaction method based on gesture recognition

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986111A (en) * 2021-12-28 2022-01-28 北京亮亮视野科技有限公司 Interaction method, interaction device, electronic equipment and storage medium
CN114167994A (en) * 2022-02-11 2022-03-11 北京亮亮视野科技有限公司 Knowledge base adding method, device, equipment and medium
CN114167994B (en) * 2022-02-11 2022-06-28 北京亮亮视野科技有限公司 Knowledge base adding method, device, equipment and medium
CN115756161A (en) * 2022-11-15 2023-03-07 华南理工大学 Multi-modal interactive structure mechanics analysis method, system, computer equipment and medium
CN115756161B (en) * 2022-11-15 2023-09-26 华南理工大学 Multi-mode interactive structure mechanics analysis method, system, computer equipment and medium

Similar Documents

Publication Publication Date Title
CN112486322A (en) Multimodal AR (augmented reality) glasses interaction system based on voice recognition and gesture recognition
JP7432556B2 (en) Methods, devices, equipment and media for man-machine interaction
CN107220235B (en) Speech recognition error correction method and device based on artificial intelligence and storage medium
CN109889920B (en) Network course video editing method, system, equipment and storage medium
US11093769B2 (en) Stroke extraction in free space
Gao et al. Sign language recognition based on HMM/ANN/DP
Dreuw et al. Benchmark databases for video-based automatic sign language recognition
CN103218842B (en) A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation
CN107293296A (en) Voice identification result correcting method, device, equipment and storage medium
CN108985358A (en) Emotion identification method, apparatus, equipment and storage medium
CN109508687A (en) Man-machine interaction control method, device, storage medium and smart machine
CN107665708A (en) Intelligent sound exchange method and system
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN110148399A (en) A kind of control method of smart machine, device, equipment and medium
CN112667068A (en) Virtual character driving method, device, equipment and storage medium
CN109992765A (en) Text error correction method and device, storage medium and electronic equipment
CN114495130B (en) Cross-modal information-based document reading understanding model training method and device
CN109817210A (en) Voice writing method, device, terminal and storage medium
CN104267922A (en) Information processing method and electronic equipment
CN111160004A (en) Method and device for establishing sentence-breaking model
CN112382287A (en) Voice interaction method and device, electronic equipment and storage medium
CN115050354B (en) Digital human driving method and device
CN110032736A (en) A kind of text analyzing method, apparatus and storage medium
Ghosh et al. Eyeditor: Towards on-the-go heads-up text editing using voice and manual input
US20150073772A1 (en) Multilingual speech system and method of character

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210312

RJ01 Rejection of invention patent application after publication