CN110895931A - VR (virtual reality) interaction system and method based on voice recognition - Google Patents
VR (virtual reality) interaction system and method based on voice recognition Download PDFInfo
- Publication number
- CN110895931A CN110895931A CN201910986351.7A CN201910986351A CN110895931A CN 110895931 A CN110895931 A CN 110895931A CN 201910986351 A CN201910986351 A CN 201910986351A CN 110895931 A CN110895931 A CN 110895931A
- Authority
- CN
- China
- Prior art keywords
- module
- voice
- user
- processing
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000003993 interaction Effects 0.000 title claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 53
- 230000002093 peripheral effect Effects 0.000 claims abstract description 47
- 238000004891 communication Methods 0.000 claims abstract description 17
- 230000002452 interceptive effect Effects 0.000 claims abstract description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 230000004044 response Effects 0.000 claims description 10
- 230000009471 action Effects 0.000 claims description 9
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000007654 immersion Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 230000004438 eyesight Effects 0.000 claims description 4
- 230000014509 gene expression Effects 0.000 claims description 4
- 230000009467 reduction Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 17
- 230000008447 perception Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000035807 sensation Effects 0.000 description 5
- 235000019615 sensations Nutrition 0.000 description 5
- 238000004088 simulation Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013136 deep learning model Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 235000019613 sensory perceptions of taste Nutrition 0.000 description 1
- 230000035923 taste sensation Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention relates to the related field of voice recognition systems, and discloses a VR (virtual reality) interaction system based on voice recognition, which comprises a cloud and a VR peripheral end, wherein the cloud comprises a voice recognition module, a semantic recognition module, a scene processing module, a storage module and a communication module, the VR peripheral end comprises a display module, a voice input module and a voice input module, and the VR peripheral end also comprises the communication module, and the invention also discloses a method for the VR interaction system based on voice recognition, which comprises the following steps: constructing a knowledge base dialog library; opening a cloud end and a VR peripheral end; the user wears the VR peripheral; a user input; and (6) cloud processing. The method effectively overcomes the defects of poor interactivity and strong drawing-off feeling of the existing VR product, and realizes more natural interactive experience of people and virtual scene characters.
Description
Technical Field
The invention relates to the related field of voice recognition systems, in particular to a VR (virtual reality) interaction system and method based on voice recognition.
Background
VR, referred to as virtual reality technology for short, is an important direction of simulation technology, and is a collection of various technologies such as simulation technology, computer graphics man-machine interface technology, multimedia technology, sensing technology, network technology, and the like, which is a challenging leading discipline and research field of cross technology. Virtual reality technology (VR) mainly includes aspects of simulating environment, perception, natural skills, sensing equipment and the like. The simulated environment is a three-dimensional realistic image generated by a computer and dynamic in real time. Perception means that an ideal VR should have the perception that everyone has. In addition to the visual perception generated by computer graphics technology, there are also perceptions such as auditory sensation, tactile sensation, force sensation, and movement, and even olfactory sensation and taste sensation, which are also called multi-perception. The natural skill refers to the head rotation, eyes, gestures or other human body behavior actions of a human, and data adaptive to the actions of the participants are processed by the computer, respond to the input of the user in real time and are respectively fed back to the five sense organs of the user. The sensing device refers to a three-dimensional interaction device.
Virtual reality was proposed by the company vpl in the united states in the 80's of the 20 th century. The concrete connotations are as follows: a technique for providing an immersive sensation in an interactive three-dimensional environment generated on a computer by comprehensively utilizing a computer graphics system and various interface devices for reality and control. The virtual reality technology is a computer simulation system which can create and experience a virtual world, and the computer simulation system generates a system simulation of interactive three-dimensional dynamic visual and entity behaviors simulating multi-source information fusion by using a computer so as to immerse a user in the environment.
VR technology has a wide prospect in medical treatment, education, real estate and design. At present, VR interaction technology mainly relies on motion capture and gesture recognition, and user experience is not good, so that voice interaction becomes a strong appeal for users under the condition. Speech recognition techniques are now largely divided into two directions, namely traditional acoustic models and deep learning models. The traditional speech recognition technology, i.e. acoustic model, generates a model by extracting the audio features of the speaker under the simulation of some algorithms. The deep learning model is a technology which rises rapidly in recent years, the current comparative fire is a hidden Markov model based on a deep neural network, and the technology simulates a discriminability model based on data calculation. With the continuous progress of an algorithm and the continuous upgrade of hardware, the advantages of a deep learning model are more and more obvious, a voice recognition model based on deep learning is adopted, the existing VR product based on the voice recognition model at present is poor in interactivity and strong in drawing sense, and the natural interactive experience of people and virtual scene characters cannot be realized and needs to be improved.
Disclosure of Invention
The present invention is directed to a VR interaction system and method based on speech recognition, so as to solve the problems in the background art.
In order to achieve the purpose, the invention provides the following technical scheme: a VR (virtual reality) interaction system based on voice recognition comprises a cloud end and a VR peripheral end, wherein the cloud end comprises a voice recognition module, a semantic recognition module, a scene processing module, a storage module and a communication module, the VR peripheral end comprises a display module, a voice input module and a voice input module, and the VR peripheral end also comprises a communication module;
the voice recognition module is mainly used for primarily processing the voice of a user, namely extracting voice characteristics by means of noise reduction and reverberation removal on the basis of the voice input module, and then generating and checking a voice model by means of an algorithm based on deep learning, wherein the voice recognition module uses a plurality of algorithms and processing tools and is connected with the semantic recognition module;
the semantic recognition module carries out semantic processing again on the basis of the voice recognition module and deduces the user intention, the part needs to be analyzed according to the context to improve the accuracy, and the semantic recognition module is connected with the scene processing module;
the scene processing module analyzes the recognition result of the semantic recognition module, adjusts the layout transformation of the scene according to the result, and outputs the result through the display module, which needs the module to call a knowledge base in the storage module for relevant processing, and the scene processing module is connected with the storage module and the display module;
the scene processing module is used for calling the required dialog base knowledge base stored in the storage module according to the result of the previous step and outputting the dialog base through the voice output module, and the knowledge base is output through the display module;
the voice input module comprises a plurality of audio input devices and is connected with the voice output module;
the voice output module carries out voice output on the result in the storage module;
the communication module is responsible for communication among the peripheral devices.
Preferably, the audio input device of the voice input module comprises a microphone.
Preferably, the device of the voice input module comprises an earphone power amplifier.
A method of a VR interactive system based on speech recognition, comprising the following method steps:
constructing a knowledge base dialog library: firstly, storing a corresponding dialogue database in a storage module;
open high in the clouds and VR peripheral hardware end: after the cloud end and the VR peripheral end are opened, the communication module is ensured to be normal;
the user wears the vR peripheral: the user can feel a virtual scene after wearing the vR peripheral;
and (3) user input: a user inputs voice according to the prompt of the virtual scene or actively through the audio input peripheral;
cloud processing: through the processing at the cloud, the user can receive response information through the earphone at the vR terminal, and meanwhile response actions and expressions of the virtual scene are obtained from the display equipment of the display module.
Preferably, the method comprises the specific application based on the method steps, when the method is used, a user inputs audio through input equipment such as a microphone and the like, the audio is transmitted to the cloud, the voice recognition module is used for carrying out voice recognition to initially obtain user information, then the semantic recognition module is used for carrying out semantic recognition, the cloud understands a user instruction and deduces the intention of the user, and then the user instruction is processed in the scene processing module according to the intention of the user, and the result is transmitted to the display module; meanwhile, a knowledge base in the storage module is called, a corresponding result is returned to the VR peripheral end, and the user can listen to the dialogue information through the output equipment of the VR peripheral end, so that the user can watch the dialogue information through the display module and listen to the dialogue information through the voice input module and the audio equipment corresponding to the voice output module, double feedback of vision and hearing is obtained, and more immersion is achieved.
Compared with the prior art, the invention has the beneficial effects that: in the invention, the user can communicate with the character in the virtual scene through the voice input module on the VR peripheral end, such as micThe voice recognition module firstly reduces noise, removes reverberation, and the like to remove interference factors in the surrounding environment, then extracts voice characteristics, carries out analysis modeling through a deep learning algorithm based on a deep neural network to generate a voice model, then compares and recognizes voice information input by a user, analyzes the content and instruction information of the voice information of the user, enters a semantic recognition module on the basis of the voice recognition, carries out NLP word segmentation, keyword analysis and the like on the basis of the voice recognition by combining a context environment to further deduce the possible intention of the user, enters a scene processing module, and carries out scene processing according to the result in the semantic recognition module, the module can call a knowledge base in a storage module to perform corresponding scene response, the module comprises graph adjustment, context processing and the like, the feedback to the output is the action transformation of a virtual character, the processing results are transmitted to a display module on a VR peripheral end through a data line or other modes, and corresponding voice information is output at the same timeAnd (3) the action is carried out, so that the double perception of vision and hearing can be achieved, and the immersion of the user is greatly enhanced.
The method effectively overcomes the defects of poor interactivity and strong drawing-off feeling of the existing VR product, and realizes more natural interactive experience of people and virtual scene characters.
Drawings
Fig. 1 is a schematic diagram of a module structure according to the present invention.
In the figure: 1. a cloud end; 2. a VR peripheral end; 3. a voice recognition module; 4. a semantic recognition module; 5. a scene processing module; 6. a storage module; 7. a display module; 8. a voice input module; 9. and a voice output module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Referring to fig. 1, the present invention provides a technical solution: a VR interactive system based on voice recognition comprises a cloud end 1 and a VR peripheral end 2, wherein the cloud end 1 comprises a voice recognition module 3, a semantic recognition module 4, a scene processing module 5, a storage module 6 and a communication module, the VR peripheral end 2 comprises a display module 7, a voice input module 8 and a voice input module 8, and the VR peripheral end 2 also comprises a communication module;
the voice recognition module 3 mainly performs primary processing on the voice of a user, namely on the basis of the voice input module 8, extracts voice features in a noise reduction and reverberation removal mode, and then generates and checks a voice model through an algorithm based on deep learning, wherein the voice recognition module 3 is connected with the semantic recognition module 4 by using a plurality of algorithms and processing tools;
the semantic recognition module 4 carries out semantic processing again on the basis of the voice recognition module 3 and deduces the user intention, and the part needs to be analyzed according to the context to improve the accuracy, and the semantic recognition module 4 is connected with the scene processing module 5;
the scene processing module 5 analyzes the recognition result of the semantic recognition module 4, adjusts the layout transformation of the scene according to the result, and outputs the result through the display module 7, which needs the module to call a knowledge base in the storage module 6 for relevant processing, and the scene processing module 5 is connected with the storage module 6 and the display module 7;
the storage module 6 is used for storing a knowledge base and a conversation base, the scene processing module 5 outputs the required conversation base knowledge base which is called and stored in the storage module 6 according to the result of the previous step, the conversation base is output through the voice output module 9, and the knowledge base is output through the display module 7;
the voice input module 8 comprises a plurality of audio input devices, and the voice input module 8 is connected with the voice output module 9;
the voice output module 9 outputs the result in the storage module 6 in voice;
the communication module is responsible for communication among the peripheral devices.
Further, the audio input device of the voice input module 8 includes a microphone, and the device of the voice input module 8 includes an earphone power amplifier.
The principle method based on the system in the embodiment comprises the following steps:
constructing a knowledge base dialog library: firstly, storing a corresponding dialogue database in a storage module 6;
open high in the clouds 1 and VR peripheral hardware end 2: after the cloud 1 and the VR peripheral end 2 are started, the communication module is ensured to be normal;
the user wears the VR peripheral: the user can feel the virtual scene after wearing the VR peripheral;
and (3) user input: a user inputs voice according to the prompt of the virtual scene or actively through the audio input peripheral;
cloud 1 processing: through the processing at the cloud 1, the user receives the response information at the VR terminal through the earphone, and obtains the response action and the expression of the virtual scene from the display device of the display module 7.
Based on the specific application of the steps of the method, the specific use steps are as follows: when the system is used, a user inputs audio through input equipment such as a microphone and the like, the audio is transmitted to the cloud 1, the voice recognition module 3 is used for voice recognition to obtain user information primarily, the semantic recognition module 4 is used for semantic recognition, the cloud 1 understands a user instruction and deduces the intention of the user, the user instruction is processed in the scene processing module 5 according to the intention of the user, and the result is transmitted to the display module 7; meanwhile, a knowledge base in the storage module 6 is called, a corresponding result is returned to the VR peripheral end 2, and the user can listen to the dialogue information through the output equipment of the VR peripheral end 2, so that the user can watch the dialogue information through the display module 7 and listen to the dialogue information through the audio equipment corresponding to the voice input module 8 and the voice output module 9, double feedback of vision and hearing is obtained, and more immersion is achieved.
It should be noted that the present invention is not limited to the cloud and the VR peripheral, but refers to all devices of which the scene control system and the intelligent voice system are independent from the VR peripheral, and may also be the cloud, etc., and for the purpose of presentation, the present invention takes the cloud as an example for convenience of understanding.
A general core processor is independent of a VR peripheral end, a processor with high computing performance is needed due to the fact that a large amount of computing data are needed, at the present stage, the processor cannot meet the requirement of an all-in-one machine integrating peripherals, so that the idea is changed, a brand-new method for placing the processing process in the cloud is provided, the mode has the advantage that the processing process in the cloud can have better networking performance, and the processing method is more suitable for processing of big data.
Under the idea of the new method, in the invention, the user communicates with the character in the virtual scene through the voice input module 8 on the VR peripheral 2, for example, micThe voice recognition module 3 firstly reduces noise, removes reverberation, and the like to remove interference factors in the surrounding environment, then extracts voice characteristics, carries out analysis modeling through a deep learning algorithm based on a deep neural network to generate a voice model, then compares and recognizes the voice information input by a user, analyzes the content and instruction information of the voice information of the user, enters the semantic recognition module 4 on the basis of the voice recognition, carries out NLP word segmentation, keyword analysis and the like on the basis of the voice recognition by the cloud 1, deduces possible intentions of the user by combining a context environment, enters the scene processing module 5, and carries out scene processing according to the result in the semantic recognition module 4, the module can call a knowledge base in the storage module 6, corresponding scene response is carried out, the image adjustment is included, context processing and the like are carried out, the feedback to the output is the action transformation of virtual characters, the processing results are transmitted to the display module 7 on the VR peripheral end 2 through a data line or other modes, corresponding voice information is output at the same time, the storage module 6 comprises a knowledge graph base, a conversation base and the like, the cloud end is on the basis of the scene processing and returns response conversation from the conversation base of the storage module 6, meanwhile, the cloud end 1 is used for carrying out relevant scene processing in the scene processing module 5, according to instructions of a user, if expressions or actions of response are carried out, the double perception of visual sense and auditory sense can be achieved, and the immersion sense of the user is greatly enhanced.
Through the process analysis, the defects of poor interactivity and strong extraction sense of the existing VR products are effectively overcome, and more natural interactive experience of people and virtual scene characters is realized.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative examples and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any icons in the claims should not be construed as limiting the claim concerned.
Claims (5)
1. A VR interactive system based on speech recognition is characterized in that: the device comprises a cloud end (1) and a VR (virtual reality) peripheral end (2), wherein the cloud end (1) comprises a voice recognition module (3), a semantic recognition module (4), a scene processing module (5), a storage module (6) and a communication module, the VR peripheral end (2) comprises a display module (7), a voice input module (8) and a voice input module (8), and the VR peripheral end (2) also comprises the communication module;
the voice recognition module (3) mainly performs primary processing on the voice of a user, namely on the basis of the voice input module (8), voice features are extracted in a noise reduction and reverberation removal mode, then a voice model is generated and checked through an algorithm based on deep learning, a plurality of algorithms and processing tools are used in the voice feature extraction and reverberation removal mode, and the voice recognition module (3) is connected with the semantic recognition module (4);
the semantic recognition module (4) carries out semantic processing again on the basis of the voice recognition module (3) and deduces the user intention, and the part needs to be analyzed according to the context to improve the accuracy, and the semantic recognition module (4) is connected with the scene processing module (5);
the scene processing module (5) analyzes the recognition result of the semantic recognition module (4), adjusts the layout transformation of the scene according to the result, and outputs the result through the display module (7), which needs the module to call a knowledge base in the storage module (6) for relevant processing, and the scene processing module (5) is connected with the storage module (6) and the display module (7);
the scene processing module (5) outputs the required dialog library knowledge base which is called and stored in the memory module (6) according to the result of the previous step, the dialog library is output through the voice output module (9), and the knowledge base is output through the display module (7);
the voice input module (8) comprises a plurality of audio input devices, and the voice input module (8) is connected with the voice output module (9);
the voice output module (9) outputs the result in the storage module (6) in a voice mode;
the communication module is responsible for communication among the peripheral devices.
2. The VR interaction system of claim 1, wherein: the audio input device of the voice input module (8) comprises a microphone.
3. The VR interaction system of claim 1, wherein: the equipment of the voice input module (8) comprises an earphone power amplifier.
4. The method for a voice recognition based VR interaction system of any of claims 1-3, further comprising: the method comprises the following steps:
constructing a knowledge base dialog library: firstly, storing a corresponding dialog library in a storage module (6);
open high in the clouds (1) and VR peripheral end (2): after the cloud end (1) and the VR peripheral end (2) are started, the communication module is ensured to be normal;
the user wears the VR peripheral: the user can feel the virtual scene after wearing the VR peripheral;
and (3) user input: a user inputs voice according to the prompt of the virtual scene or actively through the audio input peripheral;
cloud (1) processing: through the processing at the cloud (1), the user can receive response information through the earphone at the VR terminal, and meanwhile response actions and expressions of the virtual scene are obtained from the display equipment of the display module (7).
5. The method of claim 4, wherein the method comprises: the method comprises the specific application based on the method steps, when the method is used, a user inputs audio through input equipment such as a microphone and the like, the audio is transmitted to a cloud end (1), the voice recognition module (3) is used for carrying out voice recognition to obtain user information preliminarily, then the semantic recognition module (4) is used for carrying out semantic recognition, the cloud end (1) understands a user instruction and deduces the intention of the user, then the user instruction is processed in a scene processing module (5) according to the intention of the user, and the result is transmitted to a display module (7); meanwhile, a knowledge base in the storage module (6) is called, a corresponding result is returned to the VR peripheral end (2), and the user can listen to the dialogue information through the output equipment of the VR peripheral end (2), so that the user can watch the dialogue information through the display module (7) and listen to the dialogue information through the audio equipment corresponding to the voice input module (8) and the voice output module (9), double feedback of vision and hearing is obtained, and more immersion is achieved.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910986351.7A CN110895931A (en) | 2019-10-17 | 2019-10-17 | VR (virtual reality) interaction system and method based on voice recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910986351.7A CN110895931A (en) | 2019-10-17 | 2019-10-17 | VR (virtual reality) interaction system and method based on voice recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110895931A true CN110895931A (en) | 2020-03-20 |
Family
ID=69786337
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910986351.7A Pending CN110895931A (en) | 2019-10-17 | 2019-10-17 | VR (virtual reality) interaction system and method based on voice recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110895931A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696536A (en) * | 2020-06-05 | 2020-09-22 | 北京搜狗科技发展有限公司 | Voice processing method, apparatus and medium |
CN111768768A (en) * | 2020-06-17 | 2020-10-13 | 北京百度网讯科技有限公司 | Voice processing method and device, peripheral control equipment and electronic equipment |
CN111939558A (en) * | 2020-08-19 | 2020-11-17 | 北京中科深智科技有限公司 | Method and system for driving virtual character action by real-time voice |
CN111986297A (en) * | 2020-08-10 | 2020-11-24 | 山东金东数字创意股份有限公司 | Virtual character facial expression real-time driving system and method based on voice control |
CN112216278A (en) * | 2020-09-25 | 2021-01-12 | 威盛电子股份有限公司 | Speech recognition system, instruction generation system and speech recognition method thereof |
CN113672155A (en) * | 2021-07-02 | 2021-11-19 | 浪潮金融信息技术有限公司 | Self-service operating system, method and medium based on VR technology |
CN117391822A (en) * | 2023-12-11 | 2024-01-12 | 中汽传媒(天津)有限公司 | VR virtual reality digital display method and system for automobile marketing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106550156A (en) * | 2017-01-23 | 2017-03-29 | 苏州咖啦魔哆信息技术有限公司 | A kind of artificial intelligence's customer service system and its implementation based on speech recognition |
CN109841217A (en) * | 2019-01-18 | 2019-06-04 | 苏州意能通信息技术有限公司 | A kind of AR interactive system and method based on speech recognition |
US20190198019A1 (en) * | 2017-12-26 | 2019-06-27 | Baidu Online Network Technology (Beijing) Co., Ltd | Method, apparatus, device, and storage medium for voice interaction |
CN110335595A (en) * | 2019-06-06 | 2019-10-15 | 平安科技(深圳)有限公司 | Slotting based on speech recognition asks dialogue method, device and storage medium |
-
2019
- 2019-10-17 CN CN201910986351.7A patent/CN110895931A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106550156A (en) * | 2017-01-23 | 2017-03-29 | 苏州咖啦魔哆信息技术有限公司 | A kind of artificial intelligence's customer service system and its implementation based on speech recognition |
US20190198019A1 (en) * | 2017-12-26 | 2019-06-27 | Baidu Online Network Technology (Beijing) Co., Ltd | Method, apparatus, device, and storage medium for voice interaction |
CN109841217A (en) * | 2019-01-18 | 2019-06-04 | 苏州意能通信息技术有限公司 | A kind of AR interactive system and method based on speech recognition |
CN110335595A (en) * | 2019-06-06 | 2019-10-15 | 平安科技(深圳)有限公司 | Slotting based on speech recognition asks dialogue method, device and storage medium |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696536A (en) * | 2020-06-05 | 2020-09-22 | 北京搜狗科技发展有限公司 | Voice processing method, apparatus and medium |
CN111696536B (en) * | 2020-06-05 | 2023-10-27 | 北京搜狗智能科技有限公司 | Voice processing method, device and medium |
CN111768768A (en) * | 2020-06-17 | 2020-10-13 | 北京百度网讯科技有限公司 | Voice processing method and device, peripheral control equipment and electronic equipment |
CN111768768B (en) * | 2020-06-17 | 2023-08-29 | 北京百度网讯科技有限公司 | Voice processing method and device, peripheral control equipment and electronic equipment |
CN111986297A (en) * | 2020-08-10 | 2020-11-24 | 山东金东数字创意股份有限公司 | Virtual character facial expression real-time driving system and method based on voice control |
CN111939558A (en) * | 2020-08-19 | 2020-11-17 | 北京中科深智科技有限公司 | Method and system for driving virtual character action by real-time voice |
CN112216278A (en) * | 2020-09-25 | 2021-01-12 | 威盛电子股份有限公司 | Speech recognition system, instruction generation system and speech recognition method thereof |
CN113672155A (en) * | 2021-07-02 | 2021-11-19 | 浪潮金融信息技术有限公司 | Self-service operating system, method and medium based on VR technology |
CN113672155B (en) * | 2021-07-02 | 2023-06-30 | 浪潮金融信息技术有限公司 | VR technology-based self-service operation system, method and medium |
CN117391822A (en) * | 2023-12-11 | 2024-01-12 | 中汽传媒(天津)有限公司 | VR virtual reality digital display method and system for automobile marketing |
CN117391822B (en) * | 2023-12-11 | 2024-03-15 | 中汽传媒(天津)有限公司 | VR virtual reality digital display method and system for automobile marketing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110895931A (en) | VR (virtual reality) interaction system and method based on voice recognition | |
WO2022048403A1 (en) | Virtual role-based multimodal interaction method, apparatus and system, storage medium, and terminal | |
CN106653052B (en) | Virtual human face animation generation method and device | |
WO2022052481A1 (en) | Artificial intelligence-based vr interaction method, apparatus, computer device, and medium | |
CN111145322B (en) | Method, apparatus, and computer-readable storage medium for driving avatar | |
US20230042654A1 (en) | Action synchronization for target object | |
CN113454708A (en) | Linguistic style matching agent | |
CN108877336A (en) | Teaching method, cloud service platform and tutoring system based on augmented reality | |
JP2022524944A (en) | Interaction methods, devices, electronic devices and storage media | |
CN107003825A (en) | System and method with dynamic character are instructed by natural language output control film | |
CN112668407A (en) | Face key point generation method and device, storage medium and electronic equipment | |
Morishima | Real-time talking head driven by voice and its application to communication and entertainment | |
CN205451551U (en) | Speech recognition driven augmented reality human -computer interaction video language learning system | |
El Haddad et al. | Laughter and smile processing for human-computer interactions | |
KR20060091329A (en) | Interactive system and method for controlling an interactive system | |
US20220301250A1 (en) | Avatar-based interaction service method and apparatus | |
Ding et al. | Interactive multimedia mirror system design | |
CN114201596A (en) | Virtual digital human use method, electronic device and storage medium | |
Chandrasiri et al. | Internet communication using real-time facial expression analysis and synthesis | |
Morishima et al. | Face-to-face communicative avatar driven by voice | |
Leandro Parreira Duarte et al. | Coarticulation and speech synchronization in MPEG-4 based facial animation | |
Ohsugi et al. | A Comparative Study of Statistical Conversion of Face to Voice Based on Their Subjective Impressions. | |
Santos-Pérez et al. | AVATAR: an open source architecture for embodied conversational agents in smart environments | |
Sundblad et al. | OLGA—a multimodal interactive information assistant | |
Zoric et al. | Towards facial gestures generation by speech signal analysis using huge architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200320 |
|
RJ01 | Rejection of invention patent application after publication |