CN110471531A - Multi-modal interactive system and method in virtual reality - Google Patents
Multi-modal interactive system and method in virtual reality Download PDFInfo
- Publication number
- CN110471531A CN110471531A CN201910749937.1A CN201910749937A CN110471531A CN 110471531 A CN110471531 A CN 110471531A CN 201910749937 A CN201910749937 A CN 201910749937A CN 110471531 A CN110471531 A CN 110471531A
- Authority
- CN
- China
- Prior art keywords
- user
- information
- dialog
- virtual reality
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000002452 interceptive effect Effects 0.000 title abstract description 7
- 230000000694 effects Effects 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims abstract description 8
- 230000008569 process Effects 0.000 claims description 22
- 230000000295 complement effect Effects 0.000 claims description 15
- 230000010365 information processing Effects 0.000 claims description 15
- 230000004044 response Effects 0.000 claims description 12
- 230000008451 emotion Effects 0.000 claims description 11
- 230000003993 interaction Effects 0.000 claims description 9
- 230000003238 somatosensory effect Effects 0.000 claims description 6
- 230000033001 locomotion Effects 0.000 claims description 5
- 230000004807 localization Effects 0.000 claims description 4
- 230000002996 emotional effect Effects 0.000 claims description 3
- 239000013589 supplement Substances 0.000 claims description 3
- 230000008859 change Effects 0.000 abstract description 2
- 210000000697 sensory organ Anatomy 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000007499 fusion processing Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 238000007654 immersion Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present invention provides multi-modal human-computer dialogue learning system, method and apparatus in a kind of virtual reality, User can intuitively be interacted with computer by a variety of interactive modes, in three dimensions with the operation of User and intention dynamic change, various sense organs and interactive mode are explained, situational understanding can preferably be realized by data fusion.Interlocutor's feedback of personification is given on this basis, to reach the practice effect of better man-machine natural language foreign language learning.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a system, a method and equipment for multi-modal man-machine dialogue learning in Virtual Reality (VR).
Background
The invention is mainly based on two backgrounds, namely that real human external education resources are scarce or high in cost, the current 1-to-1 North America external education mode pain point is uneconomic in scale, and loss of many companies is increased along with the increase of scale. The current online voice culture industry tries to solve the bottlenecks of high teacher cost and scarce supply through teaching of AI virtual teachers; secondly, the mainstream foreign language man-machine dialogue learning system requires the user to follow and repeat through the standard sentences, the followed and read sentences are scored from the voice evaluation system, and the learning modes only can understand the words spoken by other people, so that the hearing and spoken language pronunciation are improved, but the subjective expression cannot be well carried out.
Disclosure of Invention
In recent years, with the development of voice recognition, voice dialogue systems, voice evaluation models, voice synthesis, and virtual reality technologies, natural dialogue between a person and a computer has been greatly improved, and the computer has been able to understand the weather inquiry requirements of a user, answer the shopping questions of the user, inquire ticket information, and the like, and even trace the questions with inaccurate voice recognition, and answer the questions in various tones according to the characters as required. However, the inventor finds that the technology applied to the online teaching of the foreign language has at least the following disadvantages through long-term observation and experiments:
firstly, at a mobile phone or a Pad mobile terminal, a real language context is created through a rich text form, but the mode of external education pure pronunciation matching with pictures and sound effects still cannot generate satisfactory immersion.
Secondly, it is good to use the effect of VR virtual reality to create an immersive foreign language learning environment, however, the existing online teaching VR system is limited to send programmed VR video data through a fixed mode, it should be understood that the language ability of a person does not only mean whether he can make sentences conforming to grammar rules, but also has the ability to properly and vividly use language, and this ability needs to cultivate language sense through a high-frequency natural interactive dialogue process to improve the comprehensive ability of foreign language, so the VR system lacking a man-machine dialogue function has limited help to the foreign language ability of the user.
Thirdly, single-channel voice is taken as the main part of man-machine interaction in foreign language teaching, however, head movement, gestures, limb movement, emotion changes and the like of a person during speaking are important information feedback in natural conversation, and academic multi-mode man-machine conversation technology has preliminary exploration.
In view of the above defects in the prior art, the present invention provides a system, a method and a device for multi-modal human-machine dialogue learning in virtual reality, so that a student user can intuitively interact with a computer through multiple interactive modes, dynamically change along with the operation and intention of the student user in a three-dimensional space, explain various senses and interactive modes, and better realize situation understanding through data fusion. On the basis, the feedback of anthropomorphic feedback is given to the interlocutor, thereby achieving better training effect of man-machine natural language foreign language learning.
In one aspect, the present invention provides a multimodal human-machine dialog system, comprising: the virtual reality equipment is configured to be capable of creating a virtual space and manipulating a virtual image; the information acquisition module is configured to be capable of acquiring and receiving user information from a user through the virtual reality device; the information processing module is configured to fuse the received user information to generate multi-modal collaborative dialog content; and the information output module is configured to be capable of outputting the multi-modal collaborative session content to the virtual reality device so as to correspondingly manipulate the virtual imagery in the virtual space.
In some embodiments, optionally, the virtual reality device includes one or more of the following: the system comprises a virtual reality head display, a virtual reality base station, a handheld controller, a motion sensing device, a host, a display screen, a sensor, an audio acquisition device and an audio playing device; and the user information comprises one or more of the following channel signals: user speech, emotion analysis, head tracking, gaze interaction, spatial localization, handle and gesture signals, somatosensory.
In some embodiments, optionally, the information processing module is further configured to fuse different channel signals in the user information at different timings.
In some embodiments, optionally, the information processing module is further configured to fuse different channel signals in the user information according to different timing relationships and/or constraint relationships between other features and the speech to obtain the multi-modal collaborative dialog content.
In some embodiments, optionally, the constraint relationship comprises one or more of the following relationships: alternating relationships, complementary relationships, enhanced relationships; wherein, the alternative relation means that semantic information representation between different channel signals is similar and/or can be replaced with each other; the complementary relation means that the voice content in the conversation process needs other channel signals as supplement to form complete semantics; and an enhancement relationship means that semantic information represented between different channel signals is relatively independent and/or can enhance the expression effect of other channel signals.
In some embodiments, optionally, when the different channel signals are ambiguous with respect to understanding the semantics, the information processing module is further configured to: if the information among the channels is in an alternate relationship, making a dialogue response according to the voice content; if the information among the channels is in a complementary relationship, combining the information of the channels with the complementary relationship to eliminate ambiguity; if the information among the channels is an enhancement relationship, making a dialogue response according to the voice content and the emotional intensity; and if the ambiguity can not be eliminated, performing suggestive inquiry according to the context content of the current conversation process so as to further acquire feedback from the user.
In some embodiments, optionally, the information processing module is further configured to switch the conversation control right between the user and the system according to the question or question of the user and the system during the conversation process of the user and the system.
In another aspect, the present application further provides a multimodal human-machine conversation method, including the following steps: receiving user information from a user, the user information comprising a multi-channel signal; fusing the multi-channel signals according to the time sequence relation and/or the constraint relation to generate multi-modal collaborative dialog content; and outputting the multi-modal collaborative dialog content to the virtual reality device so as to correspondingly control the virtual image in the virtual space.
In some embodiments, optionally, the fusing step comprises: if the multi-channel signal only comprises voice information, making a dialogue response according to the voice content; if the multi-channel signal is ambiguous in semantic understanding, fusing the multi-channel signal according to a constraint relation; and if the ambiguity can not be eliminated, performing suggestive inquiry according to the context content of the current conversation process so as to further acquire feedback from the user.
In some embodiments, optionally, during the course of a user's dialog with the system, the dialog control rights are changed between the user and the system based on questions or questions asked from the user and the system.
Compared with the prior art, the invention has the beneficial effects that at least:
first, the invention combines the multi-mode interaction technology and the virtual reality technology, and the developed novel man-machine conversation system enables student users to more achieve immersive learning activities, and the whole interaction process is vivid and lifelike, thereby greatly improving the learning interest of students and achieving the purpose of learning migration.
Secondly, the technique of the present invention contributes to improving the naturalness of the man-machine conversation. The multi-modal dialog system constructed on the basis of fusion processing of multi-channel signals such as voice, head gestures, gestures and emotions can provide more information power for a computer, so that student users can obtain more natural experience in the whole dialog process.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
The present invention will become more readily understood from the following detailed description when read in conjunction with the accompanying drawings, wherein like reference numerals designate like parts throughout the figures, and in which:
fig. 1 is a schematic structural diagram of functional modules according to an embodiment of the present invention.
FIG. 2 is a block diagram of program modules according to an embodiment of the present invention.
FIG. 3 is a conversation policy logic diagram of one embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.
The multi-modal human-machine dialog learning system and device can include a contextual classroom and a multi-modal human-machine dialog learning system. Virtual reality equipment such as a host VR head display, a VR base station, a handheld controller, somatosensory equipment and a computer host are arranged in the scene classroom. The host VR head display can also comprise a display screen, a sensor, an audio acquisition device and an audio playing device. The VR base station and handheld controller can transform a room into a three-dimensional space, allow the user to move around in a virtual world, and manipulate virtual imagery using the handheld controller for motion tracking. The computer host is provided with a multi-mode man-machine conversation learning system which is respectively connected with the VR head display device and the somatosensory device.
Fig. 1 is a schematic structural diagram of functional modules according to an embodiment of the present invention. As shown in fig. 1, the multi-modal human-machine dialog system provided by the present invention may further include an information obtaining module, an information processing module, and an information output module, besides the virtual reality device. The virtual reality device can create a virtual space and manipulate virtual images, and may include one or more of the following devices: the system comprises a virtual reality head display, a virtual reality base station, a handheld controller, a motion sensing device, a host, a display screen, a sensor, an audio acquisition device and an audio playing device. The information acquisition module can acquire and receive user information from a user through the virtual reality device, and the user information can include one or more of the following channel signals: user speech, emotion analysis, head tracking, gaze interaction, spatial localization, handle and gesture signals, somatosensory. The information processing module can fuse the received user information to generate multi-modal collaborative dialog content. The information output module can output the multi-modal collaborative session content to the virtual reality device so as to correspondingly control the virtual image in the virtual space.
In some embodiments, the information acquisition module receives information of different channels such as voice, voice emotion, head tracking, space orientation, handle signal and body feeling signal from the user through the host VR head display input device, and then generates the multi-modal collaborative dialog content by means of the information processing module. And the information output module is used for synchronously outputting the voice response of the virtual character in the virtual reality scene and the visual and action interaction of the collocation to the host VR head display equipment.
FIG. 2 is a block diagram of program modules according to an embodiment of the present invention. By adopting the technical scheme, the student user wears the VR helmet in a scene classroom and enters the multi-modal man-machine conversation learning system as shown in figure 2. The novel teaching system has the advantages that a student user can be familiar with operations such as voice recording, head and space positioning, handle buttons and the like through the novice guidance module, then the system enters the learning target module, and the student user can know the background of a story and the learning task to be completed. And the system displays a virtual scene with a scene sequence label of 1 and displays displayable virtual roles in the virtual scene, and the system virtual roles develop man-machine conversation with student users according to the story line of the functional script.
During the conversation process between the user and the system, the information processing module can also switch the conversation control right between the user and the system according to the question or question of the user and the system. In some embodiments, in the multi-modal human-computer conversation learning system, the storyline of the functional script adopts a mixed-dominant mode, that is, the user and the system can master the control right of the conversation, the user and the system can ask or ask questions, and the conversation control right in the conversation process is changed along with the conversation process, so that the multi-modal human-computer conversation is more like the interactive mode of a real human conversation.
By adopting the technical scheme, in the man-machine conversation, fusion of different channel signals on different time sequences in the conversation process needs to be processed. In some embodiments, for example, in an english teaching scenario, the technical solution of the present invention may mainly use speech, and perform fusion feedback on the dialog leading and dialog control strategies according to different timing relationships and constraint relationships between other features and speech, which is a key to improve the naturalness of multimodal human-machine dialog.
In some embodiments, the information processing module can fuse different channel signals in the user information at different timings, and can fuse different channel signals in the user information according to different timing relationships and/or constraint relationships between other features and the speech to obtain the multi-modal collaborative dialog content.
The constraint relationships may include one or more of the following relationships: alternating relationships, complementary relationships, and enhanced relationships. An alternating relationship means that semantic information representations between different channel signals are similar and/or can be substituted for each other; the complementary relation means that the voice content in the conversation process needs other channel signals as supplement to form complete semantics; an enhancement relationship means that semantic information represented between different channel signals is relatively independent and/or can enhance the expression effect of other channel signals.
In some embodiments, in the multimodal human-machine dialog learning system, the different channel signals may include: user speech, emotion analysis, head tracking, gaze interaction, spatial localization, handle and gesture signals, and somatosensory. Different channels have different influences on voice interaction, and multi-modal collaborative dialogue content can be fused according to information alternation relation, information complementary relation, information enhancement relation and the like.
By adopting the technical scheme, the dialogue management module considers the relevance of different channel signals on semantics, realizes the fusion processing on different levels, formulates various dialogue strategies according to teaching targets, can effectively improve the naturalness of man-machine dialogue, enables student users to learn proper language expression sentences in the scene, and finally transmits dialogue information through virtual characters in the voice synthesis module.
FIG. 3 is a conversation policy logic diagram of one embodiment of the present invention. As shown in fig. 3, after receiving a multi-channel signal input, when there are multiple channel signals and the semantic understanding is ambiguous, if the information between the channels is an alternate relationship, because word senses can be substituted for each other, a dialog response is made according to the speech content; if the information between the channels is in a complementary relationship, the information of the complementary relationship channels is combined to eliminate ambiguity as the voice content needs to be supplemented with other channel information to form complete semantics; if the information among the channels is an enhancement relationship, the expression effect of the voice channel can be enhanced, so that a dialogue response is made according to the voice content and the emotional intensity; if the ambiguity can not be resolved, some suggestive inquiries are made according to the context content of the current conversation process, and the user is required to perform feedback.
The invention also provides a multi-mode man-machine conversation method, which comprises the following steps: receiving user information from a user, the user information comprising a multi-channel signal; fusing the multi-channel signals according to the time sequence relation and/or the constraint relation to generate multi-modal collaborative dialog content; and outputting the multi-modal collaborative dialog content to the virtual reality device so as to correspondingly control the virtual image in the virtual space.
In some embodiments, if the multi-channel signal includes only voice information, a dialog response is made based on the voice content; if the multi-channel signal is ambiguous in semantic understanding, fusing the multi-channel signal according to a constraint relation; and if the ambiguity can not be eliminated, performing suggestive inquiry according to the context content of the current conversation process so as to further acquire feedback from the user. During the conversation between the user and the system, the conversation control right is changed between the user and the system according to the question or question of the user and the system.
In some embodiments, a multimodal human-machine dialog learning system and apparatus, comprising: contextual classrooms and multimodal human-machine dialogue learning systems. In practice, the working process is as follows:
the student user wears the VR helmet in a scene classroom and enters the multi-modal man-machine conversation learning system as shown in the structural schematic diagram of the program module in FIG. 2. Firstly, a student user is familiar with operations such as voice recording, head and space positioning, handle buttons and the like through the novice guidance module, then the system enters the learning target module, and the student user knows the background of a story and the learning task required to be completed, such as: in a detective story context, What, and white question is expressed. And the system displays a virtual scene with a scene sequence label of 1 and displays displayable virtual roles in the virtual scene, and the system virtual roles develop man-machine conversation with student users according to the story line of the functional script. The storyline may be a student user role-playing spy to police to assist in case handling and a virtual role police to converse to find out who a suspect is.
In the multi-mode information fusion module, a voice signal and a head tracking signal Are received at the same time, a student user answers the question "Are you for?" and the head tracking signal receives a nod, the dialogue management module regards the answer as a positive reply according to a dialogue strategy that the information is in an alternative relation, and in order to enable the student user to learn a proper language expression statement in the scene, if the student user needs to be actively introduced for the first time, the student user can inquire about "May I know your name?" through a virtual role in the voice synthesis module.
In some examples, the multi-modal information fusion module receives the voice signal and the handle signal at the same time, a student user answers the question of 'Who is the choice?' to 'I think he is the choice' and points to a photo with the handle, the dialogue management module combines the information of the complementary relation channel according to the dialogue strategy of the information being the complementary relation, the he capable of understanding the sentence is the photo person pointed to by the handle, and when the multi-channel information is fused, the ambiguity is eliminated.
In some examples, the method further comprises the steps of receiving the voice signal and the spatial positioning signal at the same time in the multi-modal information fusion module, retreating for several steps when the student user says 'at's this people's the dead?', rising the surprised and afraid emotion in emotion analysis, and making a dialogue response according to the information and the dialogue strategy for enhancing the relation and adding the emotion intensity to the voice content by the dialogue management module, understanding the surprised and afraid emotion of the student user on the item, and making a certain appeasing words through a virtual character.
In some embodiments, the various methods, processes, modules, apparatuses, devices, or systems described above may be implemented or performed in one or more processing devices (e.g., digital processors, analog processors, digital circuits designed to process information, analog circuits designed to process information, state machines, computing devices, computers, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices that perform some or all of the operations of a method in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for performing one or more operations of a method. The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Embodiments of the invention may be implemented in hardware, firmware, software, or various combinations thereof. The invention may also be implemented as instructions stored on a machine-readable medium, which may be read and executed using one or more processing devices. In one implementation, a machine-readable medium may include various mechanisms for storing and/or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable storage medium may include read-only memory, random-access memory, magnetic disk storage media, optical storage media, flash-memory devices, and other media for storing information, and a machine-readable transmission medium may include various forms of propagated signals (including carrier waves, infrared signals, digital signals), and other media for transmitting information. While firmware, software, routines, or instructions may be described in the above disclosure in terms of performing certain exemplary aspects and embodiments of certain actions, it will be apparent that such descriptions are merely for convenience and that such actions in fact result from a machine device, computing device, processing device, processor, controller, or other device or machine executing the firmware, software, routines, or instructions.
This written description uses examples to disclose the invention, one or more examples of which are described or illustrated in the specification and drawings. Each example is provided by way of explanation of the invention, not limitation of the invention. In fact, it will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the scope or spirit of the invention. For instance, features illustrated or described as part of one embodiment, can be used with another embodiment to yield a still further embodiment. It is therefore intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents. The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. The protection scope of the present invention is subject to the protection scope of the claims.
Claims (10)
1. A multimodal human machine dialog system, comprising:
a virtual reality device configured to be able to create a virtual space and manipulate a virtual image;
an information acquisition module configured to enable acquisition and receipt of user information from a user through the virtual reality device;
an information processing module configured to enable fusion of the received user information to generate multimodal collaborative dialog content; and
an information output module configured to be capable of outputting the multi-modal collaborative dialog content to the virtual reality device to manipulate the virtual imagery in the virtual space accordingly.
2. The multimodal human dialog system of claim 1, characterized in that:
the virtual reality device includes one or more of: the system comprises a virtual reality head display, a virtual reality base station, a handheld controller, a motion sensing device, a host, a display screen, a sensor, an audio acquisition device and an audio playing device; and
the user information comprises one or more of the following channel signals: user speech, emotion analysis, head tracking, gaze interaction, spatial localization, handle and gesture signals, somatosensory.
3. A multimodal human dialog system as claimed in any of the preceding claims, characterized in that:
the information processing module is further configured to be capable of fusing different channel signals in the user information at different timings.
4. A multimodal human dialog system as claimed in any of the preceding claims, characterized in that:
the information processing module is further configured to be capable of fusing different channel signals in the user information according to different timing sequence relations and/or constraint relations between other features and voice to obtain multi-modal collaborative dialog content.
5. A multimodal human dialog system as claimed in any of the preceding claims, characterized in that:
the constraint relationship comprises one or more of the following relationships: alternating relationships, complementary relationships, enhanced relationships; wherein,
the alternate relationship means that semantic information representations between different channel signals are similar and/or can be substituted for each other;
the complementary relation means that the voice content in the conversation process needs other channel signals as supplement to form complete semantics; and
the enhancement relationship means that semantic information expressed between different channel signals is relatively independent, and/or the expression effect of other channel signals can be enhanced.
6. A multimodal human dialog system as claimed in any of the preceding claims, characterized in that:
when the different channel signals are ambiguous with respect to understanding semantics, the information processing module is further configured to be capable of:
if the information among the channels is the alternate relationship, making a dialogue response according to the voice content;
if the information between the channels is the complementary relationship, combining the information of the channels with the complementary relationship to eliminate ambiguity;
if the information among the channels is the enhancement relationship, making a dialogue response according to the voice content and the emotional intensity; and
if the ambiguity can not be resolved, prompt inquiry is carried out according to the context content of the current conversation process so as to further obtain feedback from the user.
7. A multimodal human dialog system as claimed in any of the preceding claims, characterized in that:
the information processing module is further configured to switch the conversation control right between the user and the system according to the question or question of the user and the system during the conversation process of the user and the system.
8. A method of multimodal human-machine dialog, comprising the steps of:
receiving user information from a user, the user information comprising a multi-channel signal;
fusing the multi-channel signals according to a time sequence relation and/or a constraint relation to generate multi-modal collaborative dialog content; and
and outputting the multi-modal collaborative dialog content to a virtual reality device so as to correspondingly control the virtual image in the virtual space.
9. The method of claim 8, wherein the fusing step comprises:
if the multi-channel signal only comprises voice information, making a dialogue response according to the voice content;
if the multi-channel signal is ambiguous in semantic understanding, fusing the multi-channel signal according to the constraint relation; and
if the ambiguity can not be resolved, prompt inquiry is carried out according to the context content of the current conversation process so as to further obtain feedback from the user.
10. A multi-modal human dialog method according to any of the preceding claims characterized in that:
during the conversation between the user and the system, the conversation control right is changed between the user and the system according to the question or question of the user and the system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910749937.1A CN110471531A (en) | 2019-08-14 | 2019-08-14 | Multi-modal interactive system and method in virtual reality |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910749937.1A CN110471531A (en) | 2019-08-14 | 2019-08-14 | Multi-modal interactive system and method in virtual reality |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110471531A true CN110471531A (en) | 2019-11-19 |
Family
ID=68511178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910749937.1A Pending CN110471531A (en) | 2019-08-14 | 2019-08-14 | Multi-modal interactive system and method in virtual reality |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110471531A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956142A (en) * | 2019-12-03 | 2020-04-03 | 中国太平洋保险(集团)股份有限公司 | Intelligent interactive training system |
CN111968470A (en) * | 2020-09-02 | 2020-11-20 | 济南大学 | Pass-through interactive experimental method and system for virtual-real fusion |
CN112905754A (en) * | 2019-12-16 | 2021-06-04 | 腾讯科技(深圳)有限公司 | Visual conversation method and device based on artificial intelligence and electronic equipment |
CN118445759A (en) * | 2024-07-01 | 2024-08-06 | 南京维赛客网络科技有限公司 | Method, system and storage medium for recognizing user intention in VR device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325482A1 (en) * | 2012-05-29 | 2013-12-05 | GM Global Technology Operations LLC | Estimating congnitive-load in human-machine interaction |
CN103793060A (en) * | 2014-02-14 | 2014-05-14 | 杨智 | User interaction system and method |
CN104965592A (en) * | 2015-07-08 | 2015-10-07 | 苏州思必驰信息科技有限公司 | Voice and gesture recognition based multimodal non-touch human-machine interaction method and system |
CN106420254A (en) * | 2016-09-14 | 2017-02-22 | 中国科学院苏州生物医学工程技术研究所 | Multi-person interactive virtual reality rehabilitation training and evaluation system |
CN106569613A (en) * | 2016-11-14 | 2017-04-19 | 中国电子科技集团公司第二十八研究所 | Multi-modal man-machine interaction system and control method thereof |
CN108334199A (en) * | 2018-02-12 | 2018-07-27 | 华南理工大学 | The multi-modal exchange method of movable type based on augmented reality and device |
CN108399427A (en) * | 2018-02-09 | 2018-08-14 | 华南理工大学 | Natural interactive method based on multimodal information fusion |
CN108942919A (en) * | 2018-05-28 | 2018-12-07 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109933272A (en) * | 2019-01-31 | 2019-06-25 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | The multi-modal airborne cockpit man-machine interaction method of depth integration |
CN110070944A (en) * | 2019-05-17 | 2019-07-30 | 段新 | Training system is assessed based on virtual environment and the social function of virtual role |
-
2019
- 2019-08-14 CN CN201910749937.1A patent/CN110471531A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130325482A1 (en) * | 2012-05-29 | 2013-12-05 | GM Global Technology Operations LLC | Estimating congnitive-load in human-machine interaction |
CN103793060A (en) * | 2014-02-14 | 2014-05-14 | 杨智 | User interaction system and method |
CN104965592A (en) * | 2015-07-08 | 2015-10-07 | 苏州思必驰信息科技有限公司 | Voice and gesture recognition based multimodal non-touch human-machine interaction method and system |
CN106420254A (en) * | 2016-09-14 | 2017-02-22 | 中国科学院苏州生物医学工程技术研究所 | Multi-person interactive virtual reality rehabilitation training and evaluation system |
CN106569613A (en) * | 2016-11-14 | 2017-04-19 | 中国电子科技集团公司第二十八研究所 | Multi-modal man-machine interaction system and control method thereof |
CN108399427A (en) * | 2018-02-09 | 2018-08-14 | 华南理工大学 | Natural interactive method based on multimodal information fusion |
CN108334199A (en) * | 2018-02-12 | 2018-07-27 | 华南理工大学 | The multi-modal exchange method of movable type based on augmented reality and device |
CN108942919A (en) * | 2018-05-28 | 2018-12-07 | 北京光年无限科技有限公司 | A kind of exchange method and system based on visual human |
CN109933272A (en) * | 2019-01-31 | 2019-06-25 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | The multi-modal airborne cockpit man-machine interaction method of depth integration |
CN110070944A (en) * | 2019-05-17 | 2019-07-30 | 段新 | Training system is assessed based on virtual environment and the social function of virtual role |
Non-Patent Citations (1)
Title |
---|
杨明浩,陶建华等: ""面向自然交互的多通道人机对话系统"", 《计算机科学》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956142A (en) * | 2019-12-03 | 2020-04-03 | 中国太平洋保险(集团)股份有限公司 | Intelligent interactive training system |
CN112905754A (en) * | 2019-12-16 | 2021-06-04 | 腾讯科技(深圳)有限公司 | Visual conversation method and device based on artificial intelligence and electronic equipment |
CN112905754B (en) * | 2019-12-16 | 2024-09-06 | 腾讯科技(深圳)有限公司 | Visual dialogue method and device based on artificial intelligence and electronic equipment |
CN111968470A (en) * | 2020-09-02 | 2020-11-20 | 济南大学 | Pass-through interactive experimental method and system for virtual-real fusion |
CN111968470B (en) * | 2020-09-02 | 2022-05-17 | 济南大学 | Pass-through interactive experimental method and system for virtual-real fusion |
CN118445759A (en) * | 2024-07-01 | 2024-08-06 | 南京维赛客网络科技有限公司 | Method, system and storage medium for recognizing user intention in VR device |
CN118445759B (en) * | 2024-07-01 | 2024-08-27 | 南京维赛客网络科技有限公司 | Method, system and storage medium for recognizing user intention in VR device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Dalim et al. | TeachAR: An interactive augmented reality tool for teaching basic English to non-native children | |
Kahn et al. | AI programming by children using snap! Block programming in a developing country | |
Martins et al. | Accessible options for deaf people in e-learning platforms: technology solutions for sign language translation | |
CN110471531A (en) | Multi-modal interactive system and method in virtual reality | |
TWI713000B (en) | Online learning assistance method, system, equipment and computer readable recording medium | |
Oliveira et al. | Automatic sign language translation to improve communication | |
Arsan et al. | Sign language converter | |
WO2022196880A1 (en) | Avatar-based interaction service method and device | |
KR20220129989A (en) | Avatar-based interaction service method and apparatus | |
Divekar et al. | Interaction challenges in ai equipped environments built to teach foreign languages through dialogue and task-completion | |
De Wit et al. | The design and observed effects of robot-performed manual gestures: A systematic review | |
KR20190130774A (en) | Subtitle processing method for language education and apparatus thereof | |
Pan et al. | Application of virtual reality in English teaching | |
Alvarado et al. | Inclusive learning through immersive virtual reality and semantic embodied conversational agent: a case study in children with autism | |
CN109272983A (en) | Bilingual switching device for child-parent education | |
CN110070869B (en) | Voice teaching interaction generation method, device, equipment and medium | |
CN113253838A (en) | AR-based video teaching method and electronic equipment | |
Divekar | AI enabled foreign language immersion: Technology and method to acquire foreign languages with AI in immersive virtual worlds | |
CN107154184B (en) | Virtual reality equipment system and method for language learning | |
Soares et al. | Sign language learning using the hangman videogame | |
US11605390B2 (en) | Systems, methods, and apparatus for language acquisition using socio-neuorocognitive techniques | |
Doumanis | Evaluating humanoid embodied conversational agents in mobile guide applications | |
TWM467143U (en) | Language self-learning system | |
Divekar et al. | Building human-scale intelligent immersive spaces for foreign language learning | |
Rauf et al. | Urdu language learning aid based on lip syncing and sign language for hearing impaired children |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 10th Floor, Building B7, Huaxin Tiandi, 188 Yizhou Road, Xuhui District, Shanghai, 2003 Applicant after: Shanghai squirrel classroom Artificial Intelligence Technology Co.,Ltd. Address before: 10th Floor, Building B7, Huaxin Tiandi, 188 Yizhou Road, Xuhui District, Shanghai, 2003 Applicant before: SHANGHAI YIXUE EDUCATION TECHNOLOGY Co.,Ltd. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191119 |