CN115167674A - Intelligent interaction method based on digital human multi-modal interaction information standard - Google Patents

Intelligent interaction method based on digital human multi-modal interaction information standard Download PDF

Info

Publication number
CN115167674A
CN115167674A CN202210797802.4A CN202210797802A CN115167674A CN 115167674 A CN115167674 A CN 115167674A CN 202210797802 A CN202210797802 A CN 202210797802A CN 115167674 A CN115167674 A CN 115167674A
Authority
CN
China
Prior art keywords
information
interaction
modal
interactive
human
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210797802.4A
Other languages
Chinese (zh)
Inventor
郭锐
张晓东
游世学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Huilian Technology Co ltd
Original Assignee
Beijing Zhongke Huilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Huilian Technology Co ltd filed Critical Beijing Zhongke Huilian Technology Co ltd
Priority to CN202210797802.4A priority Critical patent/CN115167674A/en
Publication of CN115167674A publication Critical patent/CN115167674A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses an intelligent interaction method based on a digital man multi-modal interaction information standard, which specifically comprises the following steps: acquiring interaction information of a human-computer interface; information identification based on multi-modal fusion; interactive processing and service processing of an application system; synthesis of multi-modal information of the digital person; the invention comprehensively utilizes information interaction data obtained by channels obtained by different modes, integrates the information interaction data into multi-mode behavior characteristics, and finally obtains unified semantic description conforming to the multi-mode interaction information standard, thereby comprehensively understanding the real intention of a user, reducing the gap of man-machine in the information interaction process, enriching the interaction content, creating a harmonious man-machine interaction environment, improving the naturalness and the high efficiency in the man-machine interaction process, and promoting man-machine intelligent interaction to move to a new height.

Description

Intelligent interaction method based on digital human multi-modal interaction information standard
Technical Field
The invention relates to the technical field of man-machine intelligent interaction, in particular to an intelligent interaction method based on a digital man multi-mode interaction information standard.
Background
The man-machine interaction technology is a bridge for information communication between a person and a computer, and multi-modal man-machine interaction combines information of a plurality of modalities to enable the multi-modal man-machine interaction to give full play to respective advantages and express the intention of a user more comprehensively and accurately, so that the interaction environment of the person and the computer is compliant with the habit of the person, and man-machine interaction of 'taking the person as a center' is realized;
the traditional modal interaction mode mainly carries out independent processing by respectively extracting single modal behavior characteristics, so that unified comprehensive analysis and interaction of multi-modal interaction information of a user in a multi-modal mode cannot be realized, the convenience of man-machine interaction is reduced, the development of convenient interaction between man-machines is also inhibited, and in the process of carrying out multi-modal information interaction between man-machines at present, because information input and transmission are required to be carried out between the man-machines through multiple channels, in the process of facing multi-channel information transmission, the data information transmitted by multiple groups of channels cannot be transmitted in a unified standard due to the fact that the information of multiple channels in the man-machine interaction cannot be integrated and processed in a unified mode, the convenience of multi-channel information data transmission is further improved, and the processing difficulty of subsequent information processing equipment is increased.
Disclosure of Invention
The invention provides an intelligent interaction method based on a digital man multi-modal interaction information standard, which can effectively solve the problems that the traditional modal interaction mode provided in the background technology is mainly processed independently by respectively extracting single modal behavior characteristics, so that unified comprehensive analysis and interaction of multi-modal interaction information of a user in a multi-modal mode can not be realized, the convenience of man-machine interaction is reduced, and the development of convenient interaction between man-machines is inhibited.
In order to achieve the purpose, the invention provides the following technical scheme: the intelligent interaction method based on the digital human multi-modal interaction information standard comprehensively utilizes information interaction data of different modalities, integrates the information interaction data into multi-modal behavior characteristics again, finally obtains uniform semantic description conforming to the multi-modal interaction information standard, and completes information interaction;
the intelligent interaction method specifically comprises the following steps:
s1, acquiring interaction information of a human-computer interface;
s2, information identification based on multi-mode fusion;
s3, interactive processing and service processing of the application system;
s4, synthesizing multi-modal information of the digital person;
and S5, outputting multi-modal human-computer interaction information.
According to the technical scheme, the establishment of the unified data specification standard refers to the integration of the interactive information acquired in the multi-modal interactive information input channel in the man-machine interaction process into the unified data specification standard, and the transmission information of the man-machine in the multi-modal interaction process can be timely and effectively transmitted and interactively used by determining the standard data transmission standard, so that the phenomenon that the data transmission is blocked due to different standards in the transmission process of the interactive information data is avoided, and the unified data specification standard needs to be established before normal interaction.
According to the technical scheme, in the S1, the step of acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interaction interface of the interactive machine by using a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires the multi-mode interaction information data of the user;
and the user inputs multimodal interaction information and related parameter information specifically including expression information, motion information, voice information and language information.
According to the technical scheme, after the interactive machine acquires the multi-modal interactive information input by the user, the interactive information is classified in the interactive machine according to the types of the interactive information in different modes, and a classification model of multi-modal fusion is fully used, so that the interactive information in different modes can be accurately identified and classified, the accuracy of user intention identification is improved, and the interactive information is converted into multi-modal data input under unified specifications according to the multi-modal interactive information standard of the digital human.
According to the technical scheme, in the S1, before the interactive information of the human-computer interface is obtained, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;
when a perceived user has an interaction intention, acquiring interaction information through a human-computer interface of the machine to perform human-computer interaction of multi-mode information;
and when the perceived user has no interaction intention, initiating simple and wide greeting type interaction according to a preset interaction rule.
According to the technical scheme, in the S2, different interactive tasks are converted into unified task descriptions in a mode of establishing a multi-mode data unified conversion model, so that unified information interactive task information is conveniently obtained;
in the process of establishing a multi-modal data unified conversion model, the complementarity among a plurality of modal interaction information input channels of multiple users needs to be fully considered, the obtained expression information, action information, voice information and language information are coded in the same semantic space, and information conversion is carried out through the unified conversion model according to different modal interaction information to realize unified task description, so that a complete, unambiguous and exchangeable interaction task is obtained.
According to the technical scheme, in S3, after the interactive information input by the user is obtained, the multimodal data information input by the user is processed through comprehensive utilization of a computer vision technology, an intelligent voice processing technology and a natural language processing technology;
in the processing process, the method specifically refers to that the multi-modal interactive information data are recognized, analyzed, normalized and fused through an artificial intelligence technology, appropriate feedback behaviors are made according to an interactive decision mechanism, responses to a user are obtained according to a natural language generation algorithm or an intelligent question-answering algorithm, meanwhile feedback behaviors or characteristics of other anthropomorphic persons are generated, and finally multi-modal data output meeting the digital human multi-modal interactive information standard is generated.
According to the technical scheme, the step of generating feedback data meeting the standard in the step S4 is to comprehensively analyze the speech, the expression, the micro expression and the text content in terms of the emotion of the user to form multi-dimensional features, and is different from the traditional method of extracting single modal behavior features respectively and processing the single modal behavior features independently, so that the accurate classification of the emotion of the user is obtained by establishing a multi-modal model based on deep learning;
and based on the same set of unified information data standard, the feedback data of the digital person meeting the multi-modal interactive information standard is generated by combining a natural language generation technology, a voice synthesis technology and a computational graphic image processing technology.
According to the technical scheme, the feedback data of the digital person specifically comprises expression information, gesture information, limb action information and voice information.
According to the technical scheme, in the step S5, outputting the multi-modal human-computer interaction information means outputting the multi-modal interaction information composed of vivid digital human figure actions, voice and texts in the user interface of the interactive machine through the use of the real-time rendering engine and the multimedia processing system, so as to realize real-time interactive communication between the machine and the user.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention comprehensively utilizes the information interaction data obtained by the channels obtained by different modes, integrates the information interaction data into multi-mode behavior characteristics, and finally obtains the unified semantic description which accords with the information standard of multi-mode interaction, thereby comprehensively understanding the real intention of the user, reducing the gap of man-machine in the information interaction process, enriching the interaction content, creating a harmonious man-machine interaction environment, improving the naturalness and the high efficiency in the man-machine interaction process, and promoting the man-machine intelligent interaction to move to a new height.
2. The unified data standard is established, so that the interaction information acquired in the multi-modal interaction information input channels in the man-machine interaction process is conveniently integrated into the unified data standard, the multi-modal interaction information can be timely and effectively transmitted and interactively used in the multi-modal interaction process between subsequent man-machines, the phenomenon that the data transmission process is blocked due to different standards in the transmission process of the interaction information data is avoided, and the interaction experience of a user is improved.
3. By comprehensively utilizing the computer vision technology, the intelligent voice processing technology and the natural language processing technology, the multi-modal data information input by a user can be conveniently identified, analyzed, normalized and fused, so that the multi-modal data information can be effectively used in a subsequent interaction manner, the interaction process between human machines is more convenient, and the communication of the interaction information in the human-computer interaction process is more accurate.
4. Different interaction tasks are conveniently converted into unified task description through the established multi-mode data unified conversion model, unified information interaction task information is conveniently obtained, in the interaction process, the complementarity among multi-user multi-mode interaction information input channels is considered, the obtained multi-mode interaction information is conveniently coded in the same semantic space, so that the unified task description is realized, complete, unambiguous and exchangeable interaction tasks are obtained, the problem that the understanding of the multi-mode interaction information is too complex is solved, and the language semantic processing of the multi-mode interaction information is realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
In the drawings:
FIG. 1 is a flow chart of the steps of the intelligent interaction method of the present invention;
FIG. 2 is a schematic diagram of the intelligent interaction process of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1: as shown in fig. 1-2, the invention provides a technical solution, in an intelligent interaction method based on a digital human multi-modal interaction information standard, information interaction data of different modalities are comprehensively utilized and are re-integrated into multi-modal behavior characteristics, so as to finally obtain a unified semantic description meeting the multi-modal interaction information standard and complete information interaction;
the intelligent interaction method specifically comprises the following steps:
s1, acquiring interaction information of a human-computer interface;
s2, information identification based on multi-mode fusion;
s3, interactive processing and service processing of the application system;
s4, synthesizing multi-modal information of the digital person;
and S5, outputting multi-modal human-computer interaction information.
According to the technical scheme, the establishment of the unified data specification standard refers to the integration of the interactive information acquired in the multi-modal interactive information input channel in the man-machine interaction process into the unified data specification standard, and the transmission information of the man-machine in the multi-modal interaction process can be timely and effectively transmitted and interactively used by determining the standard data transmission standard, so that the phenomenon that the data transmission is blocked due to different standards in the transmission process of the interactive information data is avoided, and the unified data specification standard needs to be established before normal interaction.
According to the technical scheme, in S1, the step of acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interface of the interactive machine by utilizing a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires the multi-mode interaction information data of the user;
and the user inputs multimodal interaction information and related parameter information specifically including expression information, motion information, voice information and language information.
According to the technical scheme, after the interactive machine acquires the multi-modal interactive information input by the user, the interactive information is classified in the interactive machine according to the types of the interactive information in different modes, and a classification model of multi-modal fusion is fully used, so that the interactive information in different modes can be accurately identified and classified, the accuracy of user intention identification is improved, and the interactive information is converted into multi-modal data input under unified specifications according to the multi-modal interactive information standard of the digital human.
According to the technical scheme, in S1, before interactive information of a human-computer interface is acquired, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;
when the perceived user has an interaction intention, acquiring interaction information through a human-computer interface of the machine to perform human-computer interaction of multi-mode information;
and when the perceived user has no interaction will, initiating simple and wide greeting type interaction according to a preset interaction rule.
According to the technical scheme, in S2, different interactive tasks are converted into unified task descriptions in a mode of establishing a multi-mode data unified conversion model, so that unified information interactive task information is conveniently obtained;
in the process of establishing a multi-modal data unified conversion model, the complementarity among a plurality of modal interaction information input channels of multiple users needs to be fully considered, the obtained expression information, action information, voice information and language information are coded in the same semantic space, and information conversion is carried out through the unified conversion model according to different modal interaction information to realize unified task description, so that a complete, unambiguous and exchangeable interaction task is obtained.
According to the technical scheme, after the interactive information input by the user is obtained, the multi-modal data information input by the user is processed through comprehensive utilization of a computer vision technology, an intelligent voice processing technology and a natural language processing technology in S3;
in the processing process, the method specifically refers to the practice of carrying out recognition, analysis, normalization processing and fusion processing on multi-modal interactive information data through an artificial intelligence technology, making a proper feedback behavior according to an interactive decision mechanism, obtaining a response to a user according to a natural language generation algorithm or an intelligent question-answering algorithm, simultaneously generating feedback behaviors or characteristics of other anthropomorphic persons, and finally generating multi-modal data output which accords with the multi-modal interactive information standard of the digital person.
According to the technical scheme, in S4, the generation of feedback data meeting the standard means that comprehensive analysis can be performed from the perspective of voice, expression, micro-expression and text content through the recognition of the emotion of the user to form multi-dimensional features, and the method is different from the traditional method of extracting single modal behavior features respectively and performing independent processing, so that the accurate classification of the emotion of the user is obtained by establishing a multi-modal model based on deep learning;
and based on the same set of unified information data standard, the feedback data of the digital person meeting the multi-modal interactive information standard is generated by combining a natural language generation technology, a voice synthesis technology and a computational graphic image processing technology.
According to the technical scheme, the feedback data of the digital person specifically comprises expression information, gesture information, limb action information and voice information.
According to the technical scheme, in the step S5, the multi-modal human-computer interaction information is output in a user interface of the interactive machine through the use of a real-time rendering engine and a multimedia processing system, so that the real-time interactive communication between the machine and the user is realized.
Example 2: as shown in fig. 1-2, the invention provides a technical solution, in an intelligent interaction method based on a digital human multi-modal interaction information standard, information interaction data of different modalities are comprehensively utilized and are re-integrated into multi-modal behavior characteristics, so as to finally obtain a unified semantic description meeting the multi-modal interaction information standard and complete information interaction;
the intelligent interaction method specifically comprises the following steps:
s1, acquiring interaction information of a human-computer interface;
s2, information identification based on multi-mode fusion;
s3, interactive processing and service processing of the application system;
s4, synthesizing multi-modal information of the digital person;
and S5, outputting multi-modal human-computer interaction information.
According to the technical scheme, the establishment of the unified data specification standard refers to the integration of the interactive information acquired in the multi-modal interactive information input channel in the man-machine interaction process into the unified data specification standard, and the transmission information of the man-machine in the multi-modal interaction process can be timely and effectively transmitted and interactively used by determining the standard data transmission standard, so that the phenomenon that the data transmission is blocked due to different standards in the transmission process of the interactive information data is avoided, and the unified data specification standard needs to be established before normal interaction.
According to the technical scheme, in S1, the step of acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interface of the interactive machine by utilizing a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires the multi-mode interaction information data of the user;
and the user inputs multimodal interaction information and related parameter information specifically including expression information, motion information, voice information and language information.
According to the technical scheme, after the interactive machine acquires the multi-modal interactive information input by the user, the interactive information is classified in the interactive machine according to the types of the interactive information in different modes, and a classification model of multi-modal fusion is fully used, so that the interactive information in different modes can be accurately identified and classified, the accuracy of user intention identification is improved, and the interactive information is converted into multi-modal data input under unified specifications according to the multi-modal interactive information standard of the digital human.
According to the technical scheme, in the S1, before the interactive information of the human-computer interface is obtained, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;
when a perceived user has an interaction intention, acquiring interaction information through a human-computer interface of the machine to perform human-computer interaction of multi-mode information;
and when the perceived user has no interaction intention, initiating simple and wide greeting type interaction according to a preset interaction rule.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described above, or equivalents may be substituted for elements thereof. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The intelligent interaction method based on the digital human multi-modal interaction information standard is characterized by comprising the following steps: comprehensively utilizing information interaction data of different modes, and integrating the information interaction data into multi-mode behavior characteristics again to finally obtain uniform semantic description conforming to the information standard of multi-mode interaction to complete information interaction;
the intelligent interaction method specifically comprises the following steps:
s1, acquiring interaction information of a human-computer interface;
s2, information identification based on multi-mode fusion;
s3, interactive processing and service processing of the application system;
s4, synthesizing multi-modal information of the digital person;
and S5, outputting multi-modal human-computer interaction information.
2. The intelligent interaction method based on digital human multi-modal interaction information standard according to claim 1, characterized in that: the establishment of the unified data specification standard refers to the integration of interactive information acquired in a multi-modal interactive information input channel in the human-computer interaction process into the unified data specification standard, and the transmission information of a human-computer in the multi-modal interaction process can be timely and effectively transmitted and interactively used by determining the standard of transmitted data, so that the phenomenon that the data transmission is blocked due to different standards in the transmission process of interactive information data is avoided, and the unified data specification standard needs to be established before normal interaction.
3. The intelligent interaction method based on the multi-modal interaction information standard of the digital human according to claim 2, characterized in that: in the S1, acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interaction interface of the interactive machine by using a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires multi-mode interaction information data of the user;
and the user inputs multimodal interaction information and related parameter information specifically including expression information, motion information, voice information and language information.
4. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: after the interactive machine acquires the multi-modal interactive information input by the user, classifying each interactive information in the interactive machine according to the types of the interactive information in different modalities, and fully using a classification model of multi-modal fusion, so that the modal interactive information in different types can be accurately identified and classified, the accuracy of user intention identification is improved, and the multi-modal interactive information is converted into multi-modal data input under unified specification according to the multi-modal interactive information standard of the digital human.
5. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the S1, before the interactive information of the human-computer interface is acquired, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;
when a perceived user has an interaction intention, acquiring interaction information through a human-computer interface of the machine to perform human-computer interaction of multi-mode information;
and when the perceived user has no interaction will, initiating simple and wide greeting type interaction according to a preset interaction rule.
6. The intelligent interaction method based on the multi-modal interaction information standard of the digital human according to claim 1, characterized in that: in the S2, different interaction tasks are converted into uniform task description in a mode of establishing a multi-mode data unified conversion model, so that uniform information interaction task information is conveniently obtained;
in the process of establishing a multi-modal data unified conversion model, the complementarity among a plurality of modal interaction information input channels of multiple users needs to be fully considered, the obtained expression information, action information, voice information and language information are coded in the same semantic space, and information conversion is carried out through the unified conversion model according to different modal interaction information to realize unified task description, so that a complete, unambiguous and exchangeable interaction task is obtained.
7. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the step S3, after the interactive information input by the user is obtained, the multimodal data information input by the user is processed by comprehensively utilizing a computer vision technology, an intelligent voice processing technology and a natural language processing technology;
in the processing process, the method specifically refers to the practice of carrying out recognition, analysis, normalization processing and fusion processing on multi-modal interactive information data through an artificial intelligence technology, making a proper feedback behavior according to an interactive decision mechanism, obtaining a response to a user according to a natural language generation algorithm or an intelligent question-answering algorithm, simultaneously generating feedback behaviors or characteristics of other anthropomorphic persons, and finally generating multi-modal data output which accords with the multi-modal interactive information standard of the digital person.
8. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the S4, the generation of the feedback data meeting the standard refers to that comprehensive analysis can be performed from the perspective of voice, expression, micro-expression and text content through the recognition of the emotion of the user to form multi-dimensional features, and the method is different from the traditional method of extracting single modal behavior features respectively and performing independent processing, so that the accurate classification of the emotion of the user is obtained through establishing a deep learning-based multi-modal model;
and based on the same set of unified information data standard, the feedback data of the digital person meeting the information standard of multi-modal interaction is generated by combining a natural language generation technology, a voice synthesis technology and a computational graph and image processing technology.
9. The intelligent interaction method based on digital human multi-modal interaction information standard according to claim 8, wherein: the feedback data of the digital person specifically comprises expression information, gesture information, limb action information and voice information.
10. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the step S5, outputting the multi-modal human-computer interaction information means outputting the multi-modal interaction information composed of realistic digital human figure actions, voices and texts in the user interface of the interactive machine through the use of the real-time rendering engine and the multimedia processing system, thereby implementing real-time interactive communication between the machine and the user.
CN202210797802.4A 2022-07-06 2022-07-06 Intelligent interaction method based on digital human multi-modal interaction information standard Pending CN115167674A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210797802.4A CN115167674A (en) 2022-07-06 2022-07-06 Intelligent interaction method based on digital human multi-modal interaction information standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210797802.4A CN115167674A (en) 2022-07-06 2022-07-06 Intelligent interaction method based on digital human multi-modal interaction information standard

Publications (1)

Publication Number Publication Date
CN115167674A true CN115167674A (en) 2022-10-11

Family

ID=83490749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210797802.4A Pending CN115167674A (en) 2022-07-06 2022-07-06 Intelligent interaction method based on digital human multi-modal interaction information standard

Country Status (1)

Country Link
CN (1) CN115167674A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383620A (en) * 2023-03-29 2023-07-04 北京鹅厂科技有限公司 Method and device for applying multi-mode artificial intelligence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383620A (en) * 2023-03-29 2023-07-04 北京鹅厂科技有限公司 Method and device for applying multi-mode artificial intelligence
CN116383620B (en) * 2023-03-29 2023-10-20 北京鹅厂科技有限公司 Method and device for applying multi-mode artificial intelligence

Similar Documents

Publication Publication Date Title
US20230316643A1 (en) Virtual role-based multimodal interaction method, apparatus and system, storage medium, and terminal
CN106653052B (en) Virtual human face animation generation method and device
CN110688911B (en) Video processing method, device, system, terminal equipment and storage medium
Wu et al. Multimodal large language models: A survey
CN103218842B (en) A kind of voice synchronous drives the method for the three-dimensional face shape of the mouth as one speaks and facial pose animation
US20210191506A1 (en) Affective interaction systems, devices, and methods based on affective computing user interface
Schiel et al. The SmartKom Multimodal Corpus at BAS.
JP2018014094A (en) Virtual robot interaction method, system, and robot
CN106985137A (en) Multi-modal exchange method and system for intelligent robot
CN105046238A (en) Facial expression robot multi-channel information emotion expression mapping method
CN101923858A (en) Real-time and synchronous mutual translation voice terminal
TW201937344A (en) Smart robot and man-machine interaction method
CN107808191A (en) The output intent and system of the multi-modal interaction of visual human
CN106502382A (en) Active exchange method and system for intelligent robot
JP2023552854A (en) Human-computer interaction methods, devices, systems, electronic devices, computer-readable media and programs
CN115330911A (en) Method and system for driving mimicry expression by using audio
CN113793398A (en) Drawing method and device based on voice interaction, storage medium and electronic equipment
CN106648054B (en) A kind of Multimodal interaction method of the company robot based on RealSense
WO2023246163A1 (en) Virtual digital human driving method, apparatus, device, and medium
CN115953521B (en) Remote digital person rendering method, device and system
CN116661603A (en) Multi-mode fusion user intention recognition method under complex man-machine interaction scene
CN117251057A (en) AIGC-based method and system for constructing AI number wisdom
CN117032453A (en) Virtual reality interaction system for realizing mutual recognition function
CN115167674A (en) Intelligent interaction method based on digital human multi-modal interaction information standard
CN114708849A (en) Voice processing method and device, computer equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination