CN115167674A

CN115167674A - Intelligent interaction method based on digital human multi-modal interaction information standard

Info

Publication number: CN115167674A
Application number: CN202210797802.4A
Authority: CN
Inventors: 郭锐; 张晓东; 游世学
Original assignee: Beijing Zhongke Huilian Technology Co ltd
Current assignee: Beijing Zhongke Huilian Technology Co ltd
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2022-10-11

Abstract

The invention discloses an intelligent interaction method based on a digital man multi-modal interaction information standard, which specifically comprises the following steps: acquiring interaction information of a human-computer interface; information identification based on multi-modal fusion; interactive processing and service processing of an application system; synthesis of multi-modal information of the digital person; the invention comprehensively utilizes information interaction data obtained by channels obtained by different modes, integrates the information interaction data into multi-mode behavior characteristics, and finally obtains unified semantic description conforming to the multi-mode interaction information standard, thereby comprehensively understanding the real intention of a user, reducing the gap of man-machine in the information interaction process, enriching the interaction content, creating a harmonious man-machine interaction environment, improving the naturalness and the high efficiency in the man-machine interaction process, and promoting man-machine intelligent interaction to move to a new height.

Description

Intelligent interaction method based on digital human multi-modal interaction information standard

Technical Field

The invention relates to the technical field of man-machine intelligent interaction, in particular to an intelligent interaction method based on a digital man multi-mode interaction information standard.

Background

The man-machine interaction technology is a bridge for information communication between a person and a computer, and multi-modal man-machine interaction combines information of a plurality of modalities to enable the multi-modal man-machine interaction to give full play to respective advantages and express the intention of a user more comprehensively and accurately, so that the interaction environment of the person and the computer is compliant with the habit of the person, and man-machine interaction of 'taking the person as a center' is realized;

the traditional modal interaction mode mainly carries out independent processing by respectively extracting single modal behavior characteristics, so that unified comprehensive analysis and interaction of multi-modal interaction information of a user in a multi-modal mode cannot be realized, the convenience of man-machine interaction is reduced, the development of convenient interaction between man-machines is also inhibited, and in the process of carrying out multi-modal information interaction between man-machines at present, because information input and transmission are required to be carried out between the man-machines through multiple channels, in the process of facing multi-channel information transmission, the data information transmitted by multiple groups of channels cannot be transmitted in a unified standard due to the fact that the information of multiple channels in the man-machine interaction cannot be integrated and processed in a unified mode, the convenience of multi-channel information data transmission is further improved, and the processing difficulty of subsequent information processing equipment is increased.

Disclosure of Invention

The invention provides an intelligent interaction method based on a digital man multi-modal interaction information standard, which can effectively solve the problems that the traditional modal interaction mode provided in the background technology is mainly processed independently by respectively extracting single modal behavior characteristics, so that unified comprehensive analysis and interaction of multi-modal interaction information of a user in a multi-modal mode can not be realized, the convenience of man-machine interaction is reduced, and the development of convenient interaction between man-machines is inhibited.

In order to achieve the purpose, the invention provides the following technical scheme: the intelligent interaction method based on the digital human multi-modal interaction information standard comprehensively utilizes information interaction data of different modalities, integrates the information interaction data into multi-modal behavior characteristics again, finally obtains uniform semantic description conforming to the multi-modal interaction information standard, and completes information interaction;

the intelligent interaction method specifically comprises the following steps:

s1, acquiring interaction information of a human-computer interface;

s2, information identification based on multi-mode fusion;

s3, interactive processing and service processing of the application system;

s4, synthesizing multi-modal information of the digital person;

and S5, outputting multi-modal human-computer interaction information.

According to the technical scheme, the establishment of the unified data specification standard refers to the integration of the interactive information acquired in the multi-modal interactive information input channel in the man-machine interaction process into the unified data specification standard, and the transmission information of the man-machine in the multi-modal interaction process can be timely and effectively transmitted and interactively used by determining the standard data transmission standard, so that the phenomenon that the data transmission is blocked due to different standards in the transmission process of the interactive information data is avoided, and the unified data specification standard needs to be established before normal interaction.

According to the technical scheme, in the S1, the step of acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interaction interface of the interactive machine by using a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires the multi-mode interaction information data of the user;

and the user inputs multimodal interaction information and related parameter information specifically including expression information, motion information, voice information and language information.

According to the technical scheme, after the interactive machine acquires the multi-modal interactive information input by the user, the interactive information is classified in the interactive machine according to the types of the interactive information in different modes, and a classification model of multi-modal fusion is fully used, so that the interactive information in different modes can be accurately identified and classified, the accuracy of user intention identification is improved, and the interactive information is converted into multi-modal data input under unified specifications according to the multi-modal interactive information standard of the digital human.

According to the technical scheme, in the S1, before the interactive information of the human-computer interface is obtained, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;

when a perceived user has an interaction intention, acquiring interaction information through a human-computer interface of the machine to perform human-computer interaction of multi-mode information;

and when the perceived user has no interaction intention, initiating simple and wide greeting type interaction according to a preset interaction rule.

According to the technical scheme, in the S2, different interactive tasks are converted into unified task descriptions in a mode of establishing a multi-mode data unified conversion model, so that unified information interactive task information is conveniently obtained;

in the process of establishing a multi-modal data unified conversion model, the complementarity among a plurality of modal interaction information input channels of multiple users needs to be fully considered, the obtained expression information, action information, voice information and language information are coded in the same semantic space, and information conversion is carried out through the unified conversion model according to different modal interaction information to realize unified task description, so that a complete, unambiguous and exchangeable interaction task is obtained.

According to the technical scheme, in S3, after the interactive information input by the user is obtained, the multimodal data information input by the user is processed through comprehensive utilization of a computer vision technology, an intelligent voice processing technology and a natural language processing technology;

in the processing process, the method specifically refers to that the multi-modal interactive information data are recognized, analyzed, normalized and fused through an artificial intelligence technology, appropriate feedback behaviors are made according to an interactive decision mechanism, responses to a user are obtained according to a natural language generation algorithm or an intelligent question-answering algorithm, meanwhile feedback behaviors or characteristics of other anthropomorphic persons are generated, and finally multi-modal data output meeting the digital human multi-modal interactive information standard is generated.

According to the technical scheme, the step of generating feedback data meeting the standard in the step S4 is to comprehensively analyze the speech, the expression, the micro expression and the text content in terms of the emotion of the user to form multi-dimensional features, and is different from the traditional method of extracting single modal behavior features respectively and processing the single modal behavior features independently, so that the accurate classification of the emotion of the user is obtained by establishing a multi-modal model based on deep learning;

and based on the same set of unified information data standard, the feedback data of the digital person meeting the multi-modal interactive information standard is generated by combining a natural language generation technology, a voice synthesis technology and a computational graphic image processing technology.

According to the technical scheme, the feedback data of the digital person specifically comprises expression information, gesture information, limb action information and voice information.

According to the technical scheme, in the step S5, outputting the multi-modal human-computer interaction information means outputting the multi-modal interaction information composed of vivid digital human figure actions, voice and texts in the user interface of the interactive machine through the use of the real-time rendering engine and the multimedia processing system, so as to realize real-time interactive communication between the machine and the user.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention comprehensively utilizes the information interaction data obtained by the channels obtained by different modes, integrates the information interaction data into multi-mode behavior characteristics, and finally obtains the unified semantic description which accords with the information standard of multi-mode interaction, thereby comprehensively understanding the real intention of the user, reducing the gap of man-machine in the information interaction process, enriching the interaction content, creating a harmonious man-machine interaction environment, improving the naturalness and the high efficiency in the man-machine interaction process, and promoting the man-machine intelligent interaction to move to a new height.

2. The unified data standard is established, so that the interaction information acquired in the multi-modal interaction information input channels in the man-machine interaction process is conveniently integrated into the unified data standard, the multi-modal interaction information can be timely and effectively transmitted and interactively used in the multi-modal interaction process between subsequent man-machines, the phenomenon that the data transmission process is blocked due to different standards in the transmission process of the interaction information data is avoided, and the interaction experience of a user is improved.

3. By comprehensively utilizing the computer vision technology, the intelligent voice processing technology and the natural language processing technology, the multi-modal data information input by a user can be conveniently identified, analyzed, normalized and fused, so that the multi-modal data information can be effectively used in a subsequent interaction manner, the interaction process between human machines is more convenient, and the communication of the interaction information in the human-computer interaction process is more accurate.

4. Different interaction tasks are conveniently converted into unified task description through the established multi-mode data unified conversion model, unified information interaction task information is conveniently obtained, in the interaction process, the complementarity among multi-user multi-mode interaction information input channels is considered, the obtained multi-mode interaction information is conveniently coded in the same semantic space, so that the unified task description is realized, complete, unambiguous and exchangeable interaction tasks are obtained, the problem that the understanding of the multi-mode interaction information is too complex is solved, and the language semantic processing of the multi-mode interaction information is realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

In the drawings:

FIG. 1 is a flow chart of the steps of the intelligent interaction method of the present invention;

FIG. 2 is a schematic diagram of the intelligent interaction process of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Example 1: as shown in fig. 1-2, the invention provides a technical solution, in an intelligent interaction method based on a digital human multi-modal interaction information standard, information interaction data of different modalities are comprehensively utilized and are re-integrated into multi-modal behavior characteristics, so as to finally obtain a unified semantic description meeting the multi-modal interaction information standard and complete information interaction;

the intelligent interaction method specifically comprises the following steps:

s1, acquiring interaction information of a human-computer interface;

s2, information identification based on multi-mode fusion;

s3, interactive processing and service processing of the application system;

s4, synthesizing multi-modal information of the digital person;

and S5, outputting multi-modal human-computer interaction information.

According to the technical scheme, in S1, the step of acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interface of the interactive machine by utilizing a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires the multi-mode interaction information data of the user;

According to the technical scheme, in S1, before interactive information of a human-computer interface is acquired, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;

when the perceived user has an interaction intention, acquiring interaction information through a human-computer interface of the machine to perform human-computer interaction of multi-mode information;

and when the perceived user has no interaction will, initiating simple and wide greeting type interaction according to a preset interaction rule.

According to the technical scheme, in S2, different interactive tasks are converted into unified task descriptions in a mode of establishing a multi-mode data unified conversion model, so that unified information interactive task information is conveniently obtained;

According to the technical scheme, after the interactive information input by the user is obtained, the multi-modal data information input by the user is processed through comprehensive utilization of a computer vision technology, an intelligent voice processing technology and a natural language processing technology in S3;

in the processing process, the method specifically refers to the practice of carrying out recognition, analysis, normalization processing and fusion processing on multi-modal interactive information data through an artificial intelligence technology, making a proper feedback behavior according to an interactive decision mechanism, obtaining a response to a user according to a natural language generation algorithm or an intelligent question-answering algorithm, simultaneously generating feedback behaviors or characteristics of other anthropomorphic persons, and finally generating multi-modal data output which accords with the multi-modal interactive information standard of the digital person.

According to the technical scheme, in S4, the generation of feedback data meeting the standard means that comprehensive analysis can be performed from the perspective of voice, expression, micro-expression and text content through the recognition of the emotion of the user to form multi-dimensional features, and the method is different from the traditional method of extracting single modal behavior features respectively and performing independent processing, so that the accurate classification of the emotion of the user is obtained by establishing a multi-modal model based on deep learning;

According to the technical scheme, in the step S5, the multi-modal human-computer interaction information is output in a user interface of the interactive machine through the use of a real-time rendering engine and a multimedia processing system, so that the real-time interactive communication between the machine and the user is realized.

Example 2: as shown in fig. 1-2, the invention provides a technical solution, in an intelligent interaction method based on a digital human multi-modal interaction information standard, information interaction data of different modalities are comprehensively utilized and are re-integrated into multi-modal behavior characteristics, so as to finally obtain a unified semantic description meeting the multi-modal interaction information standard and complete information interaction;

the intelligent interaction method specifically comprises the following steps:

s1, acquiring interaction information of a human-computer interface;

s2, information identification based on multi-mode fusion;

s3, interactive processing and service processing of the application system;

s4, synthesizing multi-modal information of the digital person;

and S5, outputting multi-modal human-computer interaction information.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described above, or equivalents may be substituted for elements thereof. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The intelligent interaction method based on the digital human multi-modal interaction information standard is characterized by comprising the following steps: comprehensively utilizing information interaction data of different modes, and integrating the information interaction data into multi-mode behavior characteristics again to finally obtain uniform semantic description conforming to the information standard of multi-mode interaction to complete information interaction;

the intelligent interaction method specifically comprises the following steps:

s1, acquiring interaction information of a human-computer interface;

s2, information identification based on multi-mode fusion;

s3, interactive processing and service processing of the application system;

s4, synthesizing multi-modal information of the digital person;

and S5, outputting multi-modal human-computer interaction information.

2. The intelligent interaction method based on digital human multi-modal interaction information standard according to claim 1, characterized in that: the establishment of the unified data specification standard refers to the integration of interactive information acquired in a multi-modal interactive information input channel in the human-computer interaction process into the unified data specification standard, and the transmission information of a human-computer in the multi-modal interaction process can be timely and effectively transmitted and interactively used by determining the standard of transmitted data, so that the phenomenon that the data transmission is blocked due to different standards in the transmission process of interactive information data is avoided, and the unified data specification standard needs to be established before normal interaction.

3. The intelligent interaction method based on the multi-modal interaction information standard of the digital human according to claim 2, characterized in that: in the S1, acquiring the interaction information of the human-computer interface means that the interaction information and related parameter information input by a user are acquired in real time at the input end of the human-computer interaction interface of the interactive machine by using a human-computer interface device consisting of an image and sound acquisition device, a touch screen device and a mouse keyboard, so that the interactive machine acquires multi-mode interaction information data of the user;

4. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: after the interactive machine acquires the multi-modal interactive information input by the user, classifying each interactive information in the interactive machine according to the types of the interactive information in different modalities, and fully using a classification model of multi-modal fusion, so that the modal interactive information in different types can be accurately identified and classified, the accuracy of user intention identification is improved, and the multi-modal interactive information is converted into multi-modal data input under unified specification according to the multi-modal interactive information standard of the digital human.

5. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the S1, before the interactive information of the human-computer interface is acquired, the intention of a user needs to be sensed through sensing equipment consisting of a distance sensor and a camera, so that whether the interactive main body has a potential information interaction intention or not is predicted;

6. The intelligent interaction method based on the multi-modal interaction information standard of the digital human according to claim 1, characterized in that: in the S2, different interaction tasks are converted into uniform task description in a mode of establishing a multi-mode data unified conversion model, so that uniform information interaction task information is conveniently obtained;

7. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the step S3, after the interactive information input by the user is obtained, the multimodal data information input by the user is processed by comprehensively utilizing a computer vision technology, an intelligent voice processing technology and a natural language processing technology;

8. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the S4, the generation of the feedback data meeting the standard refers to that comprehensive analysis can be performed from the perspective of voice, expression, micro-expression and text content through the recognition of the emotion of the user to form multi-dimensional features, and the method is different from the traditional method of extracting single modal behavior features respectively and performing independent processing, so that the accurate classification of the emotion of the user is obtained through establishing a deep learning-based multi-modal model;

and based on the same set of unified information data standard, the feedback data of the digital person meeting the information standard of multi-modal interaction is generated by combining a natural language generation technology, a voice synthesis technology and a computational graph and image processing technology.

9. The intelligent interaction method based on digital human multi-modal interaction information standard according to claim 8, wherein: the feedback data of the digital person specifically comprises expression information, gesture information, limb action information and voice information.

10. The intelligent interaction method based on the digital human multi-modal interaction information standard according to claim 2, characterized in that: in the step S5, outputting the multi-modal human-computer interaction information means outputting the multi-modal interaction information composed of realistic digital human figure actions, voices and texts in the user interface of the interactive machine through the use of the real-time rendering engine and the multimedia processing system, thereby implementing real-time interactive communication between the machine and the user.