CN117153157B

CN117153157B - Multi-mode full duplex dialogue method and system for semantic recognition

Info

Publication number: CN117153157B
Application number: CN202311212596.7A
Authority: CN
Inventors: 沈卫民; 刘祖芳; 马学文; 王伟林
Original assignee: Shenzhen Macchi Information Technology Co ltd
Current assignee: Shenzhen Macchi Information Technology Co ltd
Priority date: 2023-09-19
Filing date: 2023-09-19
Publication date: 2024-06-04
Anticipated expiration: 2043-09-19
Also published as: CN117153157A

Abstract

The invention provides a multi-mode full duplex dialogue method and a system for semantic recognition, wherein the method comprises the following steps: step 1: acquiring an initiated dialogue between a dialogue user and a preset dialogue model; step 2: determining a dialogue mode selected by a dialogue user; step 3: according to the semantic recognition technology and the dialogue mode, acquiring dialogue semantics of initiating dialogue; step 4: a multi-modal full duplex conversation is conducted based on the meaning of the utterance and the conversation modality. According to the multi-mode full duplex dialogue method and system for semantic recognition, the speaking meaning is determined according to the acquired full duplex dialogue and the dialogue mode selected by the dialogue user, and the multi-mode full duplex dialogue is performed according to the speaking meaning and the dialogue mode, so that the user intention is expressed and understood more abundantly, more diversified responses are provided, and the interactivity is stronger.

Description

Multi-mode full duplex dialogue method and system for semantic recognition

Technical Field

The invention relates to the technical field of semantic recognition, in particular to a multi-mode full duplex dialogue method and system for semantic recognition.

Background

A multi-modal full duplex conversation refers to the simultaneous input and output of multiple modalities (e.g., text, voice, image, video, etc.) involved in the conversation process. Meaning refers to identifying what a piece of content (e.g., text, audio, and image, etc.) represents. The multi-mode full duplex dialogue method for semantic recognition refers to a method for processing input and output of multiple modes simultaneously in a dialogue system, and can realize full duplex dialogue interaction, expand semantic understanding and generation to multiple modes, and enable the dialogue system to more comprehensively understand user input information and generate multi-mode reply information.

The application number is: the invention patent of CN201811010816.7 discloses a method for realizing full duplex voice dialogue and page control based on a webpage, wherein the method comprises the following steps: the user accesses the webpage; a user initiates a voice conversation request in a webpage; and a server side for responding to the voice session request; the server establishes full duplex voice dialogue with the user and outputs the user intention; the command control module receives the intention of the user to realize page control. The invention solves the problems of poor interaction experience and low communication efficiency of the existing webpage manager and website visitor dialogue.

However, in the above-mentioned prior art, only a dialogue of a voice mode is performed, but a simple expression mode of the voice mode is relatively single, and there are situations that the content to be expressed cannot be intuitively expressed through voice, and in addition, the human-computer interaction process experience is poor.

In view of the foregoing, there is a need for a multi-modal full duplex dialogue method and system for semantic recognition that addresses at least the above-mentioned shortcomings.

Disclosure of Invention

The invention aims to provide a multi-mode full duplex dialogue method and a system for identifying semantic meaning, which are used for determining the semantic meaning of a speech according to the acquired dialogue initiating mode and dialogue mode selected by a dialogue user, carrying out multi-mode full duplex dialogue according to the semantic meaning of the speech and the dialogue mode, expressing and understanding the user intention more abundantly, providing more diversified responses and having stronger interactivity.

The multi-mode full duplex dialogue method for semantic recognition provided by the embodiment of the invention comprises the following steps:

step 1: acquiring an initiated dialogue between a dialogue user and a preset dialogue model;

step 2: determining a dialogue mode selected by a dialogue user;

step3: according to the semantic recognition technology and the dialogue mode, acquiring dialogue semantics of initiating dialogue;

step 4: a multi-modal full duplex conversation is conducted based on the meaning of the utterance and the conversation modality.

Preferably, step 1: the method comprises the steps of obtaining an initiated dialogue between a dialogue user and a preset dialogue model, and comprising the following steps:

Determining an input port for initiating a dialog;

Acquiring input information of a dialogue user;

Determining a target port of the input information according to the information type of the input information;

Determining port information according to input information based on an analysis rule corresponding to a target port;

Based on the real-time Web technology, the initiation of the dialogue is determined according to the port information.

Preferably, step 2: determining a dialog modality selected by a dialog user, comprising:

Acquiring a mode selection instruction input by a dialogue user, and determining a dialogue mode according to the mode selection instruction;

And/or the number of the groups of groups,

And acquiring the context information input by the dialogue user, determining the mode switching intention of the user according to the context information, and determining the dialogue mode according to the mode switching intention.

Preferably, step 3: according to the speech recognition technology and the dialogue mode, acquiring the dialogue semantic of the initiated dialogue comprises the following steps:

Collecting training data according to a dialogue mode;

based on a preset extraction rule, determining a plurality of extraction samples according to training data;

Training a language identification decision tree according to the extracted sample based on a random forest algorithm;

Determining a plurality of decision results according to the initiated dialog and the semantic recognition decision tree;

acquiring a decision result expression heat map;

Carrying out hierarchical clustering on each decision result according to the decision result expression heat map to obtain a clustering tree;

Determining tree nodes with the maximum volume in the cluster tree;

Obtaining a central heat map value of a decision result corresponding to the tree node;

determining a meaning of the utterance according to the central heat map value;

Wherein, according to decision result expression heat map, carry out hierarchical clustering of each decision result, obtain the cluster tree, include:

and calculating the similarity between every two decision results, wherein the calculation formula of the similarity is as follows:

Wherein, D _m is the mth decision result, D _n is the nth decision result, corridation (D _m,D_n) is the similarity calculation result of the mth decision result and the nth decision result, dis (D _m,D_n) is the distance between the mth decision result and the nth decision result on the decision result expression heat map, X _m and X _n are the calibration values of the X dimension of the mth decision result and the nth decision result on the decision result expression heat map, Y _m and Y _n are the calibration values of the Y dimension of the mth decision result and the nth decision result on the decision result expression heat map, and σ is the preset similarity normalization coefficient;

and carrying out iterative merging on the decision result with the highest similarity to obtain a cluster tree.

Preferably, collecting training data according to a dialog modality includes:

acquiring a mode type of a dialogue mode;

Determining a collection rule according to the mode type;

Determining a collection rule to correspond to a preset collection template;

Acquiring a dialogue scene of initiating a dialogue;

Extracting dialogue scene characteristics of a dialogue scene based on a preset dialogue scene characteristic extraction rule;

Determining feature setting parameters of the collection template according to dialogue scene features;

setting corresponding feature setting parameters of the collecting template to obtain a target template;

Training data is collected based on the target template.

Preferably, step 4: according to the meaning of the words and the conversation mode, carrying out multi-mode full duplex conversation, comprising:

According to the meaning of the words, obtaining dialogue requirements;

determining an output channel of a dialogue mode;

determining output content according to the dialogue requirement and the output channel;

And carrying out multi-mode full duplex dialogue according to the output content.

Preferably, determining output content according to the dialogue requirement and the output channel includes:

Determining a first dialogue vector of dialogue requirements based on a preset dialogue vector model;

Acquiring a corpus group corresponding to an output channel;

based on a preset sentence-breaking rule, determining a plurality of first sentence-breaking corpora according to the corpus group;

determining a second dialogue vector of the first sentence-breaking corpus based on the dialogue vector model;

Aligning the vector starting points of the first dialogue vector and the second dialogue vector, and acquiring a first vector included angle between the first dialogue vector and the second dialogue vector after the vector starting points are aligned;

If the first vector included angle is smaller than a preset vector included angle threshold value, the corresponding second dialogue vector is used as a third dialogue vector;

determining a first vector included angle between the first dialogue vector and the third dialogue vector, and taking the first vector included angle as a second vector included angle;

Rotating the third dialogue vector by a second vector included angle to obtain a fourth dialogue vector;

Calculating a vector modulus difference value between the fourth dialogue vector and the first dialogue vector;

Determining output content according to the second vector included angle and the vector module value difference value;

Wherein calculating a vector modulus difference between the fourth dialog vector and the first dialog vector comprises:

calculating a dimension difference value of the same vector dimension of the fourth dialogue vector and the first dialogue vector;

acquiring the dimension weight of the vector dimension according to the vector dimension and a preset dimension weight library;

And determining a vector modulus value difference value according to the dimension difference value and the dimension weight based on a preset calculation rule.

Preferably, determining the output content according to the second vector included angle and the vector modulus difference value includes:

acquiring a first conversion rule corresponding to the vector included angle and a second conversion rule corresponding to the vector module value difference;

According to the first conversion rule, determining a first conversion value corresponding to the second vector included angle, and correlating with a corresponding third dialogue vector;

Determining a second conversion value corresponding to the vector module value difference value according to a second conversion rule, and correlating with a corresponding third dialogue vector;

summing the first conversion value and the second conversion value associated with the third dialogue vector to obtain a statistical value;

Taking the first sentence-breaking corpus corresponding to the third dialogue vector with the smallest statistic value as the second sentence-breaking corpus;

and determining a third sentence-breaking corpus according to the second sentence-breaking corpus and the first sentence-breaking corpus in the corpus group, and taking the third sentence-breaking corpus as output content.

Preferably, the multi-mode full duplex dialogue is performed according to the output content, including:

acquiring a presentation requirement of output content;

Determining a first support parameter according to the presentation requirement;

acquiring a second support parameter of the local server;

judging whether the output content can be presented on the user interface or not according to the first support parameter and the second support parameter;

if the output content can be presented on the user interface, presenting the output content through the user interface;

if the output content can not be presented on the user interface, establishing a communication link with a target platform meeting the presentation requirement through a local server, and sending the presentation requirement to the target platform;

And acquiring the presentation information after the target platform receives the presentation request, and returning the presentation information in real time.

The embodiment of the invention provides a multi-mode full duplex dialogue system for semantic recognition, which comprises the following steps:

The system comprises an initiation dialogue acquisition subsystem, a dialogue generation subsystem and a dialogue generation subsystem, wherein the initiation dialogue acquisition subsystem is used for acquiring an initiation dialogue between a dialogue user and a preset dialogue model;

a dialog modality determination subsystem for determining a dialog modality selected by a dialog user;

The speaking meaning acquisition subsystem is used for acquiring the speaking meaning of the initiated dialog according to the speaking meaning identification technology and the dialog mode;

and the dialogue subsystem is used for carrying out multi-mode full-duplex dialogue according to the meaning of the words and the dialogue modes.

The beneficial effects of the invention are as follows:

according to the method and the device, the speaking meaning is determined according to the acquired dialog initiating mode and the dialog user selecting mode, the multi-mode full duplex dialog is carried out according to the speaking meaning and the dialog mode, the user intention is expressed and understood more abundantly, more diversified responses are provided, and the interactivity is stronger.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of a multi-modal full duplex dialogue method for semantic recognition according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a multi-mode full duplex dialogue system with semantic recognition according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

The embodiment of the invention provides a multi-mode full duplex dialogue method for semantic recognition, which is shown in fig. 1 and comprises the following steps:

step 1: acquiring an initiated dialogue between a dialogue user and a preset dialogue model; wherein, the dialogue user is: a user of a multi-modal full duplex dialog system using semantic recognition; the dialogue model is as follows: a multi-modal full duplex dialog system; the initiation dialogue is: a dialog initiated on a dialog interface of a multi-modal full duplex dialog system for semantic recognition;

step 2: determining a dialogue mode selected by a dialogue user; the dialogue mode is as follows: text, voice, image, video, etc.;

Step 3: according to the semantic recognition technology and the dialogue mode, acquiring dialogue semantics of initiating dialogue; wherein, the dialogue meaning is: initiating meaning of dialog inclusion;

Step 4: a multi-modal full duplex conversation is conducted based on the meaning of the utterance and the conversation modality. Wherein, the form of the multi-mode full duplex dialogue refers to dialogue can be various, such as: conversations are performed through text, audio, images, and the like.

The working principle and the beneficial effects of the technical scheme are as follows:

When the method is applied specifically, a dialogue user initiates a dialogue of a mode which can be supported by any server in a multi-mode full duplex dialogue system with semantic recognition, and after the multi-mode full duplex dialogue system with semantic recognition receives the initiated dialogue, a reply of the mode selected by the dialogue user is generated.

In one embodiment, step 1: the method comprises the steps of obtaining an initiated dialogue between a dialogue user and a preset dialogue model, and comprising the following steps:

determining an input port for initiating a dialog; the input ports are, for example: radio microphone, such as: a file input box;

acquiring input information of a dialogue user; the input information is, for example: inputting text, inputting pictures, inputting voice and the like;

Determining a target port of the input information according to the information type of the input information; the information types are, for example: text, images, audio, etc.; when determining a destination port for input information, for example: when the input information is text or image, determining an input box of a preset user interface, wherein the preset user interface is a dialogue interface presented to a dialogue user, and the dialogue user inputs or drags the text and the image to be input to the corresponding input box to complete the determination of a target port (namely, the input port corresponding to the input box of the user interface is the target port of the text and the image); and, for example: the sound receiving microphone receives sound of surrounding environment in real time, and when the input information is voice information, the target port is a port corresponding to the sound receiving microphone;

determining port information according to input information based on an analysis rule corresponding to a target port; the analysis rule is determined according to the port equipment; the port information is: dialogue information transmitted by the port in real time;

Based on the real-time Web technology, the initiation of the dialogue is determined according to the port information. The real-time Web technology belongs to the category of the prior art, and is not described in detail.

The application determines the target port of the input information according to the determined input port and the information type of the input information of the dialogue user, introduces the analysis rule of the target port and determines the port information according to the input information. The real-time Web technology is introduced to initiate the dialogue in real time according to the port information, so that timeliness of initiating the dialogue to acquire is improved, the dialogue efficiency is higher, and the user experience is better.

In one embodiment, step 2: determining a dialog modality selected by a dialog user, comprising:

Acquiring a mode selection instruction input by a dialogue user, and determining a dialogue mode according to the mode selection instruction; wherein, the mode selection instruction is: the server makes a trigger instruction during the mode change operation;

And/or the number of the groups of groups,

And acquiring the context information input by the dialogue user, determining the mode switching intention of the user according to the context information, and determining the dialogue mode according to the mode switching intention. Wherein, the above information is: historical conversations of conversational users; the mode switching intent is: what mode is desired to be switched to.

The application introduces two modes to determine the dialogue mode, and the determination of the dialogue mode is more reasonable.

In one embodiment, step 3: according to the speech recognition technology and the dialogue mode, acquiring the dialogue semantic of the initiated dialogue comprises the following steps:

Collecting training data according to a dialogue mode; wherein, training data is: a record identifying the semantic meaning of the corresponding dialog of the dialog modality, such as: identifying a machine learning record that text represents a semantic, such as: identifying a machine learning record of speech representative semantics;

based on a preset extraction rule, determining a plurality of extraction samples according to training data; the extraction rule is as follows: the number of the extracted training data samples is replaced, 1 sample is extracted each time, and finally the extracted samples with the number of the samples are formed;

Training a language identification decision tree according to the extracted sample based on a random forest algorithm; wherein, the random forest algorithm belongs to the category of the prior art, and the principle is not repeated; the semantic meaning decision tree is: determining a decision tree for speaking the meaning according to the initiating statement;

Determining a plurality of decision results according to the initiated dialog and the semantic recognition decision tree; wherein, the decision result is: semantic decision results of the initiating sentences output by each semantic identification decision tree;

acquiring a decision result expression heat map; wherein, the decision result expression heat map is: the heat map used for visualization to represent different decision results, the heat map analysis belongs to the category of the prior art;

Determining tree nodes with the maximum volume in the cluster tree; wherein, the tree node with the largest volume is: the maximum volume is: the number of decision results divided into tree nodes is the largest;

Obtaining a central heat map value of a decision result corresponding to the tree node; wherein, the central heat map value is: average heat map value of decision result corresponding to tree node;

Determining a meaning of the utterance according to the central heat map value; wherein each heat map value has a correspondingly characterized semantic meaning, and therefore, can be determined;

And carrying out iterative merging on the decision result with the highest similarity to obtain a cluster tree. Wherein, the iterative merging is as follows: and combining decision results with highest similarity of the current iteration when each iteration is performed until the number of combined results reaches a manual preset value.

The application collects training data corresponding to dialogue modes, and determines extraction samples for generating semantic recognition decision trees according to the training data. And introducing a random forest algorithm, training a language identification decision tree according to the extracted sample, and obtaining a decision result of the language identification decision tree on initiating the conversation. Taking the discreteness of the distribution of the decision results into consideration, introducing a decision result expression heat map, and clustering the hierarchy of each decision result to obtain a cluster tree; when hierarchical clustering is carried out, similarity among decision results is introduced, iterative merging is carried out, and when similarity is calculated, a similarity normalization coefficient is introduced, so that the rationality of hierarchical clustering is improved. And determining the central heat map value of the tree node with the maximum volume in the cluster tree to determine the meaning of the words, so that the determination efficiency of the meaning of the words is improved.

In one embodiment, collecting training data according to a dialog modality includes:

Acquiring a mode type of a dialogue mode; among them, the modality types are, for example: text, images, and video, etc.;

Determining a collection rule according to the mode type; wherein, the collection rule is: how to collect training data corresponding to the modality type, for example: how to collect records of text semantic recognition, such as: how to collect records of speech semantic recognition;

Determining a collection rule to correspond to a preset collection template; wherein, the collection template is: the template accords with the collection rule, and the constraint of the template only carries out the collection of corresponding records according to the collection rule;

acquiring a dialogue scene of initiating a dialogue; the dialogue scene is as follows: application scenarios of dialog, such as: a visual question-answering system, a chat robot, etc.;

extracting dialogue scene characteristics of a dialogue scene based on a preset dialogue scene characteristic extraction rule; the dialogue scene features are, for example: a dialogue of which topic is performed, a form of the dialogue, and the like;

Determining feature setting parameters of the collection template according to dialogue scene features; wherein, the characteristic setting parameters are: constraint parameters of the collection template are restrained, and the collection behavior of the collection template is restrained to further acquire more accurate training data;

Training data is collected based on the target template.

In general, the process of semantic recognition of different dialog modalities is different, so the application introduces modality types, determines collection rules according to the modality types, and acquires collection templates according to the collection rules. In addition, a full duplex dialogue scene is introduced, dialogue scene characteristics of the dialogue scene are extracted to determine characteristic setting parameters of the collecting template, corresponding characteristic setting parameters of the collecting template are set, training data are collected, and the collected training data are more suitable.

In one embodiment, step 4: according to the meaning of the words and the conversation mode, carrying out multi-mode full duplex conversation, comprising:

according to the meaning of the words, obtaining dialogue requirements; the dialogue requirements are as follows: requirements for semantic characterization of conversations, such as: please help me draw a picture with the theme "xx" with AI;

Determining an output channel of a dialogue mode; wherein, the output channel is: delivering different forms or intermediaries of dialog replies to the user;

determining output content according to the dialogue requirement and the output channel; wherein, the output content is: reply to dialog requirements, such as: AI drawing entitled "xx";

according to the application, the output content is determined according to the acquired dialogue requirement and the output channel of the dialogue mode, the multi-mode full duplex dialogue is performed according to the output content, more diversified responses are provided, and the interactivity is also stronger.

In one embodiment, determining output content based on dialog requirements and output channels includes:

determining a first dialogue vector of dialogue requirements based on a preset dialogue vector model; the dialogue vector model is as follows: word bag model; the first dialog vector is: regarding each word in the dialogue as an independent feature, constructing a vocabulary, and counting the occurrence times of each word in the dialogue or using weights such as TF-IDF and the like for each dialogue to represent, so as to obtain a vector representation with fixed length finally;

Acquiring a corpus group corresponding to an output channel; wherein, the output channel corresponds corpus group and is: a dialogue record containing a dialogue mode corresponding to the output channel;

Based on a preset sentence-breaking rule, determining a plurality of first sentence-breaking corpora according to the corpus group; the sentence breaking rules are, for example: for the questioner, a sentence is broken when an enter key is pressed, and for the compound, a sentence is broken when 3s is not output;

determining a second dialogue vector of the first sentence-breaking corpus based on the dialogue vector model; wherein, the construction rule of the second dialogue vector is the same as the first dialogue vector;

Aligning the vector starting points of the first dialogue vector and the second dialogue vector, and acquiring a first vector included angle between the first dialogue vector and the second dialogue vector after the vector starting points are aligned; the first vector included angle is, for example: 15 degrees;

If the first vector included angle is smaller than a preset vector included angle threshold value, the corresponding second dialogue vector is used as a third dialogue vector; the vector angle threshold is preset manually, for example: 10 degrees;

calculating a vector modulus difference value between the fourth dialogue vector and the first dialogue vector; the vector modulus difference is, for example: 0.2;

Calculating a dimension difference value of the same vector dimension of the fourth dialogue vector and the first dialogue vector; the dimension difference value is a numerical value difference value of vector elements of a fourth dialogue vector and a first dialogue vector of the same vector dimension;

acquiring the dimension weight of the vector dimension according to the vector dimension and a preset dimension weight library; the vector dimension and the corresponding dimension weight in the preset dimension weight library are input in advance by manpower;

and determining a vector modulus value difference value according to the dimension difference value and the dimension weight based on a preset calculation rule. The preset calculation rule is that the dimension difference value is squared and multiplied by the corresponding dimension weight and summed to obtain a result value, and the obtained result value is root-signed to obtain a vector module value difference value.

When the reply of the dialogue is initiated, a vector matching technology is adopted, but because the meaning of each vector dimension representation is different, the dialogue reply determined by simply calculating the vector difference is not accurate enough, the application introduces a dialogue vector model, determines a first dialogue vector of dialogue requirements, performs sentence breaking on a corpus corresponding to an output channel according to sentence breaking rules, determines a first sentence breaking corpus, and determines a second dialogue vector of the first sentence breaking corpus. Calculating a first vector included angle of the first dialogue vector and the second dialogue vector, screening a second vector included angle of the first vector included angle being smaller than a vector included angle threshold value, calculating a vector module value difference value of a fourth dialogue vector and the first dialogue vector, introducing a vector dimension and a dimension weight library during calculation, determining a dimension weight, determining a vector module value difference value according to the dimension difference value and the dimension weight, determining output content according to the second vector included angle and the vector module value difference value, further improving the accuracy degree of the obtaining process of the output content, and being more suitable.

In one embodiment, determining output content based on the second vector included angle and the vector modulus difference value comprises:

acquiring a first conversion rule corresponding to the vector included angle and a second conversion rule corresponding to the vector module value difference; the first conversion rule is as follows: a rule for converting the vector angle into a first conversion value; the second conversion rule is: a rule for converting the vector modulus difference value into a second converted value;

According to the first conversion rule, determining a first conversion value corresponding to the second vector included angle, and correlating with a corresponding third dialogue vector; wherein the first conversion value is a numerical value;

Determining a second conversion value corresponding to the vector module value difference value according to a second conversion rule, and correlating with a corresponding third dialogue vector; wherein the second conversion value is a numerical value;

Summing the first conversion value and the second conversion value associated with the third dialogue vector to obtain a statistical value; the larger the statistical value is, the less the requirement of the first sentence-breaking corpus representation corresponding to the third dialogue vector is matched with the dialogue requirement;

And determining a third sentence-breaking corpus according to the second sentence-breaking corpus and the first sentence-breaking corpus in the corpus group, and taking the third sentence-breaking corpus as output content. Wherein, the third sentence breaking sentence is: and replying sentences and replying contents of the second sentence-breaking corpus in the corpus group.

The application introduces a first conversion rule and a second conversion rule, determines a first conversion value according to the first conversion rule and the second vector included angle, determines a second conversion value according to the vector module value difference value and the second conversion rule, calculates the sum of the first conversion value and the second conversion value associated with the third dialogue vector, obtains a statistic value, takes the response of the first sentence-breaking corpus corresponding to the third dialogue vector with the minimum statistic value as output content, and has more reasonable determination of the output content.

In one embodiment, a multi-modal full duplex conversation is conducted based on output content, comprising:

Acquiring a presentation requirement of output content; among these presentation requirements are, for example: an image of what parameters are as follows: dynamic video presentation of what parameters;

Determining a first support parameter according to the presentation requirement; wherein the first support parameters include: a platform supporting presentation requirements, a platform carrying server and setting parameters;

acquiring a second support parameter of the local server; wherein the second support parameters include: configuration parameters of the local server;

Judging whether the output content can be presented on the user interface or not according to the first support parameter and the second support parameter; judging whether the output content can be presented on the user interface, and if the second support parameter of the local server can support the first support parameter, the output content can be presented on the user interface, otherwise, the output content cannot be presented;

If the output content can not be presented on the user interface, establishing a communication link with a target platform meeting the presentation requirement through a local server, and sending the presentation requirement to the target platform; the target platform is, for example: a three-dimensional modeling platform;

and acquiring the presentation information after the target platform receives the presentation request, and returning the presentation information in real time. The presentation information is, for example: three-dimensional animation of SolidWorks aided design.

When a dialogue user performs a dialogue, there are different requirements, such as: drawing some graphs with larger demand on calculation force, wherein the local server may not have corresponding configuration, so the application determines the first support parameter according to the acquired presentation requirement of the output content, judges whether the output content can be presented on the user interface according to the first support parameter and the second support parameter of the local server, and when the output content can be directly output on the user interface, otherwise, a communication link with a target platform meeting the presentation requirement is established, and the presentation information of the target platform after receiving the presentation requirement is returned in real time, thereby expanding the formal breadth of the multi-mode dialogue presentation content and improving the user experience.

The embodiment of the invention provides a multi-mode full duplex dialogue system for semantic recognition, which is shown in fig. 2 and comprises the following steps:

an initiating dialogue acquisition subsystem 1, configured to acquire an initiating dialogue between a dialogue user and a preset dialogue model;

a dialog modality determination subsystem 2 for determining a dialog modality selected by a dialog user;

A speaking meaning acquisition subsystem 3, configured to acquire a speaking meaning of initiating a conversation according to a speaking meaning recognition technology and a conversation mode;

A dialogue subsystem 4 for conducting a multi-modal full duplex dialogue according to the meaning of the utterance and the dialogue modality.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A multi-modal full duplex dialog method for semantic recognition, comprising:

step 2: determining a dialogue mode selected by a dialogue user;

step 4: according to the speaking meaning and the conversation mode, carrying out multi-mode full duplex conversation;

Step 3: according to the speech recognition technology and the dialogue mode, acquiring the dialogue semantic of the initiated dialogue comprises the following steps:

Collecting training data according to a dialogue mode;

acquiring a decision result expression heat map;

Determining tree nodes with the maximum volume in the cluster tree;

obtaining a central heat map value of a decision result corresponding to the tree node, wherein the central heat map value is as follows: average heat map value of decision result corresponding to tree node;

Determining the meaning of the words according to the central heat map value and the meaning of the corresponding representation of each heat map value;

2. The method of multi-modal full duplex dialogue for semantic recognition as claimed in claim 1 wherein step 1: the method comprises the steps of obtaining an initiated dialogue between a dialogue user and a preset dialogue model, and comprising the following steps:

Determining an input port for initiating a dialog;

Acquiring input information of a dialogue user;

based on the analysis rule corresponding to the target port, determining port information according to the input information, wherein the port information is as follows: dialogue information transmitted by the port in real time;

3. The multi-modal full duplex dialogue method as claimed in claim 1 wherein step 2: determining a dialog modality selected by a dialog user, comprising:

And/or the number of the groups of groups,

4. The multi-modal full duplex dialogue method as claimed in claim 1 wherein collecting training data based on dialogue modality comprises:

acquiring a mode type of a dialogue mode;

Determining a collection rule according to the mode type;

Determining a collection rule to correspond to a preset collection template;

Acquiring a dialogue scene of initiating a dialogue;

Training data is collected based on the target template.

5. The multi-modal full duplex dialogue method as claimed in claim 1 wherein step 4: according to the meaning of the words and the conversation mode, carrying out multi-mode full duplex conversation, comprising:

According to the meaning of the words, obtaining dialogue requirements;

determining an output channel of a dialogue mode;

6. The method of claim 5, wherein determining output content based on the dialog requirements and the output channel comprises:

obtaining a corpus group corresponding to an output channel, wherein the corpus group corresponding to the output channel is as follows: a dialogue record containing a dialogue mode corresponding to the output channel;

Wherein, according to the corpus group, determining a plurality of first sentence-breaking corpora includes: for the questioner, pressing the enter key once to determine a first sentence-breaking corpus, and for the compound feed, no output is generated for 3 seconds to determine a first sentence-breaking corpus;

Based on a preset calculation rule, determining a vector modulus value difference value according to the dimension difference value and the dimension weight;

Determining output content according to the second vector included angle and the vector modulus difference value, including:

Determining a third sentence-breaking corpus according to the second sentence-breaking corpus and the first sentence-breaking corpus in the corpus group, wherein the third sentence-breaking corpus is used as output content, and the third sentence-breaking corpus is: and replying sentences and replying contents of the second sentence-breaking corpus in the corpus group.

7. The method for multi-modal full-duplex dialogue for semantic recognition as claimed in claim 5 wherein said performing a multi-modal full-duplex dialogue based on said output content comprises:

acquiring a presentation requirement of output content;

acquiring a second support parameter of the local server;

8. A multi-modal full duplex dialog system for semantic recognition, comprising:

a dialogue subsystem for performing a multi-mode full duplex dialogue according to the meaning of the utterance and the dialogue mode;

The following operations are performed on the speech intent acquisition subsystem:

Collecting training data according to a dialogue mode;

acquiring a decision result expression heat map;

Determining tree nodes with the maximum volume in the cluster tree;