CN111639168A

CN111639168A - Multi-turn conversation processing method and device, electronic equipment and storage medium

Info

Publication number: CN111639168A
Application number: CN202010437955.9A
Authority: CN
Inventors: 赵筱军; 罗雪峰; 白常福; 范良煌; 何谐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2020-09-08
Anticipated expiration: 2040-05-21
Also published as: CN111639168B

Abstract

The application discloses a method and a device for processing multi-turn conversations, electronic equipment and a storage medium, and relates to the field of natural language processing and cloud computing. One embodiment of the method comprises: acquiring current conversation data of a user; identifying a current intent, a current entity, and a current session context in the current dialog data; when determining that a scene switching condition is met according to the current intention and/or the current entity, taking the current session scene as a target session scene; processing multiple rounds of dialog in the target session scenario. The embodiment of the application can improve the universality of a multi-turn dialogue system, so that the intelligence and the fluency of multi-turn dialogue are improved.

Description

Multi-turn conversation processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a natural language processing and cloud computing technology.

Background

The multi-turn dialog system is being increasingly used in many fields as a very popular and promising application. The multi-turn conversation system adopts the related technologies of voice recognition, natural language understanding and the like, and provides various services such as consultation, guidance, daily chat and the like for the user in a voice interaction mode.

In a multi-turn dialog application scenario, the user still dominates. In the continuous conversation, the chat topics of the context of the user have continuity and complementarity, and the jumping among the topics has the characteristic of randomness.

However, in the current multi-turn dialog system, a simple question-answer library matching mechanism is usually used, most of the dialog systems use single-turn dialog and task-type dialog based on a card slot, and only simple dialog and service can be provided, which is not suitable for dialog of scene switching application scenes, and during multi-turn interaction with a user, phenomena such as topic scene transfer, question answering and even no answer can occur.

Disclosure of Invention

The embodiment of the application provides a processing method and device for multi-turn conversations, electronic equipment and a storage medium, so that the universality of a multi-turn conversation system is improved, and the intelligence and the fluency of the multi-turn conversations are improved.

In a first aspect, an embodiment of the present application provides a method for processing multiple rounds of dialogs, including:

acquiring current conversation data of a user;

identifying a current intent, a current entity, and a current session context in the current dialog data;

when determining that a scene switching condition is met according to the current intention and/or the current entity, taking the current session scene as a target session scene;

processing multiple rounds of dialog in the target session scenario.

In a second aspect, an embodiment of the present application provides a processing apparatus for multiple rounds of dialogues, including:

the current dialogue data acquisition module is used for acquiring current dialogue data of a user;

the information identification module is used for identifying the current intention, the current entity and the current conversation scene in the current conversation data;

a first target session scene determining module, configured to, when it is determined that a scene switching condition is satisfied according to the current intention and/or the current entity, take the current session scene as a target session scene;

and the multi-turn dialog processing module is used for processing multi-turn dialogues in the target session scene.

In a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the processing method of the multi-turn dialog provided by the embodiment of the first aspect.

In a fourth aspect, the present application further provides a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are configured to cause the computer to perform the processing method for multiple dialogs provided in the first aspect.

According to the method and the device, the current intention, the current entity and the current conversation scene in the current conversation data of the user are identified, so that when the scene switching condition is determined to be met according to the current intention and/or the current entity, the current conversation scene is used as the target conversation scene and multiple rounds of conversations are processed, the problem that the scene switching adaptability of an existing multiple-round conversation system is poor is solved, the universality of the multiple-round conversation system is improved, and the intelligence and the fluency of the multiple rounds of conversations are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a flowchart of a processing method of a multi-turn dialog according to an embodiment of the present application;

FIG. 2a is a flowchart of a processing method for multiple rounds of dialog according to an embodiment of the present disclosure;

FIG. 2b is a schematic flowchart illustrating a processing method for multiple sessions according to an embodiment of the present disclosure;

fig. 3 is a block diagram of a processing apparatus for multi-turn dialog according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device for implementing a processing method of a multi-turn dialog according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In an example, fig. 1 is a flowchart of a processing method for multiple rounds of conversations, which is provided in an embodiment of the present application, and this embodiment may be applicable to a case where a conversation scene is accurately determined to perform multiple rounds of conversations, and the method may be executed by a processing apparatus for multiple rounds of conversations, and the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. The electronic device may be a terminal device for human-computer interaction. Accordingly, as shown in fig. 1, the method comprises the following operations:

and S110, acquiring current conversation data of the user.

Wherein the current dialogue data may be dialogue data currently input by the user.

In the embodiment of the application, the multi-turn dialogue system can collect the current dialogue data of the user in real time through a voice collection technology.

And S120, identifying the current intention, the current entity and the current conversation scene in the current conversation data.

Wherein the intention represents a business action to be performed by the user, such as checking weather, checking balance or transferring money, etc. The intention can be distinguished into a top-level intention, which can be triggered at any time during a dialog, and a sub-intention, which can be triggered only in the corresponding scene. The entity may be a parameter required to complete a business action, such as time, place, or card number. An entity is the core concept of conversational data, being somewhat related to the user's intent. An intent plus several entities may complete the transaction of a business. Correspondingly, the current intention is the intention corresponding to the current dialogue data, and the current entity is the entity corresponding to the current dialogue data. The current conversation scenario may be one scenario consisting of the current intent and all conversational interactions under the current intent.

In the embodiment of the application, after the current dialogue data of the user is acquired, the current intention, the current entity and the current conversation scene in the current dialogue data need to be identified. Alternatively, a learning model, such as a convolutional neural network or a machine learning model, may be employed to identify the current intent, the current entity, and the current session scenario in the current dialog data. Alternatively, the current intention may be identified by using a method based on template matching or text classification, or the current entity may be identified by using a method based on keyword matching, template matching, or a statistical model, and the specific identification technology type is not limited in the embodiment of the present application.

S130, when the scene switching condition is determined to be met according to the current intention and/or the current entity, the current conversation scene is used as a target conversation scene.

The scene switching condition may be a condition for determining whether to perform scene switching. Optionally, in this embodiment of the present application, the scene switching condition may be set according to the current intention and/or the current entity, and this embodiment of the present application does not limit the specific content of the scene switching condition.

Since the conversation scene is composed of the intention and all the dialogue interactions with the intention, in a normal case, the scene switching can be performed by recognizing the current intention in the current dialogue data. However, it should be noted that if the current dialogue data can identify both the current intention and the current entity, the dialogue effect of only performing scene switching according to the current intention is not ideal.

In a specific example, assuming that the current session scenario is a car rental scenario, the multi-turn dialog system may require the user to continue providing date information according to a predetermined configuration after the user provides city information in the previous session scenario. Assuming that the user wants to determine a specific car rental date according to weather conditions, the current dialogue data of the user may express an intention of inquiring weather, such as "help me inquire the weather of the next day". The current dialog data may identify both the weather query intent and the time entity. Obviously, it is more appropriate to switch the current session to the weather inquiry scenario at this time. Assuming that the user determines that the user needs to rent a car in the next day, and continues to input 'the next day bar' without knowing how the weather is, but the user is determined in the next day regardless of the weather, the current dialogue data can also identify the weather query intention and the time entity at the same time. Obviously, it is more appropriate to keep the car rental scenario of the current session at this time, i.e., not to switch the scenario.

Therefore, the universality of judging whether to switch to the current conversation scene is not strong only according to the current intention, and the conversation effect including the current intention and the current entity under partial scenes can be influenced, so that the intelligence and the fluency of multi-turn conversation can not meet the conversation requirement. Therefore, the embodiment of the application judges whether the scene switching condition is met by combining the current intention and the current entity at the same time, and the scene switching is carried out only when the scene switching condition is determined to be met, the current conversation scene is used as the target conversation scene, the processing mode of the multi-turn conversation has stronger universality, and the intelligence and the fluency of the multi-turn conversation can be further improved.

And S140, processing multiple rounds of conversations in the target conversation scene.

In the embodiment of the application, when the scene switching condition is determined to be met according to the current intention and/or the current entity, the confidence coefficient of the current conversation scene is higher, the current conversation scene is used as the target conversation scene, and the effect of processing multiple rounds of conversations in the target conversation scene is more ideal.

In an optional embodiment of the present application, the target session scenario may include: a consultation scenario, a guidance scenario, a daily voice interaction scenario, or a learning coaching scenario.

Alternatively, the consultation scenario may be a scenario for consultation information, wherein the type of consultation information may include, but is not limited to, weather consultation or product consultation, etc. The guidance scene may be a scene related to guidance information such as a navigation route, the daily voice interaction scene may be a chat scene, and the learning guidance scene may be a scene of examination question searching, examination question answering, online teaching, and the like, and the specific scene type of the target session scene is not limited in the embodiment of the present application.

In an example, fig. 2a is a flowchart of a processing method for multiple rounds of conversations, which is provided in an embodiment of the present application, and the embodiment of the present application performs optimization and improvement on the basis of technical solutions of the foregoing embodiments, and provides a specific implementation manner that whether a scene switching condition is satisfied is determined according to a current intention and/or a current entity, when the scene switching condition is satisfied, a current conversation scene is used as a target conversation scene to process multiple rounds of conversations, and when the scene switching condition is not satisfied, a previous conversation scene is used as the target conversation scene to process multiple rounds of conversations.

A method for processing multiple rounds of conversations as shown in fig. 2a, comprising:

and S210, acquiring the current conversation data of the user.

And S220, identifying the current intention, the current entity and the current conversation scene in the current conversation data.

And S230, judging whether the current intention and/or the current entity meet scene switching conditions, if so, executing S240, otherwise, executing S250.

In an optional embodiment of the present application, determining that the scene cut condition is satisfied according to the current intention and/or the current entity may include: determining that a scene cut condition is satisfied when the current dialog data includes the current intent and does not include the current entity; determining that a scene cut condition is satisfied when the current dialog data includes the current intent and the current entity at the same time, and the confidence of the current intent and the current entity is determined to satisfy a scene cut sub-condition.

In an optional embodiment of the present application, determining that the scene cut condition is not satisfied according to the current intention and/or the current entity may include: determining that a scene cut condition is not satisfied when the current dialog data includes the current entity and does not include the current intent; determining that a scene cut condition is not satisfied when the current dialog data includes the current intention and the current entity at the same time and the confidence degrees of the current intention and the current entity are determined not to satisfy the scene cut sub-condition; and when the current intention and the current entity are both empty, determining that the scene switching condition is not met.

The scene switching sub-condition may be a condition for determining whether to perform scene switching when the current dialog data includes the current intention and the current entity at the same time.

Specifically, when the current dialogue data includes only the current intention, it is determined that the scene switching condition is satisfied, which indicates that the dialogue data of the user does need to switch the session scene. And when the current dialogue data simultaneously comprises the current intention and the current entity, if the confidence degrees of the current intention and the current entity meet the scene switching sub-condition, determining that the scene switching condition is met, and otherwise, determining that the scene switching condition is not met. And when the current dialogue data only comprises the current entity or neither comprises the current intention and the current entity, namely the current intention is not identified, determining that the scene switching condition is not met.

In an optional embodiment of the present application, determining that the confidence level of the current intent and the current entity satisfies the scenario switch sub-condition may include: determining an intent confidence for the current intent and an entity confidence for the current entity; when it is determined that the intent confidence is greater than the entity confidence, determining that the current intent and the confidence of the current entity satisfy a scene cut sub-condition.

Wherein the intention confidence may be a confidence calculated for the current intention, and the entity confidence may be a confidence calculated for the current entity.

Accordingly, when the current dialogue data includes both the current intention and the current entity, the intention confidence of the current intention and the entity confidence of the current entity need to be determined, respectively. If the intention confidence is greater than the entity confidence, determining that the current intention and the confidence of the current entity meet the scene switching sub-condition, namely determining that the scene switching condition is met; otherwise, determining that the current intention and the confidence of the current entity do not satisfy the scene switching sub-condition, that is, determining that the scene switching condition is not satisfied.

In the above scheme, as long as the current dialog data does not include the current intention, it is determined that the scene switching condition is not satisfied, and under the condition that the current intention and the current entity are included at the same time, it is determined whether the scene switching is required or not according to the intention confidence of the current intention and the entity confidence of the current entity, so that the problem that the dialog effect is not ideal because the scene switching is still performed when the entity confidence is smaller than the intention confidence can be effectively solved.

In an optional embodiment of the present application, determining the intent confidence of the current intent and the entity confidence of the current entity may include: determining an intention confidence of the current intention according to a set classification model; filtering the interference words included in the current dialogue data to obtain filtered dialogue data; and calculating text similarity for the entity value of the current entity and the filtering dialogue data, and determining the entity confidence of the current entity according to the calculation result.

The set classification model may be, for example, a rule-based classification, a conventional machine learning algorithm, or a deep learning algorithm, and the type of the model of the set classification model is not limited in this application. The interference words may be type words such as stop words or language words, and the filtered dialogue data may be data obtained by filtering the interference words in the current dialogue data.

Specifically, the intention recognition is to recognize the whole sentence, and may recognize the current dialogue data according to a set classification model, and calculate the intention confidence of the current intention. When the entity confidence is calculated, firstly, the interference words included in the current dialogue data need to be filtered to obtain filtered dialogue data, so that the text similarity between the entity value of the current entity and the filtered dialogue data is calculated on the basis of the filtered dialogue data, and the entity confidence of the current entity is obtained.

And S240, taking the current session scene as a target session scene.

And S250, taking the previous session scene as the target session scene.

Accordingly, when it is determined that the scene cut condition is not satisfied according to the current intention and/or the current entity, the previous session scene may be taken as the target session scene. That is, no scene change is performed.

In the above scheme, when it is determined that the scene switching condition is not satisfied according to the current intention and/or the current entity, the previous session scene is taken as the target session scene, so that the problem that the conversation effect is not ideal due to scene recognition, that is, switching can be avoided.

And S260, processing multiple rounds of conversations in the target conversation scene.

In an optional embodiment of the present application, processing multiple rounds of dialog in the target session scenario may include: acquiring dialogue associated data stored in a previous conversation scene; the conversation associated data comprises preorder entity data and a corresponding relation between a conversation node and entity collection operation; updating target entity data of the target session scene according to the preorder entity data; and determining conversational dialogue according to the target entity data and the corresponding relation collected by the nodes and the entities.

Where the preceding entity data may be the entity data collected from a previous session scenario. The conversation node can be a round of interaction between a user and a machine device in a conversation process, conversation data of the user can trigger one conversation node to process a request and obtain an answer, and the specific processing process can comprise entity collection, context inheritance, context transition, service system docking, answer generation and the like. The corresponding relation between the conversation node and the entity collection operation can be used for switching entities or modifying collected entity data and other functions. The target entity data may be entity data collected in a target session scenario.

In the embodiment of the present application, if the target session scene is the switched current session scene, when multiple rounds of conversations are processed in the target session scene, the corresponding relationship between the preamble entity data stored in the previous session scene and the collection operation of the conversation node and the entity can be obtained. And then updating target entity data of the target session scene according to the pre-order entity data stored in the previous session scene, and determining the dialogues according to the updated target entity data and the corresponding relation collected by the nodes and the entities. For example, node migration is performed according to the session node hopping configuration, entity switching or modification is performed on target entity data according to the correspondence between the nodes and the entity collection, and then an entity collection session of a target session scene is generated according to the finally obtained target entity data, wherein the entity collection session is generated according to the collected entity data.

In a specific example, assuming that the current session scenario is a car rental scenario, the multi-turn dialog system may require the user to continue providing date information according to a predetermined configuration after the user provides city information in the previous session scenario. Assuming that the user wants to determine a specific car rental date according to weather conditions, the current dialogue data of the user may express an intention of inquiring weather, such as "help me inquire the weather of the next day". The words include both the weather query intent and the time entity, and the intent may be more confident than the entity, thus switching the conversation to the scene of the weather query. The location information of the weather inquiry scene can inherit the location information already collected in the last car renting scene or guide clarified location information in consideration of the conversation experience and the correlation of the two scenes. And after the weather query is finished, the user continues to switch the scene of returning the rented car by expressing the intention of renting the car, and continues the previous process to collect the date information of rented cars.

In the technical scheme, the target entity data of the target session scene is updated by utilizing the preorder entity data of the previous session scene, so that the data inheritance of the previous session scene can be realized, the multi-round dialog can be processed based on the scene context, and the multi-round dialog has intelligence and fluency.

In an optional embodiment of the present application, updating the target entity data of the target session scenario according to the preamble entity data may include: and when the preorder entity data are determined to meet the entity data inheritance condition, updating the target entity data of the target session scene according to the preorder entity data.

The entity data inheritance condition may be a condition for determining whether the prologue entity data needs to be inherited.

Correspondingly, only when the previous entity data meets the entity data inheritance condition, the target entity data of the target session scene can be updated according to the previous entity data. And if the preorder entity data does not meet the data inheritance condition, directly collecting the entity data of the current session scene according to the current session data, and taking the collected entity data of the current session scene as target entity data. The benefits of this arrangement are: the problem of mismatching conversation technique caused by inheriting the preorder entity data which does not meet the inheritance condition of the entity data is avoided.

In an optional embodiment of the present application, determining that the preamble entity data satisfies an entity data inheritance condition may include: and when the prologue entity data configuration data inheritance label is determined, determining that the prologue entity data meet an entity data inheritance condition.

Wherein the data inheritance tag can be a manually configured data tag for identifying that the data is inheritable. For example, assuming that a worker configures a data inheritance tag for a time entity of a multi-turn dialog system in advance, when the preamble entity data includes a preamble time entity, the multi-turn dialog system may configure the data inheritance tag for the preamble time entity.

Optionally, determining whether the preamble entity data meets the entity data inheritance condition may specifically be: and when the prologue entity data configuration data inheritance label is determined, determining that the prologue entity data meet the entity data inheritance condition.

In the scheme, the problem of mismatching dialogues caused by the fact that entity data which is not inherited is inherited by a system due to the influence of automatic configuration errors of the system can be avoided by manually configuring the data inheritance tag of the entity data.

In an optional embodiment of the present application, processing multiple rounds of dialog in the target session scenario may include: determining a scene type of the target session scene; when the scene type of the target conversation scene is determined to be the target scene type, determining conversation operation according to the current conversation data; and when the scene type of the target session scene is determined to be null, determining the conversation operation according to a preset unmatched conversation operation list.

The target scene type may be any scene type. The preset unmatched dialogs list may be a preset unmatched dialogs that is not successfully matched, for example, "sorry, i do not understand your meaning," and may be specifically set according to actual requirements, and the embodiment of the present application does not limit the content of the dialogs and the number of the dialogs in the preset unmatched dialogs list.

Correspondingly, if the target session scene is the previous session scene, when multiple rounds of conversations are processed in the target session scene, whether the previous session scene belongs to the target scene type or not can be judged firstly. That is, it is determined whether the session is in a certain scene. If the scene type of the target session scene is the target scene type, that is, the session is determined to be in a certain scene, the conversational skill can be determined according to the current conversational data. If the scene type of the target session scene is empty, that is, the session is determined not to be in a certain scene, the conversational dialogue can be determined according to a preset unmatched conversational list. The benefits of this arrangement are: when the situation that scene switching is not carried out and the situation that the previous conversation scene is empty is identified, conversation dialogues are determined through the preset unmatched conversation list, the user experience and the intelligence of multiple rounds of conversations can be improved, and the method and the device are suitable for the situation when the user and the machine start the first round of conversation.

In an optional embodiment of the present application, determining a conversational dialogue according to the current conversational data may include: when the current intention and the current entity are not empty, and the semantic meaning expressed by the user is determined to be an entity query according to the current dialogue data and entity clarification is determined, determining a dialogue technique according to a matched entity clarification technique; determining a conversational language according to an entity collection conversational language when the current intent and the current entity are not both empty, and the semantic meaning expressed by the user is determined to be an entity query according to the current conversational data and the entity clarification is determined to be rejected; when the current intention and the current entity are not empty and the semantic meaning expressed by the user is determined to be a non-entity query according to the current dialogue data, collecting the current entity data and determining a dialogue technique according to the current entity data; determining a dialogs based on the matched entity clarification dialogs when the current intent and the current entity are both empty and entity clarification is determined; determining conversational utterances according to an entity gathering conversational utterance when the current intent and the current entity are both empty and it is determined to refuse to perform entity clarification.

In this case, the entity query also asks about a specific entity, for example, "what is a Sport Utility Vehicle (SUV)? ". Entity clarification, i.e. clarifying the user of an entity, e.g. "ask you what is you asking about SUV? ". The list of clarifying words may include multiple types of clarifying words. For example, a clarifying word for clarifying an entity, or a clarifying word intended to clarify, which is a question used when a machine device wishes to obtain certain information, may be specifically set according to actual needs, and the embodiments of the present application do not limit the content of the clarifying word and the number of the clarifying words in the clarifying word list.

Correspondingly, if the scene type of the previous conversation scene belongs to the target scene type, that is, when the conversation is in a certain scene, the conversational technology needs to be determined according to the current meaning, the current entity, the semantics of the current conversational data expression, whether entity clarification is needed, and the like. Specifically, under the condition that the current intention and the current entity are not empty and the semantic meaning expressed by the user is determined to be the entity query according to the current dialogue data, if the entity clarification is determined, the dialogue technique is determined according to the matched entity clarification technique; otherwise, the conversational dialogue may be determined directly from the entity gathering dialogue. In the event that the current intent and the current entity are not both empty and the semantics of the user expression are determined to be non-entity queries based on the current conversational data, the current entity data may be collected and conversational utterances determined based on the current entity data. For example, node migration is performed according to the dialogue node jump configuration, and other entity collection dialogs or service completion dialogs are generated. Illustratively, a business completion dialog may be "good, goodbye". If the current intention and the current entity are both empty, if the entity clarification is determined to be performed, determining the dialogue technique according to the matched entity clarification technique; otherwise, determining the conversation technique according to the entity collection conversation technique.

In the scheme, the dialogues are determined according to the current meaning, the current entity, the semantics of the current dialogue data expression, whether entity clarification is needed or not and the like, so that the dialogue requirements of the user can be clarified, the accurate dialogues can be determined, and the intelligence and the fluency of multiple rounds of dialogues can be further improved.

In an alternative embodiment of the present application, determining to perform entity clarification may include: performing fuzzy matching according to the candidate clarifying dialect list and the current dialogue data, and determining to perform entity clarification when the matched clarifying dialect is determined to exist; determining to reject entity clarification may include: and performing fuzzy matching with the current dialogue data according to the candidate clarifying dialogue list, and refusing to perform entity clarification when determining that no matched clarifying dialogue exists.

Wherein, the candidate clarifying words list can be a preset list comprising a plurality of clarifying words.

Specifically, fuzzy matching may be performed on the current dialogue data according to the candidate clarified dialogue list, when it is determined that there is a matched clarified dialogue, entity clarification is determined to be performed, and when it is determined that there is no matched clarified dialogue, entity clarification is rejected.

Fig. 2b is a schematic flowchart of a processing method of a multi-turn dialog according to an embodiment of the present application, and in a specific example, as shown in fig. 2b, an intention and an entity of current dialog data (query) expressed by a user are identified, and the method is divided into the following 4 scenarios according to the identified situations:

(1) if the user expresses the intention and does not express the entity, the session is switched to a new scene corresponding to the intention, and simultaneously, the session associated data collected by the previous scene is temporarily stored, wherein the session associated data comprises preorder entity data and the corresponding relation of the session node and the entity collection operation. And updating the entity data of the new scene according to the entity data sustainable by the previous scene, and performing node migration according to the skip configuration of the conversation node to generate the entity collection dialect of the new scene.

(2) If the conversation is in a certain scene, judging whether the semantic meaning expressed by the user is an entity query or not according to the current conversation. If the entity is inquired, the entity is not identified, whether the entity clarification can be carried out is judged, and if the matched clarification is existed, the entity clarification is returned; otherwise, the replying entity collects the words. If the semantic meaning expressed by the user is not an entity query, entity data is collected, node migration is carried out according to the dialogue node skip configuration, and other entity collection dialogues or service completion dialogues are generated. And if the conversation is not in a certain scene, replying to the preset unmatched conversation.

(3) The user expresses the intention and also expresses the entity, and the intention confidence degree and the entity confidence degree need to be compared respectively. And if the intention confidence degree is greater than the entity confidence degree, switching the conversation to a new scene corresponding to the intention, wherein the subsequent flow is the same as the above (1). And if the intention confidence coefficient is less than or equal to the entity confidence coefficient, not switching the conversation scene, and the subsequent flow is the same as the step (2) above.

(4) The user does not express the intention and does not express the entity, if the conversation is in a certain scene, whether entity clarification can be carried out or not is judged, and if the matched clarification treatment exists, the entity clarification treatment is replied; otherwise, the replying entity collects the words. And if the conversation is not in a certain scene, replying to the preset unmatched conversation.

Therefore, the technical scheme supports switching and recovery of a plurality of scenes, scene switching is determined according to the recognized intents and entities and the confidence degrees of the intents and the entities, and multi-scene switching universality is good. Meanwhile, cross-scene data inheritance of configuration is supported according to business requirements, and the intelligence and the fluency of conversation are improved, so that the man-machine conversation has good experience similar to the man-machine conversation.

According to the technical scheme, the current intention, the current entity and the current conversation scene in the current conversation data of the user are identified, so that when the scene switching condition is determined to be met according to the current intention and/or the current entity, the current conversation scene is used as the target conversation scene and multiple rounds of conversations are processed, when the scene switching condition is determined to be not met according to the current intention and/or the current entity, the previous conversation scene is used as the target conversation scene and multiple rounds of conversations are processed, the universality of the multiple round conversation system is improved, and the intelligence and the fluency of the multiple rounds of conversations are improved.

In an example, fig. 3 is a structural diagram of a processing apparatus for multiple rounds of conversations, which is provided in an embodiment of the present application, and the embodiment of the present application is applicable to a case where a conversation scene is accurately determined to perform multiple rounds of conversations, and the apparatus is implemented by software and/or hardware and is specifically configured in an electronic device. The electronic device may be a terminal device for human-computer interaction.

A processing apparatus 300 for multi-turn dialog, as shown in fig. 3, comprises: a current conversation data acquisition module 310, an information recognition module 320, a first target session scenario determination module 330, and a multi-turn conversation processing module 340. Wherein,

a current dialogue data acquisition module 310, configured to acquire current dialogue data of a user;

an information identification module 320, configured to identify a current intention, a current entity, and a current session scene in the current dialog data;

a first target session scene determining module 330, configured to, when it is determined that a scene switching condition is satisfied according to the current intention and/or the current entity, take the current session scene as a target session scene;

a multi-turn dialog processing module 340, configured to process multiple turns of dialog in the target session scene.

Optionally, the first target session scene determining module includes: a scene switching condition determining unit configured to determine that a scene switching condition is satisfied when the current dialogue data includes the current intention and does not include the current entity; determining that a scene cut condition is satisfied when the current dialog data includes the current intent and the current entity at the same time, and the confidence of the current intent and the current entity is determined to satisfy a scene cut sub-condition.

Optionally, the scene switching condition determining unit is specifically configured to: determining an intent confidence for the current intent and an entity confidence for the current entity; when it is determined that the intent confidence is greater than the entity confidence, determining that the current intent and the confidence of the current entity satisfy a scene cut sub-condition.

Optionally, the scene switching condition determining unit is specifically configured to: determining an intention confidence of the current intention according to a set classification model; filtering the interference words included in the current dialogue data to obtain filtered dialogue data; and calculating text similarity for the entity value of the current entity and the filtering dialogue data, and determining the entity confidence of the current entity according to the calculation result.

Optionally, the multi-round dialog processing module 340 includes: the conversation related data acquisition unit is used for acquiring conversation related data stored in a previous conversation scene; the conversation associated data comprises preorder entity data and a corresponding relation between a conversation node and entity collection operation; a target entity data updating unit, configured to update target entity data of the target session scene according to the preamble entity data; a first conversational speech determining unit, configured to determine conversational speech according to the target entity data and the correspondence relationship collected by the node and the entity.

Optionally, the target entity data updating unit is specifically configured to: and when the preorder entity data are determined to meet the entity data inheritance condition, updating the target entity data of the target session scene according to the preorder entity data.

Optionally, the target entity data updating unit is specifically configured to: and when the prologue entity data configuration data inheritance label is determined, determining that the prologue entity data meet an entity data inheritance condition.

Optionally, the processing apparatus 300 for multiple rounds of conversations further includes: and the second target session scene determining module is used for taking the previous session scene as the target session scene when determining that the scene switching condition is not met according to the current intention and/or the current entity.

Optionally, the second target session scene determining module is specifically configured to: determining that a scene cut condition is not satisfied when the current dialog data includes the current entity and does not include the current intent; determining that a scene cut condition is not satisfied when the current dialog data includes the current intention and the current entity at the same time and the confidence degrees of the current intention and the current entity are determined not to satisfy the scene cut sub-condition; and when the current intention and the current entity are both empty, determining that the scene switching condition is not met.

Optionally, the multi-round dialog processing module 340 includes: a scene type determining unit, configured to determine a scene type of the target session scene; a second conversation determining unit, configured to determine a conversation according to the current conversation data when it is determined that the scene type of the target conversation scene is a target scene type; and the third conversation determining unit is used for determining conversation according to a preset unmatched conversation list when the scene type of the target conversation scene is determined to be null.

Optionally, the second dialog determination unit is specifically configured to: when the current intention and the current entity are not empty, and the semantic meaning expressed by the user is determined to be an entity query according to the current dialogue data and entity clarification is determined, determining a dialogue technique according to a matched entity clarification technique; determining a conversational language according to an entity collection conversational language when the current intent and the current entity are not both empty, and the semantic meaning expressed by the user is determined to be an entity query according to the current conversational data and the entity clarification is determined to be rejected; when the current intention and the current entity are not empty and the semantic meaning expressed by the user is determined to be a non-entity query according to the current dialogue data, collecting the current entity data and determining a dialogue technique according to the current entity data; determining a dialogs based on the matched entity clarification dialogs when the current intent and the current entity are both empty and entity clarification is determined; determining conversational utterances according to an entity gathering conversational utterance when the current intent and the current entity are both empty and it is determined to refuse to perform entity clarification.

Optionally, the second dialog determination unit is specifically configured to: performing fuzzy matching according to the candidate clarifying dialect list and the current dialogue data, and determining to perform entity clarification when the matched clarifying dialect is determined to exist; and performing fuzzy matching with the current dialogue data according to the candidate clarifying dialogue list, and refusing to perform entity clarification when determining that no matched clarifying dialogue exists.

Optionally, the target session scenario includes: a consultation scenario, a guidance scenario, a daily voice interaction scenario, or a learning coaching scenario.

The processing device for multi-round conversations can execute the processing method for multi-round conversations provided by any embodiment of the application, and has corresponding functional modules and beneficial effects of the execution method. For details of the technology not described in detail in this embodiment, reference may be made to a processing method for multiple rounds of dialog provided in any embodiment of the present application.

Since the processing device for multi-turn dialog described above is a device capable of executing the processing method for multi-turn dialog in the embodiment of the present application, based on the processing method for multi-turn dialog described in the embodiment of the present application, a person skilled in the art can understand the specific implementation of the processing device for multi-turn dialog in the embodiment of the present application and various modifications thereof, and therefore, how to implement the processing method for multi-turn dialog in the embodiment of the present application by the processing device for multi-turn dialog is not described in detail herein. The scope of the present application is intended to be covered by the claims so long as those skilled in the art can implement the method for processing multiple rounds of dialog in the embodiments of the present application.

In one example, the present application also provides an electronic device and a readable storage medium.

Fig. 4 is a schematic structural diagram of an electronic device for implementing a processing method of a multi-turn dialog according to an embodiment of the present application. Fig. 4 is a block diagram of an electronic device of a processing method of a multi-turn dialog according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 4, the electronic apparatus includes: one or more processors 401, memory 402, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor 401 is taken as an example.

Memory 402 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the processing method of the multi-turn dialog provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the processing method of multiple dialogs provided by the present application.

The memory 402, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the processing method of multiple rounds of dialogs in the embodiment of the present application (for example, the current dialog data acquisition module 310, the information recognition module 320, the first target session scene determination module 330, and the multiple rounds of dialog processing module 340 shown in fig. 3). The processor 401 executes various functional applications of the server and data processing, i.e., a processing method for implementing multiple rounds of conversations in the above-described method embodiments, by running non-transitory software programs, instructions and modules stored in the memory 402.

The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device implementing the processing method of the multi-turn dialog, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 402 may optionally include memory located remotely from the processor 401, and these remote memories may be connected over a network to an electronic device that implements the processing method of the multi-turn dialog. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the processing method of multiple rounds of conversations may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus.

The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus implementing the processing method of the multi-turn dialog, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for processing multiple rounds of conversations, comprising:

acquiring current conversation data of a user;

processing multiple rounds of dialog in the target session scenario.

2. The method of claim 1, wherein determining that a scene cut condition is satisfied in accordance with the current intent and/or current entity comprises:

determining that a scene cut condition is satisfied when the current dialog data includes the current intent and does not include the current entity;

determining that a scene cut condition is satisfied when the current dialog data includes the current intent and the current entity at the same time, and the confidence of the current intent and the current entity is determined to satisfy a scene cut sub-condition.

3. The method of claim 2, wherein determining that the confidence of the current intent and the current entity satisfies a scene cut sub-condition comprises:

determining an intent confidence for the current intent and an entity confidence for the current entity;

when it is determined that the intent confidence is greater than the entity confidence, determining that the current intent and the confidence of the current entity satisfy a scene cut sub-condition.

4. The method of claim 3, wherein determining the intent confidence of the current intent and the entity confidence of the current entity comprises:

determining an intention confidence of the current intention according to a set classification model;

filtering the interference words included in the current dialogue data to obtain filtered dialogue data;

and calculating text similarity for the entity value of the current entity and the filtering dialogue data, and determining the entity confidence of the current entity according to the calculation result.

5. The method of claim 1, wherein processing multiple turns of dialog in the target session scenario comprises:

acquiring dialogue associated data stored in a previous conversation scene; the conversation associated data comprises preorder entity data and a corresponding relation between a conversation node and entity collection operation;

updating target entity data of the target session scene according to the preorder entity data;

and determining conversational dialogue according to the target entity data and the corresponding relation collected by the nodes and the entities.

6. The method of claim 5, wherein updating target entity data of the target session scenario according to the preamble entity data comprises:

and when the preorder entity data are determined to meet the entity data inheritance condition, updating the target entity data of the target session scene according to the preorder entity data.

7. The method of claim 6, wherein determining that the preamble entity data satisfies an entity data inheritance condition comprises:

and when the prologue entity data configuration data inheritance label is determined, determining that the prologue entity data meet an entity data inheritance condition.

8. The method of claim 1 or 4, further comprising:

and when the scene switching condition is determined not to be met according to the current intention and/or the current entity, taking the previous session scene as the target session scene.

9. The method of claim 8, wherein determining, from the current intent and/or current entity, that a scene cut condition is not satisfied comprises:

determining that a scene cut condition is not satisfied when the current dialog data includes the current entity and does not include the current intent;

determining that a scene cut condition is not satisfied when the current dialog data includes the current intention and the current entity at the same time and the confidence degrees of the current intention and the current entity are determined not to satisfy the scene cut sub-condition;

and when the current intention and the current entity are both empty, determining that the scene switching condition is not met.

10. The method of claim 8, wherein processing multiple turns of dialog in the target session scenario comprises:

determining a scene type of the target session scene;

when the scene type of the target conversation scene is determined to be the target scene type, determining conversation operation according to the current conversation data;

and when the scene type of the target session scene is determined to be null, determining the conversation operation according to a preset unmatched conversation operation list.

11. The method of claim 10, wherein determining conversational speech from the current conversational data comprises:

when the current intention and the current entity are not empty, and the semantic meaning expressed by the user is determined to be an entity query according to the current dialogue data and entity clarification is determined, determining a dialogue technique according to a matched entity clarification technique;

determining a conversational language according to an entity collection conversational language when the current intent and the current entity are not both empty, and the semantic meaning expressed by the user is determined to be an entity query according to the current conversational data and the entity clarification is determined to be rejected;

when the current intention and the current entity are not empty and the semantic meaning expressed by the user is determined to be a non-entity query according to the current dialogue data, collecting the current entity data and determining a dialogue technique according to the current entity data;

determining a dialogs based on the matched entity clarification dialogs when the current intent and the current entity are both empty and entity clarification is determined;

determining conversational utterances according to an entity gathering conversational utterance when the current intent and the current entity are both empty and it is determined to refuse to perform entity clarification.

12. The method of claim 11, wherein determining to perform entity clarification comprises:

performing fuzzy matching according to the candidate clarifying dialect list and the current dialogue data, and determining to perform entity clarification when the matched clarifying dialect is determined to exist;

determining to reject entity clarification, comprising:

and performing fuzzy matching with the current dialogue data according to the candidate clarifying dialogue list, and refusing to perform entity clarification when determining that no matched clarifying dialogue exists.

13. The method of claim 1, wherein the target session scenario comprises: a consultation scenario, a guidance scenario, a daily voice interaction scenario, or a learning coaching scenario.

14. A device for processing multiple rounds of conversations, comprising:

15. The apparatus of claim 14, wherein the first target session scenario determination module comprises:

a scene switching condition determining unit configured to determine that a scene switching condition is satisfied when the current dialogue data includes the current intention and does not include the current entity;

16. The apparatus according to claim 15, wherein the scene-cut-condition determining unit is specifically configured to:

17. The apparatus according to claim 16, wherein the scene-cut-condition determining unit is specifically configured to:

18. The apparatus of claim 14, wherein the multi-turn dialog processing module comprises:

the conversation related data acquisition unit is used for acquiring conversation related data stored in a previous conversation scene; the conversation associated data comprises preorder entity data and a corresponding relation between a conversation node and entity collection operation;

a target entity data updating unit, configured to update target entity data of the target session scene according to the preamble entity data;

a first conversational speech determining unit, configured to determine conversational speech according to the target entity data and the correspondence relationship collected by the node and the entity.

19. The apparatus according to claim 18, wherein the target entity data updating unit is specifically configured to:

20. The apparatus according to claim 19, wherein the target entity data updating unit is specifically configured to:

21. The apparatus of claim 14 or 17, wherein the apparatus further comprises:

and the second target session scene determining module is used for taking the previous session scene as the target session scene when determining that the scene switching condition is not met according to the current intention and/or the current entity.

22. The apparatus of claim 21, wherein the second target session context determining module is specifically configured to:

23. The apparatus of claim 21, wherein the multi-turn dialog processing module comprises:

a scene type determining unit, configured to determine a scene type of the target session scene;

a second conversation determining unit, configured to determine a conversation according to the current conversation data when it is determined that the scene type of the target conversation scene is a target scene type;

and the third conversation determining unit is used for determining conversation according to a preset unmatched conversation list when the scene type of the target conversation scene is determined to be null.

24. The apparatus of claim 23, wherein the second conversational dialog determination unit is specifically configured to:

25. The apparatus of claim 24, wherein the second conversational dialog determination unit is specifically configured to:

26. The apparatus of claim 14, wherein the target session scenario comprises: a consultation scenario, a guidance scenario, a daily voice interaction scenario, or a learning coaching scenario.

27. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

28. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-13.