CN113157893A

CN113157893A - Method, medium, apparatus, and computing device for intent recognition in multiple rounds of conversations

Info

Publication number: CN113157893A
Application number: CN202110571576.3A
Authority: CN
Inventors: 沙雨辰; 俞霖霖; 胡光龙; 汪源; 刘秀颖
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2021-07-23
Anticipated expiration: 2041-05-25
Also published as: CN113157893B

Abstract

Embodiments of the present disclosure provide a method, medium, apparatus, and computing device for intent recognition in multiple rounds of dialog. The method comprises the following steps: the method comprises the steps of obtaining an initial question included in a first turn of dialog in multiple turns of dialog, determining a plurality of candidate entities corresponding to the initial question, generating candidate questions based on the initial question and the candidate entities, and determining an intention recognition result of the first turn of dialog according to the candidate questions and reference example sentences which are pre-configured with definite intentions. The intention recognition is carried out on the basis of recognizing the candidate entity, so that the intention recognition and the entity recognition are effectively combined for use, compared with a mode of simply carrying out the intention recognition, factors influenced by the entity are considered, and the accuracy rate of the intention recognition in multiple rounds of conversations is greatly improved.

Description

Method, medium, apparatus, and computing device for intent recognition in multiple rounds of conversations

Technical Field

Embodiments of the present disclosure relate to the field of intent recognition technology, and more particularly, to a method, medium, apparatus, and computing device for intent recognition in multiple rounds of dialog.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be relevant prior art by inclusion in this section.

The multi-turn dialog is generally divided into a chatting-type multi-turn dialog and a task-type multi-turn dialog. The chatting type multi-turn dialog is generally found in a chatting robot, such as microsoft mini ice, little love classmates and the like, can be used for carrying out 'horse running free' type dialog with a user in an open field, and the chat topic is not limited. Task-type multi-turn conversations are generally found in customer service robots, such as all-grass carps, net-easy seven fish and the like, and aim to help users solve specific problems in a certain field or achieve a certain purpose. Users of task-based multi-turn conversations typically interact with the system with one or more explicit intents, such as "booking flight tickets" or "self-help health diagnostics," etc. The intention cannot be simply satisfied by a question-and-answer type conversation scheme, namely, a single-round conversation, but the final purpose can be achieved only by confirming the intention of the user step by step and completing relevant attribute information through multiple rounds of conversations of the user and the customer service robot. For example, in the case of a reservation ticket intention, the multi-turn dialog system needs to firstly clarify that the user intention is "reservation ticket", and then complement the relevant information of the user, such as the departure place, the destination, the departure time, and the type of the space, so as to realize the final reservation ticket service for the user.

Currently, the existing multi-turn dialog is usually to determine the user's intention to enter the session in the first turn of the user, for example, to confirm whether the user needs a "reservation ticket" or a "ticket change"; and then slot filling is carried out, and other information required for completing user intention identification is collected, such as 'reservation of air ticket' intention, and collection of entity information of 'departure place' and 'destination' of the user.

However, the above-mentioned prior art performs the intent recognition and the entity recognition separately, resulting in that the correctness of the whole session flow depends strongly on the first performed step. If the first intention identification link is wrong, the subsequent entity identification task has no meaning. For example, in an airport scene, two intentions of 'reservation ticket' and 'ticket change', if the intention of a user is mistaken for 'ticket change' from 'reservation ticket' in an intention identification link, then the entity of 'change date' is extracted without meaning.

Disclosure of Invention

The present disclosure contemplates a method and apparatus for intent recognition in multiple rounds of dialog.

In a first aspect of embodiments of the present disclosure, there is provided a method of intent recognition in a multi-turn dialog, comprising:

acquiring an initial question sentence contained in a first turn of conversation in multiple turns of conversations;

determining a plurality of candidate entities corresponding to the initial question;

generating a candidate question based on the initial question and the plurality of candidate entities;

and determining the intention recognition result of the first turn of dialogue according to the candidate question sentences and the reference example sentences which are pre-configured with definite intentions.

In one embodiment of the present disclosure, the types of the candidate entities include: the system comprises a preset entity, an enumerated entity, a regular entity and a descriptive entity, wherein the preset entity is a preset and directly packaged entity type, the enumerated entity is an entity type with an entity value capable of being enumerated, the regular entity is an entity type with an entity value capable of being induced by a regular expression, and the descriptive entity is an entity type for describing the attribute or the state of things.

In one embodiment of the present disclosure, the determining a plurality of candidate entities corresponding to the initial question includes:

and identifying a plurality of candidate entities corresponding to the initial question by using a pre-trained first identification model, wherein the candidate entities comprise a preset entity, an enumerated entity and a regular entity.

In one embodiment of the present disclosure, the generating a candidate question based on the initial question and the plurality of candidate entities includes:

combining the candidate entities to form a plurality of candidate entity sets corresponding to the initial question, wherein the candidate entity sets take at least one of the candidate entities as an element thereof;

and respectively configuring the elements in each candidate entity set in corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question.

In an embodiment of the disclosure, the respectively configuring elements in each candidate entity set in corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question includes:

and respectively configuring the elements in each candidate entity set in the corresponding positions of the initial question, and filtering out sentences the number of which is less than the specified number to obtain a plurality of candidate questions corresponding to the initial question.

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question sentences, and filtering sentences which do not appear in reference example sentences which are configured with definite intentions in advance to obtain a plurality of candidate question sentences corresponding to the initial question sentences.

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question, and randomly selecting a plurality of sentences as candidate questions corresponding to the initial question by taking the currently obtained sentences as a limit when the total number of the obtained sentences reaches a preset upper limit value.

In an embodiment of the present disclosure, before the combining the plurality of candidate entities to form the plurality of candidate entity sets corresponding to the initial question, the method further includes:

for candidate entities with a plurality of entity values, acquiring the character string length of each entity value corresponding to the candidate entity, determining the initial position of the corresponding entity value in the initial question sentence, and screening out the corresponding entity value for combination according to the character string length and the initial position.

In an embodiment of the present disclosure, the screening out the corresponding entity values for combination according to the length of the character string and the starting position includes:

if the character string lengths of the entity values of the candidate entity are different, selecting the entity value with the longest character string length for combination;

if the string lengths of the entity values of the candidate entity are all the same, the entity value with the top initial position is selected for combination when the strings included in the entity values overlap.

In an embodiment of the present disclosure, the determining an intention recognition result of the first turn dialog according to the candidate question sentences and reference example sentences configured with explicit intentions in advance includes:

if a reference example sentence comprising a description type entity exists in a reference example sentence which is configured with a definite intention in advance, expanding the entity value of the description type entity in the reference example sentence to obtain an expanded example sentence, and supplementing the obtained expanded example sentence into the intention;

and determining the intention recognition result of the first-turn dialog according to the candidate question sentences and the reference example sentences and the extended example sentences of the intention.

In an embodiment of the present disclosure, the expanding the entity value of the descriptive entity in the reference example sentence to obtain an expanded example sentence includes:

acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values configured for the description type entity in advance;

and sequentially replacing the current entity value in the reference example sentence with the obtained other entity values to respectively obtain the corresponding extended example sentences.

respectively coding the candidate question sentences and the reference example sentences which are pre-configured with definite intentions;

calculating the similarity of the coded candidate question sentences and the reference example sentences;

and determining the intention recognition result of the first turn of conversation according to the result of the similarity calculation.

In one embodiment of the present disclosure, the method is performed by a pre-trained first recognition model, and the first recognition model is trained using training data of a pairing framework construction triplet before the start of the multi-turn dialog.

In one embodiment of the present disclosure, the triple includes a first text string, a second text string, and a third text string, and the training is targeted to have a higher similarity between the first text string and the second text string than between the first text string and the third text string.

In an embodiment of the present disclosure, the training data of constructing triples using a pairing framework is used to train the first recognition model, including:

if the first text string and the second text string in the triple contain the same entity and the first text string and the third text string do not contain the same entity, training the first recognition model to recognize the similarity of the first text string and the second text string, which is higher than the similarity of the first text string and the third text string.

selecting a specified number of entities from a plurality of identical entities under the condition that a first text string and a second text string in the triple contain the plurality of identical entities;

and finding out entities different from the specified number of entities in a third text string of the triple, and training the first recognition model to recognize the similarity between the first text string and the second text string, which is higher than the similarity between the first text string and the third text string.

in the case that a first text string and a second text string in the triple contain a plurality of identical entities, selecting at least two entities from the plurality of identical entities;

if an entity exists in the third text string of the triple and the entity is the same as one of the at least two entities, training the first recognition model to recognize the similarity between the first text string and the second text string, which is higher than the similarity between the first text string and the third text string.

In one embodiment of the present disclosure, the method further comprises:

encoding the constructed training data before training the first recognition model as follows:

reserving a plurality of place-occupying entities in the dictionary, and setting corresponding place-occupying codes according to the ranking sequence of the corresponding place-occupying entities in the dictionary;

sequencing the entities in the constructed training data, and sequentially replacing the sequenced entities with corresponding placeholders according to the ranking sequence;

and converting other characters in the training data into character codes according to a conversion function.

In one embodiment of the present disclosure, the method further comprises:

and after the similarity is calculated, determining the entity combination in the candidate question sentence corresponding to the highest similarity as the entity recognition result of the first round of conversation.

In one embodiment of the present disclosure, the method further comprises:

collecting other entities except the entity combination in the dialog after the first turn of dialog;

performing a reply or response of the multiple rounds of dialog based on the intent recognition result and the combination of entities and other entities.

In one embodiment of the present disclosure, the method further comprises at least one of:

receiving configuration information input by a configuration staff, generating an enumeration entity according to the configuration information, and configuring an entity name and an enumerable entity value for the enumeration entity;

receiving configuration information input by a configuration worker, generating a regular type entity according to the configuration information, and configuring an entity name and a regular expression for the regular type entity;

receiving configuration information input by a configuration person, generating a descriptive entity according to the configuration information, and configuring an entity name and a descriptive entity value for the descriptive entity.

In a second aspect of embodiments of the present disclosure, there is provided an apparatus for intent recognition in multiple rounds of dialog, comprising:

the acquisition module is used for acquiring initial question sentences contained in the first-turn dialog in the multiple rounds of dialogues;

a determining module for determining a plurality of candidate entities corresponding to the initial question;

a generating module for generating candidate question sentences based on the initial question sentences and the plurality of candidate entities;

and the recognition module is used for determining the intention recognition result of the first-turn dialog according to the candidate question sentences and the reference example sentences which are pre-configured with definite intentions.

In one embodiment of the disclosure, the determining module is to:

In one embodiment of the present disclosure, the generating module includes:

a combining unit, configured to combine the multiple candidate entities to form multiple candidate entity sets corresponding to the initial question, where the candidate entity sets have at least one of the candidate entities as an element thereof;

and the generating unit is used for respectively configuring the elements in each candidate entity set in the corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question.

In one embodiment of the present disclosure, the generating unit is configured to:

In an embodiment of the disclosure, the combining unit is further configured to:

In an embodiment of the present disclosure, the combining unit, when screening out the corresponding entity values for combining according to the character string length and the starting position, is specifically configured to:

In one embodiment of the present disclosure, the identification module includes:

the system comprises an extension unit, a display unit and a control unit, wherein the extension unit is used for extending an entity value of a description type entity in a reference example sentence to obtain an extension example sentence and supplementing the obtained extension example sentence into an intention if the reference example sentence which is configured with a definite intention in advance contains the reference example sentence comprising the description type entity;

and the recognition unit is used for determining the intention recognition result of the first-turn dialog according to the candidate question sentences and the reference example sentences and the extended example sentences of the intention.

In one embodiment of the present disclosure, the extension unit is configured to:

if a reference example sentence comprising a descriptive entity exists in a reference example sentence which is configured with a definite intention in advance, acquiring other entity values except the entity value which is currently appeared in the reference example sentence from a plurality of entity values which are configured for the descriptive entity in advance;

sequentially replacing the current entity value in the reference example sentence with the obtained other entity values to respectively obtain corresponding extended example sentences;

and supplementing the obtained extended example sentence into the intention.

In one embodiment of the disclosure, the identification module is to:

In one embodiment of the present disclosure, the apparatus implements functions by a pre-trained first recognition model, and further includes:

and the training module is used for adopting a pairing frame to construct training data of the triples to train the first recognition model before the start of the multi-round conversation.

In one embodiment of the disclosure, the training module is to:

before the multi-turn conversation starts, if a first text string and a second text string in the triple contain the same entity and the first text string and a third text string do not contain the same entity, training the first recognition model to recognize the similarity of the first text string and the second text string, which is higher than the similarity of the first text string and the third text string.

In one embodiment of the disclosure, the training module is to:

selecting a specified number of entities from a plurality of identical entities when the first text string and the second text string in the triple contain the plurality of identical entities before the start of the multi-turn conversation;

In one embodiment of the disclosure, the training module is to:

selecting at least two entities from the plurality of identical entities if the first text string and the second text string in the triple contain the plurality of identical entities before the start of the multi-turn conversation;

In one embodiment of the present disclosure, the apparatus further comprises:

an encoding module, configured to encode the constructed training data before training the first recognition model in the following manner:

In one embodiment of the present disclosure, the identification module is further configured to:

In one embodiment of the present disclosure, the apparatus further comprises:

the collection module is used for collecting other entities except the entity combination in the dialog after the first round of dialog;

and the response module is used for executing the reply or response of the multi-turn dialog according to the intention recognition result, the entity combination and other entities.

In an embodiment of the present disclosure, the apparatus further includes a configuration module configured to perform configuration in at least one of the following manners:

In a third aspect of embodiments of the present disclosure, a computer-readable medium is provided, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the steps of the method for intent recognition in a multi-turn dialog described above.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of intent recognition in multiple sessions when executing the program.

According to the method and the device for recognizing the intention in the multi-turn dialog, an initial question sentence contained in a first turn of the multi-turn dialog is obtained, a plurality of candidate entities corresponding to the initial question sentence are determined, candidate question sentences are generated on the basis of the initial question sentence and the candidate entities, and the intention recognition result of the first turn of the multi-turn dialog is determined according to the candidate question sentences and reference example sentences which are configured with definite intentions in advance. Because the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, compared with a mode of simply carrying out the intention recognition, factors influenced by the entity are considered, and the accuracy rate of the intention recognition in multiple rounds of conversations is greatly improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 schematically illustrates a first flowchart of a method implementation of intent recognition for multiple rounds of dialog according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates an intended configuration diagram one, according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates an intended configuration diagram two, according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart for implementing a method for intent recognition in a multi-turn dialog according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of determining candidate entities according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a schematic diagram of generating a candidate question according to an embodiment of the present disclosure;

FIG. 7 schematically shows a diagram of candidate question filtering results according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a diagram of screening entity values according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a descriptive entity extension according to an embodiment of the present disclosure;

FIG. 10 schematically illustrates a flow diagram for training a first recognition model according to an embodiment of the present disclosure;

FIG. 11 schematically illustrates an overall flow diagram of a multi-turn dialog according to an embodiment of the present disclosure;

FIG. 12 schematically illustrates a media diagram for a method of intent recognition in multiple rounds of dialog, according to an embodiment of the present disclosure;

FIG. 13 schematically illustrates an apparatus for intent recognition in multiple rounds of dialog, according to an embodiment of the present disclosure;

FIG. 14 schematically shows a structural diagram of a computing device according to an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present disclosure, and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to an embodiment of the disclosure, a method, a medium, an apparatus and a computing device for intention recognition in multiple rounds of conversations are provided.

In this document, any number of elements in the drawings is by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments of the present disclosure.

Summary of The Invention

The present disclosure finds that, in the existing intent recognition technology of the first-turn conversation of multiple turns of conversations, two steps of intent recognition and entity recognition are often performed independently, so that the accuracy of the recognition result of the whole conversation process strongly depends on the first executed step, and if the first executed step is wrong, the accuracy of the final recognition result is also greatly affected.

In view of this, the present disclosure provides a method and an apparatus for recognizing an intention in a multi-turn dialog, which determine a plurality of candidate entities corresponding to an initial question by obtaining the initial question included in a first turn of the multi-turn dialog, generate a candidate question based on the initial question and the candidate entities, and determine an intention recognition result of the first turn of the dialog according to the candidate question and a reference example sentence configured with a definite intention in advance. Because the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, compared with a mode of simply carrying out the intention recognition, factors influenced by the entity are considered, and the accuracy rate of the intention recognition in multiple rounds of conversations is greatly improved.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure are described in detail below.

Exemplary method

A method of intent recognition in a multi-turn dialog according to an exemplary embodiment of the present disclosure is described below with reference to fig. 1.

As shown in fig. 1, the method for intention recognition in multiple rounds of dialog according to the embodiment of the present disclosure includes the following steps:

s11: acquiring an initial question sentence contained in a first turn of conversation in multiple turns of conversations;

s12: determining a plurality of candidate entities corresponding to the initial question;

s13: generating a candidate question based on the initial question and the plurality of candidate entities;

s14: and determining the intention recognition result of the first-turn dialog according to the candidate question sentences and the reference example sentences which are pre-configured with clear intentions.

In the embodiment of the present disclosure, a multi-turn dialog refers to a dialog that performs multiple turns of interaction with a user, each turn of dialog is represented in a form of a question and a answer, wherein the first turn of dialog refers to a first turn of dialog in the multi-turn dialog.

In the embodiments of the present disclosure, an entity refers to an object or a transaction that exists in the real world and can be distinguished from each other. Further, an entity need not necessarily be physically present, but may be an abstract concept. For example: location, vehicle, brand of cell phone, duration, type of bay, and the like.

The above explicit intent is pre-configured and may include a variety of application scenarios. For example, in an airport scenario, "help user book ticket" may be configured as an intent; in a health scenario, "help user initially diagnose illness and make suggestions" may be configured as an intention; in an educational scenario, "helping a user complete a course reservation" may be configured as an intent.

Through the process, the candidate entity is determined based on the initial question, the intention identification is carried out based on the candidate entity, the intention identification and the entity identification are effectively combined for use, compared with a mode of simply carrying out the intention identification, factors influenced by the entity are considered, and the accuracy of the intention identification in multiple rounds of conversations is greatly improved.

The types of candidate entities related to the embodiment of the disclosure include: a preset entity, an enumerated entity, a canonical entity, and a descriptive entity. The preset entity is a preset and directly packaged entity type, the enumerated entity is an entity type with an entity value capable of being enumerated, the regular entity is an entity type with an entity value capable of being induced by a regular expression, and the descriptive entity is an entity type for describing the attributes or states of things.

In one possible implementation, the pre-set type entity may include named entity types such as person name, place name, time, etc. The entity types are common, the method can be suitable for intention recognition tasks in different services and different fields, and the method has richer open source labeling data and a more mature entity recognition solution. For example, in the intention of "reservation ticket", the "departure place", "destination", and "departure time" may be classified as the preset-type entity.

In one possible implementation, the enumerated entities may be configured by a configuration person, including configuring which entities are enumerated entities and defining corresponding entity names and entity values. For example, in the intent of "booking airline tickets," the configurator may configure the "slot level" as an enumerated entity, the entity values of which are enumerable, which may include: first class, business class and economy class. Generally, in each business scenario, the description of entity values is precise and limited, and configurators refer to the colloquial terminology when describing such entity values.

In one possible implementation, the canonical entities can be freely defined by the configurator as entity names and corresponding regular expressions related to the business. For example, in the intention of "self-help health diagnosis", the "user case number" is a string of preset user identification IDs, and the configurator may directly use a set of regular rules to represent all case number IDs. The entity value of the regular type entity is controllable and cannot jump out of the range specified by the regular expression.

In a possible implementation, the descriptive entity is different from the enumerated entity, and a configurator may adopt different expression methods when describing the value of the entity, so that certain understanding and generalization capability of the value of the entity is required. For example, in a "self-help health diagnosis" intent, a "user symptom" is configured as a descriptive entity whose entity values may be configured as: cough, rhinorrhea, insomnia and headache. However, when the user actually describes his or her symptoms in multiple rounds of conversation, the user may not use words that are completely consistent with the configuration, such as: the expressions "cough for a while", "nasal discharge is much" and "temple pain" are used. Therefore, in consideration of the semantic generalization of the entity value, the description-type entity provided by the embodiment of the disclosure can solve the problem that the semantics of the general entity value cannot be generalized. And when a plurality of candidate entities corresponding to the initial question sentence are determined, only the preset entity, the enumerated entity and the regular entity are considered, and the description entity is placed in the link of the intention recognition, so that not only are system resources saved, but also a better entity recognition effect can be achieved.

In the embodiment of the present disclosure, each entity has another attribute, i.e., a corresponding entity value, in addition to two attributes, i.e., an entity name and an entity type. One or more entity values of one entity can be provided, and correspondingly represent real values corresponding to the entities in different scenes. For example, the entity value corresponding to the entity "address" includes "beijing" and "shanghai", the entity value corresponding to the entity "vehicle" includes "train" and "airplane", and the like.

In the embodiment of the disclosure, the entity configurations of the enumerated entity, the regular entity and the descriptive entity are completely open to the configurator, and can be configured and used in various service scenes and fields without cost, and the configurator can divide and configure the entity associated with the intention according to different characteristics of the entity types, thereby greatly improving the flexibility of application and being more widely applicable.

In the disclosure, the intention name, the entity type and the reference example sentence may be configured when configuring the intention, and the corresponding content may be configured specifically according to actual needs. It should be noted that the configured reference example sentence may include different entity information, or may not include entity information at all, which are all the configuration categories of the reference example sentence meeting the requirements, and the configurator can perform corresponding design and configuration of the intention according to the relevant scenes of the business. For example, referring to fig. 2, it is intended that a ticket is ordered by reference example sentence 3 "of" reservation tickets ". "belongs to a reference example sentence which does not contain entity information, and the corresponding intention can be indicated without the entity information, and the situation can be set by a configurator according to actual needs.

Fig. 2 schematically illustrates an intended configuration diagram according to an embodiment of the present disclosure. Referring to fig. 2, an intention name "reservation ticket" is configured in the airport scene, and four entities are associated with the intention, the entity names being "departure place", "destination", "departure time", and "slot level", respectively. Wherein the entity types of the 'departure place', 'destination' and 'departure time' are preset type entities. The entity type of the "bay level" is an enumerated entity, and the entity values thereof include: first class, business class and economy class. The reference example sentence configured for the intention includes: 1. helping me to order a ticket flying to the Shanghai in tomorrow. 2. I want to order a business cabin ticket, which flies to Beijing. 3. And ordering an air ticket. The number of the configured reference example sentences can be set according to needs, and specific numerical values are not limited, for example, 5 or 8 reference example sentences are configured. In addition, entity values can be marked in reference example sentences for distinction, and the specific marking mode is not limited. For example, in the reference example sentence "help me to make a flight ticket for tomorrow to shanghai", the entity value "tomorrow" of the entity "departure time" and the entity value "shanghai" of the entity "destination" are marked.

Fig. 3 schematically illustrates an intended configuration diagram according to an embodiment of the present disclosure. Referring to fig. 3, in a health scenario, an intention name "self-help health diagnosis" is configured, three entities associated with the intention are provided, the entity names are "user case number", "user symptom" and "duration", respectively, and the corresponding entity types are a regular entity, a descriptive entity and a preset entity, respectively. Wherein, the entity value of the entity 'user symptom' comprises: cough, rhinorrhea, insomnia and headache. The reference example sentence configured for the intention includes: 1. i have coughed for a week and have been quite bad. 2. The user can not sleep all the time at night for one month. 3. The number of my case is 123456, what should be taken for a runny nose? In addition, the entity values may also be marked in the reference example sentence to show a distinction, such as marking the entity value "cough" of the entity "user symptom" and the entity value "one month" of the entity "duration" in the reference example sentence "i have coughed for one week and have not been good all the time".

In a possible implementation, the step S12 may include:

and identifying a plurality of candidate entities corresponding to the initial question sentence by using a pre-trained first identification model, wherein the plurality of candidate entities comprise a preset entity, an enumerated entity and a regular entity.

In a possible implementation, the step S14 may include:

respectively coding the candidate question sentences and the reference example sentences which are pre-configured with definite intentions; calculating the similarity of the coded candidate question sentences and the reference example sentences; and determining the intention recognition result of the first turn of the conversation according to the result of the similarity calculation.

In a possible implementation, after performing the similarity calculation, the method may further include:

and combining the entities in the candidate question sentence corresponding to the highest similarity to determine the entity combination as the entity recognition result of the first round of conversation. The entity recognition result determined based on the similarity can more accurately reflect the entity corresponding to the initial question sentence, and can be used as a basis for replying or responding when multiple rounds of conversations are subsequently replied or responded, so that the satisfaction degree of a user is improved.

In a possible embodiment, the method may further include:

collecting other entities except the entity combination in the conversation after the first round of conversation; and executing reply or response of multiple rounds of conversations according to the intention recognition result and the entity combination and other entities.

The general session management method can be adopted to track the state of the current intention at any time, judge whether the identification of the entity information required currently is finished, and if the identification is not finished, guide the user to supplement other entity information through a question reversing operation until all entities required by the current intention are identified. The reply of executing the multi-turn dialog refers to sending the content of the reply to the user, and the response of executing the multi-turn dialog refers to triggering and executing corresponding operation as the response of the multi-turn dialog. Specifically, reply contents can be generated and sent to the user according to the configured reply mode, such as reply to the disease diagnosis result and suggest measures. Or, the corresponding operation can be directly triggered and executed as the response of the multi-turn dialog, for example, the operation of booking or invoicing the ticket for the user is triggered and executed as the response of the multi-turn dialog, and the like.

The mode of replying or responding to the multi-turn dialog after the comprehensive intention recognition result and the entity recognition result can improve the accuracy of replying or responding, further improve the satisfaction degree of the user to the replying or responding result, and improve the user experience.

Fig. 4 schematically illustrates a flow chart of a method implementation of multiple rounds of dialog intention recognition according to an embodiment of the present disclosure. As shown in fig. 4, the method for intention recognition in multiple rounds of dialog according to the embodiment of the present disclosure includes the following steps:

s41: acquiring an initial question sentence contained in a first turn of conversation in multiple turns of conversations;

s42: determining a plurality of candidate entities corresponding to the initial question;

s43: combining the candidate entities to form a plurality of candidate entity sets corresponding to the initial question, wherein each candidate entity set takes at least one candidate entity as an element of the candidate entity set;

wherein elements of the candidate entity sets are formed by candidate entities, a candidate entity set may include one or more candidate entities. In order to improve the accuracy of the intention recognition result, the determined multiple candidate entities may be combined as many as possible to obtain the most candidate entity sets. Of course, the number of candidate entity sets is not specifically limited in the embodiments of the present disclosure.

S44: respectively configuring elements in each candidate entity set in corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question;

s45: if a reference example sentence comprising a description type entity exists in a reference example sentence which is configured with a definite intention in advance, extending the entity value of the description type entity in the reference example sentence to obtain an extended example sentence, and supplementing the obtained extended example sentence into the intention;

s46: and determining an intention identification result of the first-turn dialog according to the candidate question sentences and the reference example sentences and the extended example sentences of the intention.

In the process, a plurality of candidate entity sets are obtained through the combination of a plurality of candidate entities, and then the candidate entities are configured at the corresponding positions of the initial question based on the elements in the candidate entity sets to form a plurality of candidate questions, so that a plurality of objects to be matched can be obtained, and the intention recognition result of the first round of conversation is further determined according to the objects to be matched and the reference example sentences which are configured with definite intentions in advance, so that the candidate questions are rich, the influence of the entity combinations on the intention recognition is fully considered, and the accuracy of the intention recognition is effectively improved. In addition, the extended example sentence obtained by extending the description type entity is also used as an object to be matched, so that richer and more complete data support can be provided for intention identification, the accuracy of intention identification is further improved, and the satisfaction of a user is improved.

In this disclosure, when the step S42 determines a plurality of candidate entities corresponding to the initial question, the candidate entities may be determined according to whether the type of the entity is a preset type entity, an enumerated type entity, or a regular type entity. For the preset entity, based on its universality, an open-source entity recognition module or a self-developed and trained entity extraction model can be directly adopted for entity recognition, and commonly used methods such as LSTM-CRF (Long Short-Term Memory Conditional Random Field algorithm) or BiGRU (Bidirectional Gated cycle Unit) -CRF and the like. For enumerated entities, due to the relative precision of the technology, the entity identification can be carried out in a rule matching mode. For the regular type entity, the entity value range of the regular type entity does not jump out of the range specified by the regular expression, so that the regular expression can be directly used for matching and identifying.

For the descriptive entity, because the wording is relatively broad, all possible entity values cannot be exhausted in an enumeration manner, and the requirement for semantic generalization and comprehension capability is the same as that of the intention identification link, the descriptive entity may be determined in step S45.

Fig. 5 schematically shows a schematic diagram of determining candidate entities according to an embodiment of the present disclosure. Referring to fig. 5, after an initial question included in a first turn of dialog in multiple turns of dialogs is obtained, a plurality of candidate entities corresponding to the initial question are determined, including a preset entity, an enumerated entity, and a regular entity. The left example in the figure is an airport scene, the initial question is "tomorrow goes to Guangzhou business, and I book business cabin tickets", and the identified entities include: the "departure time", "departure place", "destination", and "slot level" correspond to entity values: "tomorrow", "guangzhou" and "business class". Since "guangzhou" may be a departure entity value or a destination entity value, in the case where it is temporarily impossible to confirm which entity is specified, two possibilities are retained, and the entity attribute thereof is reconfirmed in the final intention recognition step. The right example in the figure is a healthy scene, and the initial question is "how do i feel headache today? The case number is this: 654321 ", the identified entities include: the duration and the user case number respectively correspond to entity values as follows: "today always" and "654321".

Fig. 6 schematically shows a schematic diagram of generating a candidate question according to an embodiment of the present disclosure. Referring to fig. 6, two intents, "ticket change" and "reservation ticket" are configured in an airport scenario, and each intention is configured with related entity information and a reference example sentence. Wherein, the entity intending the 'ticket change' association comprises: the regular entity 'flight number' and the preset entity 'change date', and the reference example sentence comprises: "1. help me change the air ticket flying to the sea in tomorrow to the acquired" and so on. The entities intended to "book tickets" association include: the preset type entity is 'departure place', 'destination', 'departure time' and the enumerated type entity is 'bay level', and the reference example sentence comprises: ' 1. help me to order a ticket flying to the Shanghai in tomorrow. 2. I am in Hangzhou, Feiyuangzhou No. 20, book air tickets ", etc.

When an initial question "tomorrow went on business, and help me to book an airline ticket" is obtained in an initial turn of a multi-turn conversation, 4 entities are first identified in the initial question: "departure time", "change date", "departure place", and "destination", and it is recognized that the types of these 4 entities are all preset type entities. These 4 entities are then combined to form a set of 8 candidate entities corresponding to the initial question. Referring to fig. 6, the obtained 8 candidate entity sets respectively include the following combinations:

"Change date";

"Change sign date" + "destination";

"Change sign date" + "origin";

"departure time" + "destination";

"departure time" + "departure place";

"departure time";

"destination";

"origin".

The elements in the 8 candidate entity sets are configured in the corresponding positions of the initial question, so that 8 candidate questions as shown in the figure can be obtained, and the process of generating the candidate questions is completed.

In a possible implementation, the step S44 may include:

and respectively configuring the elements in each candidate entity set in the corresponding positions of the initial question, and filtering out sentences the number of which is less than the specified number to obtain a plurality of candidate questions corresponding to the initial question. This embodiment is to associate as many entities as possible, to determine more representative question candidates, and to improve the accuracy of intent recognition. The specified number can be set according to the setting, such as 2, and the specific numerical value is not limited in the present disclosure.

Taking the scenario of fig. 6 as an example for illustration, if the specified number is set to 2, a sentence with less than 2 elements is filtered out, that is, if only one element, i.e., an entity, is included in the question candidate, this represents that there is no entity combination, which is regarded as not representative, and it can be filtered out. The question candidates 1, 6, 7, 8 in fig. 6 each comprise only one entity and can therefore be filtered out. However, since the first question candidate includes only the "change date", and the first reference example sentence is included in the reference example sentences intended to "change flight tickets", it is possible to retain the first question candidate and filter out only the question candidates 6, 7, and 8.

In a possible implementation, the step S44 may include:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question sentences, and filtering sentences which do not appear in the reference example sentences which are configured with definite intentions in advance to obtain a plurality of candidate question sentences corresponding to the initial question sentences. The implementation mode is used for ensuring that the entity combination can meet the existing entity combination condition in the reference example sentence, so that the condition of intention conflict is filtered, and the accuracy rate of the entity combination is improved. Among them, sentences which do not appear in the reference example sentence configured with clear intentions in advance are considered to be the case of the occurrence of intention conflicts, which do not meet the requirements of the actual scene, and therefore, the sentences are filtered out.

Taking the scenario of fig. 6 as an example, both

question candidates

2 and 3 belong to intent conflicts. The candidate question sentence 2 is a combination of the "change date" and the "destination", and the candidate question sentence 3 is a combination of the "change date" and the "departure place", which are both not present in the reference example sentence aiming at the "ticket change", so that the candidate question sentence and the candidate question sentence can be filtered out.

Fig. 7 schematically shows a schematic diagram of a candidate question filtering result according to an embodiment of the present disclosure. Referring to fig. 7, after filtering the 8 question candidates obtained in fig. 6 according to the above method, 3 final question candidates, i.e., question candidates 1, 4, and 5, are obtained. Further, entities in the candidate question sentence can be replaced by entity symbols, so that subsequent coding and model training are facilitated.

In a possible implementation, the step S44 may include:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question, and randomly selecting a plurality of sentences as candidate question sentences corresponding to the initial question sentences by taking the currently obtained sentences as a limit when the total number of the obtained sentences reaches a preset upper limit value.

The situation that the total number of the obtained sentences reaches the preset upper limit value usually belongs to an extreme situation, which indicates that the combination explosion occurs at present, for example, the combination of entities such as identification numbers or train tickets and the like which are closely related to numbers is extremely easy to occur. In order to avoid resource waste, at this time, generation of a new sentence may be stopped, that is, entity combination is not performed any more, but a plurality of sentences are randomly selected from the currently obtained sentences as candidate question sentences corresponding to the initial question sentences. The number of sentences selected at random may be set as required, for example, 3 or 5 sentences are selected at random, and the embodiment of the present disclosure is not limited to specific values.

In a possible implementation manner, the step S43 may further include:

for candidate entities with a plurality of entity values, the character string length of each entity value corresponding to the candidate entity is obtained, the initial position of the corresponding entity value in the initial question sentence is determined, and the corresponding entity value for combination is screened out according to the character string length and the initial position.

The screening out the corresponding entity values for combination according to the length of the character string and the initial position may specifically include:

if the character string lengths of the entity values of the candidate entity are different, selecting the entity value with the longest character string length for combination; if the string lengths of the entity values of the candidate entity are all the same, the entity value with the top initial position is selected for combination when the strings included in the entity values overlap.

In the above embodiment, the entity value having the longest string length of the entity values is preferentially selected for combination, and then the entity value having the most preceding start position is selected for combination when the string lengths are the same.

Fig. 8 schematically shows a diagram of screening entity values according to an embodiment of the present disclosure. Referring to fig. 8, in the shopping guide scenario, the configurator is intended to "consult the details of the mobile phone," and its associated entities include the entity "mobile phone model" and the like. An initial question, "how well i want to configure iphone12 ProMax" is obtained in the first turn of the multi-turn dialog. An enumerated entity, "handset model", is identified from this initial question and three entity values can be extracted, iphone12, iphone12Pro and iphone12ProMax respectively. When the final entity values are selected for combination, if the lengths of the character strings of the three entity values are different, the entity value iphone12ProMax with the longest character string length can be selected for combination, that is, the entity combination candidate question 3 is reserved, and the entity combination candidate questions 1 and 2 are filtered.

In a possible implementation manner, the expanding the entity value of the entity of the type described in the reference example sentence in step S45 to obtain an expanded example sentence may include:

acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values configured for the description type entity in advance; and sequentially replacing the current entity value in the reference example sentence with the obtained other entity values to respectively obtain the corresponding extended example sentences.

In the embodiment of the present disclosure, the candidate question sentences obtained based on the preset type, enumerated type, and regular type entities are used as sentences to be matched for intent recognition. And for the descriptive entity, obtaining an extended example sentence by adopting a way of performing jargon extension on a reference example sentence with configured intention, and using the extended example sentence as a sentence to be matched for intention identification, thereby enriching the sentences to be matched for intention identification and further improving the accuracy of intention identification.

Fig. 9 schematically shows a schematic diagram of a descriptive entity extension according to an embodiment of the present disclosure. Referring to fig. 9, the intent "self-service health diagnosis" configured in the medical scenario includes a reference example sentence: "i have coughed for a week and have not been good. Wherein "cough" is configured as a descriptive entity, the corresponding entity name is "user symptom", and the corresponding entity values include: cough, rhinorrhea, insomnia and headache. For the descriptive entity "user symptom", enumerating all configured entity values, sequentially replacing the current entity value "cough" in the reference example sentence with other 3 entity values, thereby obtaining 3 extended example sentences. After the 3 extended example sentences are supplemented into the intention, the reference example sentences are extended from 1 to 4, all possible enumerated user symptoms are covered, and the reference example sentences are used as sentences to be matched for intention identification.

The method provided by the embodiment of the present disclosure may be executed by a first recognition model trained in advance, and before starting a multi-turn dialog, the first recognition model is trained by using training data of a triplet constructed by a pairing frame (pair frame). The triple comprises a first text string, a second text string and a third text string, and the training aims at the similarity between the first text string and the second text string which is higher than the similarity between the first text string and the third text string.

FIG. 10 schematically shows a flow diagram for training a first recognition model according to an embodiment of the present disclosure. Referring to fig. 10, before a multi-turn dialog begins, training a first recognition model using training data of a pairing framework construction triplet may include the following steps:

s101: constructing training data of a triple containing an entity by adopting a pairing frame;

among them, the training mode of the Pairwise framework is ternary, so that the training data of the triples needs to be constructed in advance. Of course, it is also possible to adopt other frameworks such as a pointwise single-document method, where the pointwise adopts a binary training mode, and this is not particularly limited in the embodiment of the present disclosure. The following description will be given by taking a pairwise frame as an example.

S102: reserving a plurality of place-occupying entities in the dictionary, and setting corresponding place-occupying codes according to the ranking sequence of the corresponding place-occupying entities in the dictionary;

in general, since almost all entities of all domains can not be collected and there are cases where a user configures an entity by himself/herself, the disclosed embodiments regard an entity as a symbol having no linguistic meaning but having a significance for text similarity calculation. The placeholder entities reserved in the dictionary are not fixed for a certain entity, but all entities can be common. The number of reserved placeholder entities may be set as required, such as 8, 10, or 20, and so on.

For example, 10 placeholder entities are reserved in the dictionary, entity names and entity types do not need to be configured, and only corresponding placeholders are set for the placeholder entities. Accordingly, the placeholder code may be 10 consecutive codes, such as id 1-id 10. The 10 space occupying entities correspond to the 10 space occupying codes one by one.

S103: sequencing the entities in the constructed training data, and sequentially replacing the sequenced entities with corresponding placeholders according to a ranking sequence;

the rules for entity sorting can be set as required, and any specific rule can be selected, so long as the entities are sorted according to a certain strategy. For example, the entities may be ordered by initials of names, and so on. Taking 10 place-occupying entities corresponding to 10 place-occupying codes as an example, if the constructed training data has 3 entities, the first 3 place-occupying codes in the dictionary are sequentially replaced after the ordering.

S104: converting other characters in the training data into character codes according to the conversion function;

s105: and training the first recognition model by adopting the coded training data.

In the process, the conversion from the entity to the code is completed based on the reserved space occupying entity in the dictionary and the corresponding space occupying code thereof, the first recognition model can learn the general meaning of the entity, namely the importance of the entity instead of the language meaning of each entity, so that the first recognition model can be ensured to be better migrated to a new scene, and the low-threshold migration capability in each scene is realized. And the training data is constructed based on the triples under the pairing frame, and then training is carried out according to the coded training data, so that the first recognition model can reach the learning target of the pairing frame, a powerful guarantee is provided for subsequent intention recognition, and a reliable basis is provided for improving the accuracy of the intention recognition.

In the disclosed embodiment, the training data of the triplet may be represented as (S1, S2, S3). Where S1 is the first text string, S2 is the second text string, and S3 is the third text string. The learning objective of the pairwise framework is set as: similarity S of first text string S1 and second text string S2₁₂Higher than the similarity S of the first text string S1 and the third text string S3₁₃Expressed as follows:

S₁₂>S₁₃

wherein, the similarity of the two text strings can be expressed as follows:

S_ij＝Sim(S_i,S_j)

in a possible implementation, the step S101 may include:

firstly, a triplet of plain text is constructed (S1, S2, S3), and if the first text string S1 and the second text string S2 contain the same entity and the first text string S1 and the third text string S3 do not contain the same entity, a specified number of entities can be randomly selected from the same entities contained in S1 and S2 and replaced by entity symbols, so as to obtain triplet training data containing entities. This way of constructing the training data is to let the first recognition model learn that the entities are important words in the text strings, and further, when two text strings have the same entity, the similarity is higher.

For example, a plain text triplet is (i want to book tickets today, i want to book tickets afternoon), where S1 and S2 contain the same entity "today" and S3 does not contain the entity "today", then the same entity "today" is replaced with an entity symbol, resulting in a triplet with entities (i want to book tickets $ time, $ time $ tickets, i want to book tickets $ time, $ day tickets, i want to book tickets afternoon tickets). The entity symbol may be set as required, and may represent a corresponding entity, and the specific content is not limited.

In a possible implementation, the step S101 may include:

firstly, a pure text triple is constructed (S1, S2 and S3), if a plurality of identical entities are contained in the first text string S1 and the second text string S2, a specified number of entities are selected from the plurality of identical entities, entities different from the specified number of entities are found in S3, and entity symbols are replaced with the entities to obtain triple training data containing the entities. This way of constructing the training data is to let the first recognition model learn that text strings with the same entities are more similar than text strings with different entities.

For example, the triplet of plain text is (i want to book tickets today, i want to book tickets afternoon), where S1 and S2 contain two identical entities "today" and "tickets". One identical entity "today" is selected in S1 and S2 to be replaced with the entity symbol $ time $, and an entity different from the entity "today" is found in S3, such as a "ticket", which is replaced with the entity symbol $ ticket _ type $, resulting in a triplet of entities (i want to specify $ time $ tickets, i want $ time $ tickets, $ time $ booking, i want $ ticket booking the future $ ticket _ type $). The entity symbol may be set as required, and may represent a corresponding entity, and the specific content is not limited.

In a possible implementation, the step S101 may include:

first, a triplet of plain text is constructed (S1, S2, S3), and if a plurality of identical entities are included in the first text string S1 and the second text string S2, at least two entities are selected from the plurality of identical entities and replaced with entity symbols. If an entity exists in S3 and the entity is the same as one of the at least two entities, the entity symbol is replaced with the entity symbol, and the triplet training data including the entity is obtained. The training data is constructed in such a way that the first recognition model learns that the text strings corresponding to the larger number of the same entities have a higher similarity than the text strings corresponding to the smaller number of the same entities.

For example, the triplet of plain text is (i want to book today 'S tickets, i want to book a future' S tickets), where S1 and S2 contain two identical entities "today" and "tickets" that were chosen to be replaced with the corresponding entity symbols $ time $ and $ ticket _ type $, respectively. If an entity "ticket" exists in S3, which is one of the two same entities, it is replaced with the entity symbol $ ticket _ type $, resulting in a triplet of entities (i want to specify $ ticket _ type $for $ time $ i, $ ticket _ type $thati want to specify $ time $ i, and $ ticket _ type $thati want to specify the next day). The entity symbol may be set as required, and may represent a corresponding entity, and the specific content is not limited.

The three ways of constructing the training data of the entity-containing triplet may be performed in at least one way, preferably all ways in sequence, so that the training data can be expanded to the greatest extent, and the training effect of the first recognition model is further improved. Moreover, the training data constructing mode does not need to collect more corpora and manually label data, and manpower is greatly saved.

In the embodiment of the present disclosure, after the steps S103 and S104 are performed, the encoded training data may be obtained. The following is a specific example. For example, the constructed training data (S1, S2, S3) includes three entities: brand, trade name and quantity, respectively replaced with solid symbols: $ brand, $ product $ and $ num $. The substituted S1 and S2 are as follows:

s1 ═ date of production of $ brand $ $ product ";

s2 ═ i buy $ num $ brand $ facial mask;

sequencing the three entities according to a preset rule, and then sequentially: and replacing the positions of the marks of brand $, $ num $, $ product $, and sequentially replacing the marks with corresponding position-occupying

codes

1, 2 and 3 in the dictionary according to the ranking sequence. And converting other characters into character codes according to the conversion function to obtain codes corresponding to S1 and S2 as follows:

the code corresponding to S1 is "1, 3, f (of), f (generation), f (production), f (day), f (period)";

s2 corresponds to the code "f (i), f (to), f (buy), 2, f (one), 1, f (of), f (face), f (membrane)";

wherein, the f (x) function is a conversion function from characters to code ids.

According to the embodiment of the disclosure, through conversion from the entity to the code, the first recognition model can learn the general meaning of the entity, that is, the importance of the entity can be learned, rather than learning the language meaning of each entity, so that the first recognition model can be ensured to be better migrated to a new scene, and the capability of migrating in each scene with a low threshold can be achieved.

In an embodiment of the present disclosure, the step S105 may include at least one of the following:

if the first text string and the second text string in the triple contain the same entity, and the first text string and the third text string do not contain the same entity, training a first recognition model to recognize the similarity of the first text string and the second text string, which is higher than the similarity of the first text string and the third text string;

under the condition that a first text string and a second text string in a triple contain a plurality of same entities, selecting a specified number of entities from the plurality of same entities; finding out entities different from the specified number of entities in a third text string of the triple, and training a first recognition model to recognize the similarity between the first text string and the second text string, wherein the similarity is higher than the similarity between the first text string and the third text string;

selecting at least two entities from the plurality of identical entities if the first text string and the second text string in the triple contain the plurality of identical entities; if an entity exists in the third text string of the triple and the entity is the same as one of the at least two entities, training the first recognition model to recognize the similarity between the first text string and the second text string, which is higher than the similarity between the first text string and the third text string.

In a possible embodiment, the method further comprises at least one of:

receiving configuration information input by a configuration worker, generating a regular entity according to the configuration information, and configuring an entity name and a regular expression for the regular entity;

and receiving configuration information input by a configurator, generating a descriptive entity according to the configuration information, and configuring an entity name and a descriptive entity value for the descriptive entity.

Through the mode, configuration personnel can set the enumeration type entity, the regular type entity and the description type entity according to service requirements, the application is very flexible, the model does not need to be retrained in each configuration, the configuration can be used in a target service scene, the configuration can be migrated to any field, and the scene migration capability is very strong.

Fig. 11 schematically shows an overall flow diagram of a multi-turn dialog according to an embodiment of the present disclosure. Referring to fig. 11, intent recognition for multiple rounds of dialog includes a configuration flow and a recognition flow. The configuration process comprises the steps of configuring the intention and configuring the entity, and training data obtained by configuration is used for training the first recognition model. In the identification process, intention identification is carried out on multiple turns of conversations initiated by a user by using a trained first identification model to obtain the intention of the user and related entity information, and then corresponding reply or response is carried out through conversation management and reply generation. For example, the results of the health diagnosis may be returned to the user in a health scenario, the shipping time may be returned to the user in an after-market scenario, and so forth. In the scenario of a predetermined air ticket, an operation of booking an air ticket may be triggered as a response, in the scenario of an invoice, an operation of invoicing a user may be triggered as a response, and so on. Wherein the first recognition model performing intent recognition comprises: identification of candidate entities, combinations of candidate entities, and intent identification in connection with entities. The intention recognition and the response of the multi-turn dialog are finally completed through the flow.

Exemplary Medium

Having described the method of the exemplary embodiment of the present disclosure, the medium of the exemplary embodiment of the present disclosure is explained next with reference to fig. 12.

In some possible embodiments, various aspects of the disclosure may also be implemented as a computer-readable medium having a program stored thereon, which when executed by a processor, is used to implement steps in a method for intent recognition in multiple rounds of dialog according to various exemplary embodiments of the disclosure described in the "exemplary methods" section above of this specification.

Specifically, the processor is configured to implement the following steps when executing the program: acquiring an initial question sentence contained in a first turn of conversation in multiple turns of conversations; determining a plurality of candidate entities corresponding to the initial question; generating a candidate question based on the initial question and the plurality of candidate entities; and determining the intention recognition result of the first turn of dialogue according to the candidate question sentences and the reference example sentences which are pre-configured with definite intentions.

It should be noted that: the above-mentioned medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As shown in fig. 12, a medium 120 that can employ a portable compact disc read only memory (CD-ROM) and include a program and can be run on a device according to an embodiment of the present disclosure is described. However, the disclosure is not so limited, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).

Exemplary devices

Having described the media of the exemplary embodiments of the present disclosure, the apparatus of the exemplary embodiments of the present disclosure is described next with reference to fig. 13.

As shown in fig. 13, the apparatus for recognizing intent in multiple rounds of dialog according to the embodiment of the present disclosure may include:

an obtaining module 1301, configured to obtain an initial question included in a first-turn dialog in multiple turns of dialogues;

a determining module 1302 for determining a plurality of candidate entities corresponding to the initial question;

a generating module 1303, configured to generate a candidate question based on the initial question and the plurality of candidate entities;

and the identification module 1304 is used for determining the intention identification result of the first-turn dialog according to the candidate question sentences and the reference example sentences which are pre-configured with clear intentions.

In one possible implementation, the types of candidate entities include: the system comprises a preset entity, an enumerated entity, a regular entity and a descriptive entity, wherein the preset entity is a preset and directly packaged entity type, the enumerated entity is an entity type with an entity value capable of being enumerated, the regular entity is an entity type capable of inducing the entity value by using a regular expression, and the descriptive entity is an entity type for describing the attribute or the state of the object.

In one possible embodiment, the determining module is configured to:

In one possible implementation, the generating module includes:

the combination unit is used for combining the candidate entities to form a plurality of candidate entity sets corresponding to the initial question, wherein the candidate entity sets take at least one candidate entity as an element of the candidate entity sets;

In one possible embodiment, the generation unit is configured to:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question sentences, and filtering sentences which do not appear in the reference example sentences which are configured with definite intentions in advance to obtain a plurality of candidate question sentences corresponding to the initial question sentences.

In one possible embodiment, the generation unit is configured to:

In a possible embodiment, the combination unit is further configured to:

In a possible embodiment, the combination unit is specifically configured to, in filtering out the respective entity values for combining based on the string length and the starting position:

In one possible embodiment, the identification module comprises:

the system comprises an extension unit, a display unit and a control unit, wherein the extension unit is used for extending an entity value of a description type entity in a reference example sentence to obtain an extension example sentence and supplementing the obtained extension example sentence into an intention if the reference example sentence which is configured with a definite intention in advance contains the reference example sentence which comprises the description type entity;

and the recognition unit is used for determining the intention recognition result of the first-turn dialog according to the candidate question sentences and the intended reference example sentences and extended example sentences.

In one possible embodiment, the expansion unit is configured to:

and supplements the resulting expanded example sentence into the intention.

In one possible embodiment, the identification module is configured to:

and determining the intention recognition result of the first turn of the conversation according to the result of the similarity calculation.

In a possible embodiment, the apparatus implements functions by a first recognition model trained in advance, and further includes:

and the training module is used for adopting the matching frame to construct training data of the triples to train the first recognition model before the start of the multi-round conversation.

In one possible implementation, the triplet includes a first text string, a second text string, and a third text string, and the training is targeted to have a similarity between the first text string and the second text string that is higher than a similarity between the first text string and the third text string.

In one possible embodiment, the training module is configured to:

before the multi-turn conversation starts, if the first text string and the second text string in the triple contain the same entity and the first text string and the third text string do not contain the same entity, training the first recognition model to recognize the similarity of the first text string and the second text string, which is higher than the similarity of the first text string and the third text string.

In one possible embodiment, the training module is configured to:

before starting a plurality of rounds of conversations, under the condition that a first text string and a second text string in a triple contain a plurality of identical entities, selecting a specified number of entities from the plurality of identical entities;

and finding out entities different from the specified number of entities in a third text string of the triple, and training a first recognition model to recognize the similarity between the first text string and the second text string, wherein the similarity is higher than the similarity between the first text string and the third text string.

In one possible embodiment, the training module is configured to:

In a possible embodiment, the above apparatus further comprises:

the coding module is used for coding the constructed training data before training the first recognition model according to the following modes:

sequencing the entities in the constructed training data, and sequentially replacing the sequenced entities with corresponding placeholders according to a ranking sequence;

and converting other characters in the training data into character codes according to the conversion function.

In one possible embodiment, the identification module is further configured to:

and after the similarity is calculated, combining the entities in the candidate question sentence corresponding to the highest similarity to determine the entity combination as the entity recognition result of the first round of conversation.

In a possible embodiment, the above apparatus further comprises:

the collection module is used for collecting other entities except the entity combination in the conversation after the first round of conversation;

and the response module is used for executing reply responses of multiple rounds of conversations according to the intention recognition result, the entity combination and other entities.

In a possible implementation, the apparatus further includes a configuration module configured to perform configuration in at least one of the following manners:

According to the device provided by the embodiment of the disclosure, an initial question included in a first turn of dialog in multiple turns of dialogs is obtained, multiple candidate entities corresponding to the initial question are determined, a candidate question is generated based on the initial question and the multiple candidate entities, and an intention identification result of the first turn of dialog is determined according to the candidate question and a reference example sentence which is configured with a definite intention in advance. Because the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, compared with a mode of simply carrying out the intention recognition, factors influenced by the entity are considered, and the accuracy rate of the intention recognition in multiple rounds of conversations is greatly improved.

Exemplary computing device

Having described the methods, media, and apparatus of the exemplary embodiments of the present disclosure, a computing device of the exemplary embodiments of the present disclosure is described next with reference to fig. 14.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

In some possible implementations, a computing device according to embodiments of the present disclosure may include at least one processing unit and at least one memory unit. Wherein the storage unit stores program code that, when executed by the processing unit, causes the processing unit to perform steps in a method of intent recognition in multiple dialogs according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

A computing device 140 according to such an embodiment of the present disclosure is described below with reference to fig. 14. The computing device 140 shown in fig. 14 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the disclosure.

As shown in fig. 14, computing device 140 is in the form of a general purpose computing device. Components of computing device 140 may include, but are not limited to: the at least one processing unit 1401 and the at least one memory unit 1402 are connected to a bus 1403 which connects different system components (including the processing unit 1401 and the memory unit 1402).

The bus 1403 includes a data bus, a control bus, and an address bus.

The storage unit 1402 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)14021 and/or cache memory 14022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 14023.

Storage unit 1402 may also include a program/utility 14025 having a set (at least one) of program modules 14024, such program modules 14024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Computing device 140 may also communicate with one or more external devices 1404 (e.g., keyboard, pointing device, etc.). Such communication may occur via an input/output (I/O) interface 1405. Also, computing device 140 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through network adapter 1406. As shown in FIG. 14, the network adapter 1406 communicates with the other modules of the computing device 140 over a bus 1403. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computing device 140, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be noted that although in the above detailed description several units/modules or sub-units/sub-modules of the apparatus are mentioned which are intended to be identified in a number of sessions, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Further, while the operations of the disclosed methods are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that the present disclosure is not limited to the particular embodiments disclosed, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method of intent recognition in a multi-turn dialog, comprising:

2. The method of claim 1, wherein the type of the candidate entity comprises: the system comprises a preset entity, an enumerated entity, a regular entity and a descriptive entity, wherein the preset entity is a preset and directly packaged entity type, the enumerated entity is an entity type with an entity value capable of being enumerated, the regular entity is an entity type with an entity value capable of being induced by a regular expression, and the descriptive entity is an entity type for describing the attribute or the state of things.

3. The method of claim 1, wherein generating candidate question sentences based on the initial question sentence and the plurality of candidate entities comprises:

4. The method of claim 3, wherein the respectively arranging the elements in the respective candidate entity sets in corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question comprises:

5. The method according to claim 1, wherein the determining the intention recognition result of the first-turn dialog according to the candidate question sentences and reference example sentences which are pre-configured with definite intentions comprises:

6. The method of claim 1, wherein the method is performed by a pre-trained first recognition model, and wherein the first recognition model is trained using training data of a pairing framework construction triplet prior to the start of the multi-turn dialog.

7. The method of claim 2, further comprising at least one of:

8. An apparatus for intent recognition in a multi-turn dialog, comprising:

9. A medium storing a computer program, characterized in that the program, when being executed by a processor, carries out the method according to any one of claims 1-7.

10. A computing device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.