CN113157893B

CN113157893B - Method, medium, apparatus and computing device for intent recognition in multiple rounds of conversations

Info

Publication number: CN113157893B
Application number: CN202110571576.3A
Authority: CN
Inventors: 沙雨辰; 俞霖霖; 胡光龙; 汪源; 刘秀颖
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2023-12-15
Anticipated expiration: 2041-05-25
Also published as: CN113157893A

Abstract

Embodiments of the present disclosure provide a method, medium, apparatus, and computing device for intent recognition in a multi-round dialog. The method comprises the following steps: acquiring initial questions contained in a first-round dialogue in a plurality of rounds of dialogue, determining a plurality of candidate entities corresponding to the initial questions, generating candidate questions based on the initial questions and the plurality of candidate entities, and determining an intention recognition result of the first-round dialogue according to the candidate questions and reference example sentences which are pre-configured with clear intention. According to the embodiment of the disclosure, due to the fact that the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, and compared with a mode of simply carrying out the intention recognition, factors of the influence of the entity are considered, so that the accuracy of the intention recognition in multiple rounds of conversations is greatly improved.

Description

Method, medium, apparatus and computing device for intent recognition in multiple rounds of conversations

Technical Field

Embodiments of the present disclosure relate to the field of intent recognition technology, and more particularly, to methods, media, apparatuses, and computing devices for intent recognition in multiple rounds of conversations.

Background

This section is intended to provide a background or context to the embodiments of the disclosure recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.

The multi-round conversations are generally classified into a boring multi-round conversation and a task multi-round conversation. The chatting type multi-turn dialogue is generally found in chatting robots, such as Microsoft ice, little colleagues and the like, can perform 'swamp sky' type dialogue with users in the open field, and chat topics are unlimited. Task-type multi-turn conversations are commonly found in customer service robots, such as ali honey, networkseven fish, etc., to help users solve a particular problem in a field or achieve a particular goal. Users of task-type multi-turn conversations typically interact with the system with one or more explicit intents, such as "scheduled air tickets" or "self-help health diagnostics," etc. The intention cannot be satisfied simply by a one-to-one dialogue scheme, namely a single-round dialogue, but the intention of the user and the completion of the related attribute information can be confirmed step by step through multiple rounds of dialogue of the user and the customer service robot, so that the final purpose can be achieved. For example, under the intention of a reservation ticket, the multi-turn dialogue system needs to determine that the intention of the user is a reservation ticket, and then complement relevant information such as the departure place, the destination, the departure time, the class type and the like of the user, so as to realize the final reservation ticket service for the user.

Currently, existing multi-turn conversations typically determine a user's intent to enter a conversation in a first turn of the user's conversation, such as confirming whether the user needs a "reservation ticket" or a "ticket change"; and then filling the slot positions, and collecting other information required by completing the user intention recognition, such as the 'departure place' and 'destination' entity information of the user required to be collected by the 'reservation air ticket' intention.

However, the above prior art performs the two steps of intention recognition and entity recognition independently, resulting in the positive and negative of the entire session flow depending on the first performed step. If the prior intention recognition link is wrong, the subsequent entity recognition task is meaningless. For example, in an airport scene, there are two intents of "reservation ticket" and "ticket change", and if the user intents to be mistaken from "reservation ticket" to "ticket change" in the intention recognition link, then the entity of "change date" is extracted later.

Disclosure of Invention

The present disclosure contemplates a method and apparatus for intent recognition in a multi-round dialog.

In a first aspect of embodiments of the present disclosure, there is provided a method of intent recognition in a multi-round dialog, comprising:

Acquiring an initial question included in a first-round dialogue in multiple rounds of dialogues;

determining a plurality of candidate entities corresponding to the initial question;

generating a candidate question based on the initial question and the plurality of candidate entities;

and determining the intention recognition result of the first-round dialogue according to the candidate question sentence and the reference example sentence which is pre-configured with the explicit intention.

In one embodiment of the disclosure, the types of candidate entities include: the system comprises a preset type entity, an enumeration type entity, a regular type entity and a description type entity, wherein the preset type entity is a preconfigured and directly packaged entity type, the enumeration type entity is an entity type with an enumerated entity value, the regular type entity is an entity type with an available regular expression inductive entity value, and the description type entity is an entity type describing object attributes or states.

In one embodiment of the disclosure, the determining a plurality of candidate entities corresponding to the initial question includes:

and identifying a plurality of candidate entities corresponding to the initial question by using a first pre-trained identification model, wherein the plurality of candidate entities comprise preset type entities, enumeration type entities and regular type entities.

In one embodiment of the disclosure, the generating a candidate question based on the initial question and the plurality of candidate entities includes:

combining the plurality of candidate entities to form a plurality of candidate entity sets corresponding to the initial question, wherein the candidate entity sets have at least one candidate entity as an element thereof;

elements in each candidate entity set are respectively configured in corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question.

In one embodiment of the present disclosure, the configuring the elements in each candidate entity set in the corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question includes:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question, and filtering out sentences with the number of elements less than the designated number to obtain a plurality of candidate question sentences corresponding to the initial question.

And respectively configuring elements in each candidate entity set in the corresponding positions of the initial question, and filtering out sentences which do not appear in the reference example sentences which are pre-configured with clear intention to obtain a plurality of candidate question sentences corresponding to the initial question.

and respectively configuring elements in each candidate entity set in the corresponding positions of the initial question, limiting the currently obtained sentences when the total number of the obtained sentences reaches a preset upper limit value, and randomly selecting a plurality of sentences as candidate question sentences corresponding to the initial question.

In one embodiment of the present disclosure, before the combining the plurality of candidate entities to form the plurality of candidate entity sets corresponding to the initial question, the method further includes:

and for the candidate entity with a plurality of entity values, acquiring the character string length of each entity value corresponding to the candidate entity, determining the initial position of the corresponding entity value in the initial question, and screening out the corresponding entity value for combination according to the character string length and the initial position.

In one embodiment of the present disclosure, the screening the corresponding entity values for combining according to the string length and the start position includes:

if the character string lengths of the plurality of entity values of the candidate entity are different, selecting the entity value with the longest character string length for combination;

if the character strings of the plurality of entity values of the candidate entity are the same in length, the entity value with the forefront initial position is selected for combination under the condition that the character strings contained in the plurality of entity values are overlapped.

In one embodiment of the present disclosure, the determining, according to the candidate question sentence and a reference example sentence configured with an explicit intention in advance, an intention recognition result of the first-round dialog includes:

if a reference example sentence which is pre-configured with an explicit intention exists, expanding the entity value of the descriptive entity in the reference example sentence to obtain an expanded example sentence, and supplementing the obtained expanded example sentence into the intention;

and determining an intention recognition result of the first-round dialogue according to the candidate question sentence, the reference example sentence and the extended example sentence of the intention.

In one embodiment of the present disclosure, the expanding the entity value of the descriptive entity in the reference example sentence to obtain an expanded example sentence includes:

Acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values configured for the descriptive entity in advance;

and sequentially replacing the current entity value in the reference example sentence with the other acquired entity values to respectively obtain corresponding extended example sentences.

encoding the candidate questions and the reference example sentences which are pre-configured with clear intention respectively;

performing similarity calculation on the encoded candidate question and the reference example sentence;

and determining an intention recognition result of the first-round dialogue according to the result of the similarity calculation.

In one embodiment of the present disclosure, the method is performed by a pre-trained first recognition model, and training the first recognition model with training data of a pairing framework construction triplet before the multi-round dialog starts.

In one embodiment of the present disclosure, the triplet includes a first text string, a second text string, and a third text string, and the training is aimed at a similarity of the first text string and the second text string that is higher than a similarity of the first text string and the third text string.

In one embodiment of the disclosure, the training the first recognition model using training data of paired frame construction triples includes:

and if the first text string and the second text string in the triplet contain the same entity and the first text string and the third text string do not have the same entity, training the first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than the similarity of the first text string and the third text string.

selecting a specified number of entities from a plurality of identical entities in the case that the first text string and the second text string in the triplet contain the identical entities;

and finding out entities which are different from the specified number of entities in the third text string of the triplet, and training the first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than the similarity of the first text string and the third text string.

Selecting at least two entities from a plurality of identical entities in the case that the first text string and the second text string in the triplet include the plurality of identical entities;

and if one entity exists in the third text string of the triplet and is the same as one entity in the at least two entities, training the first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than the similarity of the first text string and the third text string.

In one embodiment of the present disclosure, the method further comprises:

the constructed training data is encoded before training the first recognition model in the following manner:

reserving a plurality of space occupying entities in a dictionary, and setting corresponding space occupying codes according to the arrangement sequence of the corresponding space occupying entities in the dictionary;

sorting the entities in the constructed training data, and sequentially replacing the sorted entities with corresponding space occupying codes according to the ranking order;

and converting other words in the training data into word codes according to a conversion function.

In one embodiment of the present disclosure, the method further comprises:

and after the similarity is calculated, determining the entity combination in the candidate question corresponding to the highest similarity as an entity identification result of the first-round dialogue.

In one embodiment of the present disclosure, the method further comprises:

collecting other entities except the entity combination in a dialogue after the first round of dialogue;

and executing the reply or response of the multi-round dialogue according to the intention recognition result, the entity combination and other entities.

In one embodiment of the present disclosure, the method further comprises at least one of:

receiving configuration information input by a configuration person, generating an enumeration type entity according to the configuration information, and configuring an entity name and an enumerated entity value for the enumeration type entity;

receiving configuration information input by a configuration person, generating a regular type entity according to the configuration information, and configuring an entity name and a regular expression for the regular type entity;

and receiving configuration information input by a configuration person, generating a description type entity according to the configuration information, and configuring an entity name and a descriptive entity value for the description type entity.

In a second aspect of embodiments of the present disclosure, there is provided an apparatus for intent recognition in a multi-round dialog, comprising:

the acquisition module is used for acquiring initial questions contained in a first-round dialogue in the multi-round dialogue;

a determining module for determining a plurality of candidate entities corresponding to the initial question;

The generation module is used for generating candidate questions based on the initial questions and the candidate entities;

and the recognition module is used for determining the intention recognition result of the first-round dialogue according to the candidate question sentence and the reference example sentence which is pre-configured with the explicit intention.

In one embodiment of the disclosure, the determining module is configured to:

In one embodiment of the present disclosure, the generating module includes:

a combination unit, configured to combine the plurality of candidate entities to form a plurality of candidate entity sets corresponding to the initial question, where the candidate entity sets have at least one candidate entity as an element thereof;

And the generating unit is used for respectively configuring elements in each candidate entity set in the corresponding positions of the initial question so as to form a plurality of candidate questions corresponding to the initial question.

In one embodiment of the present disclosure, the generating unit is configured to:

In one embodiment of the present disclosure, the combining unit is further configured to:

In one embodiment of the present disclosure, the combining unit is specifically configured to, when screening out the corresponding entity values for combining according to the string length and the start position:

In one embodiment of the present disclosure, the identification module includes:

an expansion unit, configured to, if a reference example sentence with an explicit intention is preset, expand an entity value of a description type entity in the reference example sentence to obtain an expanded example sentence, and supplement the obtained expanded example sentence to the intention;

And the recognition unit is used for determining an intention recognition result of the first-round dialogue according to the candidate question sentence, the reference example sentence and the extended example sentence of the intention.

In one embodiment of the present disclosure, the extension unit is configured to:

if a reference example sentence with clear intention is pre-configured, acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values pre-configured for the descriptive entity;

sequentially replacing the current entity value in the reference example sentence with the other acquired entity values to respectively obtain corresponding extended example sentences;

and supplementing the obtained extended example sentence into the intention.

In one embodiment of the present disclosure, the identification module is configured to:

In one embodiment of the present disclosure, the apparatus performs functions by a pre-trained first recognition model, the apparatus further comprising:

And the training module is used for training the first recognition model by adopting training data of the pairing frame construction triplet before the multi-round dialogue starts.

In one embodiment of the present disclosure, the training module is configured to:

before the multi-turn dialogue starts, if the first text string and the second text string in the triplet contain the same entity and the first text string and the third text string do not have the same entity, training the first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than the similarity of the first text string and the third text string.

before the multi-round dialogue starts, selecting a specified number of entities from a plurality of identical entities under the condition that the first text string and the second text string in the triplet contain the identical entities;

before the multi-round dialogue starts, selecting at least two entities from a plurality of identical entities under the condition that the first text string and the second text string in the triplet contain the identical entities;

In one embodiment of the present disclosure, the apparatus further comprises:

the coding module is used for coding the constructed training data before training the first recognition model in the following way:

In one embodiment of the present disclosure, the identification module is further configured to:

In one embodiment of the present disclosure, the apparatus further comprises:

the collecting module is used for collecting other entities except the entity combination in the dialogue after the first round of dialogue;

and the response module is used for executing the reply or response of the multi-round dialogue according to the intention recognition result, the entity combination and other entities.

In one embodiment of the disclosure, the apparatus further comprises a configuration module configured to employ at least one of:

In a third aspect of the disclosed embodiments, a computer-readable medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of intent recognition in a multi-turn conversation described above.

In a fourth aspect of embodiments of the present disclosure, there is provided a computing device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the method for intent recognition in a multi-round dialog.

According to the method and the device for identifying the intention in the multi-round dialogue, through acquiring an initial question included in a first-round dialogue in the multi-round dialogue, a plurality of candidate entities corresponding to the initial question are determined, candidate questions are generated based on the initial question and the plurality of candidate entities, and the intention identification result of the first-round dialogue is determined according to the candidate questions and a reference example sentence which is pre-configured with clear intention. Because the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, and compared with a mode of simply carrying out the intention recognition, factors of the influence of the entity are considered, so that the accuracy of the intention recognition in the multi-round dialogue is greatly improved.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

FIG. 1 schematically illustrates a flow diagram first implementation of a method for intent recognition in a multi-round conversation in accordance with an embodiment of the present disclosure;

FIG. 2 schematically illustrates an intent configuration diagram one in accordance with an embodiment of the present disclosure;

FIG. 3 schematically illustrates an intent configuration diagram II in accordance with an embodiment of the present disclosure;

FIG. 4 schematically illustrates a method implementation flow chart II for intent recognition in a multi-round conversation in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of determining candidate entities according to an embodiment of the disclosure;

FIG. 6 schematically illustrates a schematic diagram of generating candidate questions according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a schematic diagram of candidate question filtering results according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a diagram of screening entity values according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a descriptive entity extension in accordance with an embodiment of the present disclosure;

FIG. 10 schematically illustrates a flow chart of training a first recognition model according to an embodiment of the present disclosure;

FIG. 11 schematically illustrates an overall flow diagram of a multi-round dialog in accordance with an embodiment of the present disclosure;

FIG. 12 schematically illustrates a media schematic of a method for intent recognition in a multi-round conversation in accordance with an embodiment of the present disclosure;

FIG. 13 schematically illustrates a schematic diagram of an apparatus for intent recognition in a multi-round conversation in accordance with an embodiment of the present disclosure;

fig. 14 schematically illustrates a structural schematic diagram of a computing device according to an embodiment of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

According to embodiments of the present disclosure, a method, medium, apparatus, and computing device for intent recognition in a multi-round dialog are presented.

Any number of elements in the figures are for illustration and not limitation, and any naming is used for distinction only, and not for any limiting sense.

The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.

Summary of The Invention

The present disclosure finds that in the existing intent recognition technology of the first-round session of the multi-round session, two steps of intent recognition and entity recognition are often performed independently, so that the accuracy of the recognition result of the whole session flow is strongly dependent on the first executed step, and if the first executed step is wrong, the accuracy of the final recognition result is also greatly affected.

In view of this, the present disclosure provides a method and apparatus for intent recognition in a multi-round dialogue, by acquiring an initial question included in a first-round dialogue in the multi-round dialogue, determining a plurality of candidate entities corresponding to the initial question, generating a candidate question based on the initial question and the plurality of candidate entities, and determining an intent recognition result of the first-round dialogue according to the candidate question and a reference example sentence configured with explicit intent in advance. Because the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, and compared with a mode of simply carrying out the intention recognition, factors of the influence of the entity are considered, so that the accuracy of the intention recognition in the multi-round dialogue is greatly improved.

Having described the basic principles of the present disclosure, various non-limiting embodiments of the present disclosure are specifically described below.

Exemplary method

A method of intent recognition in a multi-round dialog according to an exemplary embodiment of the present disclosure is described below with reference to fig. 1.

As shown in fig. 1, the method for intent recognition in a multi-round dialog according to an embodiment of the present disclosure includes the steps of:

s11: acquiring an initial question included in a first-round dialogue in multiple rounds of dialogues;

S12: determining a plurality of candidate entities corresponding to the initial question;

s13: generating a candidate question based on the initial question and the plurality of candidate entities;

s14: and determining the intention recognition result of the first-round dialogue according to the candidate question sentence and the reference example sentence which is pre-configured with the explicit intention.

In the embodiment of the disclosure, multiple rounds of conversations refer to conversations that interact with a user in multiple rounds, each of which is in the form of a question-and-answer, wherein the first round of conversations refers to the first round of conversations in the multiple rounds of conversations.

In the disclosed embodiments, an entity refers to an object or transaction that exists in the real world and can be distinguished from each other. Moreover, an entity need not necessarily be a physical existence, but may also be an abstract concept. For example: location, vehicle, cell phone brand, duration, bunk type, etc.

The above explicit intent is preconfigured and may include a variety of application scenarios. For example, in an airport scenario, "helping a user to reserve an air ticket" may be configured as an intent; in a health scenario, "help user to diagnose condition and make advice" may be configured as an intention; in an educational scenario, "helping a user to complete a course reservation" may be configured as an intention.

Through the above process, the embodiment of the disclosure determines the candidate entity based on the initial question, performs the intention recognition based on the candidate entity, effectively combines the intention recognition and the entity recognition, and considers the factors of the influence of the entity compared with the mode of performing the intention recognition only, thereby greatly improving the accuracy of the intention recognition in the multi-round dialogue.

Types of candidate entities to which embodiments of the present disclosure relate include: preset type entity, enumeration type entity, regular type entity and descriptive type entity. The preset type entity is a preconfigured and directly packaged entity type, the enumerated type entity is an entity type with an enumerated entity value, the regular type entity is an entity type with an available regular expression inducing the entity value, and the descriptive type entity is an entity type describing the object attribute or state.

In one possible implementation, the preset-type entity may include named entity types such as a person name, a place name, a time, and the like. The entity types are common, can be suitable for intention recognition tasks in different businesses and different fields, and have rich open source annotation data and mature entity recognition solutions. For example, in the intent of "reservation ticket," the "departure place", "destination", "departure time" may be categorized as preset entities.

In one possible implementation, the enumerated entities may be configured by a configurator, including configuring which entities are enumerated entities and defining corresponding entity names and entity values. For example, in the intent of "reservation ticket," a configurator may configure "billboards" as an enumerated entity whose entity value is enumerated, which may include: first class, service class and economy class. In general, in various business scenarios, the description of entity values is accurate and limited, and configurators refer to popular terminology in describing such entity values.

In one possible implementation, regular type entities may be freely defined by a configurator with respect to business-related entity names and corresponding regular expressions. For example, in the "self-help health diagnosis" intention, the "user case number" is a string of preset user identification IDs, and the configurator can directly express all case numbers IDs by using a set of regular rules. The entity value of the regular type entity is controllable and does not jump out of the range specified by the regular expression.

In one possible implementation, the description type entity is different from the enumeration type entity, and a configurator may use different expression methods when describing the entity values, so that certain understanding and generalization capability on the entity values are required. For example, in a "self-help health diagnosis" intent, a "user symptom" is configured as a descriptive entity whose entity value may be configured to: cough, runny nose, insomnia and headache. However, when the user actually describes his own symptoms in multiple rounds of conversations, no word is used that is completely consistent with the configuration, as the user may use: cough for a while, nasal discharge, pain in the temple, etc. Therefore, considering the semantic generalization condition of the entity values, the description type entity provided by the embodiment of the disclosure can solve the problem that the semantics of the general entity values cannot be generalized. And when a plurality of candidate entities corresponding to the initial question are determined, only the preset type entity, the enumeration type entity and the regular type entity are considered, and the description type entity is put into a link of intention recognition, so that system resources are saved, and a better entity recognition effect can be achieved.

In the embodiment of the disclosure, each entity has another attribute, namely a corresponding entity value, besides two attributes, namely an entity name and an entity type. The entity value of an entity can be one or more, and the corresponding entity value represents the corresponding true value of the entity under different scenes. For example, the entity values corresponding to the entity "address" are "Beijing" and "Shanghai", and the entity values corresponding to the entity "vehicle" are "train" and "plane", etc.

In the embodiment of the disclosure, the configuration of the three types of entity, namely the enumeration type entity, the regular type entity and the description type entity, is completely open to configuration personnel, can be configured and used in various service scenes and fields without cost, and the configuration personnel can divide and configure the entity which is intended to be associated according to different characteristics of the entity types, so that the flexibility of application is greatly improved, and the application is wider.

In the present disclosure, when an intention is configured, an intention name, an entity type and a reference example sentence may be configured, and in particular, corresponding contents may be configured according to actual needs. It should be noted that different entity information can be contained in the configured reference example sentences, or the entity information can be completely not contained, and the reference example sentences are all reference example sentence configuration categories meeting requirements, and corresponding design and configuration of intention can be carried out by configuration personnel according to service related scenes. For example, referring to fig. 2, a ticket is intended to be ordered by a reference example sentence 3 in "reservation ticket". "belongs to a reference example sentence which does not contain entity information, the corresponding intention can be indicated without the entity information, and the situation can be set by configuration personnel according to actual needs.

Fig. 2 schematically illustrates an intended configuration schematic according to an embodiment of the present disclosure. Referring to fig. 2, the intent name "predetermined ticket" is configured in the airport scene, and there are four entities associated with the intent, and the entity names are "departure place", "destination", "departure time" and "class level", respectively. The entity types of the departure place, the destination and the departure time are preset entities. The entity type of the "cabin class" is an enumeration type entity, and its actual values include: first class, business class and economy class. The reference example sentence configured for the intention includes: 1. and (5) helping me to order an air ticket for flying the sea on tomorrow. 2. I want to order a business class ticket, fly to Beijing. 3. And (5) booking the air ticket. The number of the configured reference example sentences can be set according to the needs, and specific numerical values are not limited, such as configuring 5 or 8 reference example sentences and the like. In addition, the entity value may be marked in the reference example sentence to show the distinction, and the specific marking mode is not limited. For example, in the reference example sentence "help me to order an air ticket for flying open sky to the sea", the entity value "open sky" of the entity "departure time" and the entity value "upper sea" of the entity "destination" are marked.

Fig. 3 schematically illustrates an intended configuration schematic according to an embodiment of the present disclosure. Referring to fig. 3, in the health scenario, the schematic name is configured as "self-help health diagnosis", three entities are associated with the schematic, the entity names are respectively "user medical records", "user symptoms" and "duration", and the corresponding entity types are respectively a regular type entity, a descriptive type entity and a preset type entity. Wherein the entity values of the entity "user symptoms" include: cough, runny nose, insomnia and headache. The reference example sentence configured for the intention includes: 1. i have cough for a week, which is not good. 2. The night is not asleep all the time, and the night lasts for one month. 3. What should the nasal discharge take the medicine? In addition, the entity value may be marked in the reference example sentence to show the distinction, for example, in the reference example sentence "i have cough for a week, but not good at all", the entity value "cough" of the entity "user symptom" and the entity value "one month" of the entity "duration" are marked.

In one possible implementation manner, the step S12 may include:

Using a first recognition model trained in advance, a plurality of candidate entities corresponding to the initial question are recognized, the plurality of candidate entities including a preset type entity, an enumerated type entity, and a regular type entity.

In one possible implementation manner, the step S14 may include:

encoding candidate questions and reference example sentences which are pre-configured with clear intention respectively; performing similarity calculation on the encoded candidate question and the reference example sentence; and determining the intention recognition result of the first-round dialogue according to the result of the similarity calculation.

In one possible implementation manner, after performing the similarity calculation, the method may further include:

and combining the entities in the candidate questions corresponding to the highest similarity, and determining the combination as an entity identification result of the first-round dialogue. The entity identification result based on similarity determination can reflect the entity corresponding to the initial question more accurately, and can be used as a basis for replying or responding when the subsequent multi-round dialogue replying or responding is carried out, thereby being beneficial to improving the satisfaction degree of users.

In one possible embodiment, the method may further include:

collecting other entities except the entity combination in the dialogue after the first round of dialogue; and executing replies or responses of multiple rounds of conversations according to the intention recognition result, the entity combination and other entities.

The general session management method can be adopted to track the state of the current intention at any time, judge whether the identification of the entity information required at present is completed or not, and if not, guide the user to supplement other entity information through a reverse question-and-talk technique until all the entities required by the current intention are identified. The reply of the multi-round dialogue is to send the content of the reply to the user, and the response of the multi-round dialogue is to trigger the corresponding operation to be executed as the response of the multi-round dialogue. Specifically, according to the configured reply mode, reply content can be generated and sent to the user, such as reply to the diagnosis result of the illness state and suggested measures. Alternatively, the corresponding operation may be directly triggered to be performed as a response of the multi-round dialogue, for example, the operation of booking the air ticket to the user or the operation of invoicing the user may be triggered to be performed as a response of the multi-round dialogue, etc.

The method for replying or responding to the multi-round dialogue after the comprehensive intention recognition result and the entity recognition result can improve the accuracy of replying or responding, further improve the satisfaction degree of the user on the replying or responding result and improve the user experience.

Fig. 4 schematically illustrates a flow diagram of a method implementation of intent recognition in a multi-round dialog in accordance with an embodiment of the disclosure. As shown in fig. 4, the method for intent recognition in a multi-round dialog of an embodiment of the present disclosure includes the steps of:

S41: acquiring an initial question included in a first-round dialogue in multiple rounds of dialogues;

s42: determining a plurality of candidate entities corresponding to the initial question;

s43: combining the plurality of candidate entities to form a plurality of candidate entity sets corresponding to the initial question, wherein each candidate entity set takes at least one candidate entity as an element thereof;

wherein the elements of the candidate entity set are made up of candidate entities, a candidate entity set may comprise one or more candidate entities. In order to improve accuracy of the intention recognition result, the determined plurality of candidate entities may be combined as much as possible to obtain the maximum candidate entity set. Of course, the number of candidate entity sets is not specifically limited in the embodiments of the present disclosure.

S44: respectively configuring elements in each candidate entity set in corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question;

s45: if a reference example sentence which is pre-configured with a clear intention exists, expanding the entity value of the description type entity in the reference example sentence to obtain an expanded example sentence, and supplementing the obtained expanded example sentence into the intention;

S46: and determining the intention recognition result of the first-round dialogue according to the candidate question sentence, the reference example sentence and the extension example sentence of the intention.

According to the method, the multiple candidate entity sets are obtained through the combination of the multiple candidate entities, the multiple candidate question sentences are formed based on the fact that elements in the candidate entity sets are configured at corresponding positions of the initial question sentences, multiple objects to be matched can be obtained, according to the objects to be matched and the reference example sentences which are pre-configured with clear intention, the intention recognition result of the first-round dialogue is further determined, the candidate question sentences are rich, the influence of the entity combination on the intention recognition is fully considered, and the accuracy of the intention recognition is effectively improved. In addition, the description type entity is expanded to obtain an expanded example sentence which is also used as an object to be matched, so that richer and more perfect data support can be provided for intention recognition, the accuracy of the intention recognition is further improved, and the satisfaction of a user is improved.

In the embodiment of the present disclosure, when determining the plurality of candidate entities corresponding to the initial question in step S42, the candidate entities may be determined according to whether the type of the entity is a preset type entity, an enumeration type entity or a regular type entity. For the preset entity, based on the universality, an open-source entity identification module or a self-research training entity extraction model can be directly adopted to carry out entity identification, and common methods such as LSTM-CRF (Long Short-Term Memory Conditional Random Field algorithm, long-Short-term memory network-conditional random field algorithm) or BiGRU (Bidirectional Gated Recurrent Unit, two-way gating circulation unit) -CRF and the like are adopted. For the enumerated entity, because the speaking operation is relatively accurate, the entity identification can be performed in a rule matching mode. For a regular type entity, the entity value range does not jump out of the range specified by the regular expression, so that the regular expression can be directly used for matching and identifying.

For descriptive entities, since the words are relatively broad, all possible entity values cannot be exhausted by means of enumeration, and the requirements for semantic generalization and understanding capability are the same as those of the intended recognition link, the determination of descriptive entities can be performed in step S45.

Fig. 5 schematically illustrates a schematic diagram of determining candidate entities according to an embodiment of the disclosure. Referring to fig. 5, after an initial question included in a first-round dialog of a multi-round dialog is acquired, a plurality of candidate entities corresponding to the initial question are determined, including a preset-type entity, an enumeration-type entity, and a regular-type entity. The left side example of the figure is an airport scene, the initial question is "open day goes to Guangzhou business trip, help me to order business class tickets", and the identified entities include: "departure time", "departure place", "destination" and "bunk grade", the corresponding entity values are respectively: "tomorrow", "Guangzhou" and "business class". Since "guangzhou" may be a source entity value or a destination entity value, if it is temporarily impossible to identify which entity is, both possibilities are retained, and the actual attribute is reconfirmed in the final intention recognition step. The right example in the figure is a health scenario, the initial question is "how do i go back to what is i's headache today? The case number is this: 654321", the identified entities include: the duration and the user medical records number correspond to the entity values: "today always" and "654321".

Fig. 6 schematically illustrates a schematic diagram of generating a candidate question according to an embodiment of the present disclosure. Referring to fig. 6, two intents "ticket change" and "predetermined ticket" are configured in an airport scene, and each intention is configured with related entity information and reference example sentences. Wherein the entity with which the intention of "ticket change" is associated comprises: the regular type entity 'flight number' and the preset type entity 'change date', and the reference example sentences comprise: "1. Help me to check in the open sky to the air ticket of the Shanghai" etc.. The entities intended to be associated with a "predetermined ticket" include: the preset type entity 'departure place', 'destination', 'departure time' and the enumeration type entity 'cabin class', and the reference example sentence comprises: and 1, helping me to order an open sky flight air ticket. 2. I am in hangzhou, number 20 fei guangzhou, order air ticket ", etc.

When an initial question "tomorrow goes to Guangzhou business trip," which helps me to order air tickets, "is acquired in the first-round dialogue of the multi-round dialogue, 4 entities are first identified in the initial question: "departure time", "date of change", "departure place" and "destination", and the types of these 4 entities are identified as preset entities. The 4 entities are then combined to form a set of 8 candidate entities corresponding to the initial question. Referring to fig. 6, the resulting 8 candidate entity sets each include the following combinations:

"date of change";

"date of change" + "destination";

"date of change" + "place of departure";

"departure time" + "destination";

"departure time" + "departure place";

"departure time";

"destination";

"place of departure".

The elements in the 8 candidate entity sets are respectively configured in the corresponding positions of the initial question, so that 8 candidate questions as shown in the figure can be obtained, and the generation process of the candidate questions is completed.

In one possible implementation manner, the step S44 may include:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question, and filtering out sentences with the number of elements less than the appointed number to obtain a plurality of candidate question sentences corresponding to the initial question. The implementation is to associate entities as much as possible, so that more representative candidate questions are determined, and accuracy of intention recognition is improved. The specific number may be set according to a setting, such as 2, etc., and the disclosure of the specific number is not limited.

Taking the scenario of fig. 6 as an example, the specified number may be set to 2, and sentences of less than 2 elements are filtered out, that is, if only one element, i.e., an entity, is included in the candidate question, it represents a case where there is no combination of entities, which is regarded as not being representative, and may be filtered out. The candidate questions 1, 6, 7, 8 in fig. 6 each include only one entity and thus can be filtered out. However, since the first candidate question includes only the "date of change", the first candidate question is present in the reference example intended for the "ticket change", and thus may be retained, and only the candidate questions 6, 7, 8 may be filtered.

In one possible implementation manner, the step S44 may include:

elements in each candidate entity set are respectively configured in corresponding positions of the initial question, sentences which do not appear in the reference example sentences which are pre-configured with clear intention are filtered, and a plurality of candidate question sentences corresponding to the initial question are obtained. The embodiment is used for ensuring that the entity combination can accord with the existing entity combination condition in the reference example sentence, thereby filtering out the condition of intention conflict and improving the accuracy of the entity combination. Among them, sentences which do not appear in reference example sentences which are pre-configured with explicit intention are considered to be cases where intention conflicts occur, do not meet the needs of actual scenes, and are thus filtered out.

Taking the scenario of fig. 6 as an example for illustration, both candidate questions 2 and 3 belong to intent conflicts. Wherein, the candidate question sentence 2 is a combination of the "change date" and the "destination", the candidate question sentence 3 is a combination of the "change date" and the "departure place", and both of them are not present in the reference example sentence intended for the "ticket change", so that both can be filtered out.

Fig. 7 schematically illustrates a schematic diagram of candidate question filtering results according to an embodiment of the present disclosure. Referring to fig. 7, after the 8 candidate questions obtained in fig. 6 are filtered according to the above method, 3 final candidate questions, that is, the above candidate questions 1, 4 and 5, are obtained. Further, the entities in the candidate question may be replaced with entity symbols to facilitate subsequent coding and model training.

In one possible implementation manner, the step S44 may include:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question, limiting the currently obtained sentences when the total number of the obtained sentences reaches a preset upper limit value, and randomly selecting a plurality of sentences as candidate question sentences corresponding to the initial question.

The situation that the total number of sentences reaches the preset upper limit value usually belongs to extreme cases, and shows that the combination explosion is generated at present, for example, the combination of entities which are related to the number of identification cards or train tickets and the like and have close relationship with numbers is extremely easy to generate. In order to avoid resource waste, the generation of new sentences can be stopped, namely, entity combination is not performed any more, but the sentences obtained at present are limited, and a plurality of sentences are randomly selected from the sentences to serve as candidate question sentences corresponding to the initial question. The number of sentences selected randomly may be set according to needs, for example, 3 or 5 sentences selected randomly, and the embodiment of the disclosure is not limited.

In a possible implementation manner, the step S43 may further include:

for a candidate entity with a plurality of entity values, acquiring the character string length of each entity value corresponding to the candidate entity, determining the initial position of the corresponding entity value in the initial question, and screening out the corresponding entity value for combination according to the character string length and the initial position.

The screening the corresponding entity values for combining according to the length of the character string and the starting position may specifically include:

if the character string lengths of the plurality of entity values of the candidate entity are different, selecting the entity value with the longest character string length for combination; if the character strings of the plurality of entity values of the candidate entity are the same in length, the entity value with the forefront initial position is selected for combination under the condition that the character strings contained in the plurality of entity values are overlapped.

In the above embodiment, the entity value with the longest string length of the entity values is preferentially selected for combination, and then, the entity value with the forefront starting position is selected for combination under the condition that the string lengths are the same, so that the strategy can ensure that the entity value for combination reflects the information of the initial question more truly.

Fig. 8 schematically illustrates a diagram of screening entity values according to an embodiment of the present disclosure. Referring to fig. 8, in the shopping guide scenario, the configurator configures the intention to "consult the details of the mobile phone", and the associated entity includes the entity "mobile phone model" and so on. An initial question "how i want to ask the configuration of iphone12 ProMax" was obtained in the first dialog of the multi-dialog. The enumerated entity "handset model" is identified from the initial question and three entity values, iphone12Pro and iphone12ProMax, can be extracted, respectively. When the final entity value is selected for combination, the character string lengths of the three entity values are found to be different, and then the entity value iphone12ProMax with the longest character string length can be selected for combination, namely, the entity combination candidate question 3 is reserved, and the entity combination candidate questions 1 and 2 are filtered out.

In a possible implementation manner, the expanding the entity value of the description type entity in the reference example sentence in the step S45 to obtain an expanded example sentence may include:

acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values configured for the descriptive entity in advance; and sequentially replacing the current entity value in the reference example sentence with the obtained other entity values to respectively obtain corresponding extended example sentences.

In the embodiment of the disclosure, the candidate question sentences obtained based on preset type, enumeration type and regular type entities are used as sentences to be matched for intention recognition. For the descriptive entity, the expanded example sentences are obtained by performing speaking expansion on the configured reference example sentences, and are also used as sentences to be matched for intention recognition, so that the sentences to be matched for intention recognition are enriched, and the accuracy of the intention recognition is improved.

Fig. 9 schematically illustrates a schematic diagram of descriptive entity extensions in accordance with an embodiment of the present disclosure. Referring to fig. 9, the intention "self-help health diagnosis" configured in the medical scene includes one reference example sentence: "I have cough for a week, not good all the time. Wherein "cough" is configured as a descriptive entity, the corresponding entity name is "user symptom", and the corresponding entity value includes: cough, runny nose, insomnia and headache. For the description type entity 'user symptom', enumerating all configured entity values, and replacing the current entity value 'cough' in the reference example sentence with other 3 entity values in turn, so as to obtain 3 extended example sentences. After the 3 extended example sentences are supplemented into the intention, the reference example sentences are extended from 1 to 4, and all possible enumerated user symptoms are covered, and the extended example sentences serve as sentences to be matched for intention recognition.

The method provided by the embodiment of the disclosure can be executed by a first recognition model which is trained in advance, and training data of triples is constructed by adopting a pairing frame (a pairing frame) before a plurality of rounds of conversations are started. The triple comprises a first text string, a second text string and a third text string, and the training aims at the similarity of the first text string and the second text string to be higher than the similarity of the first text string and the third text string.

FIG. 10 schematically illustrates a flow chart of training a first recognition model according to an embodiment of the present disclosure. Referring to fig. 10, training a first recognition model using training data of a pairing framework construction triplet before a multi-round dialog starts may include the steps of:

s101: constructing training data of triples containing entities by adopting a pairing frame;

the training method of the Pairwise frame is ternary, so that training data of a triplet needs to be pre-constructed. Of course, other frameworks such as the pointwise single document approach are also possible, and the pointwise uses a binary training approach, which is not specifically limited by the embodiments of the present disclosure. The following description will take the example of the papercase framework.

S102: reserving a plurality of space occupying entities in a dictionary, and setting corresponding space occupying codes according to the arrangement sequence of the corresponding space occupying entities in the dictionary;

in general, since all entities in all fields can hardly be collected, and there are cases where the user custom configures the entities, the embodiments of the present disclosure treat the entities as symbols that have no linguistic meaning but have an important meaning for text similarity calculation. The placeholder entities reserved in the dictionary are not fixed for a certain entity, but all entities may be common. The number of reserved space occupying entities can be set according to the needs, such as 8, 10 or 20, etc.

For example, 10 placeholder entities are reserved in the dictionary, the entity names and entity types are not required to be configured, and corresponding placeholder codes are only required to be set for the entity names and the entity types. Accordingly, the space-occupying code may be 10 consecutive codes, such as id 1-id 10. The 10 placeholder entities are in one-to-one correspondence with the 10 placeholder codes.

S103: sorting the entities in the constructed training data, and sequentially replacing the sorted entities with corresponding space occupying codes according to the ranking order;

the entity ordering rules can be set according to the needs, and the specific selection of the rules can be realized, so long as the entities are ordered according to a certain strategy. For example, the entities may be ordered by their name initials, and so on. Taking 10 space-occupying entities corresponding to 10 space-occupying codes as an example, if 3 entities exist in the constructed training data, the first 3 space-occupying codes in the dictionary are replaced in sequence after sequencing.

S104: converting other characters in the training data into character codes according to a conversion function;

s105: and training the first recognition model by using the encoded training data.

The process is based on the reserved space occupying entity and the corresponding space occupying code in the dictionary, the conversion from the entity to the code is completed, and the first recognition model can learn the meaning common to the entity, namely, the importance of the entity can be learned instead of the language meaning of each entity, so that the first recognition model can be ensured to be better migrated to a new scene, and the capability of migrating in each scene with a low threshold is realized. And the training data is constructed based on the triples under the pairing frame, and then the training is performed according to the encoded training data, so that the first recognition model can reach the learning target of the pairing frame, powerful guarantee is provided for subsequent intention recognition, and a reliable basis is provided for improving the accuracy of the intention recognition.

In embodiments of the present disclosure, the training data for the triples may be represented as (S1, S2, S3). Wherein S1 is a first text string, S2 is a second text string, and S3 is a third text string. The learning objective of the paper framework is set as follows: similarity S of the first text string S1 and the second text string S2 ₁₂ Higher than the similarity S of the first text string S1 and the third text string S3 ₁₃ The expression is as follows:

S ₁₂ >S ₁₃

wherein the similarity of two text strings can be expressed as follows:

S _ij ＝Sim(S _i ,S _j )

in one possible implementation manner, the step S101 may include:

firstly, constructing a triplet (S1, S2, S3) of a plain text, and if the first text string S1 and the second text string S2 contain the same entity and the first text string S1 and the third text string S3 do not contain the same entity, randomly selecting a specified number of entities from the same entities contained in the S1 and the S2, and replacing the entities with entity symbols to obtain triplet training data containing the entities. The training data is constructed in such a way that the first recognition model learns that the entity is an important word in the text strings, and that the similarity is higher when the two text strings have the same entity.

For example, the triplet of plain text is (i want to ticket today, i want to ticket next day), where S1 and S2 contain the same entity "today" and S3 does not contain the entity "today", then the same entity "today" is replaced with an entity symbol resulting in a triplet containing entities (i want to ticket $time $, i want to ticket next day). The entity symbol may be set as required, and may represent a corresponding entity, and the specific content is not limited.

In one possible implementation manner, the step S101 may include:

firstly, constructing a triplet (S1, S2, S3) of a plain text, if a plurality of identical entities are contained in a first text string S1 and a second text string S2, selecting a specified number of entities from the identical entities, finding out entities different from the specified number of entities in S3, and replacing the entities with entity symbols to obtain triplet training data containing the entities. This way of constructing the training data is to let the first recognition model learn text strings with the same entity with a higher similarity than text strings with different entities.

For example, a triplet of plain text is (i want to go to today 'S ticket, i want to go to the next day' S ticket), where S1 and S2 contain two identical entities "today" and "ticket". In S1 and S2, selecting an identical entity 'today' to replace the entity symbol $time, $and in S3, finding an entity different from the entity 'today', such as an 'ticket', replacing the entity with the entity symbol $ticket_type, $, thereby obtaining a triplet containing the entity as (i want to define the ticket of $time, $i want to define the ticket of $time, $ticket of the following day). The entity symbol may be set as required, and may represent a corresponding entity, and the specific content is not limited.

In one possible implementation manner, the step S101 may include:

first, a triplet (S1, S2, S3) of plain text is constructed, and if the first text string S1 and the second text string S2 contain a plurality of identical entities, at least two entities are selected from the plurality of identical entities to be replaced by entity symbols. If there is an entity in S3 and the entity is the same as one entity of the at least two entities, the entity is replaced by an entity symbol, and the triplet training data containing the entity is obtained. The training data is constructed in such a way that the first recognition model learns text strings corresponding to a greater number of identical entities, and the similarity is higher than text strings corresponding to a lesser number of identical entities.

For example, the triplet of plain text is (i want to go to today 'S ticket, i want to book the next day' S ticket), where S1 and S2 contain two identical entities "today" and "ticket", chosen to be replaced with the corresponding entity symbols $time $and $ticket_type $, respectively. If there is an entity "ticket" in S3, which is one of the two identical entities, it is replaced with the entity symbol $ticket_type$, so as to obtain the triplet including the entity as $ticket_type$ (i want to define $ticket_type$for $time $, i want to define $ticket_type$forthe following day). The entity symbol may be set as required, and may represent a corresponding entity, and the specific content is not limited.

The three ways of constructing the training data containing the triples of the entities can be executed at least one of them, preferably all in turn, so that the training data can be expanded to the greatest extent, and the training effect of the first recognition model is improved. Moreover, the training data is constructed in the mode without collecting more corpus and manually marking data, so that manpower is greatly saved.

In the embodiment of the present disclosure, after the steps S103 and S104, encoded training data may be obtained. The following is a specific example. For example, the constructed training data (S1, S2, S3) includes three entities: brand, trade name and quantity, respectively replaced by physical symbols: brand, $product $and $num $. The replaced S1 and S2 are as follows:

s1= "$brand$product $date of production";

s2= "i want to buy $num $ $brand $mask";

the three entities are sequenced according to a preset rule and then are sequentially: the corresponding space occupation codes 1, 2 and 3 in the dictionary are replaced in sequence according to the ranking order. And then converting other characters into character codes according to a conversion function, and obtaining codes corresponding to S1 and S2 as follows:

The codes corresponding to S1= "1, 3, f (f), f (raw), f (produced), f (day), f (period)";

the code corresponding to S2 is = "f (me), f (want), f (buy), 2" f (number), 1, f (number), f (face), f (film) ";

wherein the f (x) function is a literal to code id conversion function.

According to the embodiment of the disclosure, through the conversion from the entity to the code, the first recognition model can learn the general meaning of the entity, namely, the importance of the entity can be learned, instead of learning the language meaning of each entity, so that the first recognition model can be ensured to be better migrated to a new scene, and the capability of migrating in each scene under a low threshold is realized.

In an embodiment of the disclosure, the step S105 may include at least one of:

if the first text string and the second text string in the triplet contain the same entity and the first text string and the third text string do not contain the same entity, training a first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than that of the first text string and the third text string;

selecting a specified number of entities from a plurality of identical entities in the case that the first text string and the second text string in the triplet include the plurality of identical entities; finding out entities different from the specified number of entities in the third text strings of the triples, and training a first recognition model to recognize the similarity of the first text strings and the second text strings, wherein the similarity is higher than that of the first text strings and the third text strings;

In the case that the first text string and the second text string in the triplet contain a plurality of identical entities, selecting at least two entities from the plurality of identical entities; if an entity exists in the third text string of the triplet and is identical to one entity of the at least two entities, training the first recognition model to recognize the similarity of the first text string and the second text string higher than the similarity of the first text string and the third text string.

In one possible embodiment, the above method further comprises at least one of:

By the method, configuration personnel can set enumeration type entities, regular type entities and description type entities according to service requirements, the application is very flexible, each configuration can be used in a target service scene without retraining a model, the configuration personnel can be moved to any field, and the scene migration capability is very strong.

Fig. 11 schematically illustrates an overall flow diagram of a multi-round dialog according to an embodiment of the present disclosure. Referring to fig. 11, the intention recognition of the multi-round dialog includes a configuration flow and a recognition flow. The configuration flow comprises the steps of configuring intention and configuring entity, and training data obtained through configuration is used for training the first recognition model. In the identification process, the trained first identification model is used for carrying out intention identification on multiple rounds of conversations initiated by the user, so that intention of the user and related entity information are obtained, and corresponding replies or responses are carried out through session management and reply generation. For example, the results of the health diagnosis may be returned to the user in a health scenario, the shipping time may be returned to the user in an after-market scenario, and so on. In response to an operation to order an air ticket may be triggered in a predetermined air ticket scenario, in response to an operation to invoice a user may be triggered in an invoice scenario, and so on. Wherein the first recognition model performs intention recognition including: identification of candidate entities, combination of candidate entities, and intent identification in conjunction with the entities. Through the flow, the intention recognition and response of the multi-round dialogue are finally completed.

Exemplary Medium

Having described the method of an exemplary embodiment of the present disclosure, next, a medium of an exemplary embodiment of the present disclosure will be described with reference to fig. 12.

In some possible implementations, aspects of the present disclosure may also be implemented as a computer-readable medium having stored thereon a program for implementing the steps in a method for intent recognition in a multi-round dialog according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of this specification when the program is executed by a processor.

Specifically, the processor is configured to implement the following steps when executing the program: acquiring an initial question included in a first-round dialogue in multiple rounds of dialogues; determining a plurality of candidate entities corresponding to the initial question; generating a candidate question based on the initial question and the plurality of candidate entities; and determining the intention recognition result of the first-round dialogue according to the candidate question sentence and the reference example sentence which is pre-configured with the explicit intention.

It should be noted that: the medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As shown in fig. 12, a medium 120 is depicted that may employ a portable compact disc read-only memory (CD-ROM) and that includes a program and that may run on a device, in accordance with an embodiment of the present disclosure. However, the disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take many forms, including, but not limited to: electromagnetic signals, optical signals, or any suitable combination of the preceding. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the context of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).

Exemplary apparatus

Having described the media of the exemplary embodiments of the present disclosure, next, an apparatus of the exemplary embodiments of the present disclosure will be described with reference to fig. 13.

As shown in fig. 13, an apparatus for intent recognition in a multi-round dialog of an embodiment of the present disclosure may include:

an obtaining module 1301, configured to obtain an initial question included in a first-round dialogue in a multi-round dialogue;

a determining module 1302 for determining a plurality of candidate entities corresponding to the initial question;

a generating module 1303, configured to generate a candidate question based on the initial question and a plurality of candidate entities;

and a recognition module 1304 for determining an intention recognition result of the first-round dialogue according to the candidate question sentence and the reference example sentence which is pre-configured with the explicit intention.

In one possible implementation, the types of candidate entities include: the system comprises a preset type entity, an enumeration type entity, a regular type entity and a description type entity, wherein the preset type entity is a preconfigured and directly packaged entity type, the enumeration type entity is an entity type with an enumerated entity value, the regular type entity is an entity type with an available regular expression induction entity value, and the description type entity is an entity type for describing object attributes or states.

In one possible implementation, the determining module is configured to:

In one possible implementation, the generating module includes:

and the generating unit is used for respectively configuring the elements in each candidate entity set in the corresponding positions of the initial question to form a plurality of candidate questions corresponding to the initial question.

In a possible embodiment, the generating unit is configured to:

and respectively configuring elements in each candidate entity set in corresponding positions of the initial question, and filtering out sentences with the number of elements less than the appointed number to obtain a plurality of candidate question sentences corresponding to the initial question.

In a possible embodiment, the generating unit is configured to:

elements in each candidate entity set are respectively configured in corresponding positions of the initial question, sentences which do not appear in the reference example sentences which are pre-configured with clear intention are filtered, and a plurality of candidate question sentences corresponding to the initial question are obtained.

In a possible embodiment, the generating unit is configured to:

In a possible embodiment, the combination unit is further configured to:

In one possible implementation, the combining unit is specifically configured to, when screening out the corresponding entity values for combining according to the string length and the start position:

In one possible implementation, the identification module includes:

and the recognition unit is used for determining the intention recognition result of the first-round dialogue according to the candidate question sentence, the reference example sentence and the extension example sentence of the intention.

In one possible embodiment, the expansion unit is configured to:

if the reference example sentence with the explicit intention is pre-configured, acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values pre-configured for the descriptive entity;

replacing the current entity value in the reference example sentence with the obtained other entity values in turn to respectively obtain corresponding extended example sentences;

and supplementing the obtained extended example sentence into the intention.

In one possible embodiment, the identification module is configured to:

encoding candidate questions and reference example sentences which are pre-configured with clear intention respectively;

and determining the intention recognition result of the first-round dialogue according to the result of the similarity calculation.

In a possible implementation manner, the apparatus performs functions by using a first recognition model trained in advance, and the apparatus further includes:

In one possible implementation, the triplet includes a first text string, a second text string, and a third text string, the training targeting a similarity of the first text string and the second text string that is higher than a similarity of the first text string and the third text string.

In one possible implementation, the training module is configured to:

before the multi-turn conversation starts, if the first text string and the second text string in the triplet contain the same entity and the first text string and the third text string do not have the same entity, training the first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than the similarity of the first text string and the third text string.

In one possible implementation, the training module is configured to:

Before the start of the multi-round dialogue, selecting a specified number of entities from a plurality of identical entities under the condition that the first text string and the second text string in the triplet contain the identical entities;

and finding out entities which are different from the specified number of entities in the third text string of the triplet, and training a first recognition model to recognize the similarity of the first text string and the second text string, wherein the similarity is higher than that of the first text string and the third text string.

In one possible implementation, the training module is configured to:

before the start of the multi-round dialogue, selecting at least two entities from the plurality of identical entities under the condition that the first text string and the second text string in the triplet contain the plurality of identical entities;

if an entity exists in the third text string of the triplet and is identical to one entity of the at least two entities, training the first recognition model to recognize the similarity of the first text string and the second text string higher than the similarity of the first text string and the third text string.

In one possible embodiment, the apparatus further includes:

and converting other characters in the training data into character codes according to the conversion function.

In one possible implementation, the identification module is further configured to:

after the similarity calculation, the entity combination in the candidate question corresponding to the highest similarity is determined as the entity identification result of the first-round dialogue.

In one possible embodiment, the apparatus further includes:

the collecting module is used for collecting other entities except entity combinations in the conversation after the first-round conversation;

and the response module is used for executing reply response of multiple rounds of conversations according to the intention recognition result, the entity combination and other entities.

In a possible implementation manner, the apparatus further includes a configuration module, configured in at least one of the following manners:

According to the device provided by the embodiment of the disclosure, the initial question included in the first-round dialogue in the multi-round dialogue is acquired, a plurality of candidate entities corresponding to the initial question are determined, candidate questions are generated based on the initial question and the plurality of candidate entities, and the intention recognition result of the first-round dialogue is determined according to the candidate questions and the reference example sentences which are pre-configured with clear intention. Because the intention recognition is carried out on the basis of recognizing the candidate entity, the intention recognition and the entity recognition are effectively combined for use, and compared with a mode of simply carrying out the intention recognition, factors of the influence of the entity are considered, so that the accuracy of the intention recognition in the multi-round dialogue is greatly improved.

Exemplary computing device

Having described the methods, media, and apparatus of exemplary embodiments of the present disclosure, a computing device of exemplary embodiments of the present disclosure is next described with reference to fig. 14.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

In some possible implementations, a computing device according to embodiments of the present disclosure may include at least one processing unit and at least one storage unit. Wherein the storage unit stores program code that, when executed by the processing unit, causes the processing unit to perform steps in the method of intent recognition in a multi-turn conversation according to various exemplary embodiments of the present disclosure described in the section of the "exemplary method" above.

A computing device 140 according to such an implementation of the present disclosure is described below with reference to fig. 14. The computing device 140 shown in fig. 14 is merely an example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 14, computing device 140 is in the form of a general purpose computing device. Components of computing device 140 may include, but are not limited to: the at least one processing unit 1401, the at least one storage unit 1402, and a bus 1403 connecting different system components (including the processing unit 1401 and the storage unit 1402).

Bus 1403 includes a data bus, a control bus, and an address bus.

The storage unit 1402 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 14021 and/or cache memory 14022, and may further include readable media in the form of non-volatile memory, such as Read Only Memory (ROM) 14023.

The storage unit 1402 may also include a program/utility 14025 having a set (at least one) of program modules 14024, such program modules 14024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Computing device 140 can also communicate with one or more external devices 1404 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 1405. Moreover, computing device 140 may also communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1406. As shown in FIG. 14, the network adapter 1406 communicates with other modules of the computing device 140 over a bus 1403. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 140, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

It should be noted that while in the above detailed description, reference is made to several units/modules or sub-units/sub-modules of the device intended to be identified in a multi-round conversation, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for intent recognition in a multi-round dialog, comprising:

respectively configuring elements in each candidate entity set in corresponding positions of the initial question, filtering out sentences with the number of elements being less than the designated number or sentences which do not appear in the reference example sentences configured with clear intention in advance, and obtaining a plurality of candidate question sentences corresponding to the initial question; or, respectively configuring elements in each candidate entity set at corresponding positions of the initial question, and randomly selecting a plurality of sentences as candidate questions corresponding to the initial question by limiting the currently obtained sentences when the total number of the obtained sentences reaches a preset upper limit value;

determining an intention recognition result of the first-round dialogue according to the candidate question sentence and a reference example sentence which is pre-configured with clear intention;

If a reference example sentence with an explicit intention is preset, expanding the entity value of the descriptive entity in the reference example sentence to obtain an expanded example sentence, supplementing the obtained expanded example sentence into the intention, and determining an intention recognition result of the first-round dialogue according to the candidate question sentence, the reference example sentence of the intention and the expanded example sentence; the descriptive entity is an entity type for describing the attribute or state of the object;

expanding the entity value of the descriptive entity in the reference example sentence to obtain an expanded example sentence, wherein the expanding example sentence comprises the following steps:

acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values configured for the descriptive entity in advance; sequentially replacing the current entity value in the reference example sentence with the other acquired entity values to respectively obtain corresponding extended example sentences;

the method is performed by a first recognition model which is trained in advance, training data of a pairing frame construction triplet is adopted to train the first recognition model before the multi-round dialogue starts, and the constructed training data is encoded before the first recognition model is trained in the following mode:

Reserving a plurality of space occupying entities in a dictionary, and setting corresponding space occupying codes according to the arrangement sequence of the corresponding space occupying entities in the dictionary; sorting the entities in the constructed training data, and sequentially replacing the sorted entities with corresponding space occupying codes according to the ranking order; converting other characters in the training data into character codes according to a conversion function;

wherein the types of the candidate entities include: the system comprises a preset type entity, an enumeration type entity, a regular type entity and a description type entity, wherein the preset type entity is a pre-configured and directly packaged entity type, the enumeration type entity is an entity type with an enumerated entity value, and the regular type entity is an entity type with an available regular expression induced entity value.

2. The method of claim 1, wherein the determining a plurality of candidate entities corresponding to the initial question comprises:

and identifying a plurality of candidate entities corresponding to the initial question by using the first identification model, wherein the plurality of candidate entities comprise preset type entities, enumeration type entities and regular type entities.

3. The method of claim 1, further comprising, prior to combining the plurality of candidate entities to form a plurality of candidate entity sets corresponding to the initial question:

4. A method according to claim 3, wherein said screening out the corresponding entity values for combining based on the string length and the start position comprises:

5. The method of claim 1, wherein the determining the intention recognition result of the first-round dialog according to the candidate question and the reference example sentence pre-configured with an explicit intention comprises:

6. The method of claim 1, wherein the triplet includes a first text string, a second text string, and a third text string, the training targeting similarity of the first text string and the second text string higher than similarity of the first text string and the third text string.

7. The method of claim 6, wherein training the first recognition model using training data of paired frame construction triples comprises:

8. The method of claim 6, wherein training the first recognition model using training data of paired frame construction triples comprises:

9. The method of claim 6, wherein training the first recognition model using training data of paired frame construction triples comprises:

10. The method of claim 5, wherein the method further comprises:

11. The method according to claim 10, wherein the method further comprises:

12. The method of claim 1, further comprising at least one of:

13. An apparatus for intent recognition in a multi-round conversation, comprising:

a generating module, configured to combine the plurality of candidate entities to form a plurality of candidate entity sets corresponding to the initial question, where the candidate entity set has at least one candidate entity as an element thereof; the method comprises the steps of obtaining a plurality of initial question sentences, wherein the initial question sentences are used for obtaining a plurality of initial question sentences, and the initial question sentences are used for obtaining a plurality of candidate entity sets; or, respectively configuring elements in each candidate entity set at corresponding positions of the initial question, and randomly selecting a plurality of sentences as candidate questions corresponding to the initial question by limiting the currently obtained sentences when the total number of the obtained sentences reaches a preset upper limit value;

the recognition module is used for determining an intention recognition result of the first-round dialogue according to the candidate question sentence and a reference example sentence which is pre-configured with an explicit intention;

The identification module comprises:

an expansion unit, configured to, if a reference example sentence with an explicit intention is preset, expand an entity value of a description type entity in the reference example sentence to obtain an expanded example sentence, and supplement the obtained expanded example sentence to the intention; the descriptive entity is an entity type for describing the attribute or state of the object;

the recognition unit is used for determining an intention recognition result of the first-round dialogue according to the candidate question sentence, the reference example sentence and the extension example sentence of the intention;

the expansion unit is used for:

if a reference example sentence with clear intention is pre-configured, acquiring other entity values except the entity value currently appearing in the reference example sentence from a plurality of entity values pre-configured for the descriptive entity; sequentially replacing the current entity value in the reference example sentence with the other acquired entity values to respectively obtain corresponding extended example sentences; the obtained extended example sentences are supplemented into the intention;

the apparatus performs functions by a pre-trained first recognition model, the apparatus further comprising:

The training module is used for training the first recognition model by adopting training data of a pairing frame construction triplet before the multi-round dialogue starts;

the coding module is used for coding the constructed training data before training the first recognition model in the following way: reserving a plurality of space occupying entities in a dictionary, and setting corresponding space occupying codes according to the arrangement sequence of the corresponding space occupying entities in the dictionary; sorting the entities in the constructed training data, and sequentially replacing the sorted entities with corresponding space occupying codes according to the ranking order; converting other characters in the training data into character codes according to a conversion function;

14. The apparatus of claim 13, wherein the determining module is configured to:

15. The apparatus of claim 13, wherein the generating module is further configured to:

16. The apparatus of claim 15, wherein the generating module is specifically configured to, when screening out the corresponding entity values for combining according to the string length and the start position:

17. The apparatus of claim 13, wherein the identification module is configured to:

18. The apparatus of claim 13, wherein the triplet includes a first text string, a second text string, and a third text string, the training targeting similarity of the first text string and the second text string higher than similarity of the first text string and the third text string.

19. The apparatus of claim 18, wherein the training module is configured to:

20. The apparatus of claim 18, wherein the training module is configured to:

21. The apparatus of claim 18, wherein the training module is configured to:

22. The apparatus of claim 17, wherein the identification module is further configured to:

23. The apparatus of claim 22, wherein the apparatus further comprises:

24. The apparatus of claim 13, further comprising a configuration module configured to at least one of:

25. A medium storing a computer program, which when executed by a processor performs the method of any one of claims 1-12.

26. A computing device, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-12.