CN110929001B

CN110929001B - Matching method and matching system in voice dialogue system

Info

Publication number: CN110929001B
Application number: CN201811019607.9A
Authority: CN
Inventors: 徐龙生; 马天泽; 葛斯函; 林锋
Original assignee: NIO Anhui Holding Co Ltd
Current assignee: NIO Holding Co Ltd
Priority date: 2018-09-03
Filing date: 2018-09-03
Publication date: 2023-09-01
Anticipated expiration: 2038-09-03
Also published as: CN110929001A

Abstract

The invention relates to a matching method in a voice dialogue system and a matching system thereof. The method comprises the following steps: a conversion mapping step, namely judging whether the NLU output result of the voice conversation can be subjected to conversion mapping, and if so, carrying out conversion mapping on the NLU output result; judging the behavior execution condition, namely judging whether the current NLU result can meet the behavior execution condition, and if so, continuing the following execution step; and executing the corresponding actions and ending the flow. According to the invention, the man-machine interaction intelligence of the voice dialogue system can be improved, and good user experience is ensured.

Description

Matching method and matching system in voice dialogue system

Technical Field

The invention relates to man-machine interaction technology, in particular to a matching method and a matching system in a voice dialogue system.

Background

The heart of the voice dialog system is to understand the user's intent, perform the relevant operations and generate the appropriate voice feedback. Because a dialogue system with strong understanding and processing capability is not on the way, a large number of long-term iterations are required in various aspects including natural language processing, model algorithm, system implementation and the like, in order to ensure the interactive user friendliness of the dialogue system, a set of complete spam strategies with diversified characteristics are required to be designed, and when the conventional system capability cannot fully understand or execute the intention of a user, the spam strategies are used for dealing with, so that the interactive experience friendliness is ensured.

Disclosure of Invention

In view of the above, the present invention aims to provide a matching method in a voice conversation system and a matching system in a voice conversation system, which have various characteristics such as spam strategies.

The matching method in the voice dialogue system is characterized by comprising the following steps:

a conversion mapping step, namely judging whether the NLU output result of the voice conversation can be subjected to conversion mapping, and if so, carrying out conversion mapping on the NLU output result;

judging the behavior execution condition, namely judging whether the current NLU result can meet the behavior execution condition, and if so, continuing the following execution step; and

executing the corresponding behavior and ending the flow;

optionally, the method further comprises the following steps:

and a dynamic query step of judging whether to perform dynamic query if the judgment result in the behavior execution condition judgment step is negative, and if the judgment result is positive, calculating elements lacking in execution behaviors for the NLU output result, supplementing the elements through query performed to the user, and ending the flow.

Optionally, the method further comprises the following steps:

and a function manual return step of determining whether or not the user can be returned by using the predefined function manual when the determination is negative in the dynamic inquiry step, and if the determination is positive, the user is returned by using the predefined function manual and the flow is ended.

Optionally, the method further comprises the following steps:

and a special spam operation step of judging whether a special spam operation model is configured or not under the condition that the judgment is no in the function manual reply step, and if the judgment is yes, matching the NLU output result with the special spam operation model and ending the flow.

Optionally, the method further comprises the following steps:

and a general spam operation step, wherein if the special spam operation step is judged to be negative, whether a general spam operation model is configured is judged, and if the general spam operation model is judged to be positive, the general spam operation model is executed on the NLU output result, and the flow is ended.

Optionally, the conversion mapping step includes:

a domain judging sub-step of judging the domain of the user's question in the NLU output result;

an intention judgment sub-step of performing intention judgment on the questions of the users in the field;

and a conversion mapping sub-step of judging whether the user intention of the NLU output result can be converted into another preset intention.

Optionally, in the conversion mapping sub-step, the user intent is converted into another preset intent according to the domain, intent, slot, and slot attribute.

Optionally, in the step of replying to the function manual, as a predefined function manual, all intentions corresponding to the function manual in the user query are marked with an independent model, the user query is searched within the range of the intentions marked with the model, whether a certain function point is hit is judged, if so, the user is replied by using the predefined function manual, and the flow is ended.

Optionally, in the universal spam step, a universal spam model is performed in terms of domain, intent, and slot.

Optionally, in the step of universal spam surgery, the longest matching principle is adopted during matching, and the specific sequence is as follows:

domain, intent, and slot;

domain and intent;

domain.

The matching system in the voice dialogue system of the invention is characterized by comprising:

the conversion mapping module is used for carrying out conversion mapping on NLU output results of the voice conversation, and carrying out conversion mapping on the NLU output results under the condition that the NLU output results can be judged to be subjected to conversion mapping;

the behavior execution condition judging module is used for judging whether the current NLU result can meet the behavior execution condition; and

and the execution module is used for executing corresponding behaviors when the behavior execution condition judgment module judges that the behavior execution condition is met.

Optionally, the method further comprises:

and the dynamic query module is used for judging whether to perform dynamic query, and if the judgment result is yes, calculating elements lacking in execution behaviors for the NLU output result and supplementing the elements through query performed to a user.

Optionally, the method further comprises:

and a function manual reply module for judging whether the user can be replied by using the predefined function manual, and if so, replying the user by using the predefined function manual.

Optionally, the method further comprises:

and the special spam model is used for judging whether a special spam model is configured or not, and matching the NLU output result with the special spam model if the judgment is yes.

Optionally, the method further comprises:

and the universal spam model is used for judging whether the universal spam model is configured or not, and executing the universal spam model on the NLU output result if the universal spam model is configured.

Optionally, the conversion mapping module includes:

a domain judging sub-module for judging the domain of the user's question in the NLU output result;

an intention judgment sub-module for carrying out intention judgment on the questions of the users in the field;

and the conversion mapping sub-module is used for judging whether the user intention of the NLU output result can be converted into another preset intention.

Optionally, in the conversion mapping sub-module, the user intention is converted into another preset intention according to the field, the intention, the slot and the slot attribute.

Optionally, in the function manual reply module, as a predefined function manual, all intentions corresponding to the function manual in the user query are marked by an independent model, the query is used for searching in the range of the intentions marked by the model, whether a certain function point is hit is judged, and if the certain function point is hit, the user is replied by using the predefined function manual.

Optionally, in the universal spam module, a universal spam model is performed in terms of domain, intent, and slot.

Optionally, in the universal spam module, the longest matching principle is adopted during matching, and the specific sequence is as follows:

domain, intent, and slot;

domain and intent;

domain.

The voice dialogue system of the invention executes the matching method in the voice dialogue system and/or comprises the matching definition system in the voice dialogue system.

The computer-readable storage medium of the present invention has a program stored thereon, which when executed by a processor, is characterized in that the matching method in the above-described voice conversation system is realized.

The data processing device of the invention comprises a memory, a processor and a program stored on the memory and capable of running on the processor, and is characterized in that the processor executes the program to realize the matching method in the voice dialogue system.

According to the matching method in the voice conversation system and the matching system in the voice conversation system of the present invention, one intention of the user can be converted into another intention set in advance, for example, another similar intention, whereby the intention of the user can be flexibly performed. Furthermore, by using the invention, the spam strategy can be set, and when the existing system capacity can not fully understand or execute the intention of the user, the spam strategy can be used for coping, thereby ensuring good physical examination of the user and improving the intelligence of man-machine interaction. In addition, by setting the multi-stage spam strategies, when the system cannot process the user intention, according to the information contained in the conversation, a proper spam is generated through a plurality of layers of specific to wide matching strategies, so that the conversation is completed, and the user experience can be further improved.

Drawings

Fig. 1 is a flow chart illustrating a matching method in an embodiment of a voice conversation system of the present invention.

Fig. 2 is a block diagram showing the configuration of a matching system in a voice conversation system according to an embodiment of the present invention.

Fig. 3 is a block diagram showing the construction of the conversion mapping module of the present invention.

Detailed Description

The following presents a simplified summary of the invention in order to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.

The definition method of the intention behavior relation in the voice dialogue system and the execution method of the intention conversion behavior in the voice dialogue system and the system for realizing the method can be suitable for various man-machine interaction voice systems. In the following, a vehicle-mounted voice system is taken as an example to specifically describe, but this is only one embodiment, and the present invention can also be used in, for example, a voice dialogue system in an intelligent home system, and the like.

First, an embodiment in which the method for defining the intended behavior relationship in the speech dialogue system of the present invention is applied to the in-vehicle speech system will be described.

Some terms appearing in the following description are explained here.

(1) Domain (Domain): the method and the system refer to the field of scenes to which the user dialogue belongs, and the vehicle-mounted scenes mainly comprise navigation, telephones, media, car control, boring and the like.

(2) Intent or also called category (Intent): specific intention of user dialogue, namely specific scene category of a series of operations which the user wants to execute, such as searching POI scene in navigation field, searching contact scene in telephone field, searching singer song in media field, etc.

(3) Slot (Slot): the details contained in the user dialog may use the smallest unit of natural language understanding information. (4) behavior (Action): in the case of an in-vehicle voice system, this refers to the behavior of the vehicle, i.e. the response of the vehicle to the results of the user dialogue. The division of behavior in the present invention is based on the fact that the behavior is independent of each other and can represent a similar class of behavior. For example, appOpen, means to open vehicle Bluetooth, automobile data recorder, hot spot, WIFI, etc., and AirCondition WindDirectionNode, means modes of defrosting, front gear blowing, foot blowing, etc. of air conditioner wind direction. The granularity of the division of the behaviors can be adjusted according to the service requirement.

(5) Operation (Operation): in the case of a vehicle-mounted voice system, the specific operation generated by a vehicle machine can be designed independently according to functions supported by different vehicle types, but the formulated standards of the vehicle-mounted voice system should contain types (types) and parameters (params), wherein the types of the operations are represented by the types of the operations, and the parameters of the specific operations are represented by the parameters of the specific operations.

(6) Element (Element): the dialog system is used to process a data structure of slots (Slot) defined to represent slots in different states.

(7) Path (Path): refers to a minimum set of elements that satisfies an action execution condition, an action may include multiple paths, and if one of the defined paths of an action is satisfied, the action may be executed.

Next, the actions, paths, and relationships between the elements will be described in detail.

Both the Element and the Path belong to the category of metadata (metadata) for actions (actions) to define the requirements for completing the actions.

The elements (elements) have the following morphology:

form 1: only the slot type (slot type) thereof is concerned, and it is considered that the condition is satisfied as long as this type of slot occurs. Such as: the temperature is regulated, and the temperature is only needed to know the type of the temperature.

Form 2: in addition to the type (type) meeting the specification, the property (property) meets certain ranges. Such as: when it is intended to open an application (app_open), we know that the user's intention is to open an application, at this time we also need to know what application the user is specifically about to open to perform this action, so we need a type= =2, a slot type (slot type) is control_target_app, and a slot attribute (Element) is an Element within a certain range. For example, turn on a WeChat, which is an Element described above for which the slot attribute (slot property) is Wechat.

The Path (Path) is composed of N elements (elements), and when N has different values, the meaning is as follows: n=0, which means that the current path (path) can be executed without any redundant information, and usually, such intention (intent) has definite target, single execution task and smaller granularity;

n=1, which indicates that the current intention (intent) needs a specific Element (Element) to perform, and typically such intention (intent) contains several distinct objects;

n >1, representing the current intention (intent) that more than one Element (Element) is needed to execute, such intention (intent) is generally broad in coverage, more in supported parlance, and more in operation that can be generated, more accurate information is needed to be executed correctly.

Next, a matching method in a voice conversation system according to an embodiment of the present invention will be described.

As shown in fig. 1, the matching method in the voice dialogue system according to an embodiment of the invention is characterized by comprising the following steps:

step S10: judging whether the NLU (Natural Language Understand, natural language understanding) output result of the voice dialogue can be subjected to conversion mapping, and if so, carrying out conversion mapping on the NLU output result;

step S11, determining whether the current NLU result can correctly execute the behavior, that is, determining whether the current NLU result satisfies the condition of executing the behavior (action), if yes, continuing the following step S12, and if no, continuing the following step S13, wherein the condition of executing the behavior is satisfied: any path (path) defined in the behavior is satisfied, wherein the path is composed of elements (elements), and whether the dialog contains necessary elements (elements) can be directly judged by NLU results;

step S12: executing corresponding behaviors and ending the flow;

step S13: judging whether a dynamic inquiry scene is configured, if yes, continuing to step S14, and if no, continuing to step S15;

step S14: performing dynamic inquiry, namely calculating elements lacking in execution behaviors for the NLU output result, supplementing the elements through inquiry to a user, and ending the flow;

step S15: judging whether the user can be replied by utilizing a predefined function manual, if so, proceeding to step S16, otherwise proceeding to step S17;

step S16: replying the user by using a predefined function manual and ending the process, specifically, marking all intentions corresponding to the function manual in the user query by using an independent model as the predefined function manual, searching by using the query in the range of the intentions marked by the model, judging whether a certain function point is hit, and replying the user by using the predefined function manual and ending the process if the certain function point is hit;

step S17: judging whether a special spam model is configured, if yes, executing a step S18, and if not, executing a step S19;

step S18: matching the NLU output result with a special spam model and ending the flow;

step S19: judging whether a general spam model is configured (the general spam model is executed according to the field, the intention and the groove), if yes, executing step S20, and if no, executing step S21;

step S20: executing a general spam model for the NLU output result and ending the flow;

step S21: the chatting mode is entered, but the chatting mode is an example, and other preset modes may be entered.

The purpose of the conversion mapping performed in step 10 is that, although the system can understand the user dialogue, the user intention that cannot be executed for various reasons (such as not supported by the vehicle, not supported by the environment, not reasonably, etc.) is converted into another intention that is similar, so as to partially satisfy the requirement of the user or provide some information for the user.

Regarding the conversion mapping, a specific NLU result, i.e. "domain+intention+slot+slot attribute" ("domain+Intnt+slot+slot property") is mapped to another preset NLU result according to a rule defined by a preset configuration, and a subsequent process is performed according to the latter. For example: mapping the conversion of the 'open refrigeration mode' into the 'maximum refrigeration of an air conditioner'; the "open face defrost mode" is mapped to the "open defrost face foot defrost mode".

Next, a specific procedure of the conversion map determination performed in step S10 will be described. Here, the specific procedure of the conversion mapping includes the following sub-steps:

a domain judging sub-step: judging the field of the user 'S question in the NLU output result, namely judging whether the user' S question belongs to a certain field (domain), if so, continuing the following substep, and if not, jumping to the step S15;

an intention judging sub-step: performing intention judgment on the questions of the users in the field, namely judging whether the questions of the users in the field have clear intention (intent), if so, continuing the following substep, and if not, jumping to the step S15;

and a conversion mapping sub-step of judging whether the user intention of the NLU output result can be converted into another preset intention, namely judging whether the field +intention +slot attribute (domain +intent +slot attribute) in the current NLU result can be converted and mapped, if so, carrying out conversion mapping on the NLU result, and if not, jumping to the step S11.

In step S14, a dynamic query is made, specifically, when the intention of the user' S query can be understood, but some information necessary to perform the subsequent action is missing in the query, the user is queried for the missing information, wherein elements required to complement to complete the action of the scene are calculated from a predefined scene and queried for the user according thereto. For example, the user says that: when the temperature of the air conditioner is adjusted, the system inquires that: how much is to be tuned? Thereby supplementing the elements needed to complete the behavior of the scene.

The user is replied with a predefined function manual in step S16. The function manual is a collection of all the function points in the vehicle, and is used to make a user feel a bad hand when the function requested by the user cannot be handled by the voice function but is actually within the function range of the vehicle. Specifically, the function manual data is put into the ES to build an index. The index needs to embody a level hierarchy in the function manual, namely function classification→function point→detailed function. Wherein, an independent model marks all intentions (intentions) corresponding to manual functions in a user query, searches in an ES by using the query within the range of the intentions (intentions) marked by the model, and if a certain function point is hit, the function manual at the front end is called to display the contents in the manual for the user. For example, "how do keyless unlocking go? And showing the content of the relevant section of the function manual 'keyless unlocking'.

The matching method in the voice conversation system of the present invention is explained above. Next, a matching system in the voice conversation system of the present invention will be described.

As shown in fig. 2, a matching system in a voice dialogue system according to an embodiment of the present invention is characterized by comprising:

a conversion mapping module 100, configured to determine whether the NLU output result of the voice dialog can be converted and mapped, and to convert and map the NLU output result if it is determined that the conversion mapping is possible;

the behavior execution condition judging module 200 judges whether the current NLU result can meet the behavior execution condition;

the execution module 300 executes the corresponding behavior when the behavior execution condition judgment module judges that the behavior execution condition is satisfied;

a dynamic query module 400, configured to determine whether to perform dynamic query, and if the determination result is yes, calculate elements missing in the execution behavior for the NLU output result and supplement the elements by querying the user;

a function manual reply module 500 for judging whether the user can be replied with a predefined function manual, and if so, replying with a predefined function manual;

a dedicated spam model 600, configured to determine whether a dedicated spam model is configured, and if so, match the dedicated spam model to the NLU output result; and

the universal spam module 700 is configured to determine whether to configure a universal spam model, and if so, execute the universal spam model on the NLU output result.

In the function manual reply module 500, as a predefined function manual, all intentions corresponding to the function manual in the user query are marked by an independent model, and the user is replied by using the predefined function manual if the user query hits a certain function point by searching for the query within the range of the intentions marked by the model.

In the universal spam module 700, a universal spam model is performed in terms of domain, intent, and slot. In the universal spam module 700, the longest matching principle is adopted during matching, and the specific sequence is as follows:

domain, intent, and slot;

domain and intent;

domain.

Next, the conversion mapping module 100 will be specifically described.

Fig. 3 is a block diagram showing the configuration of the conversion mapping module 100 of the present invention.

As shown in fig. 3, the conversion mapping module 100 includes:

a domain judging sub-module 110 for judging the domain of the user's question in the NLU output result;

an intention judgment sub-module 120 for making an intention judgment for the user's question in the field; and

the conversion mapping sub-module 130 determines whether the user intention of the NLU output result can be converted into another preset intention, wherein in the conversion mapping sub-module 130, the user intention is converted into another preset intention according to the domain, the intention, the slot, and the slot attribute.

The invention also provides a voice dialogue system, which executes the matching method in the voice dialogue system and/or comprises the matching definition system in the voice dialogue system.

The present invention also provides a computer readable storage medium having a program stored thereon, wherein the program when executed by a processor implements the matching method in a speech dialogue system as described above.

The invention also provides a data processing device comprising a memory, a processor and a program stored on the memory and capable of running on the processor, characterized in that the processor implements the matching method in the voice dialogue system when executing the program.

The above examples mainly illustrate the matching method in the voice conversation system and the matching system in the voice conversation system of the present invention. Although only a few specific embodiments of the present invention have been described, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and the invention is intended to cover various modifications and substitutions without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of matching in a voice dialog system, comprising the steps of:

step S10: a conversion mapping step, namely judging whether the NLU output result of the voice conversation can be subjected to conversion mapping, and if so, carrying out conversion mapping on the NLU output result;

the conversion mapping step includes:

a conversion mapping sub-step of judging whether the user intention of the NLU output result can be converted into another preset intention; in the conversion mapping sub-step, converting the user intention into another preset intention according to the field, intention, slot and slot attribute;

step S11: a behavior execution condition judging step of judging whether the current NLU result can meet the behavior execution condition, if yes, executing step S12, and if no, executing step S13, wherein meeting the behavior execution condition means that: any path defined in the behavior is satisfied, wherein the path is composed of elements, and whether the dialogue contains necessary elements can be directly judged by NLU results;

step S12: executing corresponding behaviors and ending the flow;

step S13: judging whether a dynamic inquiry scene is configured, if yes, executing a step S14, and if no, executing a step S15;

step S14: a dynamic query step, namely performing dynamic query, calculating elements lacking in execution behaviors for the NLU output result, supplementing the elements through query to a user, and ending the flow;

step S16: a function manual reply step of replying to a user by using a predefined function manual and ending the flow;

step S17: judging whether a special spam model is configured, if yes, executing a step S18, and if no, executing a step S19;

step S18: a special spam step, namely matching the NLU output result with a special spam model and ending the flow;

step S19: judging whether a general spam model is configured, if yes, executing a step S20, and if no, executing a step S21;

step S20: a general spam step of executing a general spam model on the NLU output result and ending the flow, wherein in the general spam step, the general spam model is executed according to the field, the intention and the slot;

step S21: and entering a chat mode.

2. The method for matching in a voice dialog system of claim 1,

in the function manual reply step, as a predefined function manual, all intentions corresponding to the function manual in the user query are marked by an independent model, searching is carried out by the query within the range of the intentions marked by the model, whether a certain function point is hit is judged, and if the function point is hit, the user is replied by the predefined function manual and the flow is ended.

3. The method for matching in a voice dialog system of claim 1,

in the general spam surgery step, the longest matching principle is adopted during matching, and the specific sequence is as follows:

domain, intent, and slot;

domain and intent;

domain.

4. A matching system in a voice dialog system, comprising: the system comprises a conversion mapping module, a behavior execution condition judging module, a dynamic query module, a function manual replying module, a special spam module and a general spam module, wherein in step S10: the conversion mapping module judges whether the NLU output result of the voice conversation can be subjected to conversion mapping, and if so, the NLU output result is subjected to conversion mapping;

wherein the conversion mapping module comprises:

the conversion mapping sub-module is used for judging whether the user intention of the NLU output result can be converted into another preset intention or not; in the conversion mapping sub-module, converting the user intention into another preset intention according to the field, the intention, the slot and the slot attribute;

step S11: the behavior execution condition judging module judges whether the current NLU result can meet the behavior execution condition, if yes, the step S12 is executed, and if no, the step S13 is executed, wherein, the step of meeting the behavior execution condition is that: any path defined in the behavior is satisfied, wherein the path is composed of elements, and whether the dialogue contains necessary elements can be directly judged by NLU results;

step S12: executing corresponding behaviors and ending the flow;

step S14: the dynamic inquiry module performs dynamic inquiry, calculates elements lacking in execution behaviors for the NLU output result, supplements the elements through inquiry to a user, and ends the flow;

step S16: the function manual reply module replies to the user by utilizing a predefined function manual and ends the flow;

step S18: the special spam model matches the NLU output result with the special spam model and ends the flow;

step S20: the general spam model executes a general spam model for the NLU output result and ends the flow, and in the general spam model, the general spam model is executed according to the field, the intention and the groove;

step S21: and entering a chat mode.

5. The matching system in a voice dialog system of claim 3,

in the function manual reply module, as a predefined function manual, all intentions corresponding to the function manual in the user query are marked by an independent model, searching is carried out by the query within the range of the intentions marked by the model, whether a certain function point is hit is judged, and if the function point is hit, the user is replied by the predefined function manual and the flow is ended.

6. The matching system in a voice dialog system of claim 3,

in the general spam operation module, the longest matching principle is adopted during matching, and the specific sequence is as follows:

domain, intent, and slot;

domain and intent;

domain.

7. A speech dialog system performing the matching method in a speech dialog system as claimed in any of claims 1 to 3 and/or comprising the matching definition system in a speech dialog system as claimed in any of claims 4 to 6.

8. A computer-readable storage medium, on which a program is stored, which program, when being executed by a processor, implements a matching method in a speech dialog system as claimed in any of claims 1 to 3.

9. A data processing apparatus comprising a memory, a processor and a program stored on the memory and executable on the processor, characterized in that the processor implements a matching method in a speech dialog system as claimed in any of claims 1-3 when executing the program.